Mongodb副本集具备自动故障转移的高可用特性,通常所说副本集是1主2从的架构,当主节点出现故障时,剩下2个节点会自动进行选出新主节点,提供对外服务.也可以进行主动维护,将主节点降级为从节点,将从节点提升为主节点.本次要介绍是当副本集中只有1个节点活着,其他节点全部异常,此时仅存的节点自动变成secondary,只能提供只读业务,无法提供写入业务.只有secondary变成primary才可以写入,此时应该如何做?
【传统关系型数据库如何做】
1、Oracle中dataguard,如果主库异常无法启动,此时需要进行将备库变成主库,可以通过active dataguard、failover或者强制switchover方式,Oracle中goldengate,此时源与目标库之间没有强制主备角色,此时都主库角色,只要将应用指向新数据源即可.
2、Mysql中master与slave,如果此时master出现异常,slave取消read only,普通用户即可以写入.(MGR与Mongodb类似,分布式系统)
【Mongodb此时如何将secondary变成primary】
1、节点以standalone方式启动,而不是副本集启动,重建剩下2个节点
2、至少恢复3个成员中2个节点来满足超过1/2投票选出新主,这种只要实例能够启动即可
【以standalone方式启动来重建副本集】
1、查看副本集配置情况
备注:1主2从副本集,无延迟、无仲裁节点.
代码语言:javascript复制xiaoxu:PRIMARY> rs.config();
{
"_id" : "xiaoxu",
"version" : 3,
"protocolVersion" : NumberLong(1),
"members" : [
{
"_id" : 0,
"host" : "10.130.10.149:37017",
"arbiterOnly" : false,
"buildIndexes" : true,
"hidden" : false,
"priority" : 10,
"tags" : {
},
"slaveDelay" : NumberLong(0),
"votes" : 1
},
{
"_id" : 1,
"host" : "10.130.10.150:37017",
"arbiterOnly" : false,
"buildIndexes" : true,
"hidden" : false,
"priority" : 9,
"tags" : {
},
"slaveDelay" : NumberLong(0),
"votes" : 1
},
{
"_id" : 2,
"host" : "10.130.9.149:37017",
"arbiterOnly" : false,
"buildIndexes" : true,
"hidden" : false,
"priority" : 8,
"tags" : {
},
"slaveDelay" : NumberLong(0),
"votes" : 1
}
]
2、模拟2个节点故障(kill -9)
备注:生产环境建议正常关闭实例,此时只有一个节点时,副本集是无法写入数据,只能读取.
xiaoxu:SECONDARY> rs.status();
{
"set" : "xiaoxu"},
"members" : [
{
"_id" : 0,
"name" : "10.130.10.149:37017",
"health" : 0,
"state" : 8,
"stateStr" : "(not reachable/healthy)",
"uptime" : 0,
"optime" : {
"ts" : Timestamp(0, 0),
"t" : NumberLong(-1)
},
{
"_id" : 1,
"name" : "10.130.10.150:37017",
"health" : 0,
"state" : 8,
"stateStr" : "(not reachable/healthy)",
"uptime" : 0,
"optime" : {
"ts" : Timestamp(0, 0),
"t" : NumberLong(-1)
},
{
"_id" : 2,
"name" : "10.130.9.149:37017",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 617240,
"optime" : {
"ts" : Timestamp(1597062517, 1),
"t" : NumberLong(7)
}
[可以进读操作,如修改会提示不是master]
xiaoxu:SECONDARY> rs.slaveOk();
xiaoxu:SECONDARY> show databases;
POCDB 0.845GB
admin 0.000GB
local 4.847GB
mongodb 0.406GB
mongodb1 0.386GB
mongodb2 0.387GB
mongodb3 0.386GB
mongodb4 0.387GB
mongodb5 0.407GB
survey 0.117GB
test 0.387GB
xiaoxu 0.386GB
xiaoxu:SECONDARY> use survey;
switched to db survey
xiaoxu:SECONDARY> db.survey.find().count();
1000000
xiaoxu:SECONDARY>
xiaoxu:SECONDARY> db.survey.drop();
2020-08-10T20:34:14.354 0800 E QUERY [thread1] Error: drop failed: {
"ok" : 0,
"errmsg" : "not master",
"code" : 10107,
"codeName" : "NotMaster"
} :
_getErrorWithCode@src/mongo/shell/utils.js:25:13
DBCollection.prototype.drop@src/mongo/shell/collection.js:752:1
@(shell):1:1
3、以standalone模式启动
xiaoxu:SECONDARY> use admin;
switched to db admin
xiaoxu:SECONDARY> db.shutdownServer();
server should be down...
vi /opt/mongo37017/conf/mongodb37017.conf
dbpath=/opt/mongo37017/data
logpath=/opt/mongo37017/log/mongodb.log
pidfilepath=/opt/mongo37017/log/mongodb.pid
directoryperdb=true
logappend=true
port=37017
fork=true
storageEngine=wiredTiger
wiredTigerCacheSizeGB=1
#wiredTigerStatisticsLogDelaySecs=0
wiredTigerJournalCompressor=snappy
wiredTigerDirectoryForIndexes=true
wiredTigerCollectionBlockCompressor=snappy
wiredTigerIndexPrefixCompression=1
#replSet=xiaoxu --注释replSet
【通过conf文件启动】
mongod -f /opt/mongo37017/conf/mongodb37017.conf
child process started successfully, parent exiting
mongo 127.0.0.1:37017
--提示没有副本集启动,因为replset存在副本集信息.
2020-08-10T20:38:19.442 0800 I STORAGE [initandlisten] ** WARNING: mongod started without --replSet yet 1 documents are present in local.system.replset
2020-08-10T20:38:19.442 0800 I STORAGE [initandlisten] ** Restart with --replSet unless you are doing maintenance and no other clients are connected.
2020-08-10T20:38:19.442 0800 I STORAGE [initandlisten] ** The TTL collection monitor will not start because of this.
4、验证读写操作
> use survey;
switched to db survey
> show tables;
survey
test
> db.test.find()
{ "_id" : 7, "type" : "food", "item" : "ccc", "ratings" : [ 9, 5, 8 ] }
{ "_id" : 5, "type" : "food", "item" : "aaa", "ratings" : [ 5, 8, 9 ] }
{ "_id" : 8, "type" : "food", "item" : "ddd", "ratings" : [ 9, 5 ] }
{ "_id" : 9, "type" : "food", "item" : "eee", "ratings" : [ 5, 9, 5 ] }
{ "_id" : 6, "type" : "food", "item" : "bbb", "ratings" : [ 5, 9 ] }
> db.test.find().count()
5
> db.test.drop();
true
> show tables;
survey
备注:此时shell命令行没有任何角色提示,例如是primary、secondary或者other等,此时只是一个>来表示,如何变成恢复副本集,此时可以初始化一个节点的副本集,后续可以将剩下节点加入进来.
5、重新初始化新副本集
备注:需要先删除local database,因为包括副本集信息,然后以副本集启动,相当于新的副本集.
5.1删除local db并关闭db
> use local;
switched to db local
> db.lo
db.loadServerScripts( db.logout(
> db.dropDatabase();
{ "dropped" : "local", "ok" : 1 }
> use admin;
switched to db admin
> db.shutdownServer();
server should be down...
5.2重新以副本集模式启动,取消注释replSet参数即可
【修改参数】
vi /opt/mongo37017/conf/mongodb37017.conf
dbpath=/opt/mongo37017/data
logpath=/opt/mongo37017/log/mongodb.log
pidfilepath=/opt/mongo37017/log/mongodb.pid
directoryperdb=true
logappend=true
port=37017
fork=true
replSet=xiaoxu
【启动37017实例】
mongod -f /opt/mongo37017/conf/mongodb37017.conf
5.3重新初始化副本集
mongo 127.0.0.1:37017
> use admin;
switched to db admin
> rs.initiate({_id:"xiaoxu",members:[{_id:0,host:"10.130.9.149:37017"}]});
{ "ok" : 1 }
xiaoxu:SECONDARY>
【角色已变成主,如果原主库恢复,此时还能加入进来恢复吗?】
xiaoxu:PRIMARY> show databases;
POCDB 0.845GB
admin 0.000GB
local 0.000GB
mongodb 0.406GB
mongodb1 0.386GB
mongodb2 0.387GB
mongodb3 0.386GB
mongodb4 0.387GB
mongodb5 0.407GB
survey 0.118GB
test 0.387GB
xiaoxu 0.386GB
xiaoxu:PRIMARY>
5.4处理原副本集节点
备注:此时2个是独立副本集,此时会提示IDs不匹配.
10.130.10.149节点,新副本集节点是10.130.9.149.
{
"_id" : 2,
"name" : "10.130.9.149:37017",
"health" : 0,
"state" : 8,
"stateStr" : "(not reachable/healthy)",
"uptime" : 0,
"lastHeartbeatMessage":"replica set IDs do not match, ours: 5f27d31892135040300560cf; remote node's: 5f31442979bb6521dca27356"
【处理方式】
清理原实例副本集信息,能否直接加入?可以直接加入,相当于重新初始化(以standalone方式启动删除local db然后以后副本集启动加入即可),剩下节点以类似方式处理
xiaoxu:PRIMARY> rs.add("10.130.10.149:37017");
{ "ok" : 1 }
xiaoxu:PRIMARY> rs.status();
{
"set" : "xiaoxu",
"members" : [
{
"_id" : 0,
"name" : "10.130.9.149:37017",
"health" : 1,
"state" : 1,
"stateStr" : "PRIMARY",
"uptime" : 1776,
"optime" : {
"ts" : Timestamp(1597065756, 1),
"t" : NumberLong(1)
},
{
"_id" : 1,
"name" : "10.130.10.149:37017",
"health" : 1,
"state" : 5,
"stateStr" : "STARTUP2",--重新初始化.
"uptime" : 61,
"optime" : {
"ts" : Timestamp(0, 0),
"t" : NumberLong(-1)
}
【总结】
本文档通过将副本集中只有1个存活只读来重建副本集后提供对外服务,并重新加入剩下节点来实现具备自动故障转移的高可用特性.