使用DEBUG SEGFAULT命令人工产生一个故障转移事件,
故障表象:
本文蕴含以下内容:
事务规模突显提示查询redis退步
1.添加slave节点。
1.1.1. 为者常成故障转移之DEBUG SEGFAULT。
人为故障转移在Redis Cluster中有三种路子:
方法一:对一个master使用DEBUG SEGFAULT命令。
方法二:对一个slave使用CLUSTER FAILOVER命令。
本文介绍DEBUG SEGFAULT命令用于master节点的图景。这些命令也足以用于slave节点,不过跟人工故障转移没怎么关系,暂不做牵线。
利用DEBUG SEGFAULT命令人工业生产生三个故障转移事件,进而触发slave的活动进级,进而使得原来的master担任的slots变化为由其拿走提高的slave肩负,并且该slave将转速为master,替代了原先的master的劳务。
试行DEBUG SEGFAULT 命令在此以前的节点状态如下:master节点7009有2个slave节点:7006,7007。
./redis-cli -c -h 192.168.197.101 -p 7000 cluster nodes
37ccec5145b4e071687e671bda36789e124fc9ed 192.168.197.101:7001 master - 0 1500107530823 2 connected 5461-10922
78ae31a28bcd62b87f93c932552b5f6c1fe3329c 192.168.197.101:7006 slave 5d0632d76008ea3010878317d804b3c0ae50a13f 0 1500107529816 11 connected
c48ead74999cf71f3f7446f6ae9771423de65890 192.168.197.101:7004 slave 37ccec5145b4e071687e671bda36789e124fc9ed 0 1500107529816 5 connected
5d0632d76008ea3010878317d804b3c0ae50a13f 192.168.197.101:7009 master - 0 1500107530823 11 connected 0-5460
b8be626d33d07cb10094ab9f1345d6436d18d27f 192.168.197.101:7002 master - 0 1500107531327 3 connected 10923-16383
38f95bb38e691efdb45f926eb9157cdba7111d92 192.168.197.101:7005 slave b8be626d33d07cb10094ab9f1345d6436d18d27f 0 1500107531831 6 connected
4314bb678cda2ba1550e3ec1081db5d5fae74c87 192.168.197.101:7000 myself,master - 0 0 10 connected
f53441ccbe2c3bec2fb03f8180f723c7c5b735c7 192.168.197.101:7007 slave 5d0632d76008ea3010878317d804b3c0ae50a13f 0 1500107531831 11 connected
连接到7009节点,并执行DEBUG SEGFAULT命令。
./redis-cli -c -h 192.168.197.101 -p 7009
192.168.197.101:7009> debug segfault
Could not connect to Redis at 192.168.197.101:7009: Connection refused
(1.37s)
not connected> exit
施行之后,7009节点处于FAIL状态,其slave节点之生机勃勃7006拿到升高,成为新的master节点。
./redis-cli -c -h 192.168.197.101 -p 7000 cluster nodes
37ccec5145b4e071687e671bda36789e124fc9ed 192.168.197.101:7001 master - 0 1500109072577 2 connected 5461-10922
78ae31a28bcd62b87f93c932552b5f6c1fe3329c 192.168.197.101:7006 master - 0 1500109072074 12 connected 0-5460
c48ead74999cf71f3f7446f6ae9771423de65890 192.168.197.101:7004 slave 37ccec5145b4e071687e671bda36789e124fc9ed 0 1500109073080 5 connected
5d0632d76008ea3010878317d804b3c0ae50a13f 192.168.197.101:7009 master,fail - 1500109048874 1500109046355 11 disconnected
b8be626d33d07cb10094ab9f1345d6436d18d27f 192.168.197.101:7002 master - 0 1500109073584 3 connected 10923-16383
38f95bb38e691efdb45f926eb9157cdba7111d92 192.168.197.101:7005 slave b8be626d33d07cb10094ab9f1345d6436d18d27f 0 1500109072074 6 connected
4314bb678cda2ba1550e3ec1081db5d5fae74c87 192.168.197.101:7000 myself,master - 0 0 10 connected
f53441ccbe2c3bec2fb03f8180f723c7c5b735c7 192.168.197.101:7007 slave 78ae31a28bcd62b87f93c932552b5f6c1fe3329c 0 1500109071570 12 connected
集群构成:
2.添加master节点。
1.1.2. 人为故障转移之CLUSTEMurano FAILOVEHighlander
Redis Cluster中,除了对master节点使用DEBUG SEGFAULT命令之外,还会有后生可畏种方式也得以完成人工故障转移,正是对二个slave使用CLUSTELacrosse FAILOVETiggo命令。
当前节点现状如下:
./redis-cli -c -h 192.168.197.101 -p 7000 cluster nodes
37ccec5145b4e071687e671bda36789e124fc9ed 192.168.197.101:7001 master - 0 1500112868234 2 connected 5461-10922
78ae31a28bcd62b87f93c932552b5f6c1fe3329c 192.168.197.101:7006 master - 0 1500112868738 12 connected 0-5460
c48ead74999cf71f3f7446f6ae9771423de65890 192.168.197.101:7004 slave 37ccec5145b4e071687e671bda36789e124fc9ed 0 1500112867230 5 connected
5d0632d76008ea3010878317d804b3c0ae50a13f 192.168.197.101:7009 master,fail - 1500109048874 1500109046355 11 disconnected
b8be626d33d07cb10094ab9f1345d6436d18d27f 192.168.197.101:7002 master - 0 1500112869243 3 connected 10923-16383
38f95bb38e691efdb45f926eb9157cdba7111d92 192.168.197.101:7005 slave b8be626d33d07cb10094ab9f1345d6436d18d27f 0 1500112868738 6 connected
4314bb678cda2ba1550e3ec1081db5d5fae74c87 192.168.197.101:7000 myself,master - 0 0 10 connected
f53441ccbe2c3bec2fb03f8180f723c7c5b735c7 192.168.197.101:7007 slave 78ae31a28bcd62b87f93c932552b5f6c1fe3329c 0 1500112867732 12 connected
节点7007是slave节点,节点7006是其master节点。
对节点7007履行命令CLUSTEQashqai FAILOVERubicon:
./redis-cli -c -h 192.168.197.101 -p 7007
192.168.197.101:7007> cluster failover
OK
实行成功以往,再一次查看节点意况:
192.168.197.101:7007> cluster nodes
4314bb678cda2ba1550e3ec1081db5d5fae74c87 192.168.197.101:7000 master - 0 1500113387728 10 connected
f53441ccbe2c3bec2fb03f8180f723c7c5b735c7 192.168.197.101:7007 myself,master - 0 0 15 connected 0-5460
78ae31a28bcd62b87f93c932552b5f6c1fe3329c 192.168.197.101:7006 slave f53441ccbe2c3bec2fb03f8180f723c7c5b735c7 0 1500113387728 15 connected
b8be626d33d07cb10094ab9f1345d6436d18d27f 192.168.197.101:7002 master - 0 1500113389747 3 connected 10923-16383
38f95bb38e691efdb45f926eb9157cdba7111d92 192.168.197.101:7005 slave b8be626d33d07cb10094ab9f1345d6436d18d27f 0 1500113388737 3 connected
5d0632d76008ea3010878317d804b3c0ae50a13f 192.168.197.101:7009 master,fail - 1500109048489 1500109045968 11 disconnected
c48ead74999cf71f3f7446f6ae9771423de65890 192.168.197.101:7004 slave 37ccec5145b4e071687e671bda36789e124fc9ed 0 1500113388234 2 connected
37ccec5145b4e071687e671bda36789e124fc9ed 192.168.197.101:7001 master - 0 1500113389242 2 connected 5461-10922
一句话来讲,CLUSTER FAILOVE昂科威命令在尚未以致master节点7006成为FAIL状态的情况下,使得slave节点7007晋升成为master节点,而且使得本来的master7006节点成为slave节点。
操作完结现在,7006和7007都处在不荒谬情形。
192.168.197.101:7007> cluster info
cluster_state:ok
cluster_slots_assigned:16384
cluster_slots_ok:16384
cluster_slots_pfail:0
cluster_slots_fail:0
cluster_known_nodes:8
cluster_size:3
cluster_current_epoch:15
cluster_my_epoch:15
cluster_stats_messages_sent:131678
cluster_stats_messages_received:85262
能够观察,整个Cluster的景况也是OK的。
总结:
DEBUG SEGFAULT命令和CLUSTE讴歌RDX FAILOVE宝马X5命令有部分相近之处,也是有分歧之处。
相似点:
(a卡塔尔两个都以在节点处孙铎常工作景况之处下,通过命令强迫模拟了故障的发出。
(b卡塔尔两个都会导致slave进步为master(DEBUG SEGFAULT用于master节点才会State of Qatar。
不同点:
(a卡塔尔DEBUG SEGFAULT可用于master节点,也能够用于slave节点,而CLUSTEPRADO FAILOVEEvoque只可以用于slave节点,不然报错。
(b卡塔尔DEBUG SEGFAULT试行到位今后会促成原来的master形成FAIL状态,而CLUSTE奥迪Q3 FAILOVE宝马X5不会。
(cState of QatarDEBUG SEGFAULT实施到位之后,原本的master节点仍然是master节点,而CLUSTETiguan FAILOVERAV4实施到位后,原本的master节点会成为slave节点。
3主3从,每种节点的多稀有8GB
3.删除slave节点。
机器布满:
- 删除master节点。
在同一个机架中,
5.Resharding(slots重新分配卡塔尔国。
xx.x.xxx.199xx.x.xxx.200xx.x.xxx.201
redis-server进度状态:
1.1.1. 添加slave节点
什么样向Redis Cluster中扩充二个新的节点,作为现有节点的slave呢?至稀少以下二种艺术:
(1State of Qatar使用redis-trib.rb工具,随机选拔master节点。
抑或利用redis-trib.rb那么些工具。以下命令将7006节点加多到Cluster中作为slave节点,通过7001节点奉行这几个命令。至于作为哪个master节点的slave节点,答案是在slave数量最少的master节点中随便选择八个master。
./redis-trib.rb add-node --slave 192.168.197.101:7006 192.168.197.101:7001
>>> Adding node 192.168.197.101:7006 to cluster 192.168.197.101:7001
>>> Performing Cluster Check (using node 192.168.197.101:7001)
M: 37ccec5145b4e071687e671bda36789e124fc9ed 192.168.197.101:7001
slots:5461-10922 (5462 slots) master
1 additional replica(s)
S: 4314bb678cda2ba1550e3ec1081db5d5fae74c87 192.168.197.101:7000
slots: (0 slots) slave
replicates dbcdc9682acbd8c52dd6184fe01bf5f9500b2180
M: b8be626d33d07cb10094ab9f1345d6436d18d27f 192.168.197.101:7002
slots:10923-16383 (5461 slots) master
1 additional replica(s)
S: 38f95bb38e691efdb45f926eb9157cdba7111d92 192.168.197.101:7005
slots: (0 slots) slave
replicates b8be626d33d07cb10094ab9f1345d6436d18d27f
M: dbcdc9682acbd8c52dd6184fe01bf5f9500b2180 192.168.197.101:7003
slots:0-5460 (5461 slots) master
1 additional replica(s)
S: c48ead74999cf71f3f7446f6ae9771423de65890 192.168.197.101:7004
slots: (0 slots) slave
replicates 37ccec5145b4e071687e671bda36789e124fc9ed
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.
Automatically selected master 192.168.197.101:7001
>>> Send CLUSTER MEET to node 192.168.197.101:7006 to make it join the cluster.
Waiting for the cluster to join.
>>> Configure node as replica of 192.168.197.101:7001.
[OK] New node added correctly.
(2卡塔尔使用redis-trib.rb工具,人工内定master节点。
运用--master-id那个选项来钦赐master节点的NODEID。
./redis-trib.rb add-node --slave --master-id 'dbcdc9682acbd8c52dd6184fe01bf5f9500b2180' 192.168.197.101:7007 192.168.197.101:7001
>>> Adding node 192.168.197.101:7007 to cluster 192.168.197.101:7001
>>> Performing Cluster Check (using node 192.168.197.101:7001)
//为了节省篇幅,此处省略了多少行文字。
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.
>>> Send CLUSTER MEET to node 192.168.197.101:7007 to make it join the cluster.
Waiting for the cluster to join.
>>> Configure node as replica of 192.168.197.101:7003.
[OK] New node added correctly.
依靠早前的辨证进度,已知host那些键的slot由master 7003肩负,而7007脚下早就投入到这几个Cluster中,并且是7003的slave。因而,7007上应有有host那几个键,可是只要经过7007查询host,则会重定向到其master7003上。
./redis-cli -c -h 192.168.197.101 -p 7007
192.168.197.101:7007> keys *
1) "host"
192.168.197.101:7007> get host
-> Redirected to slot [2130] located at 192.168.197.101:7003
"redis.coe2coe.me"
透过命令ps -eo pid,lstart | grep $pid,
1.1.2. 添加master节点
应用redis-trib.rb工具使得增加master节点很有益于。
./redis-trib.rb add-node 192.168.197.101:7008 192.168.197.101:7001
>>> Adding node 192.168.197.101:7008 to cluster 192.168.197.101:7001
>>> Performing Cluster Check (using node 192.168.197.101:7001)
M: 37ccec5145b4e071687e671bda36789e124fc9ed 192.168.197.101:7001
slots:5461-10922 (5462 slots) master
2 additional replica(s)
//为了省去篇幅,此处略去了若干行文字。
S: c48ead74999cf71f3f7446f6ae9771423de65890 192.168.197.101:7004
slots: (0 slots) slave
replicates 37ccec5145b4e071687e671bda36789e124fc9ed
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.
>>> Send CLUSTER MEET to node 192.168.197.101:7008 to make it join the cluster.
[OK] New node added correctly.
进而查看7008节点的情景,可以看到7008节点是作为master参加的。
./redis-trib.rb check 192.168.197.101:7008
>>> Performing Cluster Check (using node 192.168.197.101:7008)
M: 5377470350bb3fec9165a24589d115ca4fc1a644 192.168.197.101:7008
slots: (0 slots) master
0 additional replica(s)
//为了节省篇幅,此处省略了多少行文字。
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.
本条命令新添的master节点7008不常并未有负义务何slots,可是的确已是那么些Cluster中的叁个节点了。
./redis-cli -c -h 192.168.197.101 -p 7008
192.168.197.101:7008> keys *
(empty list or set)
192.168.197.101:7008> get host
-> Redirected to slot [2130] located at 192.168.197.101:7003
"redis.coe2coe.me"
发觉经太早就持续运作了五个月
1.1.3. 修正结点的master-slave关系
一时一刻7008节点是五个新投入的master节点,未有担任任何slots。
./redis-cli -c -h 192.168.197.101 -p 7008
192.168.197.101:7008> cluster nodes
5377470350bb3fec9165a24589d115ca4fc1a644 192.168.197.101:7008 myself,master - 0 0 0 connected
c48ead74999cf71f3f7446f6ae9771423de65890 192.168.197.101:7004 slave 37ccec5145b4e071687e671bda36789e124fc9ed 0 1500101360347 2 connected
b8be626d33d07cb10094ab9f1345d6436d18d27f 192.168.197.101:7002 master - 0 1500101359843 3 connected 10923-16383
dbcdc9682acbd8c52dd6184fe01bf5f9500b2180 192.168.197.101:7003 master - 0 1500101360851 7 connected 0-5460
//为了省去篇幅,此处省略了超多行文字。
f53441ccbe2c3bec2fb03f8180f723c7c5b735c7 192.168.197.101:7007 slave dbcdc9682acbd8c52dd6184fe01bf5f9500b2180 0 1500101360851 7 connected
192.168.197.101:7008> cluster replicate dbcdc9682acbd8c52dd6184fe01bf5f9500b2180
OK
今Smart用Redis Cluster的cluster replicate命令将7008以此master节对古籍标点修改正为7003节点的slave节点。
192.168.197.101:7008> cluster replicate dbcdc9682acbd8c52dd6184fe01bf5f9500b2180
OK
由来,修改成功。能够接收cluster nodes命令查看校订结果:
192.168.197.101:7008> cluster nodes
5377470350bb3fec9165a24589d115ca4fc1a644 192.168.197.101:7008 myself,slave dbcdc9682acbd8c52dd6184fe01bf5f9500b2180 0 0 0 connected
c48ead74999cf71f3f7446f6ae9771423de65890 192.168.197.101:7004 slave 37ccec5145b4e071687e671bda36789e124fc9ed 0 1500101430401 2 connected
b8be626d33d07cb10094ab9f1345d6436d18d27f 192.168.197.101:7002 master - 0 1500101430905 3 connected 10923-16383
dbcdc9682acbd8c52dd6184fe01bf5f9500b2180 192.168.197.101:7003 master - 0 1500101429897 7 connected 0-5460
//为了节约篇幅,此处省略了超级多行内容。
更加的印证一下复制关系已经成功建设布局:
192.168.197.101:7008> keys *
1) "host"
表达键host已经从其新的master上打响复制过来了。
发生故障前集群的节点状态:
1.1.4. 删除slave节点
先利用redis-cli查对待删除节点的NODEID,然后接受redis-trib.rb工具删除这么些节点就可以。
./redis-cli -c -h 192.168.197.101 -p 7008 cluster nodes |grep myself
5377470350bb3fec9165a24589d115ca4fc1a644 192.168.197.101:7008 myself,slave dbcdc9682acbd8c52dd6184fe01bf5f9500b2180 0 0 0 connected
[d@192.168.197.101:/opt/redis_cluster/7008]$./redis-trib.rb del-node 192.168.197.101:7008 5377470350bb3fec9165a24589d115ca4fc1a644
>>> Removing node 5377470350bb3fec9165a24589d115ca4fc1a644 from cluster 192.168.197.101:7008
>>> Sending CLUSTER FORGET messages to the cluster...
>>> SHUTDOWN the node.
于今甘休,7008节点不止从Cluster中除去掉了,并且其服务端口也关门了。
xx.x.xxx.200:8371(bedab2c537fe94f8c0363ac4ae97d56832316e65) masterxx.x.xxx.199:8373(792020fe66c00ae56e27cd7a048ba6bb2b67adb6) slavexx.x.xxx.201:8375(5ab4f85306da6d633e4834b4d3327f45af02171b) masterxx.x.xxx.201:8372(826607654f5ec81c3756a4a21f357e644efe605a) slavexx.x.xxx.199:8370(462cadcb41e635d460425430d318f2fe464665c5) masterxx.x.xxx.200:8374(1238085b578390f3c8efa30824fd9a4baba10ddf) slave
1.1.5. 删除master节点
Cluster中当前的节点情形如下所示,策画删除二个master节点:7003。那个master节点近来有2个slave节点7000和7007,而且担任的slots范围为:0到5460,还会有1个键数据:host。
./redis-cli -c -h 192.168.197.101 -p 7001 cluster nodes
37ccec5145b4e071687e671bda36789e124fc9ed 192.168.197.101:7001 myself,master - 0 0 2 connected 5461-10922
4314bb678cda2ba1550e3ec1081db5d5fae74c87 192.168.197.101:7000 slave dbcdc9682acbd8c52dd6184fe01bf5f9500b2180 0 1500102709303 7 connected
b8be626d33d07cb10094ab9f1345d6436d18d27f 192.168.197.101:7002 master - 0 1500102708296 3 connected 10923-16383
c48ead74999cf71f3f7446f6ae9771423de65890 192.168.197.101:7004 slave 37ccec5145b4e071687e671bda36789e124fc9ed 0 1500102707288 5 connected
f53441ccbe2c3bec2fb03f8180f723c7c5b735c7 192.168.197.101:7007 slave dbcdc9682acbd8c52dd6184fe01bf5f9500b2180 0 1500102708296 7 connected
78ae31a28bcd62b87f93c932552b5f6c1fe3329c 192.168.197.101:7006 slave 37ccec5145b4e071687e671bda36789e124fc9ed 0 1500102708296 2 connected
38f95bb38e691efdb45f926eb9157cdba7111d92 192.168.197.101:7005 slave b8be626d33d07cb10094ab9f1345d6436d18d27f 0 1500102708799 6 connected
dbcdc9682acbd8c52dd6184fe01bf5f9500b2180 192.168.197.101:7003 master - 0 1500102707792 7 connected 0-5460
./redis-cli -c -h 192.168.197.101 -p 7003
192.168.197.101:7003> keys *
1) "host"
这种状态下后生可畏旦平昔删除,将无法得逞,而是产生下边包车型客车错误,原因是必须要删空的master节点:不辜负权利何slots。
[d@192.168.197.101:/opt/redis_cluster/7008]$./redis-trib.rb del-node 192.168.197.101:7003 dbcdc9682acbd8c52dd6184fe01bf5f9500b2180
>>> Removing node dbcdc9682acbd8c52dd6184fe01bf5f9500b2180 from cluster 192.168.197.101:7003
[ERR] Node 192.168.197.101:7003 is not empty! Reshard data away and try again.
这种master的去除方法有三种:
(1卡塔尔(قطر方法风度翩翩:截止该master7003的劳动,使得slave自动提高为master。再度启航7003,那时7003将活动成为slave。进而能够一本万利的删除掉,而且还不会引致任何数据损失,并且不涉及slots的Resharding。
逐朝气蓬勃试行以下命令实现上述操作:
(a)停止7003服务。
./redis-cli -c -h 192.168.197.101 -p 7003 shutdown
在劳动结束的情状下,无法一向删除该节点,否则出现上面的不当:
./redis-trib.rb del-node 192.168.197.101:7000 dbcdc9682acbd8c52dd6184fe01bf5f9500b2180
>>> Removing node dbcdc9682acbd8c52dd6184fe01bf5f9500b2180 from cluster 192.168.197.101:7000
[ERR] No such node ID dbcdc9682acbd8c52dd6184fe01bf5f9500b2180
(b卡塔尔重新启航7003劳务。
在早已承认7003的slave选举提高已经打响做到的前提下,重新开动7003服务,当时7003将扭转为7000的叁个slave。
[d@192.168.197.101:/opt/redis_cluster/7003]$./redis-server ./redis.conf
(c卡塔尔推行删除节点操作,删除7003节点。
那儿得以成功从Cluster中剔除7003节点。
./redis-trib.rb del-node 192.168.197.101:7000 dbcdc9682acbd8c52dd6184fe01bf5f9500b2180
>>> Removing node dbcdc9682acbd8c52dd6184fe01bf5f9500b2180 from cluster 192.168.197.101:7000
>>> Sending CLUSTER FORGET messages to the cluster...
>>> SHUTDOWN the node.
从那之后,节点删除实现。
(3卡塔尔(قطر方法二:使用CLUSTE凯雷德 FAILOVELAND命让人工业生产生叁个故障转移事件,进而触发slave的机关进级。此方法跟措施后生可畏的基本原理很相仿。这里暂不介绍。
(2State of Qatar方法二:使用Redis Cluster的Resharding,将master7003负担的slots迁移到任何master,使得7003不再承受任何slots。从而7003改成一个空的master,当时得以去除掉该master。
涉及到Resharding操作,这里暂不介绍。
---------------------------------上边是日记深入分析--------------------------------------
1.1.6. Resharding(Slots重新分配卡塔尔
Resharding操作实际是Redis Cluster的意气风发有些slots从由两个master担任,调换为由另贰个master负担的经过,也便是slots的重新分配。
为了描述方便,先创制二个空的master节点7009,然后将7000上的54陆10个slots全体调换成7009节点上。
./redis-trib.rb add-node 192.168.197.101:7009 192.168.197.101:7000
眼下的节点景况如下:
./redis-cli -c -h 192.168.197.101 -p 7000 cluster nodes
37ccec5145b4e071687e671bda36789e124fc9ed 192.168.197.101:7001 master - 0 1500106989599 2 connected 5461-10922
78ae31a28bcd62b87f93c932552b5f6c1fe3329c 192.168.197.101:7006 slave 4314bb678cda2ba1550e3ec1081db5d5fae74c87 0 1500106990102 10 connected
c48ead74999cf71f3f7446f6ae9771423de65890 192.168.197.101:7004 slave 37ccec5145b4e071687e671bda36789e124fc9ed 0 1500106991610 5 connected
5d0632d76008ea3010878317d804b3c0ae50a13f 192.168.197.101:7009 master - 0 1500106991914 9 connected
b8be626d33d07cb10094ab9f1345d6436d18d27f 192.168.197.101:7002 master - 0 1500106990908 3 connected 10923-16383
38f95bb38e691efdb45f926eb9157cdba7111d92 192.168.197.101:7005 slave b8be626d33d07cb10094ab9f1345d6436d18d27f 0 1500106992014 6 connected
4314bb678cda2ba1550e3ec1081db5d5fae74c87 192.168.197.101:7000 myself,master - 0 0 10 connected 0-5460
f53441ccbe2c3bec2fb03f8180f723c7c5b735c7 192.168.197.101:7007 slave 4314bb678cda2ba1550e3ec1081db5d5fae74c87 0 1500106990605 10 connected
[d@192.168.197.101:/opt/redis_cluster/7009]$./redis-cli -c -h 192.168.197.101 -p 7000
192.168.197.101:7000> keys *
1) "host"
192.168.197.101:7000> get host
"redis.coe2coe.me"
上边将伊始开展真正的Resharding操作。
以下命令将节点7000(NODEID:4314bb678cda2ba1550e3ec1081db5d5fae74c87 卡塔尔(قطر担任的5462个slots迁移到7009(NODEID:5d0632d76008ea3010878317d804b3c0ae50a13fState of Qatar中。
./redis-trib.rb reshard --from 4314bb678cda2ba1550e3ec1081db5d5fae74c87 --to 5d0632d76008ea3010878317d804b3c0ae50a13f --slots 5461 --yes 192.168.197.101:7000
出口结果如下:
>>> Performing Cluster Check (using node 192.168.197.101:7000)^[[0m
M: 4314bb678cda2ba1550e3ec1081db5d5fae74c87 192.168.197.101:7000
slots:0-5460 (5461 slots) master
2 additional replica(s)
M: 37ccec5145b4e071687e671bda36789e124fc9ed 192.168.197.101:7001
slots:5461-10922 (5462 slots) master
1 additional replica(s)
S: 78ae31a28bcd62b87f93c932552b5f6c1fe3329c 192.168.197.101:7006
slots: (0 slots) slave
replicates 4314bb678cda2ba1550e3ec1081db5d5fae74c87
S: c48ead74999cf71f3f7446f6ae9771423de65890 192.168.197.101:7004
slots: (0 slots) slave
replicates 37ccec5145b4e071687e671bda36789e124fc9ed
M: 5d0632d76008ea3010878317d804b3c0ae50a13f 192.168.197.101:7009
slots: (0 slots) master
0 additional replica(s)
M: b8be626d33d07cb10094ab9f1345d6436d18d27f 192.168.197.101:7002
slots:10923-16383 (5461 slots) master
1 additional replica(s)
S: 38f95bb38e691efdb45f926eb9157cdba7111d92 192.168.197.101:7005
slots: (0 slots) slave
replicates b8be626d33d07cb10094ab9f1345d6436d18d27f
S: f53441ccbe2c3bec2fb03f8180f723c7c5b735c7 192.168.197.101:7007
slots: (0 slots) slave
replicates 4314bb678cda2ba1550e3ec1081db5d5fae74c87
[OK] All nodes agree about slots configuration.^[[0m
>>> Check for open slots...^[[0m
>>> Check slots coverage...^[[0m
[OK] All 16384 slots covered.^[[0m
Ready to move 5461 slots.
Source nodes:
M: 4314bb678cda2ba1550e3ec1081db5d5fae74c87 192.168.197.101:7000
slots:0-5460 (5461 slots) master
2 additional replica(s)
Destination node:
M: 5d0632d76008ea3010878317d804b3c0ae50a13f 192.168.197.101:7009
slots: (0 slots) master
0 additional replica(s)
Resharding plan:
Moving slot 0 from 4314bb678cda2ba1550e3ec1081db5d5fae74c87
Moving slot 1 from 4314bb678cda2ba1550e3ec1081db5d5fae74c87
Moving slot 2 from 4314bb678cda2ba1550e3ec1081db5d5fae74c87
Moving slot 3 from 4314bb678cda2ba1550e3ec1081db5d5fae74c87
Moving slot 4 from 4314bb678cda2ba1550e3ec1081db5d5fae74c87
Moving slot 5 from 4314bb678cda2ba1550e3ec1081db5d5fae74c87
//为了省去篇幅,此处省略了多数行文字。
Moving slot 5457 from 192.168.197.101:7000 to 192.168.197.101:7009:
Moving slot 5458 from 192.168.197.101:7000 to 192.168.197.101:7009:
Moving slot 5459 from 192.168.197.101:7000 to 192.168.197.101:7009:
Moving slot 5460 from 192.168.197.101:7000 to 192.168.197.101:7009:
现今,7001的任何54伍十七个slots全体由新的master7009担当。能够应用以下命令验证Sharding的结果:
./redis-cli -c -h 192.168.197.101 -p 7000 cluster nodes
37ccec5145b4e071687e671bda36789e124fc9ed 192.168.197.101:7001 master - 0 1500107530823 2 connected 5461-10922
78ae31a28bcd62b87f93c932552b5f6c1fe3329c 192.168.197.101:7006 slave 5d0632d76008ea3010878317d804b3c0ae50a13f 0 1500107529816 11 connected
c48ead74999cf71f3f7446f6ae9771423de65890 192.168.197.101:7004 slave 37ccec5145b4e071687e671bda36789e124fc9ed 0 1500107529816 5 connected
5d0632d76008ea3010878317d804b3c0ae50a13f 192.168.197.101:7009 master - 0 1500107530823 11 connected 0-5460
b8be626d33d07cb10094ab9f1345d6436d18d27f 192.168.197.101:7002 master - 0 1500107531327 3 connected 10923-16383
38f95bb38e691efdb45f926eb9157cdba7111d92 192.168.197.101:7005 slave b8be626d33d07cb10094ab9f1345d6436d18d27f 0 1500107531831 6 connected
4314bb678cda2ba1550e3ec1081db5d5fae74c87 192.168.197.101:7000 myself,master - 0 0 10 connected
f53441ccbe2c3bec2fb03f8180f723c7c5b735c7 192.168.197.101:7007 slave 5d0632d76008ea3010878317d804b3c0ae50a13f 0 1500107531831 11 connected
上述结果证实slot 0到5460合计54陆拾一个slots已经成功的从7007节点迁移到7009节点上了。
查询相关的键进一层验证键数据的动员搬迁结果:
./redis-cli -c -h 192.168.197.101 -p 7000
192.168.197.101:7000> keys *
(empty list or set)
192.168.197.101:7000> get host
-> Redirected to slot [2130] located at 192.168.197.101:7009
"redis.coe2coe.me"
192.168.197.101:7009> keys *
1) "host"
在节点7009上找到坐落于编号为2130的slot上的键host,表明键数据迁移成功。
这时候使用redis-trib.rb工具检查Cluster的景色:
./redis-trib.rb check 192.168.197.101:7009
>>> Performing Cluster Check (using node 192.168.197.101:7009)
M: 5d0632d76008ea3010878317d804b3c0ae50a13f 192.168.197.101:7009
slots:0-5460 (5461 slots) master
2 additional replica(s)
M: 37ccec5145b4e071687e671bda36789e124fc9ed 192.168.197.101:7001
slots:5461-10922 (5462 slots) master
1 additional replica(s)
S: c48ead74999cf71f3f7446f6ae9771423de65890 192.168.197.101:7004
slots: (0 slots) slave
replicates 37ccec5145b4e071687e671bda36789e124fc9ed
S: 78ae31a28bcd62b87f93c932552b5f6c1fe3329c 192.168.197.101:7006
slots: (0 slots) slave
replicates 5d0632d76008ea3010878317d804b3c0ae50a13f
M: 4314bb678cda2ba1550e3ec1081db5d5fae74c87 192.168.197.101:7000
slots: (0 slots) master
0 additional replica(s)
S: 38f95bb38e691efdb45f926eb9157cdba7111d92 192.168.197.101:7005
slots: (0 slots) slave
replicates b8be626d33d07cb10094ab9f1345d6436d18d27f
S: f53441ccbe2c3bec2fb03f8180f723c7c5b735c7 192.168.197.101:7007
slots: (0 slots) slave
replicates 5d0632d76008ea3010878317d804b3c0ae50a13f
M: b8be626d33d07cb10094ab9f1345d6436d18d27f 192.168.197.101:7002
slots:10923-16383 (5461 slots) master
1 additional replica(s)
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.
能够见到7000的2个slaves已经转移为7009的slaves了。
总结:
redis-trib.rb工具在应用reshard参数时,实施了以下五个动作:
(1卡塔尔(قطر将源master担负的slots调换为归目的master担任。
(2卡塔尔将源master存款和储蓄的键数据转移到指标master上。
(3卡塔尔国将源master的slaves转换为对象master的slaves.
步1:主节点8371错失和从节点8373的总是:46590:M 09 Sep 18:57:51.379 # Connection with slave xx.x.xxx.199:8373 lost.
步2:主节点8370/8375判定8371失联:42645:M 09 Sep 18:57:50.117 * Marking node bedab2c537fe94f8c0363ac4ae97d56832316e65 as failing (quorum reached).
步3:从节点8372/8373/8374吸取主节点8375说8371失去消息:46986:S 09 Sep 18:57:50.120 * FAIL message received from 5ab4f85306da6d633e4834b4d3327f45af02171b about bedab2c537fe94f8c0363ac4ae97d56832316e65
步4:主节点8370/8375授权8373升任为主节点转移:42645:M 09 Sep 18:57:51.055 # Failover auth granted to 792020fe66c00ae56e27cd7a048ba6bb2b67adb6 for epoch 16
步5:全数者节点8371改换自身的陈设,成为8373的从节点:46590:M 09 Sep 18:57:51.488 # Configuration change detected. Reconfiguring myself as a replica of 792020fe66c00ae56e27cd7a048ba6bb2b67adb6