使用DEBUG SEGFAULT命令人工产生一个故障转移事件,

2020-01-28 10:26 来源:未知

故障表象:

本文蕴含以下内容:

 

事务规模突显提示查询redis退步

1.添加slave节点。

1.1.1. 为者常成故障转移之DEBUG SEGFAULT。

人为故障转移在Redis Cluster中有三种路子:

方法一:对一个master使用DEBUG SEGFAULT命令。

方法二:对一个slave使用CLUSTER FAILOVER命令。

 

本文介绍DEBUG SEGFAULT命令用于master节点的图景。这些命令也足以用于slave节点,不过跟人工故障转移没怎么关系,暂不做牵线。

利用DEBUG SEGFAULT命令人工业生产生三个故障转移事件,进而触发slave的活动进级,进而使得原来的master担任的slots变化为由其拿走提高的slave肩负,并且该slave将转速为master,替代了原先的master的劳务。

 

试行DEBUG SEGFAULT 命令在此以前的节点状态如下:master节点7009有2个slave节点:7006,7007。

./redis-cli -c -h 192.168.197.101 -p 7000 cluster nodes

37ccec5145b4e071687e671bda36789e124fc9ed 192.168.197.101:7001 master - 0 1500107530823 2 connected 5461-10922

78ae31a28bcd62b87f93c932552b5f6c1fe3329c 192.168.197.101:7006 slave 5d0632d76008ea3010878317d804b3c0ae50a13f 0 1500107529816 11 connected

c48ead74999cf71f3f7446f6ae9771423de65890 192.168.197.101:7004 slave 37ccec5145b4e071687e671bda36789e124fc9ed 0 1500107529816 5 connected

5d0632d76008ea3010878317d804b3c0ae50a13f 192.168.197.101:7009 master - 0 1500107530823 11 connected 0-5460

b8be626d33d07cb10094ab9f1345d6436d18d27f 192.168.197.101:7002 master - 0 1500107531327 3 connected 10923-16383

38f95bb38e691efdb45f926eb9157cdba7111d92 192.168.197.101:7005 slave b8be626d33d07cb10094ab9f1345d6436d18d27f 0 1500107531831 6 connected

4314bb678cda2ba1550e3ec1081db5d5fae74c87 192.168.197.101:7000 myself,master - 0 0 10 connected

f53441ccbe2c3bec2fb03f8180f723c7c5b735c7 192.168.197.101:7007 slave 5d0632d76008ea3010878317d804b3c0ae50a13f 0 1500107531831 11 connected

 

 

连接到7009节点,并执行DEBUG SEGFAULT命令。

./redis-cli -c -h 192.168.197.101 -p 7009

192.168.197.101:7009> debug segfault

Could not connect to Redis at 192.168.197.101:7009: Connection refused

(1.37s)

not connected> exit

施行之后,7009节点处于FAIL状态,其slave节点之生机勃勃7006拿到升高,成为新的master节点。

./redis-cli -c -h 192.168.197.101 -p 7000 cluster nodes

37ccec5145b4e071687e671bda36789e124fc9ed 192.168.197.101:7001 master - 0 1500109072577 2 connected 5461-10922

78ae31a28bcd62b87f93c932552b5f6c1fe3329c 192.168.197.101:7006 master - 0 1500109072074 12 connected 0-5460

c48ead74999cf71f3f7446f6ae9771423de65890 192.168.197.101:7004 slave 37ccec5145b4e071687e671bda36789e124fc9ed 0 1500109073080 5 connected

5d0632d76008ea3010878317d804b3c0ae50a13f 192.168.197.101:7009 master,fail - 1500109048874 1500109046355 11 disconnected

b8be626d33d07cb10094ab9f1345d6436d18d27f 192.168.197.101:7002 master - 0 1500109073584 3 connected 10923-16383

38f95bb38e691efdb45f926eb9157cdba7111d92 192.168.197.101:7005 slave b8be626d33d07cb10094ab9f1345d6436d18d27f 0 1500109072074 6 connected

4314bb678cda2ba1550e3ec1081db5d5fae74c87 192.168.197.101:7000 myself,master - 0 0 10 connected

f53441ccbe2c3bec2fb03f8180f723c7c5b735c7 192.168.197.101:7007 slave 78ae31a28bcd62b87f93c932552b5f6c1fe3329c 0 1500109071570 12 connected

 

 

集群构成:

2.添加master节点。

1.1.2. 人为故障转移之CLUSTEMurano FAILOVEHighlander

 

Redis Cluster中,除了对master节点使用DEBUG SEGFAULT命令之外,还会有后生可畏种方式也得以完成人工故障转移,正是对二个slave使用CLUSTELacrosse FAILOVETiggo命令。

 

当前节点现状如下:

./redis-cli -c -h 192.168.197.101 -p 7000 cluster nodes

37ccec5145b4e071687e671bda36789e124fc9ed 192.168.197.101:7001 master - 0 1500112868234 2 connected 5461-10922

78ae31a28bcd62b87f93c932552b5f6c1fe3329c 192.168.197.101:7006 master - 0 1500112868738 12 connected 0-5460

c48ead74999cf71f3f7446f6ae9771423de65890 192.168.197.101:7004 slave 37ccec5145b4e071687e671bda36789e124fc9ed 0 1500112867230 5 connected

5d0632d76008ea3010878317d804b3c0ae50a13f 192.168.197.101:7009 master,fail - 1500109048874 1500109046355 11 disconnected

b8be626d33d07cb10094ab9f1345d6436d18d27f 192.168.197.101:7002 master - 0 1500112869243 3 connected 10923-16383

38f95bb38e691efdb45f926eb9157cdba7111d92 192.168.197.101:7005 slave b8be626d33d07cb10094ab9f1345d6436d18d27f 0 1500112868738 6 connected

4314bb678cda2ba1550e3ec1081db5d5fae74c87 192.168.197.101:7000 myself,master - 0 0 10 connected

f53441ccbe2c3bec2fb03f8180f723c7c5b735c7 192.168.197.101:7007 slave 78ae31a28bcd62b87f93c932552b5f6c1fe3329c 0 1500112867732 12 connected

 

节点7007是slave节点,节点7006是其master节点。

对节点7007履行命令CLUSTEQashqai FAILOVERubicon:

./redis-cli -c -h 192.168.197.101 -p 7007

192.168.197.101:7007> cluster failover

OK

实行成功以往,再一次查看节点意况:

192.168.197.101:7007> cluster nodes

4314bb678cda2ba1550e3ec1081db5d5fae74c87 192.168.197.101:7000 master - 0 1500113387728 10 connected

f53441ccbe2c3bec2fb03f8180f723c7c5b735c7 192.168.197.101:7007 myself,master - 0 0 15 connected 0-5460

78ae31a28bcd62b87f93c932552b5f6c1fe3329c 192.168.197.101:7006 slave f53441ccbe2c3bec2fb03f8180f723c7c5b735c7 0 1500113387728 15 connected

b8be626d33d07cb10094ab9f1345d6436d18d27f 192.168.197.101:7002 master - 0 1500113389747 3 connected 10923-16383

38f95bb38e691efdb45f926eb9157cdba7111d92 192.168.197.101:7005 slave b8be626d33d07cb10094ab9f1345d6436d18d27f 0 1500113388737 3 connected

5d0632d76008ea3010878317d804b3c0ae50a13f 192.168.197.101:7009 master,fail - 1500109048489 1500109045968 11 disconnected

c48ead74999cf71f3f7446f6ae9771423de65890 192.168.197.101:7004 slave 37ccec5145b4e071687e671bda36789e124fc9ed 0 1500113388234 2 connected

37ccec5145b4e071687e671bda36789e124fc9ed 192.168.197.101:7001 master - 0 1500113389242 2 connected 5461-10922

 

一句话来讲,CLUSTER FAILOVE昂科威命令在尚未以致master节点7006成为FAIL状态的情况下,使得slave节点7007晋升成为master节点,而且使得本来的master7006节点成为slave节点。

操作完结现在,7006和7007都处在不荒谬情形。

 

192.168.197.101:7007> cluster info

cluster_state:ok

cluster_slots_assigned:16384

cluster_slots_ok:16384

cluster_slots_pfail:0

cluster_slots_fail:0

cluster_known_nodes:8

cluster_size:3

cluster_current_epoch:15

cluster_my_epoch:15

cluster_stats_messages_sent:131678

cluster_stats_messages_received:85262

能够观察,整个Cluster的景况也是OK的。

 

总结:

DEBUG SEGFAULT命令和CLUSTE讴歌RDX FAILOVE宝马X5命令有部分相近之处,也是有分歧之处。

相似点:

(a卡塔尔两个都以在节点处孙铎常工作景况之处下,通过命令强迫模拟了故障的发出。

(b卡塔尔两个都会导致slave进步为master(DEBUG SEGFAULT用于master节点才会State of Qatar。

不同点:

(a卡塔尔DEBUG SEGFAULT可用于master节点,也能够用于slave节点,而CLUSTEPRADO FAILOVEEvoque只可以用于slave节点,不然报错。

(b卡塔尔DEBUG SEGFAULT试行到位今后会促成原来的master形成FAIL状态,而CLUSTE奥迪Q3 FAILOVE宝马X5不会。

(cState of QatarDEBUG SEGFAULT实施到位之后,原本的master节点仍然是master节点,而CLUSTETiguan FAILOVERAV4实施到位后,原本的master节点会成为slave节点。

 

 

3主3从,每种节点的多稀有8GB

3.删除slave节点。

机器布满:

  1. 删除master节点。

在同一个机架中,

5.Resharding(slots重新分配卡塔尔国。

xx.x.xxx.199xx.x.xxx.200xx.x.xxx.201

 

redis-server进度状态:

1.1.1. 添加slave节点

什么样向Redis Cluster中扩充二个新的节点,作为现有节点的slave呢?至稀少以下二种艺术:

 

(1State of Qatar使用redis-trib.rb工具,随机选拔master节点。

抑或利用redis-trib.rb那么些工具。以下命令将7006节点加多到Cluster中作为slave节点,通过7001节点奉行这几个命令。至于作为哪个master节点的slave节点,答案是在slave数量最少的master节点中随便选择八个master。

./redis-trib.rb add-node --slave  192.168.197.101:7006 192.168.197.101:7001

>>> Adding node 192.168.197.101:7006 to cluster 192.168.197.101:7001

>>> Performing Cluster Check (using node 192.168.197.101:7001)

M: 37ccec5145b4e071687e671bda36789e124fc9ed 192.168.197.101:7001

   slots:5461-10922 (5462 slots) master

   1 additional replica(s)

S: 4314bb678cda2ba1550e3ec1081db5d5fae74c87 192.168.197.101:7000

   slots: (0 slots) slave

   replicates dbcdc9682acbd8c52dd6184fe01bf5f9500b2180

M: b8be626d33d07cb10094ab9f1345d6436d18d27f 192.168.197.101:7002

   slots:10923-16383 (5461 slots) master

   1 additional replica(s)

S: 38f95bb38e691efdb45f926eb9157cdba7111d92 192.168.197.101:7005

   slots: (0 slots) slave

   replicates b8be626d33d07cb10094ab9f1345d6436d18d27f

M: dbcdc9682acbd8c52dd6184fe01bf5f9500b2180 192.168.197.101:7003

   slots:0-5460 (5461 slots) master

   1 additional replica(s)

S: c48ead74999cf71f3f7446f6ae9771423de65890 192.168.197.101:7004

   slots: (0 slots) slave

   replicates 37ccec5145b4e071687e671bda36789e124fc9ed

[OK] All nodes agree about slots configuration.

>>> Check for open slots...

>>> Check slots coverage...

[OK] All 16384 slots covered.

Automatically selected master 192.168.197.101:7001

>>> Send CLUSTER MEET to node 192.168.197.101:7006 to make it join the cluster.

Waiting for the cluster to join.

>>> Configure node as replica of 192.168.197.101:7001.

[OK] New node added correctly.

 

(2卡塔尔使用redis-trib.rb工具,人工内定master节点。

运用--master-id那个选项来钦赐master节点的NODEID。

./redis-trib.rb add-node --slave  --master-id 'dbcdc9682acbd8c52dd6184fe01bf5f9500b2180' 192.168.197.101:7007 192.168.197.101:7001

>>> Adding node 192.168.197.101:7007 to cluster 192.168.197.101:7001

>>> Performing Cluster Check (using node 192.168.197.101:7001)

//为了节省篇幅,此处省略了多少行文字。

[OK] All nodes agree about slots configuration.

>>> Check for open slots...

>>> Check slots coverage...

[OK] All 16384 slots covered.

>>> Send CLUSTER MEET to node 192.168.197.101:7007 to make it join the cluster.

Waiting for the cluster to join.

>>> Configure node as replica of 192.168.197.101:7003.

[OK] New node added correctly.

 

依靠早前的辨证进度,已知host那些键的slot由master 7003肩负,而7007脚下早就投入到这几个Cluster中,并且是7003的slave。因而,7007上应有有host那几个键,可是只要经过7007查询host,则会重定向到其master7003上。

./redis-cli  -c -h 192.168.197.101 -p 7007

192.168.197.101:7007> keys *

1) "host"

192.168.197.101:7007> get host

-> Redirected to slot [2130] located at 192.168.197.101:7003

"redis.coe2coe.me"

 

透过命令ps -eo pid,lstart | grep $pid,

1.1.2. 添加master节点

应用redis-trib.rb工具使得增加master节点很有益于。

./redis-trib.rb add-node 192.168.197.101:7008 192.168.197.101:7001

>>> Adding node 192.168.197.101:7008 to cluster 192.168.197.101:7001

>>> Performing Cluster Check (using node 192.168.197.101:7001)

M: 37ccec5145b4e071687e671bda36789e124fc9ed 192.168.197.101:7001

   slots:5461-10922 (5462 slots) master

   2 additional replica(s)

//为了省去篇幅,此处略去了若干行文字。

S: c48ead74999cf71f3f7446f6ae9771423de65890 192.168.197.101:7004

   slots: (0 slots) slave

   replicates 37ccec5145b4e071687e671bda36789e124fc9ed

[OK] All nodes agree about slots configuration.

>>> Check for open slots...

>>> Check slots coverage...

[OK] All 16384 slots covered.

>>> Send CLUSTER MEET to node 192.168.197.101:7008 to make it join the cluster.

[OK] New node added correctly.

 

进而查看7008节点的情景,可以看到7008节点是作为master参加的。

./redis-trib.rb check 192.168.197.101:7008

>>> Performing Cluster Check (using node 192.168.197.101:7008)

M: 5377470350bb3fec9165a24589d115ca4fc1a644 192.168.197.101:7008

   slots: (0 slots) master

   0 additional replica(s)

//为了节省篇幅,此处省略了多少行文字。

[OK] All nodes agree about slots configuration.

>>> Check for open slots...

>>> Check slots coverage...

[OK] All 16384 slots covered.

 

本条命令新添的master节点7008不常并未有负义务何slots,可是的确已是那么些Cluster中的叁个节点了。

./redis-cli -c -h 192.168.197.101 -p 7008

192.168.197.101:7008> keys *

(empty list or set)

192.168.197.101:7008> get host

-> Redirected to slot [2130] located at 192.168.197.101:7003

"redis.coe2coe.me"

 

 

发觉经太早就持续运作了五个月

1.1.3. 修正结点的master-slave关系

一时一刻7008节点是五个新投入的master节点,未有担任任何slots。

./redis-cli -c -h 192.168.197.101 -p 7008

192.168.197.101:7008> cluster nodes

5377470350bb3fec9165a24589d115ca4fc1a644 192.168.197.101:7008 myself,master - 0 0 0 connected

c48ead74999cf71f3f7446f6ae9771423de65890 192.168.197.101:7004 slave 37ccec5145b4e071687e671bda36789e124fc9ed 0 1500101360347 2 connected

b8be626d33d07cb10094ab9f1345d6436d18d27f 192.168.197.101:7002 master - 0 1500101359843 3 connected 10923-16383

dbcdc9682acbd8c52dd6184fe01bf5f9500b2180 192.168.197.101:7003 master - 0 1500101360851 7 connected 0-5460

//为了省去篇幅,此处省略了超多行文字。

f53441ccbe2c3bec2fb03f8180f723c7c5b735c7 192.168.197.101:7007 slave dbcdc9682acbd8c52dd6184fe01bf5f9500b2180 0 1500101360851 7 connected

192.168.197.101:7008> cluster replicate dbcdc9682acbd8c52dd6184fe01bf5f9500b2180

OK

 

今Smart用Redis Cluster的cluster replicate命令将7008以此master节对古籍标点修改正为7003节点的slave节点。

 

192.168.197.101:7008> cluster replicate dbcdc9682acbd8c52dd6184fe01bf5f9500b2180

OK

由来,修改成功。能够接收cluster nodes命令查看校订结果:

 

192.168.197.101:7008> cluster nodes

5377470350bb3fec9165a24589d115ca4fc1a644 192.168.197.101:7008 myself,slave dbcdc9682acbd8c52dd6184fe01bf5f9500b2180 0 0 0 connected

c48ead74999cf71f3f7446f6ae9771423de65890 192.168.197.101:7004 slave 37ccec5145b4e071687e671bda36789e124fc9ed 0 1500101430401 2 connected

b8be626d33d07cb10094ab9f1345d6436d18d27f 192.168.197.101:7002 master - 0 1500101430905 3 connected 10923-16383

dbcdc9682acbd8c52dd6184fe01bf5f9500b2180 192.168.197.101:7003 master - 0 1500101429897 7 connected 0-5460

//为了节约篇幅,此处省略了超级多行内容。

 

更加的印证一下复制关系已经成功建设布局:

192.168.197.101:7008> keys *

1) "host"

表达键host已经从其新的master上打响复制过来了。

 

 

 

 

 

发生故障前集群的节点状态:

1.1.4. 删除slave节点

先利用redis-cli查对待删除节点的NODEID,然后接受redis-trib.rb工具删除这么些节点就可以。

./redis-cli -c -h 192.168.197.101 -p 7008 cluster nodes |grep myself

5377470350bb3fec9165a24589d115ca4fc1a644 192.168.197.101:7008 myself,slave dbcdc9682acbd8c52dd6184fe01bf5f9500b2180 0 0 0 connected

 

[d@192.168.197.101:/opt/redis_cluster/7008]$./redis-trib.rb del-node 192.168.197.101:7008 5377470350bb3fec9165a24589d115ca4fc1a644

>>> Removing node 5377470350bb3fec9165a24589d115ca4fc1a644 from cluster 192.168.197.101:7008

>>> Sending CLUSTER FORGET messages to the cluster...

>>> SHUTDOWN the node.

 

于今甘休,7008节点不止从Cluster中除去掉了,并且其服务端口也关门了。

 

xx.x.xxx.200:8371(bedab2c537fe94f8c0363ac4ae97d56832316e65) masterxx.x.xxx.199:8373(792020fe66c00ae56e27cd7a048ba6bb2b67adb6) slavexx.x.xxx.201:8375(5ab4f85306da6d633e4834b4d3327f45af02171b) masterxx.x.xxx.201:8372(826607654f5ec81c3756a4a21f357e644efe605a) slavexx.x.xxx.199:8370(462cadcb41e635d460425430d318f2fe464665c5) masterxx.x.xxx.200:8374(1238085b578390f3c8efa30824fd9a4baba10ddf) slave

1.1.5. 删除master节点

Cluster中当前的节点情形如下所示,策画删除二个master节点:7003。那个master节点近来有2个slave节点7000和7007,而且担任的slots范围为:0到5460,还会有1个键数据:host。

./redis-cli -c -h 192.168.197.101 -p 7001 cluster nodes

37ccec5145b4e071687e671bda36789e124fc9ed 192.168.197.101:7001 myself,master - 0 0 2 connected 5461-10922

4314bb678cda2ba1550e3ec1081db5d5fae74c87 192.168.197.101:7000 slave dbcdc9682acbd8c52dd6184fe01bf5f9500b2180 0 1500102709303 7 connected

b8be626d33d07cb10094ab9f1345d6436d18d27f 192.168.197.101:7002 master - 0 1500102708296 3 connected 10923-16383

c48ead74999cf71f3f7446f6ae9771423de65890 192.168.197.101:7004 slave 37ccec5145b4e071687e671bda36789e124fc9ed 0 1500102707288 5 connected

f53441ccbe2c3bec2fb03f8180f723c7c5b735c7 192.168.197.101:7007 slave dbcdc9682acbd8c52dd6184fe01bf5f9500b2180 0 1500102708296 7 connected

78ae31a28bcd62b87f93c932552b5f6c1fe3329c 192.168.197.101:7006 slave 37ccec5145b4e071687e671bda36789e124fc9ed 0 1500102708296 2 connected

38f95bb38e691efdb45f926eb9157cdba7111d92 192.168.197.101:7005 slave b8be626d33d07cb10094ab9f1345d6436d18d27f 0 1500102708799 6 connected

dbcdc9682acbd8c52dd6184fe01bf5f9500b2180 192.168.197.101:7003 master - 0 1500102707792 7 connected 0-5460

 

./redis-cli -c -h 192.168.197.101 -p 7003

192.168.197.101:7003> keys *

1) "host"

 

 

这种状态下后生可畏旦平昔删除,将无法得逞,而是产生下边包车型客车错误,原因是必须要删空的master节点:不辜负权利何slots。

[d@192.168.197.101:/opt/redis_cluster/7008]$./redis-trib.rb del-node 192.168.197.101:7003 dbcdc9682acbd8c52dd6184fe01bf5f9500b2180

>>> Removing node dbcdc9682acbd8c52dd6184fe01bf5f9500b2180 from cluster 192.168.197.101:7003

[ERR] Node 192.168.197.101:7003 is not empty! Reshard data away and try again.

 

这种master的去除方法有三种:

(1卡塔尔(قطر‎方法风度翩翩:截止该master7003的劳动,使得slave自动提高为master。再度启航7003,那时7003将活动成为slave。进而能够一本万利的删除掉,而且还不会引致任何数据损失,并且不涉及slots的Resharding。

 

逐朝气蓬勃试行以下命令实现上述操作:

(a)停止7003服务。

./redis-cli -c -h 192.168.197.101 -p 7003 shutdown

在劳动结束的情状下,无法一向删除该节点,否则出现上面的不当:

./redis-trib.rb del-node 192.168.197.101:7000  dbcdc9682acbd8c52dd6184fe01bf5f9500b2180

>>> Removing node dbcdc9682acbd8c52dd6184fe01bf5f9500b2180 from cluster 192.168.197.101:7000

[ERR] No such node ID dbcdc9682acbd8c52dd6184fe01bf5f9500b2180

 

(b卡塔尔重新启航7003劳务。

在早已承认7003的slave选举提高已经打响做到的前提下,重新开动7003服务,当时7003将扭转为7000的叁个slave。

[d@192.168.197.101:/opt/redis_cluster/7003]$./redis-server ./redis.conf

 

(c卡塔尔推行删除节点操作,删除7003节点。

那儿得以成功从Cluster中剔除7003节点。

./redis-trib.rb del-node 192.168.197.101:7000  dbcdc9682acbd8c52dd6184fe01bf5f9500b2180

>>> Removing node dbcdc9682acbd8c52dd6184fe01bf5f9500b2180 from cluster 192.168.197.101:7000

>>> Sending CLUSTER FORGET messages to the cluster...

>>> SHUTDOWN the node.

 

从那之后,节点删除实现。

 

(3卡塔尔(قطر‎方法二:使用CLUSTE凯雷德 FAILOVELAND命让人工业生产生叁个故障转移事件,进而触发slave的机关进级。此方法跟措施后生可畏的基本原理很相仿。这里暂不介绍。

 

(2State of Qatar方法二:使用Redis Cluster的Resharding,将master7003负担的slots迁移到任何master,使得7003不再承受任何slots。从而7003改成一个空的master,当时得以去除掉该master。

涉及到Resharding操作,这里暂不介绍。

 

---------------------------------上边是日记深入分析--------------------------------------

1.1.6. Resharding(Slots重新分配卡塔尔

Resharding操作实际是Redis Cluster的意气风发有些slots从由两个master担任,调换为由另贰个master负担的经过,也便是slots的重新分配。

 

为了描述方便,先创制二个空的master节点7009,然后将7000上的54陆10个slots全体调换成7009节点上。

./redis-trib.rb add-node 192.168.197.101:7009 192.168.197.101:7000

 

眼下的节点景况如下:

./redis-cli -c -h 192.168.197.101 -p 7000 cluster nodes

37ccec5145b4e071687e671bda36789e124fc9ed 192.168.197.101:7001 master - 0 1500106989599 2 connected 5461-10922

78ae31a28bcd62b87f93c932552b5f6c1fe3329c 192.168.197.101:7006 slave 4314bb678cda2ba1550e3ec1081db5d5fae74c87 0 1500106990102 10 connected

c48ead74999cf71f3f7446f6ae9771423de65890 192.168.197.101:7004 slave 37ccec5145b4e071687e671bda36789e124fc9ed 0 1500106991610 5 connected

5d0632d76008ea3010878317d804b3c0ae50a13f 192.168.197.101:7009 master - 0 1500106991914 9 connected

b8be626d33d07cb10094ab9f1345d6436d18d27f 192.168.197.101:7002 master - 0 1500106990908 3 connected 10923-16383

38f95bb38e691efdb45f926eb9157cdba7111d92 192.168.197.101:7005 slave b8be626d33d07cb10094ab9f1345d6436d18d27f 0 1500106992014 6 connected

4314bb678cda2ba1550e3ec1081db5d5fae74c87 192.168.197.101:7000 myself,master - 0 0 10 connected 0-5460

f53441ccbe2c3bec2fb03f8180f723c7c5b735c7 192.168.197.101:7007 slave 4314bb678cda2ba1550e3ec1081db5d5fae74c87 0 1500106990605 10 connected

[d@192.168.197.101:/opt/redis_cluster/7009]$./redis-cli -c -h 192.168.197.101 -p 7000

192.168.197.101:7000> keys *

1) "host"

192.168.197.101:7000> get host

"redis.coe2coe.me"

 

上边将伊始开展真正的Resharding操作。

以下命令将节点7000(NODEID:4314bb678cda2ba1550e3ec1081db5d5fae74c87 卡塔尔(قطر‎担任的5462个slots迁移到7009(NODEID:5d0632d76008ea3010878317d804b3c0ae50a13fState of Qatar中。

./redis-trib.rb reshard --from 4314bb678cda2ba1550e3ec1081db5d5fae74c87 --to  5d0632d76008ea3010878317d804b3c0ae50a13f --slots 5461 --yes 192.168.197.101:7000

 

 

出口结果如下:

>>> Performing Cluster Check (using node 192.168.197.101:7000)^[[0m

M: 4314bb678cda2ba1550e3ec1081db5d5fae74c87 192.168.197.101:7000

   slots:0-5460 (5461 slots) master

   2 additional replica(s)

M: 37ccec5145b4e071687e671bda36789e124fc9ed 192.168.197.101:7001

   slots:5461-10922 (5462 slots) master

   1 additional replica(s)

S: 78ae31a28bcd62b87f93c932552b5f6c1fe3329c 192.168.197.101:7006

   slots: (0 slots) slave

   replicates 4314bb678cda2ba1550e3ec1081db5d5fae74c87

S: c48ead74999cf71f3f7446f6ae9771423de65890 192.168.197.101:7004

   slots: (0 slots) slave

   replicates 37ccec5145b4e071687e671bda36789e124fc9ed

M: 5d0632d76008ea3010878317d804b3c0ae50a13f 192.168.197.101:7009

   slots: (0 slots) master

   0 additional replica(s)

M: b8be626d33d07cb10094ab9f1345d6436d18d27f 192.168.197.101:7002

   slots:10923-16383 (5461 slots) master

   1 additional replica(s)

S: 38f95bb38e691efdb45f926eb9157cdba7111d92 192.168.197.101:7005

   slots: (0 slots) slave

   replicates b8be626d33d07cb10094ab9f1345d6436d18d27f

S: f53441ccbe2c3bec2fb03f8180f723c7c5b735c7 192.168.197.101:7007

   slots: (0 slots) slave

   replicates 4314bb678cda2ba1550e3ec1081db5d5fae74c87

[OK] All nodes agree about slots configuration.^[[0m

>>> Check for open slots...^[[0m

>>> Check slots coverage...^[[0m

[OK] All 16384 slots covered.^[[0m

 

Ready to move 5461 slots.

  Source nodes:

    M: 4314bb678cda2ba1550e3ec1081db5d5fae74c87 192.168.197.101:7000

   slots:0-5460 (5461 slots) master

   2 additional replica(s)

  Destination node:

    M: 5d0632d76008ea3010878317d804b3c0ae50a13f 192.168.197.101:7009

   slots: (0 slots) master

   0 additional replica(s)

  Resharding plan:

    Moving slot 0 from 4314bb678cda2ba1550e3ec1081db5d5fae74c87

    Moving slot 1 from 4314bb678cda2ba1550e3ec1081db5d5fae74c87

    Moving slot 2 from 4314bb678cda2ba1550e3ec1081db5d5fae74c87

    Moving slot 3 from 4314bb678cda2ba1550e3ec1081db5d5fae74c87

    Moving slot 4 from 4314bb678cda2ba1550e3ec1081db5d5fae74c87

    Moving slot 5 from 4314bb678cda2ba1550e3ec1081db5d5fae74c87

//为了省去篇幅,此处省略了多数行文字。

Moving slot 5457 from 192.168.197.101:7000 to 192.168.197.101:7009:

Moving slot 5458 from 192.168.197.101:7000 to 192.168.197.101:7009:

Moving slot 5459 from 192.168.197.101:7000 to 192.168.197.101:7009:

Moving slot 5460 from 192.168.197.101:7000 to 192.168.197.101:7009:

 

现今,7001的任何54伍十七个slots全体由新的master7009担当。能够应用以下命令验证Sharding的结果:

./redis-cli -c -h 192.168.197.101 -p 7000 cluster nodes

37ccec5145b4e071687e671bda36789e124fc9ed 192.168.197.101:7001 master - 0 1500107530823 2 connected 5461-10922

78ae31a28bcd62b87f93c932552b5f6c1fe3329c 192.168.197.101:7006 slave 5d0632d76008ea3010878317d804b3c0ae50a13f 0 1500107529816 11 connected

c48ead74999cf71f3f7446f6ae9771423de65890 192.168.197.101:7004 slave 37ccec5145b4e071687e671bda36789e124fc9ed 0 1500107529816 5 connected

5d0632d76008ea3010878317d804b3c0ae50a13f 192.168.197.101:7009 master - 0 1500107530823 11 connected 0-5460

b8be626d33d07cb10094ab9f1345d6436d18d27f 192.168.197.101:7002 master - 0 1500107531327 3 connected 10923-16383

38f95bb38e691efdb45f926eb9157cdba7111d92 192.168.197.101:7005 slave b8be626d33d07cb10094ab9f1345d6436d18d27f 0 1500107531831 6 connected

4314bb678cda2ba1550e3ec1081db5d5fae74c87 192.168.197.101:7000 myself,master - 0 0 10 connected

f53441ccbe2c3bec2fb03f8180f723c7c5b735c7 192.168.197.101:7007 slave 5d0632d76008ea3010878317d804b3c0ae50a13f 0 1500107531831 11 connected

 

上述结果证实slot 0到5460合计54陆拾一个slots已经成功的从7007节点迁移到7009节点上了。

 

查询相关的键进一层验证键数据的动员搬迁结果:

./redis-cli -c -h 192.168.197.101 -p 7000

192.168.197.101:7000> keys *

(empty list or set)

192.168.197.101:7000> get host

-> Redirected to slot [2130] located at 192.168.197.101:7009

"redis.coe2coe.me"

192.168.197.101:7009> keys *

1) "host"

 

在节点7009上找到坐落于编号为2130的slot上的键host,表明键数据迁移成功。

 

这时候使用redis-trib.rb工具检查Cluster的景色:

./redis-trib.rb check 192.168.197.101:7009

>>> Performing Cluster Check (using node 192.168.197.101:7009)

M: 5d0632d76008ea3010878317d804b3c0ae50a13f 192.168.197.101:7009

   slots:0-5460 (5461 slots) master

   2 additional replica(s)

M: 37ccec5145b4e071687e671bda36789e124fc9ed 192.168.197.101:7001

   slots:5461-10922 (5462 slots) master

   1 additional replica(s)

S: c48ead74999cf71f3f7446f6ae9771423de65890 192.168.197.101:7004

   slots: (0 slots) slave

   replicates 37ccec5145b4e071687e671bda36789e124fc9ed

S: 78ae31a28bcd62b87f93c932552b5f6c1fe3329c 192.168.197.101:7006

   slots: (0 slots) slave

   replicates 5d0632d76008ea3010878317d804b3c0ae50a13f

M: 4314bb678cda2ba1550e3ec1081db5d5fae74c87 192.168.197.101:7000

   slots: (0 slots) master

   0 additional replica(s)

S: 38f95bb38e691efdb45f926eb9157cdba7111d92 192.168.197.101:7005

   slots: (0 slots) slave

   replicates b8be626d33d07cb10094ab9f1345d6436d18d27f

S: f53441ccbe2c3bec2fb03f8180f723c7c5b735c7 192.168.197.101:7007

   slots: (0 slots) slave

   replicates 5d0632d76008ea3010878317d804b3c0ae50a13f

M: b8be626d33d07cb10094ab9f1345d6436d18d27f 192.168.197.101:7002

   slots:10923-16383 (5461 slots) master

   1 additional replica(s)

[OK] All nodes agree about slots configuration.

>>> Check for open slots...

>>> Check slots coverage...

[OK] All 16384 slots covered.

 

能够见到7000的2个slaves已经转移为7009的slaves了。

 

总结:

redis-trib.rb工具在应用reshard参数时,实施了以下五个动作:

(1卡塔尔(قطر‎将源master担负的slots调换为归目的master担任。

(2卡塔尔将源master存款和储蓄的键数据转移到指标master上。

(3卡塔尔国将源master的slaves转换为对象master的slaves.

 

步1:主节点8371错失和从节点8373的总是:46590:M 09 Sep 18:57:51.379 # Connection with slave xx.x.xxx.199:8373 lost.

步2:主节点8370/8375判定8371失联:42645:M 09 Sep 18:57:50.117 * Marking node bedab2c537fe94f8c0363ac4ae97d56832316e65 as failing (quorum reached).

步3:从节点8372/8373/8374吸取主节点8375说8371失去消息:46986:S 09 Sep 18:57:50.120 * FAIL message received from 5ab4f85306da6d633e4834b4d3327f45af02171b about bedab2c537fe94f8c0363ac4ae97d56832316e65

步4:主节点8370/8375授权8373升任为主节点转移:42645:M 09 Sep 18:57:51.055 # Failover auth granted to 792020fe66c00ae56e27cd7a048ba6bb2b67adb6 for epoch 16

步5:全数者节点8371改换自身的陈设,成为8373的从节点:46590:M 09 Sep 18:57:51.488 # Configuration change detected. Reconfiguring myself as a replica of 792020fe66c00ae56e27cd7a048ba6bb2b67adb6

TAG标签:
版权声明:本文由www.129028.com-澳门金沙唯一官网www129028com发布于编程新闻,转载请注明出处:使用DEBUG SEGFAULT命令人工产生一个故障转移事件,