tikv transfer-region 失败

使用 operator add transfer-region 迁移peer的存储节点时遇到的问题,以下是步骤
1 查看17045所在的位置
{“id”:17045,“peer_stores”:[3001,14028,3002]}
2 手动移动peer的存储位置
» operator add transfer-region 17045 3001 14028 21001
3 查看状态
» operator check 17045
“status: SUCCESS, operator: admin-move-region {mv peer: store [3002] to [21001]}
(kind:admin,region, region:17045(233,86), createAt:2022-03-02 22:53:43.006584448 +0800 CST m=+79957.592366764,
startAt:2022-03-02 22:53:43.006820663 +0800 CST m=+79957.592602993, currentStep:4, steps:[add learner peer 26007
on store 21001, use joint consensus, promote learner peer 26007 on store 21001 to voter, demote voter peer 17047 on store 3002 to learner,
leave joint state, promote learner peer 26007 on store 21001 to voter, demote voter peer 17047 on store 3002 to learner, remove peer on store 3002])
finished”

4 查看17045 peer的位置
{“id”:17045,“peer_stores”:[3001,14028,3002]}

5 过了一段时间发现
» operator check 17045
[500] “operator not found”

6 再次检查17045 peer的位置
{“id”:17045,“peer_stores”:[3001,14028,3002]}

问题
1 operator check 的steps 是指 输出中steps[]中以逗号分隔的几个步骤么?
2 在我的步骤前提下问什么operator check一直卡在step 4?
3 为什么最后operator check是空了 17045 的peer也没有移动成功呢?

以下是相关日志
、、、、、、、、、、、、、、、、、、、、、
[2022/03/02 23:10:00.915 +08:00] [INFO] [apply.rs:1383] [“execute admin command”] [command=“cmd_type: ChangePeer change_peer { change_type: AddLearnerNode peer { id: 26009 store_id: 21001 role: Learner } }”] [index=4656] [term=6] [peer_id=17046] [region_id=17045]
[2022/03/02 23:10:00.932 +08:00] [INFO] [apply.rs:1755] [“exec ConfChange”] [epoch=“conf_ver: 98 version: 233”] [type=AddLearner] [peer_id=17046] [region_id=17045]
[2022/03/02 23:10:00.932 +08:00] [INFO] [apply.rs:1895] [“add learner successfully”] [region=“id: 17045 start_key: 7480000000000000FFE85F728000000000FF1923130000000000FA end_key: 7480000000000000FFE85F728000000000FF1F72270000000000FA region_epoch { conf_ver: 98 version: 233 } peers { id: 17046 store_id: 3001 } peers { id: 17048 store_id: 14028 } peers { id: 26008 store_id: 3002 }”] [peer=“id: 26009 store_id: 21001 role: Learner”] [peer_id=17046] [region_id=17045]
[2022/03/02 23:10:00.949 +08:00] [INFO] [raft.rs:2609] [“switched to configuration”] [config=“Configuration { voters: Configuration { incoming: Configuration { voters: {17048, 26008, 17046} }, outgoing: Configuration { voters: {} } }, learners: {26009}, learners_next: {}, auto_leave: false }”] [raft_id=17046] [region_id=17045]
[2022/03/02 23:10:03.912 +08:00] [INFO] [apply.rs:1383] [“execute admin command”] [command=“cmd_type: ChangePeerV2 change_peer_v2 { changes { peer { id: 26009 store_id: 21001 } } changes { change_type: AddLearnerNode peer { id: 26008 store_id: 3002 role: Learner } } }”] [index=4657] [term=6] [peer_id=17046] [region_id=17045]
[2022/03/02 23:10:03.927 +08:00] [INFO] [apply.rs:1936] [“exec ConfChangeV2”] [epoch=“conf_ver: 99 version: 233”] [kind=EnterJoint] [peer_id=17046] [region_id=17045]
[2022/03/02 23:10:03.928 +08:00] [INFO] [apply.rs:2116] [“conf change successfully”] [“current region”=“id: 17045 start_key: 7480000000000000FFE85F728000000000FF1923130000000000FA end_key: 7480000000000000FFE85F728000000000FF1F72270000000000FA region_epoch { conf_ver: 101 version: 233 } peers { id: 17046 store_id: 3001 } peers { id: 17048 store_id: 14028 } peers { id: 26008 store_id: 3002 role: DemotingVoter } peers { id: 26009 store_id: 21001 role: IncomingVoter }”] [“original region”=“id: 17045 start_key: 7480000000000000FFE85F728000000000FF1923130000000000FA end_key: 7480000000000000FFE85F728000000000FF1F72270000000000FA region_epoch { conf_ver: 99 version: 233 } peers { id: 17046 store_id: 3001 } peers { id: 17048 store_id: 14028 } peers { id: 26008 store_id: 3002 } peers { id: 26009 store_id: 21001 role: Learner }”] [changes="[peer { id: 26009 store_id: 21001 }, change_type: AddLearnerNode peer { id: 26008 store_id: 3002 role: Learner }]"] [peer_id=17046] [region_id=17045]
[2022/03/02 23:10:03.939 +08:00] [INFO] [apply.rs:1383] [“execute admin command”] [command=“cmd_type: ChangePeerV2 change_peer_v2 {}”] [index=4658] [term=6] [peer_id=17046] [region_id=17045]
[2022/03/02 23:10:03.939 +08:00] [INFO] [apply.rs:1936] [“exec ConfChangeV2”] [epoch=“conf_ver: 101 version: 233”] [kind=LeaveJoint] [peer_id=17046] [region_id=17045]
[2022/03/02 23:10:03.939 +08:00] [INFO] [apply.rs:2146] [“leave joint state successfully”] [region=“id: 17045 start_key: 7480000000000000FFE85F728000000000FF1923130000000000FA end_key: 7480000000000000FFE85F728000000000FF1F72270000000000FA region_epoch { conf_ver: 103 version: 233 } peers { id: 17046 store_id: 3001 } peers { id: 17048 store_id: 14028 } peers { id: 26008 store_id: 3002 role: Learner } peers { id: 26009 store_id: 21001 }”] [peer_id=17046] [region_id=17045]
[2022/03/02 23:10:03.939 +08:00] [INFO] [apply.rs:1383] [“execute admin command”] [command=“cmd_type: ChangePeer change_peer { change_type: RemoveNode peer { id: 26008 store_id: 3002 role: Learner } }”] [index=4659] [term=6] [peer_id=17046] [region_id=17045]
[2022/03/02 23:10:03.939 +08:00] [INFO] [apply.rs:1755] [“exec ConfChange”] [epoch=“conf_ver: 103 version: 233”] [type=RemoveNode] [peer_id=17046] [region_id=17045]
[2022/03/02 23:10:03.940 +08:00] [INFO] [apply.rs:1863] [“remove peer successfully”] [region=“id: 17045 start_key: 7480000000000000FFE85F728000000000FF1923130000000000FA end_key: 7480000000000000FFE85F728000000000FF1F72270000000000FA region_epoch { conf_ver: 103 version: 233 } peers { id: 17046 store_id: 3001 } peers { id: 17048 store_id: 14028 } peers { id: 26008 store_id: 3002 role: Learner } peers { id: 26009 store_id: 21001 }”] [peer=“id: 26008 store_id: 3002 role: Learner”] [peer_id=17046] [region_id=17045]
[2022/03/02 23:10:03.950 +08:00] [INFO] [raft.rs:2609] [“switched to configuration”] [config=“Configuration { voters: Configuration { incoming: Configuration { voters: {17048, 26009, 17046} }, outgoing: Configuration { voters: {17048, 26008, 17046} } }, learners: {}, learners_next: {26008}, auto_leave: false }”] [raft_id=17046] [region_id=17045]
[2022/03/02 23:10:03.955 +08:00] [INFO] [raft.rs:2609] [“switched to configuration”] [config=“Configuration { voters: Configuration { incoming: Configuration { voters: {17048, 26009, 17046} }, outgoing: Configuration { voters: {} } }, learners: {26008}, learners_next: {}, auto_leave: false }”] [raft_id=17046] [region_id=17045]
[2022/03/02 23:10:03.955 +08:00] [INFO] [raft.rs:2609] [“switched to configuration”] [config=“Configuration { voters: Configuration { incoming: Configuration { voters: {17048, 26009, 17046} }, outgoing: Configuration { voters: {} } }, learners: {}, learners_next: {}, auto_leave: false }”] [raft_id=17046] [region_id=17045]
[2022/03/02 23:10:04.554 +08:00] [INFO] [apply.rs:1383] [“execute admin command”] [command=“cmd_type: ChangePeer change_peer { change_type: AddLearnerNode peer { id: 26010 store_id: 3002 role: Learner } }”] [index=4660] [term=6] [peer_id=17046] [region_id=17045]
[2022/03/02 23:10:04.554 +08:00] [INFO] [apply.rs:1755] [“exec ConfChange”] [epoch=“conf_ver: 104 version: 233”] [type=AddLearner] [peer_id=17046] [region_id=17045]
[2022/03/02 23:10:04.554 +08:00] [INFO] [apply.rs:1895] [“add learner successfully”] [region=“id: 17045 start_key: 7480000000000000FFE85F728000000000FF1923130000000000FA end_key: 7480000000000000FFE85F728000000000FF1F72270000000000FA region_epoch { conf_ver: 104 version: 233 } peers { id: 17046 store_id: 3001 } peers { id: 17048 store_id: 14028 } peers { id: 26009 store_id: 21001 }”] [peer=“id: 26010 store_id: 3002 role: Learner”] [peer_id=17046] [region_id=17045]
[2022/03/02 23:10:04.554 +08:00] [INFO] [raft.rs:2609] [“switched to configuration”] [config=“Configuration { voters: Configuration { incoming: Configuration { voters: {17048, 26009, 17046} }, outgoing: Configuration { voters: {} } }, learners: {26010}, learners_next: {}, auto_leave: false }”] [raft_id=17046] [region_id=17045]
[2022/03/02 23:10:06.924 +08:00] [INFO] [apply.rs:1383] [“execute admin command”] [command=“cmd_type: ChangePeerV2 change_peer_v2 { changes { peer { id: 26010 store_id: 3002 } } changes { change_type: AddLearnerNode peer { id: 26009 store_id: 21001 role: Learner } } }”] [index=4661] [term=6] [peer_id=17046] [region_id=17045]
[2022/03/02 23:10:06.935 +08:00] [INFO] [apply.rs:1936] [“exec ConfChangeV2”] [epoch=“conf_ver: 105 version: 233”] [kind=EnterJoint] [peer_id=17046] [region_id=17045]
[2022/03/02 23:10:06.943 +08:00] [INFO] [apply.rs:2116] [“conf change successfully”] [“current region”=“id: 17045 start_key: 7480000000000000FFE85F728000000000FF1923130000000000FA end_key: 7480000000000000FFE85F728000000000FF1F72270000000000FA region_epoch { conf_ver: 107 version: 233 } peers { id: 17046 store_id: 3001 } peers { id: 17048 store_id: 14028 } peers { id: 26009 store_id: 21001 role: DemotingVoter } peers { id: 26010 store_id: 3002 role: IncomingVoter }”] [“original region”=“id: 17045 start_key: 7480000000000000FFE85F728000000000FF1923130000000000FA end_key: 7480000000000000FFE85F728000000000FF1F72270000000000FA region_epoch { conf_ver: 105 version: 233 } peers { id: 17046 store_id: 3001 } peers { id: 17048 store_id: 14028 } peers { id: 26009 store_id: 21001 } peers { id: 26010 store_id: 3002 role: Learner }”] [changes="[peer { id: 26010 store_id: 3002 }, change_type: AddLearnerNode peer { id: 26009 store_id: 21001 role: Learner }]"] [peer_id=17046] [region_id=17045]
[2022/03/02 23:10:06.947 +08:00] [INFO] [apply.rs:1383] [“execute admin command”] [command=“cmd_type: ChangePeerV2 change_peer_v2 {}”] [index=4662] [term=6] [peer_id=17046] [region_id=17045]
[2022/03/02 23:10:06.948 +08:00] [INFO] [apply.rs:1936] [“exec ConfChangeV2”] [epoch=“conf_ver: 107 version: 233”] [kind=LeaveJoint] [peer_id=17046] [region_id=17045]
[2022/03/02 23:10:06.948 +08:00] [INFO] [apply.rs:2146] [“leave joint state successfully”] [region=“id: 17045 start_key: 7480000000000000FFE85F728000000000FF1923130000000000FA end_key: 7480000000000000FFE85F728000000000FF1F72270000000000FA region_epoch { conf_ver: 109 version: 233 } peers { id: 17046 store_id: 3001 } peers { id: 17048 store_id: 14028 } peers { id: 26009 store_id: 21001 role: Learner } peers { id: 26010 store_id: 3002 }”] [peer_id=17046] [region_id=17045]
[2022/03/02 23:10:06.955 +08:00] [INFO] [apply.rs:1383] [“execute admin command”] [command=“cmd_type: ChangePeer change_peer { change_type: RemoveNode peer { id: 26009 store_id: 21001 role: Learner } }”] [index=4663] [term=6] [peer_id=17046] [region_id=17045]
[2022/03/02 23:10:06.955 +08:00] [INFO] [apply.rs:1755] [“exec ConfChange”] [epoch=“conf_ver: 109 version: 233”] [type=RemoveNode] [peer_id=17046] [region_id=17045]
[2022/03/02 23:10:06.957 +08:00] [INFO] [apply.rs:1863] [“remove peer successfully”] [region=“id: 17045 start_key: 7480000000000000FFE85F728000000000FF1923130000000000FA end_key: 7480000000000000FFE85F728000000000FF1F72270000000000FA region_epoch { conf_ver: 109 version: 233 } peers { id: 17046 store_id: 3001 } peers { id: 17048 store_id: 14028 } peers { id: 26009 store_id: 21001 role: Learner } peers { id: 26010 store_id: 3002 }”] [peer=“id: 26009 store_id: 21001 role: Learner”] [peer_id=17046] [region_id=17045]
[2022/03/02 23:10:06.961 +08:00] [INFO] [raft.rs:2609] [“switched to configuration”] [config=“Configuration { voters: Configuration { incoming: Configuration { voters: {17048, 26010, 17046} }, outgoing: Configuration { voters: {17048, 26009, 17046} } }, learners: {}, learners_next: {26009}, auto_leave: false }”] [raft_id=17046] [region_id=17045]
[2022/03/02 23:10:06.964 +08:00] [INFO] [raft.rs:2609] [“switched to configuration”] [config=“Configuration { voters: Configuration { incoming: Configuration { voters: {17048, 26010, 17046} }, outgoing: Configuration { voters: {} } }, learners: {26009}, learners_next: {}, auto_leave: false }”] [raft_id=17046] [region_id=17045]
[2022/03/02 23:10:06.965 +08:00] [INFO] [raft.rs:2609] [“switched to configuration”] [config=“Configuration { voters: Configuration { incoming: Configuration { voters: {17048, 26010, 17046} }, outgoing: Configuration { voters: {} } }, learners: {}, learners_next: {}, auto_leave: false }”] [raft_id=17046] [region_id=17045]
[2022/03/02 23:12:09.331 +08:00] [INFO] [util.rs:544] [“connecting to PD endpoint”] [endpoints=http://192.168.135.134:2379]
[2022/03/02 23:12:09.363 +08:00] [INFO] [] [“New connected subchannel at 0x7f8bdd858830 for subchannel 0x7f8be7799c00”]

拿一下 PD leader log 日志,截取一下执行操作时间前后的日志,看一下为啥没有执行成功.

没有发现啥太明显的报错

没看出什么来


刚刚又做了一下直到check没有返回 也没看到明显的error,但是peer最后还是没有移动

我理解你的想做的操作是将 region id 17045 在 store id 3002 的 peer 手动迁移到 store id 21001,其他的 peer 的 store 位置不变.那么你需要进行的操作应该是通过 operator add transfer-peer 操作.

operator add transfer-peer 17045 3002 21001

把 Region 17045 在 store 3002 上的副本调度到 store 21001

参考一下 https://docs.pingcap.com/zh/tidb/v5.2/pd-control#operator-check--show--add--remove

试了一下 peer 看到了移动的过程 但是最后又自己回去了,下边是我截取的日志,有一个奇怪的cancel的动作,不知道什么引起的?

这个region的变动经过
{“id”:19001,“peer_stores”:[3001,26071,3002]}
{“id”:19001,“peer_stores”:[3001,26071,3002,21001]}
{“id”:19001,“peer_stores”:[3001,26071,21001]}
{“id”:19001,“peer_stores”:[3001,26071,3002]}

operator="“admin-move-peer
operator=”"balance-region
我看见自动调动回去的时候的operator是balance-region 这个是因为tidb自己的调度算法又给他balance回去了么

报错是 canceled ,可能的原因比较多,可以看看对应的 store id 3002 和 21001 的当时这个时间点的 tikv log 日志

又试了几次,我觉的原因应该是由于tikv节点打分造成的pd节点自动触发了balance的情况。
下图是上边截图的报错中balance的附加信息

这张图是另外做的transpeer,这一次并没有出现cancel的动作但是还是自动进行了balance
也可以看到附加的score的信息,由于存在score小于source的target,所以进行了balance。tikv节点上也并没有发现报错。不知道我这么理解对不对

应该是 balance 的机制让 region balance 回到原来的 store id.想了解一下进行 peer transfer store 的目的是什么?热点问题 transfer leader 调整 store weight 可以达到目的.如果是存储分布的需要,可以参考一下 placement rule 的功能,可能会帮到你调整到理想的数据存储分布.

1 个赞

谢谢,之前没有用过这个功能,是想测试一下看看。
嗯嗯 后续还得加强学习,多谢您的帮助

此话题已在最后回复的 1 分钟后被自动关闭。不再允许新回复。