版本号:
v4.0.0-rc.2
执行操作:
tiup cluster scale-in <cluster-name> --node 10.0.1.4:9000
问题:
1、tikv下线节点一直处于offline状态,即使没有业务了,
2、当有新的业务进来,此offline tikv节点还是会同步接收,pd监控上的offline-peer-region-count变多
版本号:
v4.0.0-rc.2
执行操作:
tiup cluster scale-in <cluster-name> --node 10.0.1.4:9000
问题:
1、tikv下线节点一直处于offline状态,即使没有业务了,
2、当有新的业务进来,此offline tikv节点还是会同步接收,pd监控上的offline-peer-region-count变多
你好,
可以根据此贴排查一下,目前 4.0 ga 已经发布,建议升级到 ga 版本。
谢谢回答
1、是tikv磁盘容量低于20%,造成不能清理
2、offline-peer-region是减少了,但是最后还剩10个offline-peer-region,不能够被清理,extra-peer-region和learn-peer-region也是10个
目前下线的tikv还是处于offline状态,pd监控里面的offline-peer-region还是10,extra和learn也一样
你好,
所以目前集群 tikv 节点是否证磁盘空间大于 20% 了呢
pd-ctl stroe 看下 offline 节点下线进度吧,可以使用 transfer region/leader 来快速迁移。
1、是大于20%
2、operator show中没有在迁移offline节点的region,offline-peer-region-count还有9个不被迁移,learn-peer-region-cout和extra-peer-region-count也是9个
感谢反馈,
尝试使用这两个命令快速转移下。
>> operator add transfer-leader 1 2 // 把 Region 1 的 leader 调度到 store 2
>> operator add transfer-region 1 2 3 4 // 把 Region 1 调度到 store 2,3,4
1、region check offline-peer
» region check offline-peer
{
"count": 10,
"regions": [
{
"id": 31889,
"start_key": "7480000000000005FF2F5F728000000006FF7F364E0000000000FA",
"end_key": "7480000000000005FF2F5F728000000006FF86888C0000000000FA",
"epoch": {
"conf_ver": 9,
"version": 920
},
"peers": [
{
"id": 31890,
"store_id": 1
},
{
"id": 31891,
"store_id": 4
},
{
"id": 123995,
"store_id": 116845
},
{
"id": 175128,
"store_id": 5,
"is_learner": true
}
],
"leader": {
"id": 31890,
"store_id": 1
},
"written_bytes": 0,
"read_bytes": 0,
"written_keys": 0,
"read_keys": 0,
"approximate_size": 94,
"approximate_keys": 454572
},
{
"id": 288028,
"start_key": "7480000000000005FF4D5F698000000000FF000006013132332EFF3233352EFF313731FF2E33310000FD03C8FF00000004E5593B00FE",
"end_key": "7480000000000005FF4D5F698000000000FF000006013132332EFF3233352EFF31392EFF3233330000FD03F8FF00000009D24D8200FE",
"epoch": {
"conf_ver": 9,
"version": 881
},
"peers": [
{
"id": 288029,
"store_id": 1
},
{
"id": 288030,
"store_id": 5
},
{
"id": 288031,
"store_id": 116845
},
{
"id": 288032,
"store_id": 4,
"is_learner": true
}
],
"leader": {
"id": 288029,
"store_id": 1
},
"written_bytes": 0,
"read_bytes": 0,
"written_keys": 0,
"read_keys": 0,
"approximate_size": 69,
"approximate_keys": 957290
},
{
"id": 125113,
"start_key": "7480000000000005FF375F698000000000FF0000020155655A46FF4A525273FF755346FF4135493943FF7162FF37636C664163FF42FF52445369484B44FFFF0000000000000000FFF7038000000007B2FF6C5B000000000000F9",
"end_key": "7480000000000005FF375F698000000000FF0000020155677A52FF75313455FF705969FF6141515570FF4F53FF334B4E627675FF44FF6D766C66363630FFFF0000000000000000FFF703800000003FE1FF18A0000000000000F9",
"epoch": {
"conf_ver": 15,
"version": 821
},
"peers": [
{
"id": 125114,
"store_id": 116845
},
{
"id": 125115,
"store_id": 1
},
{
"id": 125116,
"store_id": 5
},
{
"id": 125117,
"store_id": 4,
"is_learner": true
}
],
"leader": {
"id": 125116,
"store_id": 5
},
"written_bytes": 0,
"read_bytes": 0,
"written_keys": 0,
"read_keys": 0,
"approximate_size": 96,
"approximate_keys": 900118
},
{
"id": 126977,
"start_key": "7480000000000005FF4D5F698000000000FF000006013132332EFF3233352EFF31392EFF3233330000FD03F8FF00000009D24D8200FE",
"end_key": "7480000000000005FF4D5F698000000000FF000006013132332EFF3233352EFF313939FF2E34370000FD03F4FF000000011C577600FE",
"epoch": {
"conf_ver": 9,
"version": 881
},
"peers": [
{
"id": 126978,
"store_id": 1
},
{
"id": 126979,
"store_id": 5
},
{
"id": 126980,
"store_id": 116845
},
{
"id": 126981,
"store_id": 4,
"is_learner": true
}
],
"leader": {
"id": 126978,
"store_id": 1
},
"written_bytes": 0,
"read_bytes": 0,
"written_keys": 0,
"read_keys": 0,
"approximate_size": 36,
"approximate_keys": 500935
},
{
"id": 52577,
"start_key": "7480000000000005FF375F698000000000FF000006013132332EFF3233352EFF323132FF2E34390000FD0380FF000000279194A000FE",
"end_key": "7480000000000005FF375F698000000000FF000006013132332EFF3233352EFF323136FF2E32343100FE0380FF000000009FDA7B00FE",
"epoch": {
"conf_ver": 9,
"version": 820
},
"peers": [
{
"id": 52579,
"store_id": 4
},
{
"id": 52580,
"store_id": 5
},
{
"id": 118131,
"store_id": 116845
},
{
"id": 142658,
"store_id": 1,
"is_learner": true
}
],
"leader": {
"id": 52580,
"store_id": 5
},
"written_bytes": 0,
"read_bytes": 0,
"written_keys": 0,
"read_keys": 0,
"approximate_size": 40,
"approximate_keys": 564361
},
{
"id": 127722,
"start_key": "7480000000000005FF4D5F698000000000FF0000010132353037FF31333734FF633336FF3730653765FF3731FF393265613235FF63FF61663263376430FFFF0000000000000000FFF703B0000000124BFF624E000000000000F9",
"end_key": "7480000000000005FF4D5F698000000000FF0000010132353037FF31333734FF633336FF3730653765FF3731FF393265613235FF63FF61663263376430FFFF0000000000000000FFF703B00000001422FF0163000000000000F9",
"epoch": {
"conf_ver": 39,
"version": 888
},
"peers": [
{
"id": 127723,
"store_id": 116845
},
{
"id": 127724,
"store_id": 1
},
{
"id": 127725,
"store_id": 4
},
{
"id": 127726,
"store_id": 5,
"is_learner": true
}
],
"leader": {
"id": 127725,
"store_id": 4
},
"written_bytes": 0,
"read_bytes": 0,
"written_keys": 0,
"read_keys": 0,
"approximate_size": 96,
"approximate_keys": 898800
},
{
"id": 131043,
"start_key": "7480000000000005FF4D5F698000000000FF0000010132353037FF31333734FF633336FF3730653765FF3731FF393265613235FF63FF61663263376430FFFF0000000000000000FFF703D800000016C9FF03C3000000000000F9",
"end_key": "7480000000000005FF4D5F698000000000FF0000010132353037FF31333734FF633336FF3730653765FF3731FF393265613235FF63FF61663263376430FFFF0000000000000000FFF703D8000000188FFF99E7000000000000F9",
"epoch": {
"conf_ver": 90,
"version": 891
},
"peers": [
{
"id": 131044,
"store_id": 116845
},
{
"id": 131045,
"store_id": 5
},
{
"id": 131046,
"store_id": 4
},
{
"id": 131047,
"store_id": 1,
"is_learner": true
}
],
"leader": {
"id": 131046,
"store_id": 4
},
"written_bytes": 0,
"read_bytes": 0,
"written_keys": 0,
"read_keys": 0,
"approximate_size": 96,
"approximate_keys": 898800
},
{
"id": 125123,
"start_key": "7480000000000005FF375F698000000000FF0000060133362E35FF362E3139FF382E31FF3136000000FC0380FF0000000BB1D06D00FE",
"end_key": "7480000000000005FF375F698000000000FF0000060133362E35FF362E3230FF392E37FF3000000000FB0380FF0000002C9F64AB00FE",
"epoch": {
"conf_ver": 12,
"version": 819
},
"peers": [
{
"id": 125124,
"store_id": 1
},
{
"id": 125125,
"store_id": 116845
},
{
"id": 125126,
"store_id": 5
},
{
"id": 125127,
"store_id": 4,
"is_learner": true
}
],
"leader": {
"id": 125124,
"store_id": 1
},
"written_bytes": 0,
"read_bytes": 0,
"written_keys": 0,
"read_keys": 0,
"approximate_size": 70,
"approximate_keys": 970181
},
{
"id": 127387,
"start_key": "7480000000000005FF4D5F72E400000013FF1FD2490000000000FA",
"end_key": "7480000000000005FF4D5F72E400000013FFED84F80000000000FA",
"epoch": {
"conf_ver": 42,
"version": 900
},
"peers": [
{
"id": 127388,
"store_id": 116845
},
{
"id": 127389,
"store_id": 5
},
{
"id": 127390,
"store_id": 4
},
{
"id": 142497,
"store_id": 1,
"is_learner": true
}
],
"leader": {
"id": 127390,
"store_id": 4
},
"written_bytes": 0,
"read_bytes": 0,
"written_keys": 0,
"read_keys": 0,
"approximate_size": 113,
"approximate_keys": 459702
},
{
"id": 283457,
"start_key": "7480000000000005FF375F698000000000FF0000020144496758FF416B3144FF376271FF5650575A31FF7437FF447475717854FF4CFF6F6E704151765AFFFF0000000000000000FFF703800000001706FF53EC000000000000F9",
"end_key": "7480000000000005FF375F698000000000FF00000201444C3668FF71447466FF477166FF586A6D5663FF7375FF66556F357548FF63FF78584E41496158FFFF0000000000000000FFF703800000005044FF66D7000000000000F9",
"epoch": {
"conf_ver": 15,
"version": 819
},
"peers": [
{
"id": 283458,
"store_id": 116845
},
{
"id": 283459,
"store_id": 5
},
{
"id": 283460,
"store_id": 4
},
{
"id": 283461,
"store_id": 1,
"is_learner": true
}
],
"leader": {
"id": 283460,
"store_id": 4
},
"written_bytes": 0,
"read_bytes": 0,
"written_keys": 0,
"read_keys": 0,
"approximate_size": 99,
"approximate_keys": 928008
}
]
}
2、store id
» store 116845
{
"store": {
"id": 116845,
"address": "10.59.111.10:20160",
"state": 1,
"version": "4.0.0-rc.2",
"status_address": "10.59.111.10:20180",
"git_hash": "2fdb2804bf8ffaab4b18c4996970e19906296497",
"start_timestamp": 1591858871,
"deploy_path": "/data/tidb_deploy/tikv-20160/bin",
"last_heartbeat": 1591869043196335312,
"state_name": "Offline"
},
"status": {
"capacity": "200GiB",
"available": "180.8GiB",
"used_size": "344.8MiB",
"leader_count": 0,
"leader_weight": 1,
"leader_score": 0,
"leader_size": 0,
"region_count": 10,
"region_weight": 1,
"region_score": 809,
"region_size": 809,
"start_ts": "2020-06-11T15:01:11+08:00",
"last_heartbeat_ts": "2020-06-11T17:50:43.196335312+08:00",
"uptime": "2h49m32.196335312s"
}
}
3、问题
1、我要下线的是116845节点,但是我发现116845已经没有leader region存在了
2、是不是由于有learn region存在,副本迁移不成功,导致tikv一直处于offline状态?
3、这种要怎么解决?
已经迁移走了,预期的
是由于 region 还没有迁移完成,所以该节点还处于 offline 状态。
如果当前可用的 TiKV 节点的空间充裕,同时 CPU 和 Mem 负载较低,可以考虑增加 replica-schedule-limit
和 region-schedule-limit
增加调度,可以控制同时进行 replica 调度的任务个数。
1、手动迁移,由于有placement rule存在,需要先disable掉,才能执行命令;但由于有tiflash存在,placement rule还不能够disable,无解?
2、operator add transfer-region
» operator add transfer-region 126977 1 4 5
Failed! [500] "transfer region is not supported when placement rules enabled"
»
3、config placement-rules disable
» config placement-rules disable
Failed to set config: [400] "cannot disable placement rules with TiFlash nodes"
»
operator 的问题这边反馈下。
目前剩余的 10个 region 你可以调整参数后继续等待其 transfer 到其他节点
或者
手动关闭 offline tikv ,pd-ctl 中其状态会变为 disconnect,如果 30min 没有连接上该节点, raft 会自动补齐副本,完成后该节点将会变为 tombstone。
+1 大概是一样的问题。tidb 版本 v4.0.0
不过我这边是把tikv节点 暴力破坏掉,该节点 tikv-server一直没起来。导致该节点为”down“。
这边做的是缩容 再扩容的操作。
缩容后,该tikv节点 在 pd-ctl中显示为offline。看到有operator尝试 搬region,出现的几个region一直在重复,应该是搬不了。
等半小时后 看看效果…
补充下,有点尴尬了。
使用tiup display 看到 被破坏掉的tikv节点已经消失了,它的部署目录 也被tiup删掉了。
通过pd-ctl 发现 store id 还在,状态还是offline,region-count没有变化。同时,operator 还在尝试搬对应的region。
数据都没了…应该 是bug 吧。
现在,需要我手动删除对应的store id & operator 嘛?
你好,
正常的下线流程就是这样,不知 试过了
指的是什么,帖子中只有调整参数,可以继续等待,或者手动下线,如果手动下线失败,可以详细描述下。
1、有一点是,为什么learner状态一直变不了follow状态,并且变不了follow状态还不被自动删除?
第一点,由于有learn region存在,所以没用
如果说的是 peer 中 is_learner: true,可以将其理解为,该 peer 将不会参加 leader 的选举与投票,在 offline 节点上,所以他也不会变成 follower 除非通过 api 将其状态置位 UP,节点下线是由调度控制的,方式和加速的方式上面已经提到。
不是在offline 节点上,可能是之前由于需要接收offline tikv上的副本,创建的,但始终还是处于learner 状态
116845是要offline的节点,store 1上的region是处于learner状态
"peers": [
{
"id": 127388,
"store_id": 116845
},
{
"id": 127389,
"store_id": 5
},
{
"id": 127390,
"store_id": 4
},
{
"id": 142497,
"store_id": 1,
"is_learner": true
}
],