hdd KV 的IO一直百分百

可否上传下 io until 和 cpu / load 的监控图看下。

关于 region count 迟迟不为 0,辛苦上传

  1. pd-ctl store 的信息,
  2. offline tikv 节点的 tikv log 我们看下是什么原因能导致的。
  3. region health 监控截图辛苦也上传下。

{
“count”: 5,
“stores”: [
{
“store”: {
“id”: 90,
“address”: “172.25.10.72:3930”,
“labels”: [
{
“key”: “engine”,
“value”: “tiflash”
}
],
“version”: “v4.0.0”,
“peer_address”: “172.25.10.72:20170”,
“status_address”: “172.25.10.72:20292”,
“git_hash”: “c51c2c5c18860aaef3b5853f24f8e9cefea167eb”,
“start_timestamp”: 1593651770,
“deploy_path”: “/data/tools/tiup/tiflash-9000/bin/tiflash”,
“last_heartbeat”: 1595223356779173081,
“state_name”: “Up”
},
“status”: {
“capacity”: “1023GiB”,
“available”: “989.7GiB”,
“used_size”: “29.97KiB”,
“leader_count”: 0,
“leader_weight”: 1,
“leader_score”: 0,
“leader_size”: 0,
“region_count”: 0,
“region_weight”: 1,
“region_score”: 0,
“region_size”: 0,
“start_ts”: “2020-07-02T09:02:50+08:00”,
“last_heartbeat_ts”: “2020-07-20T13:35:56.779173081+08:00”,
“uptime”: “436h33m6.779173081s”
}
},
{
“store”: {
“id”: 130237,
“address”: “172.25.10.68:20160”,
“state”: 1,
“version”: “4.0.0”,
“status_address”: “172.25.10.68:20180”,
“git_hash”: “198a2cea01734ce8f46d55a29708f123f9133944”,
“start_timestamp”: 1593765636,
“deploy_path”: “/data/tools/tiup/tikv-20160/bin”,
“last_heartbeat”: 1595223358298474464,
“state_name”: “Offline”
},
“status”: {
“capacity”: “1023GiB”,
“available”: “1011GiB”,
“used_size”: “493.7MiB”,
“leader_count”: 0,
“leader_weight”: 1,
“leader_score”: 0,
“leader_size”: 0,
“region_count”: 35,
“region_weight”: 1,
“region_score”: 2872,
“region_size”: 2872,
“start_ts”: “2020-07-03T16:40:36+08:00”,
“last_heartbeat_ts”: “2020-07-20T13:35:58.298474464+08:00”,
“uptime”: “404h55m22.298474464s”
}
},
{
“store”: {
“id”: 2,
“address”: “172.25.10.71:20160”,
“version”: “4.0.0”,
“status_address”: “172.25.10.71:20180”,
“git_hash”: “198a2cea01734ce8f46d55a29708f123f9133944”,
“start_timestamp”: 1593651804,
“deploy_path”: “/data/tools/tiup/tikv-20160/bin”,
“last_heartbeat”: 1595223354359962263,
“state_name”: “Up”
},
“status”: {
“capacity”: “1.999TiB”,
“available”: “1.973TiB”,
“used_size”: “16.09GiB”,
“leader_count”: 500,
“leader_weight”: 1,
“leader_score”: 500,
“leader_size”: 37214,
“region_count”: 1506,
“region_weight”: 1,
“region_score”: 111765,
“region_size”: 111765,
“start_ts”: “2020-07-02T09:03:24+08:00”,
“last_heartbeat_ts”: “2020-07-20T13:35:54.359962263+08:00”,
“uptime”: “436h32m30.359962263s”
}
},
{
“store”: {
“id”: 7,
“address”: “172.25.10.69:20160”,
“version”: “4.0.0”,
“status_address”: “172.25.10.69:20180”,
“git_hash”: “198a2cea01734ce8f46d55a29708f123f9133944”,
“start_timestamp”: 1593651783,
“deploy_path”: “/data/tools/tiup/tikv-20160/bin”,
“last_heartbeat”: 1595223349487525716,
“state_name”: “Up”
},
“status”: {
“capacity”: “1.999TiB”,
“available”: “1.973TiB”,
“used_size”: “16.09GiB”,
“leader_count”: 507,
“leader_weight”: 1,
“leader_score”: 507,
“leader_size”: 37694,
“region_count”: 1506,
“region_weight”: 1,
“region_score”: 111765,
“region_size”: 111765,
“start_ts”: “2020-07-02T09:03:03+08:00”,
“last_heartbeat_ts”: “2020-07-20T13:35:49.487525716+08:00”,
“uptime”: “436h32m46.487525716s”
}
},
{
“store”: {
“id”: 1,
“address”: “172.25.10.70:20160”,
“version”: “4.0.0”,
“status_address”: “172.25.10.70:20180”,
“git_hash”: “198a2cea01734ce8f46d55a29708f123f9133944”,
“start_timestamp”: 1593651799,
“deploy_path”: “/data/tools/tiup/tikv-20160/bin”,
“last_heartbeat”: 1595223350135330766,
“state_name”: “Up”
},
“status”: {
“capacity”: “1.999TiB”,
“available”: “1.973TiB”,
“used_size”: “16.21GiB”,
“leader_count”: 499,
“leader_weight”: 1,
“leader_score”: 499,
“leader_size”: 36857,
“region_count”: 1506,
“region_weight”: 1,
“region_score”: 111765,
“region_size”: 111765,
“start_ts”: “2020-07-02T09:03:19+08:00”,
“last_heartbeat_ts”: “2020-07-20T13:35:50.135330766+08:00”,
“uptime”: “436h32m31.135330766s”
}
}
]
}

hello,这两个信息希望提供一下

region health 的监控是在哪个路径下的? 日志的要稍等,在等服务器的权限

overview - pd - region health

好的,

[2020/07/20 11:45:25.246 +08:00] [INFO] [gc_manager.rs:416] [“gc_worker: start auto gc”] [safe_point=418176287522160640]
[2020/07/20 11:45:25.246 +08:00] [INFO] [gc_manager.rs:456] [“gc_worker: finished auto gc”] [processed_regions=0]
[2020/07/20 11:55:25.261 +08:00] [INFO] [gc_manager.rs:416] [“gc_worker: start auto gc”] [safe_point=418176444808560640]
[2020/07/20 11:55:25.261 +08:00] [INFO] [gc_manager.rs:456] [“gc_worker: finished auto gc”] [processed_regions=0]
[2020/07/20 12:05:25.277 +08:00] [INFO] [gc_manager.rs:416] [“gc_worker: start auto gc”] [safe_point=418176602094960640]
[2020/07/20 12:05:25.277 +08:00] [INFO] [gc_manager.rs:456] [“gc_worker: finished auto gc”] [processed_regions=0]
[2020/07/20 12:15:25.291 +08:00] [INFO] [gc_manager.rs:416] [“gc_worker: start auto gc”] [safe_point=418176759381360640]
[2020/07/20 12:15:25.291 +08:00] [INFO] [gc_manager.rs:456] [“gc_worker: finished auto gc”] [processed_regions=0]
[2020/07/20 12:25:25.309 +08:00] [INFO] [gc_manager.rs:416] [“gc_worker: start auto gc”] [safe_point=418176916667760640]
[2020/07/20 12:25:25.309 +08:00] [INFO] [gc_manager.rs:456] [“gc_worker: finished auto gc”] [processed_regions=0]
[2020/07/20 12:35:25.325 +08:00] [INFO] [gc_manager.rs:416] [“gc_worker: start auto gc”] [safe_point=418177073954160640]
[2020/07/20 12:35:25.325 +08:00] [INFO] [gc_manager.rs:456] [“gc_worker: finished auto gc”] [processed_regions=0]
[2020/07/20 12:45:25.340 +08:00] [INFO] [gc_manager.rs:416] [“gc_worker: start auto gc”] [safe_point=418177231240560640]
[2020/07/20 12:45:25.340 +08:00] [INFO] [gc_manager.rs:456] [“gc_worker: finished auto gc”] [processed_regions=0]
[2020/07/20 12:55:25.354 +08:00] [INFO] [gc_manager.rs:416] [“gc_worker: start auto gc”] [safe_point=418177388526960640]
[2020/07/20 12:55:25.354 +08:00] [INFO] [gc_manager.rs:456] [“gc_worker: finished auto gc”] [processed_regions=0]
[2020/07/20 13:05:25.368 +08:00] [INFO] [gc_manager.rs:416] [“gc_worker: start auto gc”] [safe_point=418177545813360640]
[2020/07/20 13:05:25.368 +08:00] [INFO] [gc_manager.rs:456] [“gc_worker: finished auto gc”] [processed_regions=0]
[2020/07/20 13:15:25.382 +08:00] [INFO] [gc_manager.rs:416] [“gc_worker: start auto gc”] [safe_point=418177703099760640]
[2020/07/20 13:15:25.382 +08:00] [INFO] [gc_manager.rs:456] [“gc_worker: finished auto gc”] [processed_regions=0]
[2020/07/20 13:25:25.398 +08:00] [INFO] [gc_manager.rs:416] [“gc_worker: start auto gc”] [safe_point=418177860386160640]
[2020/07/20 13:25:25.398 +08:00] [INFO] [gc_manager.rs:456] [“gc_worker: finished auto gc”] [processed_regions=0]
[2020/07/20 13:35:25.413 +08:00] [INFO] [gc_manager.rs:416] [“gc_worker: start auto gc”] [safe_point=418178017672560640]
[2020/07/20 13:35:25.413 +08:00] [INFO] [gc_manager.rs:456] [“gc_worker: finished auto gc”] [processed_regions=0]
[2020/07/20 13:45:25.429 +08:00] [INFO] [gc_manager.rs:416] [“gc_worker: start auto gc”] [safe_point=418178174958960640]
[2020/07/20 13:45:25.429 +08:00] [INFO] [gc_manager.rs:456] [“gc_worker: finished auto gc”] [processed_regions=0]
[2020/07/20 13:55:25.444 +08:00] [INFO] [gc_manager.rs:416] [“gc_worker: start auto gc”] [safe_point=418178332245360640]
[2020/07/20 13:55:25.445 +08:00] [INFO] [gc_manager.rs:456] [“gc_worker: finished auto gc”] [processed_regions=0]
[2020/07/20 14:05:25.459 +08:00] [INFO] [gc_manager.rs:416] [“gc_worker: start auto gc”] [safe_point=418178489531760640]
[2020/07/20 14:05:25.459 +08:00] [INFO] [gc_manager.rs:456] [“gc_worker: finished auto gc”] [processed_regions=0]

你好,

目前提供的日志均为 gc 日志,看是否可以反馈下完整的 tikv 的日志呢。除了 gc 相关的也可,上传附件。

关于调度方面我们需要拿下 pd 监控中,pd - scheduler 监控面板的数据,辛苦反馈下。

[2020/07/04 16:40:35.964 +08:00] [INFO] [pd.rs:795] [“try to transfer leader”] [to_peer=“id: 262313 store_id: 2”] [from_peer=“id: 262496 store_id: 130237”] [region_id=56638]
[2020/07/04 16:40:35.964 +08:00] [INFO] [peer.rs:1864] [“transfer leader”] [peer=“id: 262313 store_id: 2”] [peer_id=262496] [region_id=56638]
[2020/07/04 16:40:35.964 +08:00] [INFO] [raft.rs:1376] [“[term 825] starts to transfer leadership to 262313”] [lead_transferee=262313] [term=825] [raft_id=262496] [region_id=56638]
[2020/07/04 16:40:35.964 +08:00] [INFO] [raft.rs:1389] [“sends MsgTimeoutNow to 262313 immediately as 262313 already has up-to-date log”] [lead_transferee=262313] [raft_id=262496] [region_id=56638]
[2020/07/04 16:40:35.965 +08:00] [INFO] [raft.rs:1003] [“received a message with higher term from 262313”] [“msg type”=MsgRequestVote] [message_term=826] [term=825] [from=262313] [raft_id=262496] [region_id=56638]
[2020/07/04 16:40:35.965 +08:00] [INFO] [raft.rs:783] [“became follower at term 826”] [term=826] [raft_id=262496] [region_id=56638]
[2020/07/04 16:40:35.965 +08:00] [INFO] [raft.rs:1192] [“[logterm: 825, index: 3991818, vote: 0] cast vote for 262313 [logterm: 825, index: 3991818] at term 826”] [“msg type”=MsgRequestVote] [term=826] [msg_index=3991818] [msg_term=825] [from=262313] [vote=0] [log_index=3991818] [log_term=825] [raft_id=262496] [region_id=56638]
[2020/07/04 16:40:35.969 +08:00] [INFO] [apply.rs:1142] [“execute admin command”] [command=“cmd_type: ChangePeer change_peer { change_type: RemoveNode peer { id: 262496 store_id: 130237 } }”] [index=3991820] [term=826] [peer_id=262496] [region_id=56638]
[2020/07/04 16:40:35.969 +08:00] [INFO] [apply.rs:1467] [“exec ConfChange”] [epoch=“conf_ver: 2581 version: 159”] [type=RemoveNode] [peer_id=262496] [region_id=56638]
[2020/07/04 16:40:35.969 +08:00] [INFO] [apply.rs:1575] [“remove peer successfully”] [region=“id: 56638 start_key: 7480000000000001FF655F698000000000FF0000010380000000FF06150F2203800000FF0024B20A61000000FC end_key: 7480000000000001FF655F698000000000FF0000020130334843FF42303039FF303030FF3041474132FF3530FF303030353232FF00FF00000000000000F7FF038000000000ED57FF7400000000000000F8 region_epoch { conf_ver: 2581 version: 159 } peers { id: 262313 store_id: 2 } peers { id: 262496 store_id: 130237 } peers { id: 263042 store_id: 7 } peers { id: 263266 store_id: 1 }”] [peer=“id: 262496 store_id: 130237”] [peer_id=262496] [region_id=56638]
[2020/07/04 16:40:35.969 +08:00] [INFO] [router.rs:256] [“[region 56638] shutdown mailbox”]
[2020/07/04 16:40:35.970 +08:00] [INFO] [peer.rs:1415] [“starts destroy”] [merged_by_target=false] [peer_id=262496] [region_id=56638]
[2020/07/04 16:40:35.970 +08:00] [INFO] [peer.rs:461] [“begin to destroy”] [peer_id=262496] [region_id=56638]
[2020/07/04 16:40:35.970 +08:00] [INFO] [pd.rs:868] [“remove peer statistic record in pd”] [region_id=56638]
[2020/07/04 16:40:36.111 +08:00] [INFO] [peer_storage.rs:1417] [“finish clear peer meta”] [takes=141.236643ms] [raft_logs=188] [raft_key=1] [apply_key=1] [meta_key=1] [region_id=56638]
[2020/07/04 16:40:36.408 +08:00] [INFO] [peer.rs:502] [“peer destroy itself”] [takes=438.212109ms] [peer_id=262496] [region_id=56638]
[2020/07/04 16:40:36.408 +08:00] [INFO] [router.rs:256] [“[region 56638] shutdown mailbox”]
[2020/07/04 16:40:36.408 +08:00] [INFO] [region.rs:473] [“register deleting data in range”] [end_key=7A7480000000000001FF655F698000000000FF0000020130334843FF42303039FF303030FF3041474132FF3530FF303030353232FF00FF00000000000000F7FF038000000000ED57FF7400000000000000F8] [start_key=7A7480000000000001FF655F698000000000FF0000010380000000FF06150F2203800000FF0024B20A61000000FC] [region_id=56638]
[2020/07/04 16:40:36.466 +08:00] [INFO] [peer.rs:181] [“replicate peer”] [peer_id=263275] [region_id=259590]

你好,scheduler 下面关于 region 的监控面板辛苦截图看下,tikv.log 是否可以上传下附件,我们需要更多的日志

tikv.log (31.0 KB)

监控中调度是正常运行的,根据当前 pd-ctl config show 中 region-schedule-limit 的信息,适当调高,看下是否可以加速 region 调度,观察一下,如果不 ok,再反馈下 tikv.log

现在的问题是region count越来越多了。

pd-ctl config show all 、 store,的信息
tikv.log pd.log 辛苦也给一下,这边判断是什么原因。

如果想要快速下线,可以将 tikv 服务器 shutdown 即可(kill -9 也可),30 min 后该 tikv 会被踢出 cluster,同时会将元信息上报 pd。

hi,
这边研发同学确认了一下,该问题在 v4.0.2 已经修复。该问题的愿意在下面 issue 中有描述

工具链接:
https://docs.pingcap.com/zh/tidb/stable/pd-control

issue:

相关帖子:

此话题已在最后回复的 1 分钟后被自动关闭。不再允许新回复。