tikv cpu很高降下来

为提高效率,提问时请提供以下信息,问题描述清晰可优先响应。

  • 【TiDB 版本】:4.0.2
  • 【问题描述】:tikv cpu降不下来,然后leader集中在第一个节点和第二个节点,第三个节点没有leader。

若提问为性能优化、故障排查类问题,请下载脚本运行。终端输出打印结果,请务必全选并复制粘贴上传。

麻烦使用 pd-ctl 执行一下show show all , store 看下结果

[tidb@iot-feisuo-tidb-003 ~]$ tiup ctl pd -u 172.16.5.13:2379 config show all
Starting component ctl: /home/tidb/.tiup/components/ctl/v4.0.2/ctl pd -u 172.16.5.13:2379 config show all
{
“client-urls”: “http://0.0.0.0:2379”,
“peer-urls”: “http://172.16.5.8:2380”,
“advertise-client-urls”: “http://172.16.5.8:2379”,
“advertise-peer-urls”: “http://172.16.5.8:2380”,
“name”: “pd-172.16.5.8-2379”,
“data-dir”: “/data/tidb/pd-2379”,
“force-new-cluster”: false,
“enable-grpc-gateway”: true,
“initial-cluster”: “pd-172.16.5.7-2379=http://172.16.5.7:2380,pd-172.16.5.8-2379=http://172.16.5.8:2380,pd-172.16.5.13-2379=http://172.16.5.13:2380”,
“initial-cluster-state”: “new”,
“join”: “”,
“lease”: 3,
“log”: {
“level”: “”,
“format”: “text”,
“disable-timestamp”: false,
“file”: {
“filename”: “/data/server/tidb/pd-2379/log/pd.log”,
“max-size”: 300,
“max-days”: 0,
“max-backups”: 0
},
“development”: false,
“disable-caller”: false,
“disable-stacktrace”: false,
“disable-error-verbose”: true,
“sampling”: null
},
“tso-save-interval”: “3s”,
“metric”: {
“job”: “pd-172.16.5.8-2379”,
“address”: “”,
“interval”: “15s”
},
“schedule”: {
“max-snapshot-count”: 3,
“max-pending-peer-count”: 16,
“max-merge-region-size”: 20,
“max-merge-region-keys”: 200000,
“split-merge-interval”: “1h0m0s”,
“enable-one-way-merge”: “false”,
“enable-cross-table-merge”: “false”,
“patrol-region-interval”: “100ms”,
“max-store-down-time”: “30m0s”,
“leader-schedule-limit”: 4,
“leader-schedule-policy”: “count”,
“region-schedule-limit”: 2048,
“replica-schedule-limit”: 64,
“merge-schedule-limit”: 8,
“hot-region-schedule-limit”: 4,
“hot-region-cache-hits-threshold”: 3,
“store-limit”: {
“1”: {
“add-peer”: 15,
“remove-peer”: 15
},
“4”: {
“add-peer”: 15,
“remove-peer”: 15
},
“5”: {
“add-peer”: 15,
“remove-peer”: 15
}
},
“tolerant-size-ratio”: 0,
“low-space-ratio”: 0.8,
“high-space-ratio”: 0.7,
“scheduler-max-waiting-operator”: 5,
“enable-remove-down-replica”: “true”,
“enable-replace-offline-replica”: “true”,
“enable-make-up-replica”: “true”,
“enable-remove-extra-replica”: “true”,
“enable-location-replacement”: “true”,
“enable-debug-metrics”: “false”,
“schedulers-v2”: [
{
“type”: “balance-region”,
“args”: null,
“disable”: false,
“args-payload”: “”
},
{
“type”: “balance-leader”,
“args”: null,
“disable”: false,
“args-payload”: “”
},
{
“type”: “hot-region”,
“args”: null,
“disable”: false,
“args-payload”: “”
},
{
“type”: “label”,
“args”: null,
“disable”: false,
“args-payload”: “”
},
{
“type”: “evict-leader”,
“args”: [
“4”
],
“disable”: false,
“args-payload”: “”
}
],
“schedulers-payload”: {
“balance-hot-region-scheduler”: “null”,
“balance-leader-scheduler”: “{“name”:“balance-leader-scheduler”,“ranges”:[{“start-key”:”",“end-key”:""}]}",
“balance-region-scheduler”: “{“name”:“balance-region-scheduler”,“ranges”:[{“start-key”:”",“end-key”:""}]}",
“evict-leader-scheduler”: “{“store-id-ranges”:{“4”:[{“start-key”:”",“end-key”:""}]}}",
“label-scheduler”: “{“name”:“label-scheduler”,“ranges”:[{“start-key”:”",“end-key”:""}]}"
},
“store-limit-mode”: “manual”
},
“replication”: {
“max-replicas”: 3,
“location-labels”: “”,
“strictly-match-label”: “false”,
“enable-placement-rules”: “false”
},
“pd-server”: {
“use-region-storage”: “true”,
“max-gap-reset-ts”: “24h0m0s”,
“key-type”: “table”,
“runtime-services”: “”,
“metric-storage”: “”,
“dashboard-address”: “http://172.16.5.13:2379
},
“cluster-version”: “4.0.2”,
“quota-backend-bytes”: “8GiB”,
“auto-compaction-mode”: “periodic”,
“auto-compaction-retention-v2”: “1h”,
“TickInterval”: “500ms”,
“ElectionInterval”: “3s”,
“PreVote”: true,
“security”: {
“cacert-path”: “”,
“cert-path”: “”,
“key-path”: “”,
“cert-allowed-cn”: null
},
“label-property”: {},
“WarningMsgs”: null,
“DisableStrictReconfigCheck”: false,
“HeartbeatStreamBindInterval”: “1m0s”,
“LeaderPriorityCheckInterval”: “1m0s”,
“dashboard”: {
“tidb_cacert_path”: “”,
“tidb_cert_path”: “”,
“tidb_key_path”: “”,
“public_path_prefix”: “”,
“internal_proxy”: false,
“disable_telemetry”: false
},
“replication-mode”: {
“replication-mode”: “majority”,
“dr-auto-sync”: {
“label-key”: “”,
“primary”: “”,
“dr”: “”,
“primary-replicas”: 0,
“dr-replicas”: 0,
“wait-store-timeout”: “1m0s”,
“wait-sync-timeout”: “1m0s”
}
}
}

Starting component ctl: /home/tidb/.tiup/components/ctl/v4.0.2/ctl pd -u 172.16.5.13:2379 store show all
{
“count”: 3,
“stores”: [
{
“store”: {
“id”: 1,
“address”: “172.16.5.81:20160”,
“version”: “4.0.2”,
“status_address”: “172.16.5.81:20180”,
“git_hash”: “98ee08c587ab47d9573628aba6da741433d8855c”,
“start_timestamp”: 1600960152,
“deploy_path”: “/data/server/tidb/tikv-20160/bin”,
“last_heartbeat”: 1601015160888427554,
“state_name”: “Up”
},
“status”: {
“capacity”: “984.2GiB”,
“available”: “548.6GiB”,
“used_size”: “416.9GiB”,
“leader_count”: 20903,
“leader_weight”: 1,
“leader_score”: 20903,
“leader_size”: 1949030,
“region_count”: 41852,
“region_weight”: 1,
“region_score”: 3896984,
“region_size”: 3896984,
“start_ts”: “2020-09-24T23:09:12+08:00”,
“last_heartbeat_ts”: “2020-09-25T14:26:00.888427554+08:00”,
“uptime”: “15h16m48.888427554s”
}
},
{
“store”: {
“id”: 4,
“address”: “172.16.5.83:20160”,
“version”: “4.0.2”,
“status_address”: “172.16.5.83:20180”,
“git_hash”: “98ee08c587ab47d9573628aba6da741433d8855c”,
“start_timestamp”: 1595749794,
“deploy_path”: “/data/server/tidb/tikv-20160/bin”,
“last_heartbeat”: 1601015155678579259,
“state_name”: “Up”
},
“status”: {
“capacity”: “984.2GiB”,
“available”: “547.1GiB”,
“used_size”: “416.9GiB”,
“leader_count”: 0,
“leader_weight”: 1,
“leader_score”: 0,
“leader_size”: 0,
“region_count”: 41852,
“region_weight”: 1,
“region_score”: 3896984,
“region_size”: 3896984,
“start_ts”: “2020-07-26T15:49:54+08:00”,
“last_heartbeat_ts”: “2020-09-25T14:25:55.678579259+08:00”,
“uptime”: “1462h36m1.678579259s”
}
},
{
“store”: {
“id”: 5,
“address”: “172.16.5.82:20160”,
“version”: “4.0.2”,
“status_address”: “172.16.5.82:20180”,
“git_hash”: “98ee08c587ab47d9573628aba6da741433d8855c”,
“start_timestamp”: 1600960302,
“deploy_path”: “/data/server/tidb/tikv-20160/bin”,
“last_heartbeat”: 1601015154106737264,
“state_name”: “Up”
},
“status”: {
“capacity”: “984.2GiB”,
“available”: “547.7GiB”,
“used_size”: “417GiB”,
“leader_count”: 20923,
“leader_weight”: 1,
“leader_score”: 20923,
“leader_size”: 1947954,
“region_count”: 41852,
“region_weight”: 1,
“region_score”: 3896984,
“region_size”: 3896984,
“start_ts”: “2020-09-24T23:11:42+08:00”,
“last_heartbeat_ts”: “2020-09-25T14:25:54.106737264+08:00”,
“uptime”: “15h14m12.106737264s”
}
}
]
}

看到 store 4 上有 evicit-leader-scheduler ,这个会导致 leader 不调度到这个节点上

“schedulers-payload”: {
“balance-hot-region-scheduler”: “null”,
“balance-leader-scheduler”: “{“name”:“balance-leader-scheduler”,“ranges”:[{“start-key”:”",“end-key”:""}]}",
“balance-region-scheduler”: “{“name”:“balance-region-scheduler”,“ranges”:[{“start-key”:”",“end-key”:""}]}",
“evict-leader-scheduler”: “{“store-id-ranges”:{“4”:[{“start-key”:”",“end-key”:""}]}}",
“label-scheduler”: “{“name”:“label-scheduler”,“ranges”:[{“start-key”:”",“end-key”:""}]}"

可以通过 pd-ctl 执行 scheduler remove evict-leader-scheduler 移除这个

可以帮个给出完整命令嘛,我是这样查看的tiup ctl pd -u 172.16.5.13:2379 store show all

可以参考一下官方文档:
https://docs.pingcap.com/zh/tidb/stable/pd-control#scheduler-show--add--remove--pause--resume--config

tiup ctl pd -u xxx -i 进入交互模式
执行 scheduler remove evict-leader-scheduler-{store_id}

tiup ctl pd -u 172.16.5.13:2379 -i 进入交互模式
scheduler remove evict-leader-scheduler-4
这样对吗?

这个是导致tikv的前面两个节点cpu高的原因嘛?可是我重启pd和,tidb,cpu就恢复了正常

是我之前执行了一共大的查询,同时union了200张表,没查询出来,我就终止了,然后cpu上去了,就一直下不来,async apply cpu
unified read pool cpu监控那个时候的cpu消化很大,tidb不会自己释放的嘛?还是说我哪里设置,或者手动释放?

tiup ctl pd -u 172.16.5.13:2379 -i 进入交互模式
scheduler remove evict-leader-scheduler-4

是的

多表关联可能会导致 tikv cpu 升高,客户端中止了 SQL 的话,对于 tidb 已经发送给 tikv 的 coprocessor 请求并不会马上终止,终止 tidb SQL 只会终止 tidb 不向 tikv 继续发送 coprocessor 请求,会需要继续等待已经发送给 tikv 的 corprocessor 执行结束