TiKV 的 CPU 使用率飙升至 5000%,并且导致瓶颈。

【TiDB 使用环境】生产环境
【TiDB 版本】v8.5.0
【部署方式】
一个 TiDB 集群,包含 3 个 TiKV、3 个 TiDB 和 3 个 TiPD
【遇到的问题:问题现象及影响】
想请问一下大佬们,我有一个 TiDB 集群,包含 3 个 TiKV、3 个 TiDB 和 3 个 TiPD。在创建索引后,我为每个表创建了复合索引来优化查询,但 TiKV 的 CPU 使用率飙升至 5000%,并且成为瓶颈。有人能帮我看看发生了什么吗?就在创建索引后,我有大约 80+ 个表,包含数百万条记录。
在 TiKV 的日志中,我看到:
[2025/04/09 22:31:07.907 +02:00] [WARN] [tracker.rs:477] ["query deadline exceeded"] [tag=select] [table_id=5694] [txn_start_ts=457239352570281992] [total_suspend_time=59.758977247s] [total_process_time=201.754712ms] [handler_build_time=26.677827ms] [wait_time.snapshot=160.312µs] [wait_time.schedule=2.281756ms] [wait_time=2.442068ms] [total_lifetime=59.991388354s] [remote_host=ipv4:10.0.1.41:55140] [region_id=51473] [session_alias=] [connection_id=4034922050] [current_stage=Tracked] [thread_id=76]

[2025/04/09 22:32:13.027 +02:00] [WARN] [prewrite.rs:833] ["commit_ts is too large, fallback to normal 2PC"] [lock="Lock { lock_type: Put, primary_key: 7480000000000016425F72013137343432333036FF36305F3078326632FF6132353433623736FF6134313636353439FF6637616162326537FF3562656630616566FF6335623066000000FC, start_ts: TimeStamp(457239619760553993), ttl: 3009, short_value: 007D0180000100000008010000, for_update_ts: TimeStamp(457239619760553993), txn_size: 2, min_commit_ts: TimeStamp(457239619760553995), use_async_commit: true, secondaries: [], rollback_ts: [], last_change: Unknown, txn_source: 0, is_locked_with_conflict: false, generation: 0 }"] [max_commit_ts=457239620287201289] [min_commit_ts=457239621254512659] [start_ts=457239619760553993] [key=7480000000000016FF425F698000000000FF0000020380000000FF67F6D90401313734FF3432333036FF3630FF5F3078326632FF61FF32353433623736FFFF6134313636353439FFFF66376161623265FF37FF356265663061FF6566FF6335623066FF000000FC00000000FB] [thread_id=132]
[2025/04/09 22:32:13.285 +02:00] [INFO] [server.rs:1466] ["ignore updating disk status as no reserve space is set"] [thread_id=16]



在应用以下索引之前,一切运行正常,没有任何问题。

SELECT
 vid,
 link,
 position_side,
 price,
 event_timestamp
FROM
gmx_v1_trades t USE INDEX(_idx_token_status_vid_link)
WHERE
 t.index_token = '0x82af49447d8a07e3bd95bd0d56f35241523fbab1'
 AND t.status = 'OPEN'
 AND t.vid < 233191
 AND NOT EXISTS (
  SELECT
   1
  FROM
gmx_v1_trades t2 USE INDEX(_idx_token_status_vid_link)
  WHERE
   t2.index_token = '0x82af49447d8a07e3bd95bd0d56f35241523fbab1'
   AND t2.status IN ('CLOSE', 'LIQUIDATED')
   AND t2.vid < 233191
   AND t2.link = t.link
 );

在添加索引之前,之前的相同查询运行得非常顺利。


执行计划:

2025/04/09 23:20:20.551 +02:00] [INFO] [raft.rs:1354] ["[logterm: 124, index: 2232, vote: 11046] ignored vote from 11048 [logterm: 124, index: 2232]: lease is not expired"] ["msg type"=MsgRequestPreVote] ["remaining ticks"=10] [term=124] [msg_index=2232] [msg_term=124] [from=11048] [vote=11046] [log_index=2232] [log_term=124] [raft_id=11046] [region_id=11045] [thread_id=372]
[2025/04/09 23:20:20.552 +02:00] [ERROR] [kv.rs:1264] ["KvService response batch commands fail"] [err="\"SendError(..)\""] [thread_id=52]
[2025/04/09 23:20:20.797 +02:00] [INFO] [tracker.rs:288] [slow-query] [perf_stats.internal_delete_skipped_count=0] [perf_stats.internal_key_skipped_count=20480] [perf_stats.block_read_byte=0] [perf_stats.block_read_count=0] [perf_stats.block_cache_hit_count=81920] [scan.range.first="Some(start: 74800000000000163E5F72013137303833383033FF30305F3078383261FF6634393434376438FF6130376533626439FF3562643064353666FF3335323431353233FF6662616231000000FC end: 74800000000000163E5F72013137303833383033FF30305F3078383261FF6634393434376438FF6130376533626439FF3562643064353666FF3335323431353233FF6662616231000000FC00)"] [scan.ranges=20480] [scan.total=40960] [scan.processed_size=5672960] [scan.processed=20480] [scan.is_desc=false] [tag=select] [table_id=5694] [txn_start_ts=457240294537822210] [total_suspend_time=48.994807302s] [total_process_time=1.558600783s] [handler_build_time=40.714364ms] [wait_time.snapshot=114.709µs] [wait_time.schedule=4.363315ms] [wait_time=4.478024ms] [total_lifetime=50.598635593s] [remote_host=ipv4:10.0.1.40:46118] [region_id=51473] [query_digest=dc6e1b76f82e659324fd1714a850d04f79cc09d7d55ba15cad0ce4c45ba8e0d0] [session_alias=] [connection_id=1679822608] [thread_id=72]
[2025/04/09 23:20:21.253 +02:00] [INFO] [server.rs:1466] ["ignore updating disk status as no reserve space is set"] [thread_id=17]
[2025/04/09 23:20:21.287 +02:00] [INFO] [tracker.rs:288] [slow-query] [perf_stats.internal_delete_skipped_count=0] [perf_stats.internal_key_skipped_count=20480] [perf_stats.block_read_byte=0] [perf_stats.block_read_count=0] [perf_stats.block_cache_hit_count=81920] [scan.range.first="Some(start: 74800000000000163E5F72013137303932303633FF32305F3078383261FF6634393434376438FF6130376533626439FF3562643064353666FF3335323431353233FF6662616231000000FC end: 74800000000000163E5F72013137303932303633FF32305F3078383261FF6634393434376438FF6130376533626439FF3562643064353666FF3335323431353233FF6662616231000000FC00)"] [scan.ranges=20480] [scan.total=40960] [scan.processed_size=5672960] [scan.processed=20480] [scan.is_desc=false] [tag=select] [table_id=5694] [txn_start_ts=457240265164324879] [total_suspend_time=48.222402959s] [total_process_time=1.511320089s] [handler_build_time=11.573768ms] [wait_time.snapshot=93.446µs] [wait_time.schedule=1.965887ms] [wait_time=2.059333ms] [total_lifetime=49.747381319s] [remote_host=ipv4:10.0.1.41:47306] [region_id=51473] [query_digest=dc6e1b76f82e659324fd1714a850d04f79cc09d7d55ba15cad0ce4c45ba8e0d0] [session_alias=] [connection_id=1807748826] [thread_id=34]
[2025/04/09 23:20:21.292 +02:00] [ERROR] [kv.rs:1264] ["KvService response batch commands fail"] [err="\"SendError(..)\""] [thread_id=34]
[2025/04/09 23:20:22.120 +02:00] [INFO] [tracker.rs:288] [slow-query] [perf_stats.internal_delete_skipped_count=0] [perf_stats.internal_key_skipped_count=20480] [perf_stats.block_read_byte=0] [perf_stats.block_read_count=0] [perf_stats.block_cache_hit_count=81920] [scan.range.first="Some(start: 74800000000000163E5F72013137303935363531FF32305F3078383261FF6634393434376438FF6130376533626439FF3562643064353666FF3335323431353233FF6662616231000000FC end: 74800000000000163E5F72013137303935363531FF32305F3078383261FF6634393434376438FF6130376533626439FF3562643064353666FF3335323431353233FF6662616231000000FC00)"] [scan.ranges=20480] [scan.total=40960] [scan.processed_size=5672960] [scan.processed=20480] [scan.is_desc=false] [tag=select] [table_id=5694] [txn_start_ts=457240265164324879] [total_suspend_time=48.93361844s] [total_process_time=1.583331462s] [handler_build_time=10.524522ms] [wait_time.snapshot=94.488µs] [wait_time.schedule=2.67309ms] [wait_time=2.767578ms] [total_lifetime=50.530278465s] [remote_host=ipv4:10.0.1.41:47306] [region_id=51473] [query_digest=dc6e1b76f82e659324fd1714a850d04f79cc09d7d55ba15cad0ce4c45ba8e0d0] [session_alias=] [connection_id=1807748826] [thread_id=82]
[2025/04/09 23:20:22.134 +02:00] [ERROR] [kv.rs:1264] ["KvService response batch commands fail"] [err="\"SendError(..)\""] [thread_id=82]
[2025/04/09 23:20:22.254 +02:00] [INFO] [server.rs:1466] ["ignore updating disk status as no reserve space is set"] [thread_id=17]
[2025/04/09 23:20:22.593 +02:00] [INFO] [tracker.rs:288] [slow-query] [perf_stats.internal_delete_skipped_count=0] [perf_stats.internal_key_skipped_count=20448] [perf_stats.block_read_byte=0] [perf_stats.block_read_count=0] [perf_stats.block_cache_hit_count=81792] [scan.range.first="Some(start: 74800000000000163E5F72013137303931333333FF36305F3078383261FF6634393434376438FF6130376533626439FF3562643064353666FF3335323431353233FF6662616231000000FC end: 74800000000000163E5F72013137303931333333FF36305F3078383261FF6634393434376438FF6130376533626439FF3562643064353666FF3335323431353233FF6662616231000000FC00)"] [scan.ranges=20448] [scan.total=40896] [scan.processed_size=5664096] [scan.processed=20448] [scan.is_desc=false] [tag=select] [table_id=5694] [txn_start_ts=457240265164324879] [total_suspend_time=49.357758s] [total_process_time=1.532267802s] [handler_build_time=14.745113ms] [wait_time.snapshot=111.292µs] [wait_time.schedule=3.34951ms] [wait_time=3.460802ms] [total_lifetime=50.908280874s] [remote_host=ipv4:10.0.1.41:47306] [region_id=51473] [query_digest=dc6e1b76f82e659324fd1714a850d04f79cc09d7d55ba15cad0ce4c45ba8e0d0] [session_alias=] [connection_id=1807748826] [thread_id=33]
[2025/04/09 23:20:22.603 +02:00] [ERROR] [kv.rs:1264] ["KvService response batch commands fail"] [err="\"SendError(..)\""] [thread_id=33]
[2025/04/09 23:20:22.989 +02:00] [INFO] [tracker.rs:288] [slow-query] [perf_stats.internal_delete_skipped_count=0] [perf_stats.internal_key_skipped_count=20480] [perf_stats.block_read_byte=0] [perf_stats.block_read_count=0] [perf_stats.block_cache_hit_count=81920] [scan.range.first="Some(start: 74800000000000163E5F72013137303931303637FF32305F3078383261FF6634393434376438FF6130376533626439FF3562643064353666FF3335323431353233FF6662616231000000FC end: 74800000000000163E5F72013137303931303637FF32305F3078383261FF6634393434376438FF6130376533626439FF3562643064353666FF3335323431353233FF6662616231000000FC00)"] [scan.ranges=20480] [scan.total=40960] [scan.processed_size=5672960] [scan.processed=20480] [scan.is_desc=false] [tag=select] [table_id=5694] [txn_start_ts=457240265164324879] [total_suspend_time=49.62618207s] [total_process_time=1.563951383s] [handler_build_time=23.449178ms] [wait_time.snapshot=123.086µs] [wait_time.schedule=3.468849ms] [wait_time=3.591935ms] [total_lifetime=51.217213223s] [remote_host=ipv4:10.0.1.41:47306] [region_id=51473] [query_digest=dc6e1b76f82e659324fd1714a850d04f79cc09d7d55ba15cad0ce4c45ba8e0d0] [session_alias=] [connection_id=1807748826] [thread_id=34]
[2025/04/09 23:20:22.995 +02:00] [ERROR] [kv.rs:1264] ["KvService response batch commands fail"] [err="\"SendError(..)\""] [thread_id=34]

我的磁盘空间使用正常。感觉问题应该跟查询无关,因为 CPU 高峰仅持续了 2 到 3 个小时,之后又恢复正常,而我并没有更改任何查询。在 Grafana 中,我可以看到 600% 的 gRPC poll CPU 使用率。

发下sql的执行计划

1 个赞

索引?
执行计划?都发下吧

1 个赞

大概率是SQL执行变慢引起,需要查看下执行计划

1 个赞

看下sql的执行计划吧。

1 个赞

此话题已在最后回复的 7 天后被自动关闭。不再允许新回复。