并发查询导致查询变慢

TiDBer_99x37Qii · 2025 年1 月 13 日 02:39

【 TiDB 使用环境】生产环境
【 TiDB 版本】
【复现路径】列表压测（单表查询），数据300w左右
【遇到的问题：问题现象及影响】导致其他查询时间从1s → 50s左右
【资源配置】进入到 TiDB Dashboard -集群信息 (Cluster Info) -主机(Hosts) 截图此页面
【附件：截图/日志/监控】

| id | estRows | estCost | actRows | task | access object | execution info | operator info | memory | disk |
| StreamAgg_10 | 1.00 | 91893354.59 | 1 | root | | time:44.9s, loops:2 | funcs:count(0)->Column#97 | 11 KB | N/A |
| └─IndexReader_22 | 0.00 | 91893304.69 | 2338928 | root | | time:44.8s, loops:2329, cop_task: {num: 86, max: 4.33s, min: 707.4µs, avg: 1.62s, p95: 4.11s, max_proc_keys: 405311, p95_proc_keys: 108512, tot_proc: 3.18s, tot_wait: 34.6ms, copr_cache_hit_ratio: 0.00, build_task_duration: 25.6µs, max_distsql_concurrency: 5}, rpc_info:{Cop:{num_rpc:86, total_time:2m19.4s}} | index:Selection_21 | 794.3 KB | N/A |
| └─Selection_21 | 0.00 | 1378399507.03 | 2338928 | cop[tikv] | | tikv_task:{proc max:4.33s, min:0s, avg: 1.62s, p80:2.89s, p95:4.1s, iters:3704, tasks:86}, scan_detail: {total_process_keys: 3443272, total_process_keys_size: 406306096, total_keys: 3443358, get_snapshot_time: 1.32ms, rocksdb: {key_skipped_count: 3443272, block: {cache_hit_count: 10560}}}, time_detail: {total_process_time: 3.18s, total_suspend_time: 2m16s, total_wait_time: 34.6ms, total_kv_read_wall_time: 2m18.9s, tikv_wall_time: 2m19.3s} | ge(haini-reseller.reseller_hjs_order_archive.order_time, 2024-05-01 00:00:00.000000), lt(haini-reseller.reseller_hjs_order_archive.order_time, 2024-09-08 23:59:59.000000) | N/A | N/A |
| └─IndexFullScan_20 | 3443272.00 | 1034760961.43 | 3443272 | cop[tikv] | table:h, index:tenant_id(tenant_id, order_time) | tikv_task:{proc max:4.33s, min:0s, avg: 1.62s, p80:2.88s, p95:4.1s, iters:3704, tasks:86} | keep order:false | N/A | N/A |

TiDBer_99x37Qii · 2025 年1 月 13 日 02:40

tikv到瓶颈了？

有你就好 · 2025 年1 月 13 日 04:55

实在不行加节点吧

BrianLiu · 2025 年1 月 13 日 05:57

应该就是 unified read pool 打满running task 堆积导致整体的查询变慢

TiDBer_小杰 · 2025 年1 月 13 日 06:03

扩容tikv吧

zhanggame1 · 2025 年1 月 13 日 06:04

按理来说300W数据不算多

TiDBer_99x37Qii · 2025 年1 月 13 日 06:11

增加read pool？

residentevil · 2025 年1 月 13 日 06:41

这个case遇到多，因为并发高后，tikv压力会变大，有时候sql执行计划就错了

h5n1 · 2025 年1 月 13 日 07:05

看下overview → tikv leader 那个监控是不是有有leader drop ，看磁盘带宽打的挺高的

TiDBer_Lisjaper · 2025 年1 月 13 日 07:07

查看下grafana监控下 TiKV-Details—>thread cpu下面哪个ti kv打满了吧

春风十里不如你 · 2025 年1 月 13 日 07:17

300多w的indexfullscan在高并发下会慢的，看看tikv的cpu资源及tikv线程池配置

BrianLiu · 2025 年1 月 13 日 07:20

先看看 grafana 对应指标

有猫万事足 · 2025 年1 月 13 日 10:53

tikv是几个c的机器？

unified read pool 应该是打满了。

time_detail: {total_process_time: 3.18s, total_suspend_time: 2m16s, total_wait_time: 34.6ms, total_kv_read_wall_time: 2m18.9s, tikv_wall_time: 2m19.3s}

实际执行的时间不长，等待的时间巨长。如果机器还有c，增加 unified read pool 并发数。

看

https://docs.pingcap.com/zh/tidb/stable/grafana-tikv-dashboard#thread-cpu

这个图里面的

Unified read pool CPU：unified read pool 线程的 CPU 使用率

有cpu没有利用到，调

https://docs.pingcap.com/zh/tidb/stable/tikv-configuration-file#max-thread-count

这个参数。

kang · 2025 年1 月 13 日 12:59

调整表结构避免使用自增ID作为主键，可以考虑将主键设置为varchar类型，并配置SHARD_ROW_ID_BITS 来打散热点，提高并发写入能力

TiDBer_LwfCbcGm · 2025 年1 月 14 日 00:11

并发查询变慢我们也经常遇到，加索引优化，分区，

TiDBer_99x37Qii · 2025 年1 月 16 日 08:49

应该是unified read pool 被打满了

oceanzhang · 2025 年1 月 16 日 09:08

首先要看你的机器慢在哪，是IO，还是cpu，IO慢也要看，cpu到瓶颈就是逻辑问题，IO慢是不是内存问题还是没有合适的索引

oceanzhang · 2025 年1 月 16 日 09:08

最后再说两句啊，这个问题不能单纯的去看，300w并不高的数据量，除非并发上天

system · 2025 年1 月 24 日 06:18

此话题已在最后回复的 7 天后被自动关闭。不再允许新回复。