【 TiDB 使用环境】生产环境
【 TiDB 版本】V6.5.9
【复现路径】对于一个大表,其中有一个列MicorSeconds,数据类型为bigint 类型,其建有索引。在短时间内客户端启动了大量的SQL查询,但是每次 MicorSeconds 范围不重复
【遇到的问题:问题现象及影响】部分请求耗时很长,但是数据量很小。
【资源配置】
慢SQL 总览
发现很多慢SQL的最大耗时tikv节点都是
执行计划如下:
| id | estRows | estCost | actRows | task | access object | execution info | operator info | memory | disk |
| Projection_7 | 110340.45 | 216893236.14 | 8192 | root | | time:9m7.7s, loops:9, Concurrency:5 | event_live_notnull.arkham.rtid, event_live_notnull.arkham.datetime, event_live_notnull.arkham.microsecondsinceepoch, event_live_notnull.arkham.date, event_live_notnull.arkham.s1, event_live_notnull.arkham.s2, event_live_notnull.arkham.co_mention_count | 471.7 KB | N/A |
| └─Limit_14 | 110340.45 | 216816152.30 | 8192 | root | | time:9m7.7s, loops:9 | offset:0, count:10000000 | N/A | N/A |
| └─Projection_49 | 110340.45 | 216816152.30 | 8192 | root | | time:9m7.7s, loops:9, Concurrency:5 | event_live_notnull.arkham.rtid, event_live_notnull.arkham.datetime, event_live_notnull.arkham.microsecondsinceepoch, event_live_notnull.arkham.date, event_live_notnull.arkham.co_mention_count, event_live_notnull.arkham.s1, event_live_notnull.arkham.s2 | 597.5 KB | N/A |
| └─IndexLookUp_48 | 110340.45 | 216739068.47 | 8768 | root | | time:9m7.7s, loops:9, index_task: {total_time: 3m0.2s, fetch_handle: 28.4ms, build: 8.33ms, wait: 3m0.1s}, table_task: {total_time: 33m14.9s, num: 7, concurrency: 5}, next: {wait_index: 3ms, wait_table_lookup_build: 395.6µs, wait_table_lookup_resp: 9m7.7s} | | 4.94 MB | N/A |
| ├─Selection_47(Build) | 110340.45 | 30587252.49 | 61531 | cop[tikv] | | time:25.6ms, loops:63, cop_task: {num: 9, max: 9.96ms, min: 659.1µs, avg: 3.56ms, p95: 9.96ms, max_proc_keys: 23131, p95_proc_keys: 23131, tot_proc: 21ms, rpc_num: 9, rpc_time: 31.9ms, copr_cache_hit_ratio: 0.00, distsql_concurrency: 15}, tikv_task:{proc max:8ms, min:0s, avg: 2.67ms, p80:8ms, p95:8ms, iters:95, tasks:9}, scan_detail: {total_process_keys: 61531, total_process_keys_size: 3384205, total_keys: 61540, get_snapshot_time: 111.4µs, rocksdb: {key_skipped_count: 61531, block: {cache_hit_count: 100}}} | not(isnull(event_live_notnull.arkham.rtid)) | N/A | N/A |
| │ └─IndexRangeScan_45 | 110340.45 | 25081264.05 | 61531 | cop[tikv] | table:Arkham, index:MicroSecond-RTId(MicroSecondSinceEpoch, RTId) | tikv_task:{proc max:8ms, min:0s, avg: 2.67ms, p80:8ms, p95:8ms, iters:95, tasks:9} | range:[1692979200000000,1693238399999999], keep order:true | N/A | N/A |
| └─TableRowIDScan_46(Probe) | 110340.45 | 33765367.58 | 34971 | cop[tikv] | table:Arkham | time:33m14.8s, loops:42, cop_task: {num: 5, max: 3m0.2s, min: 48.4ms, avg: 1m24.1s, p95: 3m0.2s, max_proc_keys: 20448, p95_proc_keys: 20448, tot_proc: 3m59.8s, tot_wait: 116ms, rpc_num: 28, rpc_time: 24m59.9s, copr_cache_hit_ratio: 0.00, distsql_concurrency: 15}, backoff{tikvRPC: 13.1s, regionMiss: 146ms}, tikv_task:{proc max:1m0s, min:48ms, avg: 48s, p80:1m0s, p95:1m0s, iters:57, tasks:5}, scan_detail: {total_process_keys: 34971, total_process_keys_size: 4756056, total_keys: 37981, get_snapshot_time: 94.9µs, rocksdb: {key_skipped_count: 32417, block: {cache_hit_count: 42320}}} | keep order:false | N/A | N/A |
发现有读热点
分析leader region 在各个节点上的分布不均匀。
但是对应时段的 qps 监控,比如 10.0.3.46这个结点,tikv coprocessor 高达21K, 但是其 只有5个leader 节点
leader分布最多的节点,并没有很高的QPS
对应的region key 的数量上也没有太集中
表结构
[2024-11-12 18:03:07] mysql> desc Arkham;
+-----------------------+-------------+------+------+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-----------------------+-------------+------+------+---------+-------+
| RTId | int(11) | YES | MUL | NULL | |
| DateTime | varchar(40) | NO | | NULL | |
| MicroSecondSinceEpoch | bigint(20) | NO | MUL | NULL | |
| date | int(11) | YES | MUL | NULL | |
| updateTime | varchar(40) | YES | | NULL | |
| co_mention_count | int(11) | YES | | NULL | |
| s1 | varchar(20) | YES | | NULL | |
| s2 | varchar(20) | YES | | NULL | |
+-----------------------+-------------+------+------+---------+-------+
8 rows in set (0.00 sec)
mysql>
某个SQL, 其他SQL 的 MicorSeconds 的范围不一样:
[2024-11-12 18:04:50] SELECT
`RTId`,
`DateTime`,
`MicroSecondSinceEpoch`,
`date`,
`s1`,
`s2`,
`co_mention_count`
FROM
Arkham
WHERE
MicroSecondSinceEpoch >= 1663776000000000
AND MicroSecondSinceEpoch <= 1663862399999999
AND ! isnull(RTId)
ORDER BY
MicroSecondSinceEpoch
LIMIT
10000000;
【附件:截图/日志/监控】