tikv发生磁盘读时磁盘读统计信息和实际执行时间差异大

【 TiDB 使用环境】Poc
【 TiDB 版本】v7.0.0

对于一个tpch100的语句: select /*+ read_from_storage(tikv[a]) / sql_no_cache count() from customer a where C_ACCTBAL < - 10000;

rocksdb发生了磁盘读时候耗时为10秒,当未发生磁盘读时候耗时5秒,但是单看执行计划中磁盘读耗时才几百毫秒。时间匹配不上,那么时间花费在哪里了呢?

--关闭tidb的缓存
mysql> show config where name like 'tikv-client.copr-cache.capacity-mb';
+------+---------------------+------------------------------------+-------+
| Type | Instance            | Name                               | Value |
+------+---------------------+------------------------------------+-------+
| tidb | 192.168.31.201:4000 | tikv-client.copr-cache.capacity-mb | 0     |
+------+---------------------+------------------------------------+-------+
1 row in set (0.01 sec)

--设置distsql并行度为1,避免tikv的coprocessor产生task积压。
mysql> set tidb_distsql_scan_concurrency=1;
Query OK, 0 rows affected (0.01 sec)

--readpool.coprocessor.low-concurrency=2,但是因为tidb_distsql_scan_concurrency=1,因此并不会产生积压。
mysql> show config where name like 'readpool.coprocessor%' and type='tikv';
+------+----------------------+--------------------------------------------------+-------+
| Type | Instance             | Name                                             | Value |
+------+----------------------+--------------------------------------------------+-------+
| tikv | 192.168.31.201:20160 | readpool.coprocessor.high-concurrency            | 9     |
| tikv | 192.168.31.201:20160 | readpool.coprocessor.low-concurrency             | 2     |
| tikv | 192.168.31.201:20160 | readpool.coprocessor.max-tasks-per-worker-high   | 2000  |
| tikv | 192.168.31.201:20160 | readpool.coprocessor.max-tasks-per-worker-low    | 2000  |
| tikv | 192.168.31.201:20160 | readpool.coprocessor.max-tasks-per-worker-normal | 2000  |
| tikv | 192.168.31.201:20160 | readpool.coprocessor.normal-concurrency          | 2     |
| tikv | 192.168.31.201:20160 | readpool.coprocessor.stack-size                  | 10MiB |
| tikv | 192.168.31.201:20160 | readpool.coprocessor.use-unified-pool            | false |
+------+----------------------+--------------------------------------------------+-------+
8 rows in set (0.01 sec)

--第一次用sql_no_cache 避免数据加载到block-cache中,发生了较多物理读:read_byte: 1.13 GB, read_time: 282.8ms
mysql> explain analyze select /*+ read_from_storage(tikv[a]) */ sql_no_cache count(*) from customer a where C_ACCTBAL <  - 10000;
+------------------------------+-------------+----------+-----------+---------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------+-----------+------+
| id                           | estRows     | actRows  | task      | access object | execution info                                                                                                                                                                                                                                                                                                                                                        | operator info                          | memory    | disk |
+------------------------------+-------------+----------+-----------+---------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------+-----------+------+
| StreamAgg_18                 | 1.00        | 1        | root      |               | time:10.2s, loops:2, RRU:49975.958141, WRU:0.000000                                                                                                                                                                                                                                                                                                                   | funcs:count(Column#11)->Column#9       | 388 Bytes | N/A  |
| └─TableReader_19             | 1.00        | 0        | root      |               | time:10.2s, loops:1, cop_task: {num: 33, max: 453.4ms, min: 255.6ms, avg: 309.6ms, p95: 356.1ms, max_proc_keys: 650432, p95_proc_keys: 448553, tot_proc: 10.2s, tot_wait: 2.61ms, rpc_num: 33, rpc_time: 10.2s, copr_cache: disabled, build_task_duration: 46.6µs, max_distsql_concurrency: 1}                                                                        | data:StreamAgg_9                       | 380 Bytes | N/A  |
|   └─StreamAgg_9              | 1.00        | 0        | cop[tikv] |               | tikv_task:{proc max:453ms, min:255ms, avg: 308.5ms, p80:316ms, p95:355ms, iters:14656, tasks:33}, scan_detail: {total_process_keys: 15000000, total_process_keys_size: 3052270577, total_keys: 15000033, get_snapshot_time: 1.58ms, rocksdb: {key_skipped_count: 15000000, block: {cache_hit_count: 170, read_count: 99848, read_byte: 1.13 GB, read_time: 282.8ms}}} | funcs:count(1)->Column#11              | N/A       | N/A  |
|     └─Selection_12           | 0.00        | 0        | cop[tikv] |               | tikv_task:{proc max:453ms, min:254ms, avg: 308ms, p80:316ms, p95:355ms, iters:14656, tasks:33}                                                                                                                                                                                                                                                                        | lt(tpch100.customer.c_acctbal, -10000) | N/A       | N/A  |
|       └─TableFullScan_11     | 15000000.00 | 15000000 | cop[tikv] | table:a       | tikv_task:{proc max:411ms, min:230ms, avg: 282.4ms, p80:291ms, p95:327ms, iters:14656, tasks:33}                                                                                                                                                                                                                                                                      | keep order:false                       | N/A       | N/A  |
+------------------------------+-------------+----------+-----------+---------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------+-----------+------+
5 rows in set (10.22 sec)

--第二次未用sql_no_cache,但是因为上一次用了sql_no_cache并没有把数据放到block-cache中(避免污染缓存),所以这次也是发生了较多物理读:read_byte: 1.13 GB, read_time: 282.8ms。
mysql> explain analyze select /*+ read_from_storage(tikv[a]) */  count(*) from customer a where C_ACCTBAL <  - 10000;
+------------------------------+-------------+----------+-----------+---------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------+-----------+------+
| id                           | estRows     | actRows  | task      | access object | execution info                                                                                                                                                                                                                                                                                                                                                        | operator info                          | memory    | disk |
+------------------------------+-------------+----------+-----------+---------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------+-----------+------+
| StreamAgg_18                 | 1.00        | 1        | root      |               | time:11.2s, loops:2, RRU:50306.220368, WRU:0.000000                                                                                                                                                                                                                                                                                                                   | funcs:count(Column#11)->Column#9       | 388 Bytes | N/A  |
| └─TableReader_19             | 1.00        | 0        | root      |               | time:11.2s, loops:1, cop_task: {num: 33, max: 480ms, min: 286.3ms, avg: 339.9ms, p95: 391.7ms, max_proc_keys: 650432, p95_proc_keys: 448553, tot_proc: 11.2s, tot_wait: 5.53ms, rpc_num: 33, rpc_time: 11.2s, copr_cache: disabled, build_task_duration: 55.8µs, max_distsql_concurrency: 1}                                                                          | data:StreamAgg_9                       | 381 Bytes | N/A  |
|   └─StreamAgg_9              | 1.00        | 0        | cop[tikv] |               | tikv_task:{proc max:479ms, min:284ms, avg: 338.8ms, p80:351ms, p95:389ms, iters:14656, tasks:33}, scan_detail: {total_process_keys: 15000000, total_process_keys_size: 3052270577, total_keys: 15000033, get_snapshot_time: 4.08ms, rocksdb: {key_skipped_count: 15000000, block: {cache_hit_count: 171, read_count: 99847, read_byte: 1.13 GB, read_time: 316.5ms}}} | funcs:count(1)->Column#11              | N/A       | N/A  |
|     └─Selection_12           | 0.00        | 0        | cop[tikv] |               | tikv_task:{proc max:478ms, min:284ms, avg: 338.2ms, p80:351ms, p95:388ms, iters:14656, tasks:33}                                                                                                                                                                                                                                                                      | lt(tpch100.customer.c_acctbal, -10000) | N/A       | N/A  |
|       └─TableFullScan_11     | 15000000.00 | 15000000 | cop[tikv] | table:a       | tikv_task:{proc max:441ms, min:268ms, avg: 311.9ms, p80:325ms, p95:335ms, iters:14656, tasks:33}                                                                                                                                                                                                                                                                      | keep order:false                       | N/A       | N/A  |
+------------------------------+-------------+----------+-----------+---------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------+-----------+------+
5 rows in set (11.22 sec)

--第三次执行可以看到并没有任何物理读,整体耗时5.79秒,比之前少了5秒。但是之前仅仅是多了发生物理读,其它都一样,且物理读耗时316毫秒,远比5秒小。
mysql> explain analyze select /*+ read_from_storage(tikv[a]) */  count(*) from customer a where C_ACCTBAL <  - 10000;
+------------------------------+-------------+----------+-----------+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------+-----------+------+
| id                           | estRows     | actRows  | task      | access object | execution info                                                                                                                                                                                                                                                                                                | operator info                          | memory    | disk |
+------------------------------+-------------+----------+-----------+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------+-----------+------+
| StreamAgg_18                 | 1.00        | 1        | root      |               | time:5.78s, loops:2, RRU:48498.923087, WRU:0.000000                                                                                                                                                                                                                                                           | funcs:count(Column#11)->Column#9       | 388 Bytes | N/A  |
| └─TableReader_19             | 1.00        | 0        | root      |               | time:5.78s, loops:1, cop_task: {num: 33, max: 268ms, min: 126.8ms, avg: 175.2ms, p95: 196ms, max_proc_keys: 650432, p95_proc_keys: 448553, tot_proc: 5.75s, tot_wait: 2.9ms, rpc_num: 33, rpc_time: 5.78s, copr_cache: disabled, build_task_duration: 87.7µs, max_distsql_concurrency: 1}                     | data:StreamAgg_9                       | 357 Bytes | N/A  |
|   └─StreamAgg_9              | 1.00        | 0        | cop[tikv] |               | tikv_task:{proc max:266ms, min:126ms, avg: 174.2ms, p80:187ms, p95:195ms, iters:14656, tasks:33}, scan_detail: {total_process_keys: 15000000, total_process_keys_size: 3052270577, total_keys: 15000033, get_snapshot_time: 1.65ms, rocksdb: {key_skipped_count: 15000000, block: {cache_hit_count: 100018}}} | funcs:count(1)->Column#11              | N/A       | N/A  |
|     └─Selection_12           | 0.00        | 0        | cop[tikv] |               | tikv_task:{proc max:266ms, min:126ms, avg: 173.8ms, p80:187ms, p95:193ms, iters:14656, tasks:33}                                                                                                                                                                                                              | lt(tpch100.customer.c_acctbal, -10000) | N/A       | N/A  |
|       └─TableFullScan_11     | 15000000.00 | 15000000 | cop[tikv] | table:a       | tikv_task:{proc max:231ms, min:105ms, avg: 150.5ms, p80:161ms, p95:174ms, iters:14656, tasks:33}                                                                                                                                                                                                              | keep order:false                       | N/A       | N/A  |
+------------------------------+-------------+----------+-----------+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------+-----------+------+
5 rows in set (5.79 sec)

rpc_num , rpc_time :向 TiKV 发送 Cop 类型的 RPC 请求总数量和总时间。
看你的时间都耗费在rpc_time上了

因为最终在full table scan时候时间就有差一倍,所以rpc时间也是,因为rpc时间是tikv侧总花费时间。但是扫描时间慢了一倍,总key扫描量没变,变化的只有发生了磁盘io,所以感觉慢了一倍主要是磁盘io问题,但是看执行计划中的read_time才几百毫秒,所以对不上。

你tikv中的task任务平均任务时间每个减少了一半吧,这就是少的缓存数据的时间吧,其他过滤数据的时间没法节省掉啊

通过 AI 的能力搜了下,返回的是这些:

根据 TiKV 官方文档中的《TiKV 热点读问题排查》可以了解到,可以通过 TiKV 的监控信息来判断是否有大量读流量造成对应 TiKV 实例的 CPU 上涨。具体可以通过 TiKV-details 监控的 Thread CPU -〉 Unified Read Pool 和 TiKV → MBps 来判断。如果 Unified Read Pool 中有明显的 CPU 异常在某个 TiKV 实例上面,同时 TiKV 的 MBps 有明显的读流量请求,达到峰值,那么就说明该 TiKV 实例存在热点读问题。

此外,TiKV 还提供了一些关于磁盘读的统计信息,如 tikv_import_write_chunk_bytestikv_import_write_chunk_durationtikv_import_upload_chunk_bytestikv_import_upload_chunk_durationtikv_import_range_delivery_duration 等,可以通过这些信息来了解 TiKV 的磁盘读情况。但是,如果磁盘读统计信息和实际执行时间差异较大,可能需要进一步排查 TiKV 实例的热点读问题。


感觉 AI 的回复,还是缺很多判断

等待更多的内部指标和观测性的能力,才能帮助我们解惑了