【 TiDB 使用环境】Poc
【 TiDB 版本】6.6
【复现路径】做全表扫描观察tikv的copTask行为
【遇到的问题:问题现象及影响】copTask扫描数据太慢
【资源配置】都是在单线程下,内存中执行,不涉及资源使用
在通用场景中测试发现在tikv扫描数据相比于mysql或者pg都比较慢,如果说是因为lsmtree原因,其它某些基于lsmtree的数据库扫描并不比mysql差。
在实际测试中发现tikv的一个copTask全表扫描能力比单线程的mysql慢1-2倍,比单线程的pg慢5倍左右。
想问下为何tikv扫描会这么慢,是否后续有优化空间?
这里因为都是内存读,实际block设置的大小的意义并不是很大,但是这里也都贴出来作为参考。
测试语句为:select sum(C_NATIONKEY) from customer; 并让其走全表扫描。
所有数据库表结构均为:
mysql> show create table customer \G
*************************** 1. row ***************************
Table: customer
Create Table: CREATE TABLE `customer` (
`C_CUSTKEY` bigint NOT NULL,
`C_NAME` varchar(25) NOT NULL,
`C_ADDRESS` varchar(40) NOT NULL,
`C_NATIONKEY` bigint NOT NULL,
`C_PHONE` char(15) NOT NULL,
`C_ACCTBAL` decimal(15,2) NOT NULL,
`C_MKTSEGMENT` char(10) NOT NULL,
`C_COMMENT` varchar(117) NOT NULL,
PRIMARY KEY (`C_CUSTKEY`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
1 row in set (0.00 sec)
TiDB的相关测试如下:
mysql> select version();
+--------------------+
| version() |
+--------------------+
| 5.7.25-TiDB-v6.6.0 |
+--------------------+
1 row in set (0.00 sec)
mysql> show config where name like 'rocksdb%block-size%';
+------+----------------------+------------------------------+-------+
| Type | Instance | Name | Value |
+------+----------------------+------------------------------+-------+
| tikv | 192.168.31.201:20160 | rocksdb.defaultcf.block-size | 32KiB |
| tikv | 192.168.31.201:20160 | rocksdb.lockcf.block-size | 16KiB |
| tikv | 192.168.31.201:20160 | rocksdb.raftcf.block-size | 16KiB |
| tikv | 192.168.31.201:20160 | rocksdb.writecf.block-size | 32KiB |
+------+----------------------+------------------------------+-------+
4 rows in set (0.01 sec)
--这里设置为1的作用是方便统计单个copTask的耗时,单个copTask一直执行过程中cpu消耗为100%(一个cpu)
mysql> set tidb_distsql_scan_concurrency=1;
Query OK, 0 rows affected (0.00 sec)
mysql> explain analyze select sum(C_NATIONKEY) from customer;
+----------------------------+------------+---------+-----------+----------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------+-----------+------+
| id | estRows | actRows | task | access object | execution info | operator info | memory | disk |
+----------------------------+------------+---------+-----------+----------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------+-----------+------+
| StreamAgg_16 | 1.00 | 1 | root | | time:754.7ms, loops:2 | funcs:sum(Column#11)->Column#9 | 1.45 KB | N/A |
| └─TableReader_17 | 1.00 | 3 | root | | time:754.7ms, loops:2, cop_task: {num: 3, max: 306.3ms, min: 169ms, avg: 251.4ms, p95: 306.3ms, max_proc_keys: 603265, p95_proc_keys: 603265, tot_proc: 751ms, rpc_num: 3, rpc_time: 754.2ms, copr_cache: disabled, build_task_duration: 12.4µs, max_distsql_concurrency: 1} | data:StreamAgg_8 | 442 Bytes | N/A |
| └─StreamAgg_8 | 1.00 | 3 | cop[tikv] | | tikv_task:{proc max:305ms, min:168ms, avg: 250.3ms, p80:305ms, p95:305ms, iters:1466, tasks:3}, scan_detail: {total_process_keys: 1500000, total_process_keys_size: 305225771, total_keys: 1500003, get_snapshot_time: 76.4µs, rocksdb: {key_skipped_count: 1500000, block: {cache_hit_count: 10004}}} | funcs:sum(tpch.customer.c_nationkey)->Column#11 | N/A | N/A |
| └─TableFullScan_15 | 1500000.00 | 1500000 | cop[tikv] | table:customer | tikv_task:{proc max:246ms, min:132ms, avg: 204ms, p80:246ms, p95:246ms, iters:1466, tasks:3} | keep order:false | N/A | N/A |
+----------------------------+------------+---------+-----------+----------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------+-----------+------+
4 rows in set (0.76 sec)
从执行计划信息中total_process_keys: 1500000, total_process_keys_size: 305225771, total_keys: 1500003可以看到所有扫描过的数据几乎不包含未GC的数据,均是有效读数据。从rocksdb: {key_skipped_count: 1500000, block: {cache_hit_count: 10004}}}可以看到均发生的是逻辑读(内存读),次数为10004。
这里表扫描算子是:TableFullScan_15,一共执行了3个task,每一个task执行时间平均为:avg: 204ms,所以表扫描一共花费的时间为612毫秒。
MySQL的相关测试如下:
mysql> select version();
+-----------+
| version() |
+-----------+
| 8.0.32 |
+-----------+
1 row in set (0.00 sec)
mysql> show variables like '%innodb_page_size%';
+------------------+-------+
| Variable_name | Value |
+------------------+-------+
| innodb_page_size | 16384 |
+------------------+-------+
1 row in set (0.00 sec)
mysql> show status like 'Innodb_data_reads';
+-------------------+---------+
| Variable_name | Value |
+-------------------+---------+
| Innodb_data_reads | 3710851 |
+-------------------+---------+
1 row in set (0.01 sec)
mysql> show status like 'Innodb_buffer_pool_read_requests';
+----------------------------------+-----------+
| Variable_name | Value |
+----------------------------------+-----------+
| Innodb_buffer_pool_read_requests | 414272227 |
+----------------------------------+-----------+
1 row in set (0.00 sec)
mysql> explain analyze select sum(C_NATIONKEY) from customer;
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| -> Aggregate: sum(customer.C_NATIONKEY) (cost=234307.26 rows=1) (actual time=334.921..334.922 rows=1 loops=1)
-> Table scan on customer (cost=120530.06 rows=1137772) (actual time=0.097..237.381 rows=1500000 loops=1)
|
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.33 sec)
mysql> show status like 'Innodb_data_reads';
+-------------------+---------+
| Variable_name | Value |
+-------------------+---------+
| Innodb_data_reads | 3710851 |
+-------------------+---------+
1 row in set (0.00 sec)
mysql> show status like 'Innodb_buffer_pool_read_requests';
+----------------------------------+-----------+
| Variable_name | Value |
+----------------------------------+-----------+
| Innodb_buffer_pool_read_requests | 414310612 |
+----------------------------------+-----------+
1 row in set (0.00 sec)
从语句执行前后的Innodb_data_reads指标可以看到都是3710851 ,说明没有发生物理读。
从语句执行前后的Innodb_buffer_pool_read_requests 指标可以看到发生的逻辑读次数为:414310612 - 414272227 = 38385
从执行信息中:Table scan on customer (cost=120530.06 rows=1137772) (actual time=0.097…237.381 rows=1500000 loops=1)可以看到表扫描的实际执行时间一共为237.381毫秒。
postgresql的相关测试如下:
tpch=# select version();
version
---------------------------------------------------------------------------------------------------------
PostgreSQL 15.2 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-44), 64-bit
(1 row)
tpch=# set max_parallel_workers_per_gather=0;
SET
tpch=# set track_io_timing=on;
SET
tpch=# SELECT current_setting('block_size');
current_setting
-----------------
8192
(1 row)
tpch=# explain (analyze true,buffers true,verbose true,timing) select sum(C_NATIONKEY) from customer;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=56484.90..56484.91 rows=1 width=32) (actual time=234.525..234.526 rows=1 loops=1)
Output: sum(c_nationkey)
Buffers: shared hit=37734
-> Seq Scan on public.customer (cost=0.00..52734.72 rows=1500072 width=8) (actual time=0.012..111.046 rows=1500000 loops=1)
Output: c_custkey, c_name, c_address, c_nationkey, c_phone, c_acctbal, c_mktsegment, c_comment
Buffers: shared hit=37734
Planning Time: 0.060 ms
Execution Time: 234.556 ms
(8 rows)
从执行信息中可以看到全部是逻辑读且一共发生了37734次。
从表扫描这里:Seq Scan on public.customer (cost=0.00…52734.72 rows=1500072 width=8) (actual time=0.012…111.046 rows=1500000 loops=1)可以看到单线程表扫描一共花费111.046毫秒。
在开启并行的情况下(比如tidb默认distsql_scan_concurrency=15),其它数据库也都能有较好的加速,CPU充足情况基本线性。某些行列混存的数据库表现要更好。
全内存扫描时间长意味着花费的CPU时间长,所以相同处理量的情况下需要的资源就要多。
所以这里请问下相同数据量情况下为何tikv的copTask的扫描逻辑读耗时这么长呢?是否后续会有优化空间?