tidb index join 强行指定连接顺序后执行时间变化的问题

各位大佬,测试下tpch 库下面sql 语句的执行性能
涉及customer表(b表)和orders表(a表)

mysql> select count() from customer;预先格式化的文本
±---------+
| count(
) |
±---------+
| 3000000 |
±---------+
1 row in set (0.41 sec)

mysql> select count() from orders;
±---------+
| count(
) |
±---------+
| 29955968 |
±---------+
1 row in set (4.38 sec)

1.首先执行了语句1,index join的连接顺序由优化器自己负责,b表是外表,a表是内表,发现执行时间在52.99s

mysql> explain analyze select /*+ INL_JOIN(a,b) */ a.O_ORDERKEY, a.O_CUSTKEY,b.C_NAME  from orders a  join customer b on  a.O_CUSTKEY = b.C_CUSTKEY;
+-----------------------------+-------------+----------+-----------+-------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------+------+
| id                          | estRows     | actRows  | task      | access object                             | execution info                                                                                                                                                                                                                                                                                               | operator info                                                                                                                                                       | memory  | disk |
+-----------------------------+-------------+----------+-----------+-------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------+------+
| IndexJoin_19                | 29955968.00 | 29955968 | root      |                                           | time:53s, loops:29255, inner:{total:4m16.6s, concurrency:5, task:126, construct:1.76s, fetch:4m11.4s, build:3.42s}, probe:4.25s                                                                                                                                                                              | inner join, inner:IndexReader_18, outer key:tpch.customer.c_custkey, inner key:tpch.orders.o_custkey, equal cond:eq(tpch.customer.c_custkey, tpch.orders.o_custkey) | 55.6 MB | N/A  |
| ├─TableReader_31(Build)     | 3000000.00  | 3000000  | root      |                                           | time:93.9ms, loops:2941, cop_task: {num: 116, max: 429.3ms, min: 3.11ms, avg: 88ms, p95: 266.5ms, max_proc_keys: 50144, p95_proc_keys: 50144, tot_proc: 4.35s, tot_wait: 434ms, rpc_num: 116, rpc_time: 10.2s, copr_cache: disabled, distsql_concurrency: 15}                                                | data:TableFullScan_30                                                                                                                                               | 13.0 MB | N/A  |
| │ └─TableFullScan_30        | 3000000.00  | 3000000  | cop[tikv] | table:b                                   | tikv_task:{proc max:278ms, min:0s, avg: 35.5ms, p80:50ms, p95:155ms, iters:3387, tasks:116}, scan_detail: {total_process_keys: 3000000, total_process_keys_size: 610451426, total_keys: 3000116, get_snapshot_time: 41.6ms, rocksdb: {key_skipped_count: 3000000, block: {cache_hit_count: 10583}}}          | keep order:false                                                                                                                                                    | N/A     | N/A  |
| └─IndexReader_18(Probe)     | 29955968.00 | 29955968 | root      |                                           | time:4m7.8s, loops:29678, cop_task: {num: 2635, max: 906.4ms, min: 2.16ms, avg: 119.1ms, p95: 436.1ms, max_proc_keys: 50144, p95_proc_keys: 50144, tot_proc: 2m55.8s, tot_wait: 22.4s, rpc_num: 2635, rpc_time: 5m13.7s, copr_cache: disabled, distsql_concurrency: 15}                                      | index:IndexRangeScan_17                                                                                                                                             | 12.1 KB | N/A  |
|   └─IndexRangeScan_17       | 29955968.00 | 29955968 | cop[tikv] | table:a, index:index_o_custkey(O_CUSTKEY) | tikv_task:{proc max:723ms, min:0s, avg: 66ms, p80:104ms, p95:352ms, iters:39564, tasks:2635}, scan_detail: {total_process_keys: 29955968, total_process_keys_size: 1377974528, total_keys: 32958329, get_snapshot_time: 403.3ms, rocksdb: {key_skipped_count: 29955968, block: {cache_hit_count: 16464198}}} | range: decided by [eq(tpch.orders.o_custkey, tpch.customer.c_custkey)], keep order:false                                                                            | N/A     | N/A  |
+-----------------------------+-------------+----------+-----------+-------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------+------+
5 rows in set (52.99 sec)

2.执行语句2,语句2的强行指定了连接顺序,让a表当外表,b表当内表,发现执行时间缩短了很多为27.66s

mysql>  explain analyze select /*+ INL_JOIN(b) */   a.O_ORDERKEY, a.O_CUSTKEY,b.C_NAME  from orders a  join customer b on  a.O_CUSTKEY = b.C_CUSTKEY;
+-----------------------------+-------------+----------+-----------+-------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------+------+
| id                          | estRows     | actRows  | task      | access object                             | execution info                                                                                                                                                                                                                                                                                           | operator info                                                                                                                                                      | memory  | disk |
+-----------------------------+-------------+----------+-----------+-------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------+------+
| IndexJoin_12                | 29955968.00 | 29955968 | root      |                                           | time:27.6s, loops:29255, inner:{total:2m9.3s, concurrency:5, task:1179, construct:11.6s, fetch:1m57.3s, build:398.2ms}, probe:7.04s                                                                                                                                                                      | inner join, inner:TableReader_9, outer key:tpch.orders.o_custkey, inner key:tpch.customer.c_custkey, equal cond:eq(tpch.orders.o_custkey, tpch.customer.c_custkey) | 15.2 MB | N/A  |
| ├─IndexReader_20(Build)     | 29955968.00 | 29955968 | root      |                                           | time:523.5ms, loops:29300, cop_task: {num: 784, max: 250.2ms, min: 2.26ms, avg: 81.5ms, p95: 168.1ms, max_proc_keys: 50144, p95_proc_keys: 50144, tot_proc: 21.1s, tot_wait: 866ms, rpc_num: 784, rpc_time: 1m3.9s, copr_cache: disabled, distsql_concurrency: 15}                                       | index:IndexFullScan_19                                                                                                                                             | 12.3 MB | N/A  |
| │ └─IndexFullScan_19        | 29955968.00 | 29955968 | cop[tikv] | table:a, index:index_o_custkey(O_CUSTKEY) | tikv_task:{proc max:115ms, min:0s, avg: 25.9ms, p80:39ms, p95:66ms, iters:32383, tasks:784}, scan_detail: {total_process_keys: 29955968, total_process_keys_size: 1377974528, total_keys: 29956752, get_snapshot_time: 133.9ms, rocksdb: {key_skipped_count: 29955968, block: {cache_hit_count: 23760}}} | keep order:false                                                                                                                                                   | N/A     | N/A  |
| └─TableReader_9(Probe)      | 29955968.00 | 2001764  | root      |                                           | time:1m56.3s, loops:3529, cop_task: {num: 5146, max: 140.6ms, min: 621.7µs, avg: 25.4ms, p95: 56.4ms, max_proc_keys: 992, p95_proc_keys: 992, tot_proc: 16.5s, tot_wait: 6.31s, rpc_num: 5146, rpc_time: 2m10.6s, copr_cache: disabled, distsql_concurrency: 15}                                         | data:TableRangeScan_8                                                                                                                                              | N/A     | N/A  |
|   └─TableRangeScan_8        | 29955968.00 | 2001764  | cop[tikv] | table:b                                   | tikv_task:{proc max:75ms, min:0s, avg: 3.59ms, p80:5ms, p95:10ms, iters:17550, tasks:5146}, scan_detail: {total_process_keys: 2001764, total_process_keys_size: 407321028, total_keys: 3005092, get_snapshot_time: 129.1ms, rocksdb: {key_skipped_count: 1999798, block: {cache_hit_count: 5439074}}}    | range: decided by [tpch.orders.o_custkey], keep order:false                                                                                                        | N/A     | N/A  |
+-----------------------------+-------------+----------+-----------+-------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------+------+
5 rows in set (27.66 sec)

这点很令人困惑,我刚开始猜想是所以即使a表数据量大,但因为a表用上了索引,不需要回表,虽然有近3000万行,但是扫描速度很快,所以a表当外表 但是执行速度更快

我就试了下第三条语句
mysql> explain analyze select /*+ INL_JOIN(b) */ a.O_ORDERKEY, a.O_CUSTKEY,a.O_ORDERDATE,b.C_NAME from orders a join customer b on a.O_CUSTKEY = b.C_CUSTKEY;

mysql>  explain analyze select /*+ INL_JOIN(b) */   a.O_ORDERKEY, a.O_CUSTKEY,a.O_ORDERDATE,b.C_NAME  from orders a  join customer b on  a.O_CUSTKEY = b.C_CUSTKEY;
+-----------------------------+-------------+----------+-----------+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------+------+
| id                          | estRows     | actRows  | task      | access object | execution info                                                                                                                                                                                                                                                                                                                                                      | operator info                                                                                                                                                      | memory  | disk |
+-----------------------------+-------------+----------+-----------+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------+------+
| IndexJoin_12                | 29955968.00 | 29955968 | root      |               | time:2m58.7s, loops:29255, inner:{total:14m50.6s, concurrency:5, task:1180, construct:46.3s, fetch:13m59.3s, build:4.91s}, probe:18.3s                                                                                                                                                                                                                              | inner join, inner:TableReader_9, outer key:tpch.orders.o_custkey, inner key:tpch.customer.c_custkey, equal cond:eq(tpch.orders.o_custkey, tpch.customer.c_custkey) | 21.9 MB | N/A  |
| ├─TableReader_18(Build)     | 29955968.00 | 29955968 | root      |               | time:1.59s, loops:29317, cop_task: {num: 1025, max: 608.4ms, min: 3.23ms, avg: 158.7ms, p95: 333.4ms, max_proc_keys: 50144, p95_proc_keys: 50144, tot_proc: 1m30.4s, tot_wait: 7.71s, rpc_num: 1025, rpc_time: 2m42.6s, copr_cache: disabled, distsql_concurrency: 15}                                                                                              | data:TableFullScan_17                                                                                                                                              | 18.4 MB | N/A  |
| │ └─TableFullScan_17        | 29955968.00 | 29955968 | cop[tikv] | table:a       | tikv_task:{proc max:515ms, min:0s, avg: 84.3ms, p80:154ms, p95:228ms, iters:33322, tasks:1025}, scan_detail: {total_process_keys: 29955968, total_process_keys_size: 4549646548, total_keys: 29956993, get_snapshot_time: 1.68s, rocksdb: {key_skipped_count: 29955968, block: {cache_hit_count: 63232, read_count: 16425, read_byte: 282.6 MB, read_time: 11.8s}}} | keep order:false                                                                                                                                                   | N/A     | N/A  |
| └─TableReader_9(Probe)      | 29955968.00 | 29744299 | root      |               | time:13m43.2s, loops:34209, cop_task: {num: 39435, max: 595.3ms, min: 657.8µs, avg: 77.6ms, p95: 211.6ms, max_proc_keys: 2016, p95_proc_keys: 2016, tot_proc: 21m47s, tot_wait: 8m43s, rpc_num: 39435, rpc_time: 50m57.4s, copr_cache: disabled, distsql_concurrency: 15}                                                                                           | data:TableRangeScan_8                                                                                                                                              | N/A     | N/A  |
|   └─TableRangeScan_8        | 29955968.00 | 29744299 | cop[tikv] | table:b       | tikv_task:{proc max:556ms, min:0s, avg: 33.4ms, p80:60ms, p95:133ms, iters:162048, tasks:39435}, scan_detail: {total_process_keys: 29744299, total_process_keys_size: 6052480951, total_keys: 29943155, get_snapshot_time: 2.1s, rocksdb: {key_skipped_count: 335362, block: {cache_hit_count: 141357867, read_count: 6, read_byte: 616.2 KB, read_time: 72.8ms}}}  | range: decided by [tpch.orders.o_custkey], keep order:false                                                                                                        | N/A     | N/A  |
+-----------------------------+-------------+----------+-----------+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------+------+
5 rows in set (2 min 58.70 sec)

在第三条语句里面,a表用不上索引,整个执行时间达到了2 min 58.70 sec,很令人困惑,并且在b表的probe阶段有29744299行,但是b表整个只有300万行,这点不知道怎么解释

求大佬解释下
1.为什么语句2会比语句1快很多
我的猜想
1.a表用上了索引,不需要回表,扫描速度快
2.在index join的probe阶段,语句1需要对a表进行近3000万行表的二级索引查找数据,还不如全表300万行全表扫描速度快

2.语句3里面的执行计划,b表的probe阶段有29744299行,但是b表整个只有300万行,这点不知道怎么解释

1 个赞

背景知识:IndexJoin的其中一个优化点是查找数据时候尽量将seek转化成next顺序扫描,seek需要二分法查找可能会涉及多个block的读取,next是链表查找,完全是下一个内存寻址,效率很高。
举例:假设查找key=x,y两个值
1、如果x,y完全连续,那么tikv查找x时候用seek定位,然后尝试next查找下一键值是否为y,因为连续则发生一次next即可找到;
2、如果x,y近似连续,那么next有限的几次后还是能找到y,在有限的next扫描次数之内效率还是远高于seek;
3、如果x,y完全不连续,距离很远,tikv中应该是默认next10次找不到那么就走seek查找y这个key,也就是发生两次seek操作,读取的block数可能会比较多。

那么对于IndexJoin来说左表(我痛恨tidb执行计划上的build和probe!!!)来一批数据,这批数据根据关联等值条件得到key,并对key进行排序去重!!!,排序后形成kvRange去tikv中读取右表数据,为何要对key排序?就是要尽量将seek转为next操作,如果大多数seek操作可以转换成next操作,那么效率就会大大提升,毕竟读取的block数会大大减少。
基于上面的理论知识来分析你这个问题:

  • 为什么语句2会比语句1快很多?

    1. 最浅显的原因是语句2读取的block数远比语句1的少,所以更快。

    2. 深入分析:语句1中左表(customer b表)o_custkey是主键经过排序去重后(你这里应该是聚簇索引表)并没有过滤数据,还是需要去右表(order a表)拿所有匹配的数据,因此对于任意一个左表的key,右表几乎都能找到几个匹配的key并返回数据,查找的记录较多,产生了较多的block读取;语句2中左表(order a表)通过索引index_o_custkey(O_CUSTKEY)获取数据,但是读取索引整体上大致是有序的(虽然keeporder=false未严格排序,且一个region中key数量足够多不考虑跨region并行无序),因此左表拿到一批数据后(全局大致有序这一批数据也是大概率有序)先对key进行排序去重处理,在这个过程中可以去重大量重复key(毕竟customer.c_custkey和order.o_custkey是一对多关系),在整体上只需要找到2001764个key,左表全表扫描(走next,读取的block少),右表需要查找的key较少(远少于语句1的右表查找),所以语句2更快!

  • 为什么语句3里面的执行计划,b表的probe阶段有29744299行,但是b表整个只有300万行?
    对于语句3,左表(order a)是全表扫描,全表扫描对于聚簇索引(主键)来说大致是有序的,但是对于里面的o_custkey完全无序!因此在左表取一批数据后,找到关联条件(o_custkey),并对其进行排序去重,但因为整体无序所以落在这批数据中的o_custkey大概率无法去重,因此排序去重大概率是无用功,针对所有的key(29955968个)几乎都要去遍历右表查找相关记录(需要查找的key数为29744299,去重仅为29955968-29744299=21万),另外因为左表每批拿到的key大多数不连续就导致右表查找时候很多seek无法转换成next,产生了大量的block读,因此效率上远比语句1、语句2更差。

另外,贴执行计划这种长文本时候要排个版,不要直接贴上去,太难看了!
下面这里进行格式化排版。

1 个赞

此处或许需要加个《提问的艺术》的链接

1 个赞

利用 并对key进行排序去重!!! 的原理优化右表查询效率,我们先对order a表按照o_custkey进行排序,然后和customer b进行关联,则对customer b产生较少的记录读取,测试如下:

类似于语句3的原始语句:
mysql> explain analyze select /*+ INL_JOIN(a,b) */ a.O_ORDERKEY, a.O_CUSTKEY,b.C_NAME from orders a join customer b on a.O_CUSTKEY = b.C_CUSTKEY;
+-------------------------------+-------------+----------+-----------+---------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+------+
| id                            | estRows     | actRows  | task      | access object | execution info                                                                                                                                                                                                                                                                                                                            | operator info                                                                                                                                                               | memory   | disk |
+-------------------------------+-------------+----------+-----------+---------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+------+
| Projection_9                  | 15000000.00 | 15000000 | root      |               | time:41.6s, loops:14650, RU:139703.470597, Concurrency:5                                                                                                                                                                                                                                                                                  | tpch10.orders.o_orderkey, tpch10.orders.o_custkey, tpch10.customer.c_name                                                                                                   | 515.4 KB | N/A  |
| └─IndexJoin_14                | 15000000.00 | 15000000 | root      |               | time:40.2s, loops:14650, inner:{total:3m22.3s, concurrency:5, task:595, construct:23.6s, fetch:2m54.4s, build:4.34s}, probe:12.7s                                                                                                                                                                                                         | inner join, inner:TableReader_11, outer key:tpch10.orders.o_custkey, inner key:tpch10.customer.c_custkey, equal cond:eq(tpch10.orders.o_custkey, tpch10.customer.c_custkey) | 19.5 MB  | N/A  |
|   ├─TableReader_20(Build)     | 15000000.00 | 15000000 | root      |               | time:634.3ms, loops:14700, cop_task: {num: 696, max: 276.3ms, min: 643.3µs, avg: 54.7ms, p95: 144ms, max_proc_keys: 50144, p95_proc_keys: 50144, tot_proc: 11.2s, tot_wait: 1.24s, rpc_num: 696, rpc_time: 37.9s, copr_cache_hit_ratio: 0.00, build_task_duration: 36.3µs, max_distsql_concurrency: 15}                                   | data:TableFullScan_19                                                                                                                                                       | 12.3 MB  | N/A  |
|   │ └─TableFullScan_19        | 15000000.00 | 15000000 | cop[tikv] | table:a       | tikv_task:{proc max:164ms, min:0s, avg: 19.4ms, p80:38ms, p95:69ms, iters:17387, tasks:696}, scan_detail: {total_process_keys: 14996992, total_process_keys_size: 2277281755, total_keys: 14997686, get_snapshot_time: 9.31ms, rocksdb: {key_skipped_count: 14996992, block: {cache_hit_count: 78435}}}                                   | keep order:false                                                                                                                                                            | N/A      | N/A  |
|   └─TableReader_11(Probe)     | 15000000.00 | 14789709 | root      |               | time:2m42.4s, loops:16536, cop_task: {num: 18511, max: 319.9ms, min: 304.8µs, avg: 48.6ms, p95: 117.3ms, max_proc_keys: 2016, p95_proc_keys: 2016, tot_proc: 2m18.6s, tot_wait: 27.8s, rpc_num: 18511, rpc_time: 14m58.9s, copr_cache_hit_ratio: 0.00, build_task_duration: 1.46s, max_distsql_concurrency: 7, max_extra_concurrency: 3}  | data:TableRangeScan_10                                                                                                                                                      | N/A      | N/A  |
|     └─TableRangeScan_10       | 15000000.00 | 14789709 | cop[tikv] | table:b       | tikv_task:{proc max:0s, min:0s, avg: 12.5ms, p80:22ms, p95:52ms, iters:78329, tasks:18511}, scan_detail: {total_process_keys: 14789529, total_process_keys_size: 3009488955, total_keys: 14970326, get_snapshot_time: 386.7ms, rocksdb: {key_skipped_count: 332902, block: {cache_hit_count: 5801406}}}                                   | range: decided by [tpch10.orders.o_custkey], keep order:false                                                                                                               | N/A      | N/A  |
+-------------------------------+-------------+----------+-----------+---------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+------+
6 rows in set (41.59 sec)

--优化后的语句
--通过sort让order a表数据按照o_custkey进行排序,然后和customer b表关联,可以看到customer b表扫描记录数已经很少了,说明利用到了"对key进行排序去重!!!",即使带着sort算子(耗时较高)的执行时间都比上面SQL执行更快,产生的block读更少。
mysql> explain analyze select /*+ INL_JOIN(a,b) */ a.O_ORDERKEY, a.O_CUSTKEY,b.C_NAME from (select * from orders a order by o_custkey ) a join customer b on a.O_CUSTKEY = b.C_CUSTKEY;
+--------------------------------+-------------+----------+-----------+---------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+---------+
| id                             | estRows     | actRows  | task      | access object | execution info                                                                                                                                                                                                                                                                                                                | operator info                                                                                                                                                               | memory   | disk    |
+--------------------------------+-------------+----------+-----------+---------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+---------+
| Projection_11                  | 15000000.00 | 15000000 | root      |               | time:36.2s, loops:14650, RU:43678.919975, Concurrency:5                                                                                                                                                                                                                                                                       | tpch10.orders.o_orderkey, tpch10.orders.o_custkey, tpch10.customer.c_name                                                                                                   | 519.7 KB | N/A     |
| └─IndexJoin_17                 | 15000000.00 | 15000000 | root      |               | time:36.1s, loops:14650, inner:{total:10s, concurrency:5, task:594, construct:4.34s, fetch:5.52s, build:176.4ms}, probe:1.81s                                                                                                                                                                                                 | inner join, inner:TableReader_14, outer key:tpch10.orders.o_custkey, inner key:tpch10.customer.c_custkey, equal cond:eq(tpch10.orders.o_custkey, tpch10.customer.c_custkey) | 11.9 MB  | N/A     |
|   ├─Sort_22(Build)             | 15000000.00 | 15000000 | root      |               | time:35.5s, loops:14654                                                                                                                                                                                                                                                                                                       | tpch10.orders.o_custkey                                                                                                                                                     | 352.0 MB | 0 Bytes |
|   │ └─TableReader_25           | 15000000.00 | 15000000 | root      |               | time:1.42s, loops:14707, cop_task: {num: 696, max: 121.1ms, min: 371.9µs, avg: 29.6ms, p95: 71.3ms, max_proc_keys: 50144, p95_proc_keys: 50144, tot_proc: 10.3s, tot_wait: 258.5ms, rpc_num: 696, rpc_time: 20.6s, copr_cache_hit_ratio: 0.00, build_task_duration: 48.3µs, max_distsql_concurrency: 15}                      | data:TableFullScan_24                                                                                                                                                       | 4.80 MB  | N/A     |
|   │   └─TableFullScan_24       | 15000000.00 | 15000000 | cop[tikv] | table:a       | tikv_task:{proc max:80ms, min:0s, avg: 16.5ms, p80:35ms, p95:61ms, iters:17387, tasks:696}, scan_detail: {total_process_keys: 14999008, total_process_keys_size: 2277588077, total_keys: 14999703, get_snapshot_time: 7.91ms, rocksdb: {key_skipped_count: 14999008, block: {cache_hit_count: 78449}}}                        | keep order:false                                                                                                                                                            | N/A      | N/A     |
|   └─TableReader_14(Probe)      | 15000000.00 | 1000537  | root      |               | time:5.14s, loops:1774, cop_task: {num: 2282, max: 102.9ms, min: 280µs, avg: 2.18ms, p95: 4.83ms, max_proc_keys: 992, p95_proc_keys: 992, tot_proc: 2.89s, tot_wait: 198.8ms, rpc_num: 2282, rpc_time: 4.95s, copr_cache_hit_ratio: 0.00, build_task_duration: 24.2ms, max_distsql_concurrency: 2, max_extra_concurrency: 1}  | data:TableRangeScan_13                                                                                                                                                      | N/A      | N/A     |
|     └─TableRangeScan_13        | 15000000.00 | 1000537  | cop[tikv] | table:b       | tikv_task:{proc max:0s, min:0s, avg: 1.14ms, p80:2ms, p95:3ms, iters:7599, tasks:2282}, scan_detail: {total_process_keys: 1000537, total_process_keys_size: 203587915, total_keys: 1502181, get_snapshot_time: 31.7ms, rocksdb: {key_skipped_count: 999924, block: {cache_hit_count: 1023839}}}                               | range: decided by [tpch10.orders.o_custkey], keep order:false                                                                                                               | N/A      | N/A     |
+--------------------------------+-------------+----------+-----------+---------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+---------+
7 rows in set (36.18 sec)

上面例子只是阐述一种优化思路,比如对少量数据先做排序再join可能会起到意想不到的效果。

1 个赞

感谢大佬
有2个想法
1.如果像语句3,外表扫出来一批数据无序的话,去内表进行匹配会导致时间长,那么可否在内存中开辟一块区域,从外表来一批数据就根据关联字段排个序,然后去内表匹配,这样是否性能会好一点

2.如果像语句1,a表利用索引被匹配到数据后,由于还需要a.O_ORDERKEY ,能否也在内存中对主键ID 排个序,这样回表的时候可以多利用next,这样效率是不是高一点?

不可以,因为我们是知道业务特性后才这么优化的,因为order的custkey都可以在customer中找到,所以效率并不低。假设order表数据绝大多数在customer中找不到,那么大表做左表本来效率就慢,再手工排序更慢,排序后去customer查找数据并不一定连续,所以优化器目前设计是最通用的没有问题,这种在执行阶段才能知道的数据分布情况需要了解业务特性的前提下进行优化。如果想进一步优化就需要让优化器知道两个表关联的特性,也就是需要支持多表关联统计信息视图,这样子才可能会有更大的优化空间。

1 个赞

其实,我后来才发现语句3与语句1和语句2的语义有点不同,语句3 多了个a.O_ORDERDATE 返回字段

explain analyze   a.O_ORDERKEY, a.O_CUSTKEY,a.O_ORDERDATE,b.C_NAME  from orders a  join customer b on  a.O_CUSTKEY = b.C_CUSTKEY;

我今天把语句3的index join 调换了下顺序,下文简称语句4,在语句4里面让a表当inner表,发现执行时间在8分钟左右,同语句3相比,我认为主要时间应该是花在在对inner表里面扫描的时候,需要回表,而a表有近3000万行,这个代价巨大,

mysql>  explain analyze select /*+ INL_JOIN(a) */   a.O_ORDERKEY, a.O_CUSTKEY,a.O_ORDERDATE,b.C_NAME  from orders a  join customer b on  a.O_CUSTKEY = b.C_CUSTKEY;
+----------------------------------+-------------+----------+-----------+-------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------+------+
| id                               | estRows     | actRows  | task      | access object                             | execution info                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          | operator info                                                                                                                                                       | memory  | disk |
+----------------------------------+-------------+----------+-----------+-------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------+------+
| IndexJoin_20                     | 29955968.00 | 29955968 | root      |                                           | time:8m1.2s, loops:29255, inner:{total:39m42.8s, concurrency:5, task:125, construct:5.63s, fetch:39m26.7s, build:10.5s}, probe:11.3s                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    | inner join, inner:IndexLookUp_19, outer key:tpch.customer.c_custkey, inner key:tpch.orders.o_custkey, equal cond:eq(tpch.customer.c_custkey, tpch.orders.o_custkey) | 70.9 MB | N/A  |
| ├─TableReader_29(Build)          | 3000000.00  | 3000000  | root      |                                           | time:367.5ms, loops:2942, cop_task: {num: 116, max: 1m44.6s, min: 4.91ms, avg: 3.3s, p95: 28.3s, max_proc_keys: 50144, p95_proc_keys: 50144, tot_proc: 4m52.7s, tot_wait: 13.3s, rpc_num: 117, rpc_time: 6m21.6s, copr_cache: disabled, distsql_concurrency: 15}, backoff{tikvRPC: 84ms, regionMiss: 2ms}                                                                                                                                                                                                                                                                                                                                                                                                               | data:TableFullScan_28                                                                                                                                               | 12.2 MB | N/A  |
| │ └─TableFullScan_28             | 3000000.00  | 3000000  | cop[tikv] | table:b                                   | tikv_task:{proc max:57.8s, min:4ms, avg: 2.52s, p80:624ms, p95:27.8s, iters:3387, tasks:116}, scan_detail: {total_process_keys: 3000000, total_process_keys_size: 610451426, total_keys: 3000116, get_snapshot_time: 6.26s, rocksdb: {key_skipped_count: 3000000, block: {cache_hit_count: 832, read_count: 9903, read_byte: 226.6 MB, read_time: 18.8s}}}                                                                                                                                                                                                                                                                                                                                                              | keep order:false                                                                                                                                                    | N/A     | N/A  |
| └─IndexLookUp_19(Probe)          | 29955968.00 | 29955968 | root      |                                           | time:39m15.7s, loops:29445, index_task: {total_time: 27m38.9s, fetch_handle: 22m33s, build: 4.71ms, wait: 5m5.9s}, table_task: {total_time: 1h50m6.6s, num: 1900, concurrency: 5}, next: {wait_index: 4m0.1s, wait_table_lookup_build: 6.27s, wait_table_lookup_resp: 34m41.6s}                                                                                                                                                                                                                                                                                                                                                                                                                                         |                                                                                                                                                                     | 78.3 KB | N/A  |
|   ├─IndexRangeScan_17(Build)     | 29955968.00 | 29955968 | cop[tikv] | table:a, index:index_o_custkey(O_CUSTKEY) | time:22m21.2s, loops:29913, cop_task: {num: 2610, max: 59.6s, min: 2.64ms, avg: 821ms, p95: 1.97s, max_proc_keys: 50144, p95_proc_keys: 50144, tot_proc: 26m49.2s, tot_wait: 4m18.1s, rpc_num: 2692, rpc_time: 38m34.8s, copr_cache: disabled, distsql_concurrency: 15}, backoff{regionMiss: 82ms, pdRPC: 679ms, tikvRPC: 710ms}, tikv_task:{proc max:59.3s, min:0s, avg: 616.5ms, p80:585ms, p95:1.54s, iters:39469, tasks:2610}, scan_detail: {total_process_keys: 29955968, total_process_keys_size: 1377974528, total_keys: 32958313, get_snapshot_time: 7.22s, rocksdb: {key_skipped_count: 29955968, block: {cache_hit_count: 19160979, read_count: 19599, read_byte: 235.6 MB, read_time: 4m46.7s}}}             | range: decided by [eq(tpch.orders.o_custkey, tpch.customer.c_custkey)], keep order:false                                                                            | N/A     | N/A  |
|   └─TableRowIDScan_18(Probe)     | 29955968.00 | 29955968 | cop[tikv] | table:a                                   | time:1h48m4.5s, loops:31426, cop_task: {num: 96899, max: 1m45s, min: 532.3µs, avg: 700.1ms, p95: 1.21s, max_proc_keys: 655, p95_proc_keys: 429, tot_proc: 13h47m4.2s, tot_wait: 2h26m40.7s, rpc_num: 98507, rpc_time: 20h9m0.5s, copr_cache: disabled, distsql_concurrency: 15}, backoff{tikvRPC: 2.47s, regionMiss: 2.19s, pdRPC: 267ms}, tikv_task:{proc max:59.7s, min:0s, avg: 512.4ms, p80:308ms, p95:938ms, iters:338924, tasks:96899}, scan_detail: {total_process_keys: 29955968, total_process_keys_size: 4549646548, total_keys: 29972690, get_snapshot_time: 1m29.3s, rocksdb: {key_skipped_count: 33458, block: {cache_hit_count: 115909056, read_count: 77952, read_byte: 1.34 GB, read_time: 27m18.6s}}}  | keep order:false                                                                                                                                                    | N/A     | N/A  |
+----------------------------------+-------------+----------+-----------+-------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------+------+
6 rows in set (8 min 1.24 sec)

TiDB回表代价是很高的,我认为比传统数据库高不少。
看指标为了找到千万级别的记录需要扫描亿级别的Block。
total_keys: 29972690
cache_hit_count: 115909056

1、本来回表代价就相当高(不管分布式还是传统数据库)。

2、回表代价:

  1. 传统堆表回表代价最低,因为RID是物理的,且优化扫可以在一个BLOCK中找到多个RID(我记得可以,待确认,如此一来回表查找多个记录可能只需要远比RID个数少的BLOCK数)。

  2. 传统B+TREE(聚簇索引)结构表,RID是逻辑的(因为有序),但B+TREE对Key的查找代价相比LSMTree更低,需要扫描 的BLOCK数更少。

  3. LSMTree结构表,RID是逻辑的(因为有序),通过主键key回表查找行记录查找效率不如B+Tree,这里每次查找都要读取4个block,代价相当高(这里tidb回表是否会对key进行排序去重待确认)。

感谢大佬回复
1.有个想法不知道是否成熟,在tidb 里面,如果利用二级索引range 扫描回表的话,扫描范围越大,那么得到的row id 就越离散,那就是回表的时候,可能很多的用seek 而不是next,如果是否有可能回表之前对rowid 排个序,这样就可以回表的时候尽量next

2.至于语句4,如果只指定走index join ,tidb的表现是a表当inner表,不是很理解这一点(当然如果不指定hint,tidb走hash join,速度更快)

mysql> explain analyze select /*+ INL_JOIN(a,b) */   a.O_ORDERKEY, a.O_CUSTKEY,a.O_ORDERDATE,b.C_NAME  from orders a  join customer b on  a.O_CUSTKEY = b.C_CUSTKEY;
+----------------------------------+-------------+----------+-----------+-------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+------+
| id                               | estRows     | actRows  | task      | access object                             | execution info                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       | operator info                                                                                                                                                       | memory   | disk |
+----------------------------------+-------------+----------+-----------+-------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+------+
| IndexJoin_20                     | 29955968.00 | 29955968 | root      |                                           | time:5m11.9s, loops:29255, inner:{total:25m27.8s, concurrency:5, task:126, construct:2.48s, fetch:25m20.4s, build:4.88s}, probe:10.4s                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                | inner join, inner:IndexLookUp_19, outer key:tpch.customer.c_custkey, inner key:tpch.orders.o_custkey, equal cond:eq(tpch.customer.c_custkey, tpch.orders.o_custkey) | 69.1 MB  | N/A  |
| ├─TableReader_31(Build)          | 3000000.00  | 3000000  | root      |                                           | time:168ms, loops:2942, cop_task: {num: 116, max: 6.03s, min: 2.46ms, avg: 312.7ms, p95: 1.23s, max_proc_keys: 50144, p95_proc_keys: 50144, tot_proc: 23.5s, tot_wait: 5.25s, rpc_num: 116, rpc_time: 36.3s, copr_cache: disabled, distsql_concurrency: 15}                                                                                                                                                                                                                                                                                                                                                                                                                          | data:TableFullScan_30                                                                                                                                               | 13.0 MB  | N/A  |
| │ └─TableFullScan_30             | 3000000.00  | 3000000  | cop[tikv] | table:b                                   | tikv_task:{proc max:5.75s, min:0s, avg: 199.1ms, p80:235ms, p95:717ms, iters:3387, tasks:116}, scan_detail: {total_process_keys: 3000000, total_process_keys_size: 610451426, total_keys: 3000116, get_snapshot_time: 2.4s, rocksdb: {key_skipped_count: 3000000, block: {cache_hit_count: 7844, read_count: 3203, read_byte: 71.6 MB, read_time: 2.59s}}}                                                                                                                                                                                                                                                                                                                           | keep order:false                                                                                                                                                    | N/A      | N/A  |
| └─IndexLookUp_19(Probe)          | 29955968.00 | 29955968 | root      |                                           | time:25m16.2s, loops:29445, index_task: {total_time: 21m0.6s, fetch_handle: 19m11.8s, build: 4.73ms, wait: 1m48.7s}, table_task: {total_time: 1h0m25.6s, num: 1902, concurrency: 5}, next: {wait_index: 3m21.6s, wait_table_lookup_build: 1.34s, wait_table_lookup_resp: 21m51.1s}                                                                                                                                                                                                                                                                                                                                                                                                   |                                                                                                                                                                     | 100.3 KB | N/A  |
|   ├─IndexRangeScan_17(Build)     | 29955968.00 | 29955968 | cop[tikv] | table:a, index:index_o_custkey(O_CUSTKEY) | time:19m10.5s, loops:29913, cop_task: {num: 2639, max: 5.57s, min: 2.9ms, avg: 603ms, p95: 1.99s, max_proc_keys: 50144, p95_proc_keys: 50144, tot_proc: 18m52.7s, tot_wait: 4m20.6s, rpc_num: 2639, rpc_time: 26m31.3s, copr_cache: disabled, distsql_concurrency: 15}, tikv_task:{proc max:5.06s, min:0s, avg: 428.6ms, p80:767ms, p95:1.67s, iters:39563, tasks:2639}, scan_detail: {total_process_keys: 29955968, total_process_keys_size: 1377974528, total_keys: 32958335, get_snapshot_time: 5.7s, rocksdb: {key_skipped_count: 29955968, block: {cache_hit_count: 28190101, read_count: 4508, read_byte: 40.3 MB, read_time: 33.7s}}}                                         | range: decided by [eq(tpch.orders.o_custkey, tpch.customer.c_custkey)], keep order:false                                                                            | N/A      | N/A  |
|   └─TableRowIDScan_18(Probe)     | 29955968.00 | 29955968 | cop[tikv] | table:a                                   | time:1h0m6.3s, loops:31425, cop_task: {num: 97002, max: 4.45s, min: 529.9µs, avg: 364.6ms, p95: 1.54s, max_proc_keys: 656, p95_proc_keys: 429, tot_proc: 6h49m25.4s, tot_wait: 2h22m11.3s, rpc_num: 97032, rpc_time: 9h49m33.1s, copr_cache: disabled, distsql_concurrency: 15}, backoff{regionMiss: 56ms}, tikv_task:{proc max:4.03s, min:0s, avg: 253.6ms, p80:452ms, p95:1.26s, iters:339294, tasks:97002}, scan_detail: {total_process_keys: 29955968, total_process_keys_size: 4549646548, total_keys: 29972627, get_snapshot_time: 8.46s, rocksdb: {key_skipped_count: 33331, block: {cache_hit_count: 176784267, read_count: 15276, read_byte: 272.8 MB, read_time: 32.4s}}}  | keep order:false                                                                                                                                                    | N/A      | N/A  |
+----------------------------------+-------------+----------+-----------+-------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+------+
6 rows in set (5 min 11.96 sec)

而MySQL的执行计划显示,a表是当outer表

mysql>  explain  analyze select   a.O_ORDERKEY, a.O_CUSTKEY,a.O_ORDERDATE,b.C_NAME  from orders a  join customer b on  a.O_CUSTKEY = b.C_CUSTKEY;

+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN                                                                                                                                                                                                                                                                                                                                                                         |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| -> Nested loop inner join  (cost=18038574.80 rows=29529477) (actual time=0.093..388788.022 rows=30000000 loops=1)
    -> Table scan on a  (cost=3488289.09 rows=29529477) (actual time=0.075..270060.577 rows=30000000 loops=1)
    -> Single-row index lookup on b using PRIMARY (C_CUSTKEY=a.O_CUSTKEY)  (cost=0.39 rows=1) (actual time=0.004..0.004 rows=1 loops=30000000)
 |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (6 min 32.82 sec)

1点就是我上面说的第三点待确认的地方。
2点表的连接顺序join reorder在逻辑优化阶段已经确认,且是基于规则的。存在手工指定leading连表顺序后最终执行计划成本更低但是优化器不走的情况。为什么mysql没问题,tidb有问题,我猜测mysql的join reorder算法更牛掰一些吧。

这种有办法优化吗?

如果说的是这个问题,那么join reorder可以在产品上进一步优化,但是这是一个大工程。但是我们可以先退而求其次让用户自己指定想要的连表顺序,但是也是不太容易,参考帖子:Hint中表连接顺序LEADING应支持多表关联优先级和兼容物理优化阶段Join算法的选择

1 个赞

此话题已在最后回复的 60 天后被自动关闭。不再允许新回复。