tidb exists 语句优化器的执行效率问题

tidb version: 6.5.3
在研究exists 语句执行计划的时候,发现优化器在不添加SEMI_JOIN_REWRITE() hint 情况下,默认的执行计划相比较来说比较慢,详情如下
1.未添加SEMI_JOIN_REWRITE hint

mysql> explain analyze select * from customer where exists (select   1 from orders where orders.O_CUSTKEY = customer.C_CUSTKEY  and orders.O_TOTALPRICE >30);
+----------------------------------+-------------+----------+-----------+------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+------+
| id                               | estRows     | actRows  | task      | access object                                  | execution info                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          | operator info                                                                                                                                                      | memory   | disk |
+----------------------------------+-------------+----------+-----------+------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+------+
| IndexHashJoin_16                 | 2400000.00  | 1999956  | root      |                                                | time:5m37.9s, loops:1957, inner:{total:27m57.4s, concurrency:5, task:126, construct:1.82s, fetch:27m47.9s, build:294.1ms, join:7.75s}                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   | semi join, inner:IndexLookUp_13, outer key:tpch.customer.c_custkey, inner key:tpch.orders.o_custkey, equal cond:eq(tpch.customer.c_custkey, tpch.orders.o_custkey) | 137.4 MB | N/A  |
| ├─TableReader_33(Build)          | 3000000.00  | 3000000  | root      |                                                | time:979.7ms, loops:2942, cop_task: {num: 116, max: 3.3s, min: 4.55ms, avg: 649.1ms, p95: 2.47s, max_proc_keys: 50144, p95_proc_keys: 50144, tot_proc: 56.7s, tot_wait: 9.45s, rpc_num: 116, rpc_time: 1m15.3s, copr_cache: disabled, distsql_concurrency: 15}                                                                                                                                                                                                                                                                                                                                                                                                                          | data:TableFullScan_32                                                                                                                                              | 90.1 MB  | N/A  |
| │ └─TableFullScan_32             | 3000000.00  | 3000000  | cop[tikv] | table:customer                                 | tikv_task:{proc max:2.94s, min:0s, avg: 467.9ms, p80:698ms, p95:2.01s, iters:3387, tasks:116}, scan_detail: {total_process_keys: 3000000, total_process_keys_size: 610451426, total_keys: 3000116, get_snapshot_time: 3.07s, rocksdb: {key_skipped_count: 3000000, block: {cache_hit_count: 1411, read_count: 9472, read_byte: 211.6 MB, read_time: 10.5s}}}                                                                                                                                                                                                                                                                                                                            | keep order:false                                                                                                                                                   | N/A      | N/A  |
| └─IndexLookUp_13(Probe)          | 29955968.00 | 29955968 | root      |                                                | time:27m45.1s, loops:29444, index_task: {total_time: 22m12.2s, fetch_handle: 18m45.8s, build: 3.17ms, wait: 3m26.4s}, table_task: {total_time: 1h15m55s, num: 1902, concurrency: 5}, next: {wait_index: 2m37.1s, wait_table_lookup_build: 1.22s, wait_table_lookup_resp: 25m5s}                                                                                                                                                                                                                                                                                                                                                                                                         |                                                                                                                                                                    | 109.6 KB | N/A  |
|   ├─IndexRangeScan_10(Build)     | 29955968.00 | 29955968 | cop[tikv] | table:orders, index:index_o_custkey(O_CUSTKEY) | time:18m44.8s, loops:29919, cop_task: {num: 2647, max: 4.39s, min: 2.48ms, avg: 639ms, p95: 2.24s, max_proc_keys: 50144, p95_proc_keys: 50144, tot_proc: 21m49.5s, tot_wait: 4m13.4s, rpc_num: 2647, rpc_time: 28m11.3s, copr_cache: disabled, distsql_concurrency: 15}, tikv_task:{proc max:4.16s, min:0s, avg: 494.9ms, p80:913ms, p95:2.01s, iters:39604, tasks:2647}, scan_detail: {total_process_keys: 29955968, total_process_keys_size: 1377974528, total_keys: 32958343, get_snapshot_time: 6.19s, rocksdb: {key_skipped_count: 29955968, block: {cache_hit_count: 24018903, read_count: 19551, read_byte: 175.0 MB, read_time: 3m20.4s}}}                                      | range: decided by [eq(tpch.orders.o_custkey, tpch.customer.c_custkey)], keep order:false                                                                           | N/A      | N/A  |
|   └─Selection_12(Probe)          | 29955968.00 | 29955968 | cop[tikv] |                                                | time:1h15m40s, loops:31412, cop_task: {num: 97001, max: 4.74s, min: 434µs, avg: 537.9ms, p95: 2.74s, max_proc_keys: 653, p95_proc_keys: 429, tot_proc: 11h51m24.1s, tot_wait: 2h11m28.2s, rpc_num: 97081, rpc_time: 14h30m5.2s, copr_cache: disabled, distsql_concurrency: 15}, backoff{regionMiss: 150ms}, tikv_task:{proc max:4.13s, min:0s, avg: 440.4ms, p80:666ms, p95:2.57s, iters:339368, tasks:97001}, scan_detail: {total_process_keys: 29955968, total_process_keys_size: 4549646548, total_keys: 29972506, get_snapshot_time: 11.7s, rocksdb: {key_skipped_count: 33090, block: {cache_hit_count: 176283602, read_count: 671195, read_byte: 11.3 GB, read_time: 10m41.3s}}}  | gt(tpch.orders.o_totalprice, 30)                                                                                                                                   | N/A      | N/A  |
|     └─TableRowIDScan_11          | 29955968.00 | 29955968 | cop[tikv] | table:orders                                   | tikv_task:{proc max:4.13s, min:0s, avg: 440.3ms, p80:666ms, p95:2.57s, iters:339368, tasks:97001}                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       | keep order:false                                                                                                                                                   | N/A      | N/A  |
+----------------------------------+-------------+----------+-----------+------------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+------+
7 rows in set (5 min 37.97 sec)

语句的执行时间大约在5min,从执行计划来看,应该是慢在
1.orders的回表过滤,貌似优化器在选择join 算法的时候,对于回表的代价评估不是特别完善
2.semi join 没有做优化 (即customer里面拿一条数据到orders里面匹配,如果只要一次匹配成功,customer里面的这条数据马上就返回,不需要重复匹配,这个点也应该在优化了 Issues · pingcap/tidb · GitHub

2.添加SEMI_JOIN_REWRITE hint

mysql> explain analyze select * from customer where exists (select  /*+ SEMI_JOIN_REWRITE() */  1 from orders where orders.O_CUSTKEY = customer.C_CUSTKEY  and orders.O_TOTALPRICE >30);
+----------------------------------+-------------+----------+-----------+----------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------+----------+---------+
| id                               | estRows     | actRows  | task      | access object  | execution info                                                                                                                                                                                                                                                                                                                                                      | operator info                                                                                | memory   | disk    |
+----------------------------------+-------------+----------+-----------+----------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------+----------+---------+
| HashJoin_23                      | 1970688.00  | 1999956  | root      |                | time:27.5s, loops:1956, build_hash_table:{total:23.5s, fetch:23s, build:524.6ms}, probe:{concurrency:5, total:2m17.6s, max:27.5s, probe:5.34s, fetch:2m12.2s}                                                                                                                                                                                                       | inner join, equal:[eq(tpch.customer.c_custkey, tpch.orders.o_custkey)]                       | 117.6 MB | 0 Bytes |
| ├─HashAgg_41(Build)              | 1970688.00  | 1999956  | root      |                | time:23s, loops:1955                                                                                                                                                                                                                                                                                                                                                | group by:tpch.orders.o_custkey, funcs:firstrow(tpch.orders.o_custkey)->tpch.orders.o_custkey | 411.8 MB | 0 Bytes |
| │ └─TableReader_42               | 1970688.00  | 29566408 | root      |                | time:873.1ms, loops:976, cop_task: {num: 975, max: 553.1ms, min: 4.54ms, avg: 144.1ms, p95: 326.5ms, max_proc_keys: 51200, p95_proc_keys: 51200, tot_proc: 54s, tot_wait: 2.88s, rpc_num: 975, rpc_time: 2m20.4s, copr_cache: disabled, distsql_concurrency: 15}                                                                                                    | data:HashAgg_34                                                                              | 5.40 MB  | N/A     |
| │   └─HashAgg_34                 | 1970688.00  | 29566408 | cop[tikv] |                | tikv_task:{proc max:289ms, min:0s, avg: 55.4ms, p80:97ms, p95:166ms, iters:29273, tasks:975}, scan_detail: {total_process_keys: 29955968, total_process_keys_size: 4549646548, total_keys: 29956943, get_snapshot_time: 545.2ms, rocksdb: {key_skipped_count: 29955968, block: {cache_hit_count: 42321, read_count: 41120, read_byte: 718.0 MB, read_time: 20.5s}}} | group by:tpch.orders.o_custkey,                                                              | N/A      | N/A     |
| │     └─Selection_40             | 29955968.00 | 29955968 | cop[tikv] |                | tikv_task:{proc max:289ms, min:0s, avg: 53ms, p80:93ms, p95:164ms, iters:29273, tasks:975}                                                                                                                                                                                                                                                                          | gt(tpch.orders.o_totalprice, 30)                                                             | N/A      | N/A     |
| │       └─TableFullScan_39       | 29955968.00 | 29955968 | cop[tikv] | table:orders   | tikv_task:{proc max:286ms, min:0s, avg: 51.2ms, p80:90ms, p95:161ms, iters:29273, tasks:975}                                                                                                                                                                                                                                                                        | keep order:false                                                                             | N/A      | N/A     |
| └─TableReader_54(Probe)          | 3000000.00  | 3000000  | root      |                | time:1.81s, loops:2937, cop_task: {num: 116, max: 636.1ms, min: 5.49ms, avg: 222.5ms, p95: 507.8ms, max_proc_keys: 50144, p95_proc_keys: 50144, tot_proc: 8.74s, tot_wait: 221ms, rpc_num: 116, rpc_time: 25.8s, copr_cache: disabled, distsql_concurrency: 15}                                                                                                     | data:TableFullScan_53                                                                        | 33.8 MB  | N/A     |
|   └─TableFullScan_53             | 3000000.00  | 3000000  | cop[tikv] | table:customer | tikv_task:{proc max:250ms, min:1ms, avg: 64ms, p80:100ms, p95:197ms, iters:3387, tasks:116}, scan_detail: {total_process_keys: 3000000, total_process_keys_size: 610451426, total_keys: 3000116, get_snapshot_time: 141.9ms, rocksdb: {key_skipped_count: 3000000, block: {cache_hit_count: 928, read_count: 9955, read_byte: 222.5 MB, read_time: 3.6s}}}          | keep order:false                                                                             | N/A      | N/A     |
+----------------------------------+-------------+----------+-----------+----------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------------------------+----------+---------+
8 rows in set (27.52 sec)

从语句2可以看出,走了hash join,速度快了很多,执行时间在30s左右,当然这里我这个场景有点凑巧,orders 表里面的重复数据较多,hashagg去重后,数量比较少,和customer 表走了hash join ,速度很快,如果orders表去重后重复数据不多的话,那么这里执行hashagg可能效果不太好

语句3 未添加SEMI_JOIN_REWRITE hint ,增加走hash join的hint

mysql>  explain analyze select /*+ hash_join(customer,orders) */  *  from customer where exists (select   1 from orders where orders.O_CUSTKEY = customer.C_CUSTKEY  and orders.O_TOTALPRICE >30);
+------------------------------+-------------+----------+-----------+----------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------+---------+---------+
| id                           | estRows     | actRows  | task      | access object  | execution info                                                                                                                                                                                                                                                                                                                                                       | operator info                                                         | memory  | disk    |
+------------------------------+-------------+----------+-----------+----------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------+---------+---------+
| HashJoin_24                  | 2400000.00  | 1999956  | root      |                | time:51.3s, loops:1956, build_hash_table:{total:15s, fetch:6.36s, build:8.67s}, probe:{concurrency:5, total:4m16.4s, max:51.3s, probe:2m9s, fetch:2m7.4s}                                                                                                                                                                                                            | semi join, equal:[eq(tpch.customer.c_custkey, tpch.orders.o_custkey)] | 1.87 GB | 0 Bytes |
| ├─TableReader_29(Build)      | 29955968.00 | 29955968 | root      |                | time:5.25s, loops:29345, cop_task: {num: 1025, max: 1.12s, min: 1.88ms, avg: 196.1ms, p95: 481.9ms, max_proc_keys: 50144, p95_proc_keys: 50144, tot_proc: 1m57.5s, tot_wait: 27.5s, rpc_num: 1025, rpc_time: 3m21s, copr_cache: disabled, distsql_concurrency: 15}                                                                                                   | data:Selection_28                                                     | 10.1 MB | N/A     |
| │ └─Selection_28             | 29955968.00 | 29955968 | cop[tikv] |                | tikv_task:{proc max:719ms, min:0s, avg: 112.6ms, p80:202ms, p95:338ms, iters:33322, tasks:1025}, scan_detail: {total_process_keys: 29955968, total_process_keys_size: 4549646548, total_keys: 29956993, get_snapshot_time: 1.18s, rocksdb: {key_skipped_count: 29955968, block: {cache_hit_count: 43753, read_count: 40025, read_byte: 688.8 MB, read_time: 33.6s}}} | gt(tpch.orders.o_totalprice, 30)                                      | N/A     | N/A     |
| │   └─TableFullScan_27       | 29955968.00 | 29955968 | cop[tikv] | table:orders   | tikv_task:{proc max:716ms, min:0s, avg: 107.4ms, p80:195ms, p95:330ms, iters:33322, tasks:1025}                                                                                                                                                                                                                                                                      | keep order:false                                                      | N/A     | N/A     |
| └─TableReader_26(Probe)      | 3000000.00  | 3000000  | root      |                | time:1.09s, loops:2936, cop_task: {num: 116, max: 4.21s, min: 2.02ms, avg: 754.2ms, p95: 1.95s, max_proc_keys: 50144, p95_proc_keys: 50144, tot_proc: 4.52s, tot_wait: 33ms, rpc_num: 116, rpc_time: 1m27.5s, copr_cache: disabled, distsql_concurrency: 15}                                                                                                         | data:TableFullScan_25                                                 | 78.9 MB | N/A     |
|   └─TableFullScan_25         | 3000000.00  | 3000000  | cop[tikv] | table:customer | tikv_task:{proc max:91ms, min:0s, avg: 31.1ms, p80:55ms, p95:64ms, iters:3387, tasks:116}, scan_detail: {total_process_keys: 3000000, total_process_keys_size: 610451426, total_keys: 3000116, get_snapshot_time: 18.8ms, rocksdb: {key_skipped_count: 3000000, block: {cache_hit_count: 1035, read_count: 9848, read_byte: 220.1 MB, read_time: 665.6ms}}}          | keep order:false                                                      | N/A     | N/A     |
+------------------------------+-------------+----------+-----------+----------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------+---------+---------+
6 rows in set, 1 warning (51.28 sec)

从语句3来看,走hash join的速度 确实会比走index hash join快很多,基于以上的结果
1.在遇到这种语句时,希望能够选择合适的join算法
2.最好不要依赖SEMI_JOIN_REWRITE() 这种hint的写法,能够做到自适应去寻找效率高的join 算法

有点没有看懂,要细想一下。

imageimage

对比下内存消耗…

  1. 回表代价估算的还可以啊,预估行数和实际行数出不多。

2.semi join 应该是做了优化的吧,你贴的那个链接点进去不知道为啥不是一个 issue 。

3.能不能自动选择合适的算法?不依赖 hint ?
现在还不能,需要https://docs.pingcap.com/zh/tidb/stable/system-variables#tidb_enable_cascades_planner
这个 GA 后才可以

1.我的个人想法是回表的代价很巨大,像tidb 这种这种lsm存储引擎数据库的代价比传统类mysql的数据库回表的代价更大,但是这里还是选择走index join ,似乎对于这部分成本的评估不是很精确
2.semi join 应该是没做优化的,可以看下这个帖子 子查询重复值多的情况下半连接执行太慢
感谢老师回复

1.在遇到这种语句时,希望能够选择合适的join算法—这个每个数据库都想做到,但是很难做到百分百准确,而且对统计信息的依赖很高
2.最好不要依赖SEMI_JOIN_REWRITE() 这种hint的写法,能够做到自适应去寻找效率高的join 算法—跟上面差不多,对统计信息的依赖很高
这些都是cost优化器一直需要提高的地方

是的,但是不知道在oracle 这种成熟的数据库上是否能够做到,请老师指教

oracle基于cost的优化器做的算很好的了,但是也不见得所有sql都能选到对的连接模式,而且tidb是分布式的数据库,跟oracle又不太一样。

1.优化器确实还不够先进 :joy_cat:
2.第二个我看 pr 合并到 master 了,老版本没合,这个主要说的是算子实现不够高效的问题,你纠结的不是执行计划的问题吗 :smiley:

第二个我看 pr 合并到 master 了,老版本没合,这个主要说的是算子实现不够高效的问题,你纠结的不是执行计划的问题吗 :smiley:
都有哈哈,感谢老师回复

用 tiflash 来算得了。

直接select后面指定tiflash方便很多

用tiflash吧

此话题已在最后回复的 60 天后被自动关闭。不再允许新回复。