对于TPCH表有如下测试语句(该语句无任何意义,仅为考察or改写能力):
select count(a.S_NAME),sum(b.s_acctbal) from supplier a,supplier b where a.S_ADDRESS=b.S_ADDRESS or a.S_PHONE=b.S_PHONE;
查看其执行计划:
mysql> explain select count(a.S_NAME),sum(b.s_acctbal) from supplier a,supplier b where a.S_ADDRESS=b.S_ADDRESS or a.S_PHONE=b.S_PHONE;
+-------------------------------+----------------+-----------+---------------+-------------------------------------------------------------------------------------------------------------------------------------------------+
| id | estRows | task | access object | operator info |
+-------------------------------+----------------+-----------+---------------+-------------------------------------------------------------------------------------------------------------------------------------------------+
| StreamAgg_10 | 1.00 | root | | funcs:count(tpch1.supplier.s_name)->Column#15, funcs:sum(tpch1.supplier.s_acctbal)->Column#16 |
| └─HashJoin_17 | 10000000000.00 | root | | CARTESIAN inner join, other cond:or(eq(tpch1.supplier.s_address, tpch1.supplier.s_address), eq(tpch1.supplier.s_phone, tpch1.supplier.s_phone)) |
| ├─TableReader_16(Build) | 100000.00 | root | | data:TableFullScan_15 |
| │ └─TableFullScan_15 | 100000.00 | cop[tikv] | table:b | keep order:false |
| └─TableReader_14(Probe) | 100000.00 | root | | data:TableFullScan_13 |
| └─TableFullScan_13 | 100000.00 | cop[tikv] | table:a | keep order:false |
+-------------------------------+----------------+-----------+---------------+-------------------------------------------------------------------------------------------------------------------------------------------------+
6 rows in set (0.01 sec)
mysql> explain analyze select a.l_partkey from lineitem a cross join lineitem b limit 10;
+-------------------------------+-------------+----------+-----------+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+-----------+---------+
| id | estRows | actRows | task | access object | execution info | operator info | memory | disk |
+-------------------------------+-------------+----------+-----------+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+-----------+---------+
| Limit_12 | 10.00 | 10 | root | | time:2m46.8s, loops:2 | offset:0, count:10 | N/A | N/A |
| └─HashJoin_13 | 10.00 | 1024 | root | | time:2m46.8s, loops:1, build_hash_table:{total:17.4s, fetch:10.5s, build:6.87s}, probe:{concurrency:3, total:8m23.6s, max:2m48.7s, probe:7m31.4s, fetch:52.1s} | CARTESIAN inner join | 1023.6 MB | 1.34 GB |
| ├─TableReader_18(Build) | 59986052.00 | 59986052 | root | | time:10.3s, loops:58813, cop_task: {num: 2197, max: 281.3ms, min: 1.1ms, avg: 31.4ms, p95: 68.4ms, max_proc_keys: 50144, p95_proc_keys: 50144, tot_proc: 56.1s, tot_wait: 9.09s, rpc_num: 2197, rpc_time: 1m9s, copr_cache_hit_ratio: 0.00, distsql_concurrency: 5} | data:TableFullScan_17 | 1.86 MB | N/A |
| │ └─TableFullScan_17 | 59986052.00 | 59986052 | cop[tikv] | table:b | tikv_task:{proc max:271ms, min:0s, avg: 25.6ms, p80:42ms, p95:58ms, iters:67264, tasks:2197}, scan_detail: {total_process_keys: 59986052, total_process_keys_size: 2159497872, total_keys: 59988249, get_snapshot_time: 204.3ms, rocksdb: {key_skipped_count: 59986052, block: {cache_hit_count: 17464, read_count: 185633, read_byte: 3.77 GB, read_time: 15.7s}}} | keep order:false | N/A | N/A |
| └─TableReader_16(Probe) | 0.00 | 3072 | root | | time:10.2ms, loops:3, cop_task: {num: 9, max: 11.9ms, min: 956.6µs, avg: 6.62ms, p95: 11.9ms, max_proc_keys: 992, p95_proc_keys: 992, tot_proc: 22ms, tot_wait: 26ms, rpc_num: 9, rpc_time: 59.4ms, copr_cache_hit_ratio: 0.00, distsql_concurrency: 5} | data:TableFullScan_15 | 48.4 KB | N/A |
| └─TableFullScan_15 | 0.00 | 3552 | cop[tikv] | table:a | tikv_task:{proc max:9ms, min:0s, avg: 2.78ms, p80:9ms, p95:9ms, iters:32, tasks:9}, scan_detail: {total_process_keys: 3552, total_process_keys_size: 697625, total_keys: 3561, get_snapshot_time: 18.2ms, rocksdb: {key_skipped_count: 3552, block: {cache_hit_count: 71, read_count: 14, read_byte: 750.3 KB, read_time: 10.4ms}}} | keep order:false | N/A | N/A |
+-------------------------------+-------------+----------+-----------+---------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------------------+-----------+---------+
6 rows in set (2 min 48.75 sec)
可以看到还是使用了hashjoin+内部cross join的方式导致hashjoin过程非常缓慢。
如果可以进行union改写则在此场景下效率会快很多。
因此是否应该增强基于代价的执行计划改写,来优化此类问题?