【 TiDB 使用环境】测试
【 TiDB 版本】6.5.3
各位老师好,对于子查询有个疑问
语句及执行计划如下,数据均来自tpch 数据
mysql> explain analyze select * from customer where C_CUSTKEY in (select O_CUSTKEY from orders where O_TOTALPRICE >30);
+----------------------------------+-------------+----------+-----------+------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+------+
| id | estRows | actRows | task | access object | execution info | operator info | memory | disk |
+----------------------------------+-------------+----------+-----------+------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+------+
| IndexHashJoin_16 | 2400000.00 | 1999956 | root | | time:5m52s, loops:1957, inner:{total:29m12.3s, concurrency:5, task:125, construct:1.96s, fetch:28m59.7s, build:311.8ms, join:10.7s} | semi join, inner:IndexLookUp_13, outer key:tpch.customer.c_custkey, inner key:tpch.orders.o_custkey, equal cond:eq(tpch.customer.c_custkey, tpch.orders.o_custkey) | 142.2 MB | N/A |
| ├─TableReader_33(Build) | 3000000.00 | 3000000 | root | | time:958.5ms, loops:2941, cop_task: {num: 116, max: 3.51s, min: 1.99ms, avg: 402.5ms, p95: 1.07s, max_proc_keys: 50144, p95_proc_keys: 50144, tot_proc: 27.6s, tot_wait: 8.76s, rpc_num: 116, rpc_time: 46.7s, copr_cache: disabled, distsql_concurrency: 15} | data:TableFullScan_32 | 90.2 MB | N/A |
| │ └─TableFullScan_32 | 3000000.00 | 3000000 | cop[tikv] | table:customer | tikv_task:{proc max:3.22s, min:0s, avg: 219.1ms, p80:359ms, p95:785ms, iters:3387, tasks:116}, scan_detail: {total_process_keys: 3000000, total_process_keys_size: 610451426, total_keys: 3000116, get_snapshot_time: 1.56s, rocksdb: {key_skipped_count: 3000000, block: {cache_hit_count: 1731, read_count: 8688, read_byte: 194.0 MB, read_time: 8.81s}}} | keep order:false | N/A | N/A |
| └─IndexLookUp_13(Probe) | 29955968.00 | 29955968 | root | | time:28m56.7s, loops:29442, index_task: {total_time: 22m42.1s, fetch_handle: 17m4s, build: 3.35ms, wait: 5m38.1s}, table_task: {total_time: 1h29m2.8s, num: 1899, concurrency: 5}, next: {wait_index: 2m6.3s, wait_table_lookup_build: 977.4ms, wait_table_lookup_resp: 26m47.3s} | | 110.8 KB | N/A |
| ├─IndexRangeScan_10(Build) | 29955968.00 | 29955968 | cop[tikv] | table:orders, index:index_o_custkey(O_CUSTKEY) | time:17m3s, loops:29933, cop_task: {num: 2628, max: 5.1s, min: 2.2ms, avg: 595.4ms, p95: 2.22s, max_proc_keys: 50144, p95_proc_keys: 50144, tot_proc: 19m45.2s, tot_wait: 4m12.9s, rpc_num: 2628, rpc_time: 26m4.7s, copr_cache: disabled, distsql_concurrency: 15}, tikv_task:{proc max:4.16s, min:0s, avg: 451.1ms, p80:734ms, p95:2.02s, iters:39517, tasks:2628}, scan_detail: {total_process_keys: 29955968, total_process_keys_size: 1377974528, total_keys: 32958320, get_snapshot_time: 5.43s, rocksdb: {key_skipped_count: 29955968, block: {cache_hit_count: 12009292, read_count: 19647, read_byte: 235.1 MB, read_time: 3m41.6s}}} | range: decided by [eq(tpch.orders.o_custkey, tpch.customer.c_custkey)], keep order:false | N/A | N/A |
| └─Selection_12(Probe) | 29955968.00 | 29955968 | cop[tikv] | | time:1h28m47.9s, loops:31426, cop_task: {num: 96848, max: 20.4s, min: 399.6µs, avg: 579.6ms, p95: 2.78s, max_proc_keys: 656, p95_proc_keys: 429, tot_proc: 12h22m51.3s, tot_wait: 2h42m30.7s, rpc_num: 96914, rpc_time: 15h36m8.7s, copr_cache: disabled, distsql_concurrency: 15}, backoff{regionMiss: 114ms, regionScheduling: 2ms}, tikv_task:{proc max:19.6s, min:0s, avg: 460.6ms, p80:557ms, p95:2.54s, iters:338860, tasks:96848}, scan_detail: {total_process_keys: 29955968, total_process_keys_size: 4549646548, total_keys: 29972508, get_snapshot_time: 13s, rocksdb: {key_skipped_count: 33090, block: {cache_hit_count: 115911778, read_count: 506722, read_byte: 8.53 GB, read_time: 14m51.1s}}} | gt(tpch.orders.o_totalprice, 30) | N/A | N/A |
| └─TableRowIDScan_11 | 29955968.00 | 29955968 | cop[tikv] | table:orders | tikv_task:{proc max:19.6s, min:0s, avg: 460.5ms, p80:557ms, p95:2.54s, iters:338860, tasks:96848} | keep order:false | N/A | N/A |
+----------------------------------+-------------+----------+-----------+------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+------+
7 rows in set (5 min 51.98 sec)
mysql> select count(*) from orders;
+----------+
| count(*) |
+----------+
| 29955968 |
+----------+
1 row in set (6.62 sec)
mysql> select count(*) from customer;
+----------+
| count(*) |
+----------+
| 3000000 |
+----------+
1 row in set (1.18 sec)
我已经将tidb_opt_insubq_to_join_and_agg 设置为off,将子查询转成 join 和 aggregation 的功能关闭(我是考虑到假如子查询里面聚合去重后数据量特别巨大的场景)https://docs.pingcap.com/zh/tidb/v6.5/subquery-optimization#子查询相关的优化
在这样的半连接下,我认为应该只要从customer表里面拿一行数据去和orders 表里面去匹配O_CUSTKEY,如果匹配上并且O_TOTALPRICE大于30,那么就可以返回到结果集,那么多余重复的数据就不需要匹配了(customer和orders 是一对多的关系),从执行计划来看,orders 里面的每一行数据都被匹配到了,那么在这一点上是否可以进一步优化成类似firstmatch row的匹配,即customer里面拿一条数据到orders里面匹配,如果只要一次匹配成功,customer里面的这条数据马上就返回,然后再从customer里面拿下一条数据?