【需求涉及的问题场景】
tpch fs=1;
alter table customer add index idx1(c_phone);
alter table orders add index idx1(o_orderdate);
对于语句:
SELECT
o_orderstatus
FROM
orders x
WHERE
o_orderdate = '1996-01-10'
AND o_custkey IN (
SELECT
c_custkey
FROM
customer a
WHERE
a.c_phone IN ('10-100-301-2651')
)
AND o_custkey IN (
SELECT
c_custkey
FROM
customer b
JOIN nation c ON b.c_nationkey = c.n_nationkey
WHERE
c.n_name IN ('CHINA')
);
观察其执行计划:
mysql> set tidb_opt_insubq_to_join_and_agg=OFF;
Query OK, 0 rows affected (0.00 sec)
+------------------------------------+-----------+-----------+----------------------------------+------------------------------------------------------------------------------+
| id | estRows | task | access object | operator info |
+------------------------------------+-----------+-----------+----------------------------------+------------------------------------------------------------------------------+
| HashJoin_18 | 396.19 | root | | semi join, equal:[eq(tpch1.orders.o_custkey, tpch1.customer.c_custkey)] |
| ├─HashJoin_52(Build) | 150.00 | root | | inner join, equal:[eq(tpch1.nation.n_nationkey, tpch1.customer.c_nationkey)] |
| │ ├─TableReader_57(Build) | 0.03 | root | | data:Selection_56 |
| │ │ └─Selection_56 | 0.03 | cop[tikv] | | eq(tpch1.nation.n_name, "CHINA") |
| │ │ └─TableFullScan_55 | 25.00 | cop[tikv] | table:c | keep order:false, stats:pseudo |
| │ └─TableReader_54(Probe) | 150000.00 | root | | data:TableFullScan_53 |
| │ └─TableFullScan_53 | 150000.00 | cop[tikv] | table:b | keep order:false |
| └─HashJoin_30(Probe) | 495.23 | root | | semi join, equal:[eq(tpch1.orders.o_custkey, tpch1.customer.c_custkey)] |
| ├─IndexReader_38(Build) | 1.04 | root | | index:IndexRangeScan_37 |
| │ └─IndexRangeScan_37 | 1.04 | cop[tikv] | table:a, index:idx1(C_PHONE) | range:["10-100-301-2651","10-100-301-2651"], keep order:false |
| └─IndexLookUp_36(Probe) | 619.04 | root | | |
| ├─IndexRangeScan_34(Build) | 619.04 | cop[tikv] | table:x, index:idx1(O_ORDERDATE) | range:[1996-01-10,1996-01-10], keep order:false |
| └─TableRowIDScan_35(Probe) | 619.04 | cop[tikv] | table:x | keep order:false |
+------------------------------------+-----------+-----------+----------------------------------+------------------------------------------------------------------------------+
13 rows in set (0.01 sec)
可以看到并没有将customer a这个子查询与customer b进行合并,导致对customer表的多次关联。
【期望的需求行为】
该语句的人工优化改写为:
SELECT
o_orderstatus
FROM
orders x
WHERE
o_orderdate = '1996-01-10'
and o_custkey IN (
SELECT
c_custkey
FROM
customer b
JOIN nation c ON b.c_nationkey = c.n_nationkey
WHERE
c.n_name IN ('CHINA') and b.c_phone IN ('10-100-301-2651')
);
观察其执行计划:
+----------------------------------------+---------+-----------+----------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| id | estRows | task | access object | operator info |
+----------------------------------------+---------+-----------+----------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| HashJoin_14 | 495.23 | root | | semi join, equal:[eq(tpch1.orders.o_custkey, tpch1.customer.c_custkey)] |
| ├─IndexJoin_28(Build) | 0.03 | root | | inner join, inner:TableReader_24, outer key:tpch1.customer.c_nationkey, inner key:tpch1.nation.n_nationkey, equal cond:eq(tpch1.customer.c_nationkey, tpch1.nation.n_nationkey) |
| │ ├─IndexLookUp_40(Build) | 1.04 | root | | |
| │ │ ├─IndexRangeScan_38(Build) | 1.04 | cop[tikv] | table:b, index:idx1(C_PHONE) | range:["10-100-301-2651","10-100-301-2651"], keep order:false |
| │ │ └─TableRowIDScan_39(Probe) | 1.04 | cop[tikv] | table:b | keep order:false |
| │ └─TableReader_24(Probe) | 0.00 | root | | data:Selection_23 |
| │ └─Selection_23 | 0.00 | cop[tikv] | | eq(tpch1.nation.n_name, "CHINA") |
| │ └─TableRangeScan_22 | 1.04 | cop[tikv] | table:c | range: decided by [tpch1.customer.c_nationkey], keep order:false, stats:pseudo |
| └─IndexLookUp_20(Probe) | 619.04 | root | | |
| ├─IndexRangeScan_18(Build) | 619.04 | cop[tikv] | table:x, index:idx1(O_ORDERDATE) | range:[1996-01-10,1996-01-10], keep order:false |
| └─TableRowIDScan_19(Probe) | 619.04 | cop[tikv] | table:x | keep order:false |
+----------------------------------------+---------+-----------+----------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
可以看到减少了对customer表的多次关联。
因此能否增强优化器,让其自动改写优化SQL,减少对表的重复扫描?
类似的更复杂的还有:因优化器问题导致TPCH的Q2语句执行过慢 ,在处理and ps_supplycost
子查询的时候也是进行了合并,减少了对基表的重复扫描。