优化器未对多个来自同一张表的子查询条件做合并

【需求涉及的问题场景】
tpch fs=1;

alter table customer add index idx1(c_phone);
alter table orders add index idx1(o_orderdate);

对于语句:

SELECT
  o_orderstatus
FROM
  orders x
WHERE
  o_orderdate = '1996-01-10'
  AND o_custkey IN (
    SELECT
      c_custkey
    FROM
      customer a
    WHERE
      a.c_phone IN ('10-100-301-2651')
  )
  AND o_custkey IN (
    SELECT
      c_custkey
    FROM
      customer b
      JOIN nation c ON b.c_nationkey = c.n_nationkey
    WHERE
      c.n_name IN ('CHINA')
  );

观察其执行计划:

mysql> set tidb_opt_insubq_to_join_and_agg=OFF;
Query OK, 0 rows affected (0.00 sec)

+------------------------------------+-----------+-----------+----------------------------------+------------------------------------------------------------------------------+
| id                                 | estRows   | task      | access object                    | operator info                                                                |
+------------------------------------+-----------+-----------+----------------------------------+------------------------------------------------------------------------------+
| HashJoin_18                        | 396.19    | root      |                                  | semi join, equal:[eq(tpch1.orders.o_custkey, tpch1.customer.c_custkey)]      |
| ├─HashJoin_52(Build)               | 150.00    | root      |                                  | inner join, equal:[eq(tpch1.nation.n_nationkey, tpch1.customer.c_nationkey)] |
| │ ├─TableReader_57(Build)          | 0.03      | root      |                                  | data:Selection_56                                                            |
| │ │ └─Selection_56                 | 0.03      | cop[tikv] |                                  | eq(tpch1.nation.n_name, "CHINA")                                             |
| │ │   └─TableFullScan_55           | 25.00     | cop[tikv] | table:c                          | keep order:false, stats:pseudo                                               |
| │ └─TableReader_54(Probe)          | 150000.00 | root      |                                  | data:TableFullScan_53                                                        |
| │   └─TableFullScan_53             | 150000.00 | cop[tikv] | table:b                          | keep order:false                                                             |
| └─HashJoin_30(Probe)               | 495.23    | root      |                                  | semi join, equal:[eq(tpch1.orders.o_custkey, tpch1.customer.c_custkey)]      |
|   ├─IndexReader_38(Build)          | 1.04      | root      |                                  | index:IndexRangeScan_37                                                      |
|   │ └─IndexRangeScan_37            | 1.04      | cop[tikv] | table:a, index:idx1(C_PHONE)     | range:["10-100-301-2651","10-100-301-2651"], keep order:false                |
|   └─IndexLookUp_36(Probe)          | 619.04    | root      |                                  |                                                                              |
|     ├─IndexRangeScan_34(Build)     | 619.04    | cop[tikv] | table:x, index:idx1(O_ORDERDATE) | range:[1996-01-10,1996-01-10], keep order:false                              |
|     └─TableRowIDScan_35(Probe)     | 619.04    | cop[tikv] | table:x                          | keep order:false                                                             |
+------------------------------------+-----------+-----------+----------------------------------+------------------------------------------------------------------------------+
13 rows in set (0.01 sec)

可以看到并没有将customer a这个子查询与customer b进行合并,导致对customer表的多次关联。
【期望的需求行为】
该语句的人工优化改写为:

SELECT
  o_orderstatus
FROM
  orders x
WHERE
  o_orderdate = '1996-01-10'
  and o_custkey IN (
    SELECT
      c_custkey
    FROM
      customer b
      JOIN nation c ON b.c_nationkey = c.n_nationkey
    WHERE
      c.n_name IN ('CHINA') and b.c_phone IN ('10-100-301-2651')
  );

观察其执行计划:

+----------------------------------------+---------+-----------+----------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| id                                     | estRows | task      | access object                    | operator info                                                                                                                                                                   |
+----------------------------------------+---------+-----------+----------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| HashJoin_14                            | 495.23  | root      |                                  | semi join, equal:[eq(tpch1.orders.o_custkey, tpch1.customer.c_custkey)]                                                                                                         |
| ├─IndexJoin_28(Build)                  | 0.03    | root      |                                  | inner join, inner:TableReader_24, outer key:tpch1.customer.c_nationkey, inner key:tpch1.nation.n_nationkey, equal cond:eq(tpch1.customer.c_nationkey, tpch1.nation.n_nationkey) |
| │ ├─IndexLookUp_40(Build)              | 1.04    | root      |                                  |                                                                                                                                                                                 |
| │ │ ├─IndexRangeScan_38(Build)         | 1.04    | cop[tikv] | table:b, index:idx1(C_PHONE)     | range:["10-100-301-2651","10-100-301-2651"], keep order:false                                                                                                                   |
| │ │ └─TableRowIDScan_39(Probe)         | 1.04    | cop[tikv] | table:b                          | keep order:false                                                                                                                                                                |
| │ └─TableReader_24(Probe)              | 0.00    | root      |                                  | data:Selection_23                                                                                                                                                               |
| │   └─Selection_23                     | 0.00    | cop[tikv] |                                  | eq(tpch1.nation.n_name, "CHINA")                                                                                                                                                |
| │     └─TableRangeScan_22              | 1.04    | cop[tikv] | table:c                          | range: decided by [tpch1.customer.c_nationkey], keep order:false, stats:pseudo                                                                                                  |
| └─IndexLookUp_20(Probe)                | 619.04  | root      |                                  |                                                                                                                                                                                 |
|   ├─IndexRangeScan_18(Build)           | 619.04  | cop[tikv] | table:x, index:idx1(O_ORDERDATE) | range:[1996-01-10,1996-01-10], keep order:false                                                                                                                                 |
|   └─TableRowIDScan_19(Probe)           | 619.04  | cop[tikv] | table:x                          | keep order:false                                                                                                                                                                |
+----------------------------------------+---------+-----------+----------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

可以看到减少了对customer表的多次关联。

因此能否增强优化器,让其自动改写优化SQL,减少对表的重复扫描?
类似的更复杂的还有:因优化器问题导致TPCH的Q2语句执行过慢 ,在处理and ps_supplycost子查询的时候也是进行了合并,减少了对基表的重复扫描。

1 个赞

好建议支持下。

在实际场景中不少SQL是程序生成的,而且开发人员一般写SQL满足业务逻辑即可,不太会关心优化。所以数据库的改写优化也比较重要。

1 个赞

:sweat_smile:这是实实在在的故障转移呀,明明是程序员开发不规范,最后要DBA兜底~

+1,见过有些程序生成的sql很傻,但是开发说改不了,那就只能依赖数据库自身的优化了