指定INL_JOIN,但执行计划并不走

【 TiDB 使用环境】Poc
【 TiDB 版本】6.3.0
【遇到的问题】不走指定的INL_JOIN,执行效率太慢
【问题现象及影响】

表结构如下:

mysql> show create table customer \G
*************************** 1. row ***************************
       Table: customer
Create Table: CREATE TABLE `customer` (
  `C_CUSTKEY` bigint(20) NOT NULL,
  `C_NAME` varchar(25) NOT NULL,
  `C_ADDRESS` varchar(40) NOT NULL,
  `C_NATIONKEY` bigint(20) NOT NULL,
  `C_PHONE` char(15) NOT NULL,
  `C_ACCTBAL` decimal(15,2) NOT NULL,
  `C_MKTSEGMENT` char(10) NOT NULL,
  `C_COMMENT` varchar(117) NOT NULL,
  PRIMARY KEY (`C_CUSTKEY`) /*T![clustered_index] CLUSTERED */,
  KEY `customer_idx1` (`C_PHONE`),
  KEY `customer_idx2` (`C_NATIONKEY`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin
1 row in set (0.00 sec)

mysql> show create table orders \G
*************************** 1. row ***************************
       Table: orders
Create Table: CREATE TABLE `orders` (
  `O_ORDERKEY` bigint(20) NOT NULL,
  `O_CUSTKEY` bigint(20) NOT NULL,
  `O_ORDERSTATUS` char(1) NOT NULL,
  `O_TOTALPRICE` decimal(15,2) NOT NULL,
  `O_ORDERDATE` date NOT NULL,
  `O_ORDERPRIORITY` char(15) NOT NULL,
  `O_CLERK` char(15) NOT NULL,
  `O_SHIPPRIORITY` bigint(20) NOT NULL,
  `O_COMMENT` varchar(79) NOT NULL,
  PRIMARY KEY (`O_ORDERKEY`) /*T![clustered_index] CLUSTERED */,
  KEY `orders_idx1` (`O_ORDERDATE`),
  KEY `o_custkey` (`O_CUSTKEY`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin
1 row in set (0.00 sec)

如下SQL语句的执行计划:

mysql> explain select a.C_NAME,b.price from tpch.customer a left join (select o_custkey,sum(O_TOTALPRICE) as price from tpch.orders group by o_custkey) b on a.C_CUSTKEY=b.O_CUSTKEY where a.C_PHONE='23-768-687-3665';
+------------------------------------+-------------+-----------+---------------------------------------+-------------------------------------------------------------------------------------------------------------------------------+
| id                                 | estRows     | task      | access object                         | operator info                                                                                                                 |
+------------------------------------+-------------+-----------+---------------------------------------+-------------------------------------------------------------------------------------------------------------------------------+
| HashJoin_13                        | 1.02        | root      |                                       | left outer join, equal:[eq(tpch.customer.c_custkey, tpch.orders.o_custkey)]                                                   |
| ├─IndexLookUp_31(Build)            | 1.02        | root      |                                       |                                                                                                                               |
| │ ├─IndexRangeScan_29(Build)       | 1.02        | cop[tikv] | table:a, index:customer_idx1(C_PHONE) | range:["23-768-687-3665","23-768-687-3665"], keep order:false                                                                 |
| │ └─TableRowIDScan_30(Probe)       | 1.02        | cop[tikv] | table:a                               | keep order:false                                                                                                              |
| └─HashAgg_38(Probe)                | 1009664.00  | root      |                                       | group by:tpch.orders.o_custkey, funcs:sum(Column#21)->Column#18, funcs:firstrow(tpch.orders.o_custkey)->tpch.orders.o_custkey |
|   └─TableReader_39                 | 1009664.00  | root      |                                       | data:HashAgg_32                                                                                                               |
|     └─HashAgg_32                   | 1009664.00  | cop[tikv] |                                       | group by:tpch.orders.o_custkey, funcs:sum(tpch.orders.o_totalprice)->Column#21                                                |
|       └─TableFullScan_37           | 15000000.00 | cop[tikv] | table:orders                          | keep order:false                                                                                                              |
+------------------------------------+-------------+-----------+---------------------------------------+-------------------------------------------------------------------------------------------------------------------------------+
8 rows in set (0.01 sec)

我理解最佳的执行计划路径为a表先通过customer_idx1索引过滤,然后通过INL_JOIN b表(b表的o_custkey索引),通过B表o_custkey索引,利用索引回表做sum(O_TOTALPRICE),这样子是最佳的执行路径,但是实际情况并没有这么走,那么我手工加上hint,强制让其走:

mysql> explain select /*+ INL_JOIN(a,b) */ a.C_NAME,b.price from tpch.customer a left join (select o_custkey,sum(O_TOTALPRICE) as price from tpch.orders group by o_custkey) b on a.C_CUSTKEY=b.O_CUSTKEY where a.C_PHONE='23-768-687-3665';
+------------------------------------+-------------+-----------+---------------------------------------+-------------------------------------------------------------------------------------------------------------------------------+
| id                                 | estRows     | task      | access object                         | operator info                                                                                                                 |
+------------------------------------+-------------+-----------+---------------------------------------+-------------------------------------------------------------------------------------------------------------------------------+
| HashJoin_12                        | 1.02        | root      |                                       | left outer join, equal:[eq(tpch.customer.c_custkey, tpch.orders.o_custkey)]                                                   |
| ├─IndexLookUp_30(Build)            | 1.02        | root      |                                       |                                                                                                                               |
| │ ├─IndexRangeScan_28(Build)       | 1.02        | cop[tikv] | table:a, index:customer_idx1(C_PHONE) | range:["23-768-687-3665","23-768-687-3665"], keep order:false                                                                 |
| │ └─TableRowIDScan_29(Probe)       | 1.02        | cop[tikv] | table:a                               | keep order:false                                                                                                              |
| └─HashAgg_37(Probe)                | 1009664.00  | root      |                                       | group by:tpch.orders.o_custkey, funcs:sum(Column#21)->Column#18, funcs:firstrow(tpch.orders.o_custkey)->tpch.orders.o_custkey |
|   └─TableReader_38                 | 1009664.00  | root      |                                       | data:HashAgg_31                                                                                                               |
|     └─HashAgg_31                   | 1009664.00  | cop[tikv] |                                       | group by:tpch.orders.o_custkey, funcs:sum(tpch.orders.o_totalprice)->Column#21                                                |
|       └─TableFullScan_36           | 15000000.00 | cop[tikv] | table:orders                          | keep order:false                                                                                                              |
+------------------------------------+-------------+-----------+---------------------------------------+-------------------------------------------------------------------------------------------------------------------------------+
8 rows in set, 2 warnings (0.00 sec)

mysql> show warnings;
+---------+------+-----------------------------------------------------------------------------------------------------------------------------------------------------+
| Level   | Code | Message                                                                                                                                             |
+---------+------+-----------------------------------------------------------------------------------------------------------------------------------------------------+
| Warning | 1815 | There are no matching table names for (b) in optimizer hint /*+ INL_JOIN(a, b) */ or /*+ TIDB_INLJ(a, b) */. Maybe you can use the table alias name |
| Warning | 1815 | Optimizer Hint /*+ INL_JOIN(a, b) */ or /*+ TIDB_INLJ(a, b) */ is inapplicable                                                                      |
+---------+------+-----------------------------------------------------------------------------------------------------------------------------------------------------+
2 rows in set (0.00 sec)

可以看到并没有和预想的结果一样,看warnings是不能这样指定执行计划。

接下来我们执行查看耗时情况(为避免hash_agg发生oom,这里采用非并行模式做hash_agg):

mysql> set tidb_hashagg_partial_concurrency=1;
Query OK, 0 rows affected, 1 warning (0.00 sec)
mysql> select a.C_NAME,b.price from tpch.customer a left join (select o_custkey,sum(O_TOTALPRICE) as price from tpch.orders group by o_custkey) b on a.C_CUSTKEY=b.O_CUSTKEY where a.C_PHONE='23-768-687-3665';
+--------------------+-----------+
| C_NAME             | price     |
+--------------------+-----------+
| Customer#000000002 | 481044.54 |
+--------------------+-----------+
1 row in set (10.86 sec)

可以看到执行了10多秒。
同样的数据量我导入到DB2中进行执行观察其执行计划和执行耗时(因执行计划输出较多,这里只贴局部关键信息):

Optimizer Plan:

                        Rows   
                      Operator 
                        (ID)   
                        Cost   
                              
                         1    
                      RETURN  
                       ( 1)   
                      63.7538 
                        |     
                         1    
                       GRPBY  
                       ( 2)   
                      63.7531 
                        |     
                      9.95056 
                      NLJOIN  
                       ( 3)   
                      63.7507 
                  /--/       \---\
          0.995056                  10    
           TBSCAN                  FETCH  
            ( 4)                   (--)   
          20.4589                 43.2918 
             |                   /       \
          0.995056             10     1.5e+07 
            SORT             RIDSCN   Table:  
            ( 5)              ( 9)    TPCH    
          20.4575            13.6895  ORDERS  
             |                 |     
          0.995056             10    
           FETCH              SORT   
            ( 6)              (10)   
           20.455            13.6884 
         /        \            |    
    0.995056    1.5e+06        10   
     IXSCAN     Table:       IXSCAN 
      ( 7)      TPCH          (11)  
    13.6718     CUSTOMER     13.685 
      |                       |       
  1.49996e+06               999982    
 Index:                   Index:      
 TPCH                     TPCH        
 CUSTOMER_IDX1            ORDERS_IDX2 


[db2inst1@host1 ~]$ time db2 "select a.C_NAME,b.price from tpch.customer a left join (select o_custkey,sum(O_TOTALPRICE) as price from tpch.orders group by o_custkey) b on a.C_CUSTKEY=b.O_CUSTKEY where a.C_PHONE='23-768-687-3665'"                    

C_NAME                    PRICE                            
------------------------- ---------------------------------
Customer#000000002                                481044.54

  1 record(s) selected.


real    0m0.034s
user    0m0.008s
sys     0m0.018s

可以看到DB2是自动使用了正确的执行计划,并且效率很高。

所以这里的疑问是在存在索引的情况下,为何手工指定hint走INL_JOIN关联,但是实际并不走并显示这种关联不可用呢?

你可以这样改造试试。

mysql> explain with b as (select id,count(1) as num from sbtest2 t2 group by t2.id) select a.id ,b.num  from sbtest1 a, b where a.id=b.id;
+--------------------------------+----------+-----------+------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------+
| id                             | estRows  | task      | access object          | operator info                                                                                                                                   |
+--------------------------------+----------+-----------+------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------+
| Projection_19                  | 44513.60 | root      |                        | sbtest.sbtest1.id, Column#11                                                                                                                    |
| └─IndexHashJoin_26             | 44513.60 | root      |                        | inner join, inner:TableReader_21, outer key:sbtest.sbtest2.id, inner key:sbtest.sbtest1.id, equal cond:eq(sbtest.sbtest2.id, sbtest.sbtest1.id) |
|   ├─Selection_31(Build)        | 44513.60 | root      |                        | not(isnull(sbtest.sbtest2.id))                                                                                                                  |
|   │ └─CTEFullScan_32           | 55642.00 | root      | CTE:b                  | data:CTE_0                                                                                                                                      |
|   └─TableReader_21(Probe)      | 1.00     | root      |                        | data:TableRangeScan_20                                                                                                                          |
|     └─TableRangeScan_20        | 1.00     | cop[tikv] | table:a                | range: decided by [sbtest.sbtest2.id], keep order:false                                                                                         |
| CTE_0                          | 55642.00 | root      |                        | Non-Recursive CTE                                                                                                                               |
| └─Projection_11(Seed Part)     | 55642.00 | root      |                        | sbtest.sbtest2.id, 1->Column#5                                                                                                                  |
|   └─IndexReader_15             | 55642.00 | root      |                        | index:IndexFullScan_14                                                                                                                          |
|     └─IndexFullScan_14         | 55642.00 | cop[tikv] | table:t2, index:k_2(k) | keep order:false                                                                                                                                |
+--------------------------------+----------+-----------+------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------+
10 rows in set (0.01 sec)

8.0的语法也支持?

8.0 ? 我用的是 tidb 5.1.4 版本。

with是MySQL 8.0的语法

看执行计划CTE被先单独执行了,这样子效率还是很低的。
其实我这个原来的业务语句就是用了with结构,我剥离出来的。

再收集一次统计信息呢?

统计信息是准的,而且我还指定了hint。

目前看好像不行,cte 部分是要独立运行出来的。

mysql> explain with b as (select id,sum(k) as num from sbtest1 group by id), a as (select * from sbtest2) select a.id ,b.num  from a,b where a.id=b.id;
+--------------------------------+-------------+-----------+-----------------------------+---------------------------------------------------------------------------+
| id                             | estRows     | task      | access object               | operator info                                                             |
+--------------------------------+-------------+-----------+-----------------------------+---------------------------------------------------------------------------+
| HashJoin_28                    | 641.82      | root      |                             | inner join, equal:[eq(sbtest.sbtest2.id, sbtest.sbtest1.id)]              |
| ├─Selection_29(Build)          | 640.00      | root      |                             | not(isnull(sbtest.sbtest2.id))                                            |
| │ └─CTEFullScan_30             | 800.00      | root      | CTE:a                       | data:CTE_1                                                                |
| └─Selection_31(Probe)          | 8000000.00  | root      |                             | not(isnull(sbtest.sbtest1.id))                                            |
|   └─CTEFullScan_32             | 10000000.00 | root      | CTE:b                       | data:CTE_0                                                                |
| CTE_1                          | 800.00      | root      |                             | Non-Recursive CTE                                                         |
| └─Selection_14(Seed Part)      | 800.00      | root      |                             | not(isnull(sbtest.sbtest2.id))                                            |
|   └─TableReader_17             | 1000.00     | root      |                             | data:TableFullScan_16                                                     |
|     └─TableFullScan_16         | 1000.00     | cop[tikv] | table:sbtest2               | keep order:false                                                          |
| CTE_0                          | 10000000.00 | root      |                             | Non-Recursive CTE                                                         |
| └─Projection_20(Seed Part)     | 10000000.00 | root      |                             | sbtest.sbtest1.id, cast(sbtest.sbtest1.k, decimal(32,0) BINARY)->Column#5 |
|   └─IndexReader_24             | 10000000.00 | root      |                             | index:IndexFullScan_23                                                    |
|     └─IndexFullScan_23         | 10000000.00 | cop[tikv] | table:sbtest1, index:k_1(k) | keep order:false                                                          |
+--------------------------------+-------------+-----------+-----------------------------+---------------------------------------------------------------------------+
13 rows in set (0.00 sec)

从这里看感觉这种select a,sum(b) from table group by a形式的SQL语句就是不走索引,而且也没有任何warning。

mysql> explain select /*+ USE_INDEX(@sel_2 x@sel_2,o_custkey) */ * from (select o_custkey,sum(O_TOTALPRICE) as price from tpch.orders x group by o_custkey) as b;
+------------------------------+-------------+-----------+---------------+-------------------------------------------------------------------------------------------------------------------------------+
| id                           | estRows     | task      | access object | operator info                                                                                                                 |
+------------------------------+-------------+-----------+---------------+-------------------------------------------------------------------------------------------------------------------------------+
| Projection_10                | 1009664.00  | root      |               | tpch.orders.o_custkey, Column#10                                                                                              |
| └─HashAgg_14                 | 1009664.00  | root      |               | group by:tpch.orders.o_custkey, funcs:sum(Column#11)->Column#10, funcs:firstrow(tpch.orders.o_custkey)->tpch.orders.o_custkey |
|   └─TableReader_15           | 1009664.00  | root      |               | data:HashAgg_16                                                                                                               |
|     └─HashAgg_16             | 1009664.00  | cop[tikv] |               | group by:tpch.orders.o_custkey, funcs:sum(tpch.orders.o_totalprice)->Column#11                                                |
|       └─TableFullScan_13     | 15000000.00 | cop[tikv] | table:x       | keep order:false                                                                                                              |
+------------------------------+-------------+-----------+---------------+-------------------------------------------------------------------------------------------------------------------------------+
5 rows in set (0.00 sec)


mysql> explain select  * from (select /*+ USE_INDEX(x,o_custkey) */ o_custkey,sum(O_TOTALPRICE) as price from tpch.orders x group by o_custkey) as b;
+------------------------------+-------------+-----------+---------------+-------------------------------------------------------------------------------------------------------------------------------+
| id                           | estRows     | task      | access object | operator info                                                                                                                 |
+------------------------------+-------------+-----------+---------------+-------------------------------------------------------------------------------------------------------------------------------+
| Projection_10                | 1009664.00  | root      |               | tpch.orders.o_custkey, Column#10                                                                                              |
| └─HashAgg_14                 | 1009664.00  | root      |               | group by:tpch.orders.o_custkey, funcs:sum(Column#11)->Column#10, funcs:firstrow(tpch.orders.o_custkey)->tpch.orders.o_custkey |
|   └─TableReader_15           | 1009664.00  | root      |               | data:HashAgg_16                                                                                                               |
|     └─HashAgg_16             | 1009664.00  | cop[tikv] |               | group by:tpch.orders.o_custkey, funcs:sum(tpch.orders.o_totalprice)->Column#11                                                |
|       └─TableFullScan_13     | 15000000.00 | cop[tikv] | table:x       | keep order:false                                                                                                              |
+------------------------------+-------------+-----------+---------------+-------------------------------------------------------------------------------------------------------------------------------+
5 rows in set (0.00 sec)

mysql> explain select /*+ USE_INDEX(@sel_2 x@sel_2,o_custkey) */ * from (select o_custkey from tpch.orders x group by o_custkey) as b;
+----------------------------+-------------+-----------+-------------------------------------+----------------------------------------------------------------------------------------------+
| id                         | estRows     | task      | access object                       | operator info                                                                                |
+----------------------------+-------------+-----------+-------------------------------------+----------------------------------------------------------------------------------------------+
| HashAgg_18                 | 1009664.00  | root      |                                     | group by:tpch.orders.o_custkey, funcs:firstrow(tpch.orders.o_custkey)->tpch.orders.o_custkey |
| └─IndexReader_19           | 1009664.00  | root      |                                     | index:HashAgg_20                                                                             |
|   └─HashAgg_20             | 1009664.00  | cop[tikv] |                                     | group by:tpch.orders.o_custkey,                                                              |
|     └─IndexFullScan_17     | 15000000.00 | cop[tikv] | table:x, index:o_custkey(O_CUSTKEY) | keep order:false                                                                             |
+----------------------------+-------------+-----------+-------------------------------------+----------------------------------------------------------------------------------------------+
4 rows in set (0.00 sec)

o_custkey重复度比较低,这里也不走索引。

表总记录数1.5千万
mysql> select * from mysql.stats_meta where table_id = (select tidb_table_id from information_schema.tables a where a.table_name='orders' and a.table_schema='tpch');
+--------------------+----------+--------------+----------+--------------------+
| version            | table_id | modify_count | count    | snapshot           |
+--------------------+----------+--------------+----------+--------------------+
| 436371908666064898 |      129 |            0 | 15000000 | 436371895938449409 |
+--------------------+----------+--------------+----------+--------------------+
1 row in set (0.02 sec)

O_CUSTKEY字段基数1百万,选择度基本上为1/15
mysql> select distinct_count,null_count,modify_count from stats_histograms where (table_id,hist_id) in (select tidb_table_id,ORDINAL_POSITION from information_schema.tables a,information_schema.columns b  where a.table_name=b.table_name and b.table_name='orders' and b.COLUMN_NAME='O_CUSTKEY' and b.TABLE_SCHEMA='tpch');
+----------------+------------+--------------+
| distinct_count | null_count | modify_count |
+----------------+------------+--------------+
|        1009664 |          0 |            0 |
+----------------+------------+--------------+
1 row in set (0.06 sec)


其实这里如果走索引那么效率应该会很高,但是还是没有走。可能上面那个关联条件不走INL_JOIN的一个原因是b表无法利用索引,看如下SQL执行计划(不走索引):
mysql> explain select o_custkey,sum(O_TOTALPRICE) as price from tpch.orders x where o_custkey=369001 group by o_custkey;
+--------------------------------+-------------+-----------+---------------+-------------------------------------------------------------------------------------------------------------------------------+
| id                             | estRows     | task      | access object | operator info                                                                                                                 |
+--------------------------------+-------------+-----------+---------------+-------------------------------------------------------------------------------------------------------------------------------+
| Projection_10                  | 807731.20   | root      |               | tpch.orders.o_custkey, Column#10                                                                                              |
| └─HashAgg_15                   | 807731.20   | root      |               | group by:tpch.orders.o_custkey, funcs:sum(Column#11)->Column#10, funcs:firstrow(tpch.orders.o_custkey)->tpch.orders.o_custkey |
|   └─TableReader_16             | 807731.20   | root      |               | data:HashAgg_17                                                                                                               |
|     └─HashAgg_17               | 807731.20   | cop[tikv] |               | group by:tpch.orders.o_custkey, funcs:sum(tpch.orders.o_totalprice)->Column#11                                                |
|       └─Selection_13           | 12000000.00 | cop[tikv] |               | eq(tpch.orders.o_custkey, 369001)                                                                                             |
|         └─TableFullScan_14     | 15000000.00 | cop[tikv] | table:x       | keep order:false                                                                                                              |
+--------------------------------+-------------+-----------+---------------+-------------------------------------------------------------------------------------------------------------------------------+
6 rows in set (0.00 sec)

这是否为优化器的缺陷呢?

所以这种没有太大意义,CTE解决不了这个问题。

这个查询即使加了 INL_JOIN 也不生效是因为目前 IndexJoin 只支持 inner side 是一个 DataSource,不支持 inner side 是 Agg->DataSource,而 b 是 Agg->DataSource 的形式。

一种 workaround 的方式是把这个查询的 left join 改写为子查询的形式且让优化器不对子查询进行解关联,这种情况下用 Apply 算子去执行(a 每取一行,然后去 b 里找匹配的行),就能利用上 orderso_custkey 索引。

mysql> explain select
    ->     a.C_NAME,
    ->     (
    ->         select
    ->             sum(O_TOTALPRICE)
    ->         from
    ->             tpch.orders b
    ->         where
    ->             b.O_CUSTKEY = a.C_CUSTKEY
    ->         group by
    ->             o_custkey
    ->     ) as price
    -> from
    ->     tpch.customer a use index(customer_idx1)
    -> where
    ->     a.C_PHONE = '23-768-687-3665';
+------------------------------------------+---------+-----------+---------------------------------------+-------------------------------------------------------------------------------------------------------+
| id                                       | estRows | task      | access object                         | operator info                                                                                         |
+------------------------------------------+---------+-----------+---------------------------------------+-------------------------------------------------------------------------------------------------------+
| Projection_11                            | 0.00    | root      |                                       | tpch.customer.c_name, Column#27                                                                       |
| └─Apply_13                               | 0.00    | root      |                                       | CARTESIAN left outer join                                                                             |
|   ├─IndexLookUp_16(Build)                | 1.00    | root      |                                       |                                                                                                       |
|   │ ├─IndexRangeScan_14(Build)           | 1.00    | cop[tikv] | table:a, index:customer_idx1(C_PHONE) | range:["23-768-687-3665","23-768-687-3665"], keep order:false, stats:pseudo                           |
|   │ └─TableRowIDScan_15(Probe)           | 1.00    | cop[tikv] | table:a                               | keep order:false, stats:pseudo                                                                        |
|   └─MaxOneRow_17(Probe)                  | 1.00    | root      |                                       |                                                                                                       |
|     └─StreamAgg_22                       | 2.00    | root      |                                       | group by:tpch.orders.o_custkey, funcs:sum(tpch.orders.o_totalprice)->Column#27                        |
|       └─Projection_40                    | 2.50    | root      |                                       | tpch.orders.o_custkey, tpch.orders.o_totalprice                                                       |
|         └─IndexLookUp_39                 | 2.50    | root      |                                       |                                                                                                       |
|           ├─IndexRangeScan_37(Build)     | 2.50    | cop[tikv] | table:b, index:o_custkey(O_CUSTKEY)   | range: decided by [eq(tpch.orders.o_custkey, tpch.customer.c_custkey)], keep order:true, stats:pseudo |
|           └─TableRowIDScan_38(Probe)     | 2.50    | cop[tikv] | table:b                               | keep order:false, stats:pseudo                                                                        |
+------------------------------------------+---------+-----------+---------------------------------------+-------------------------------------------------------------------------------------------------------+
11 rows in set (0.00 sec)

子查询中是否加 group by o_custkey 语义都是一样的,这里加了 group by o_custkey 是为了让优化器不能进行解关联(把 Apply 转化成 Join)。

或者在 6.4 引入 NO_DECORRELATE hint 之后,也去掉 group by o_custkey 然后加这个 hint 避免解关联。

mysql> explain select
    ->     a.C_NAME,
    ->     (
    ->         select /*+ NO_DECORRELATE() */
    ->             sum(O_TOTALPRICE)
    ->         from
    ->             tpch.orders b
    ->         where
    ->             b.O_CUSTKEY = a.C_CUSTKEY
    ->     ) as price
    -> from
    ->     tpch.customer a use index(customer_idx1)
    -> where
    ->     a.C_PHONE = '23-768-687-3665';
+----------------------------------------+---------+-----------+---------------------------------------+--------------------------------------------------------------------------------------------------------+
| id                                     | estRows | task      | access object                         | operator info                                                                                          |
+----------------------------------------+---------+-----------+---------------------------------------+--------------------------------------------------------------------------------------------------------+
| Projection_11                          | 0.00    | root      |                                       | tpch.customer.c_name, Column#27                                                                        |
| └─Apply_13                             | 0.00    | root      |                                       | CARTESIAN left outer join                                                                              |
|   ├─IndexLookUp_16(Build)              | 1.00    | root      |                                       |                                                                                                        |
|   │ ├─IndexRangeScan_14(Build)         | 1.00    | cop[tikv] | table:a, index:customer_idx1(C_PHONE) | range:["23-768-687-3665","23-768-687-3665"], keep order:false, stats:pseudo                            |
|   │ └─TableRowIDScan_15(Probe)         | 1.00    | cop[tikv] | table:a                               | keep order:false, stats:pseudo                                                                         |
|   └─MaxOneRow_17(Probe)                | 1.00    | root      |                                       |                                                                                                        |
|     └─HashAgg_29                       | 1.00    | root      |                                       | funcs:sum(Column#30)->Column#27                                                                        |
|       └─IndexLookUp_30                 | 1.00    | root      |                                       |                                                                                                        |
|         ├─IndexRangeScan_27(Build)     | 10.00   | cop[tikv] | table:b, index:o_custkey(O_CUSTKEY)   | range: decided by [eq(tpch.orders.o_custkey, tpch.customer.c_custkey)], keep order:false, stats:pseudo |
|         └─HashAgg_19(Probe)            | 1.00    | cop[tikv] |                                       | funcs:sum(tpch.orders.o_totalprice)->Column#30                                                         |
|           └─TableRowIDScan_28          | 10.00   | cop[tikv] | table:b                               | keep order:false, stats:pseudo                                                                         |
+----------------------------------------+---------+-----------+---------------------------------------+--------------------------------------------------------------------------------------------------------+
11 rows in set (0.00 sec)

非常感谢大佬的解答,已经说的非常明白。追问下,这个indexjoin的inner side是否后续也会考虑对比如Agg->DataSource的支持?

其实 index join 和 apply 的执行方式是类似的,都是 nested loop 的形式,index join 的效率会更高一些。对于子查询是 Agg->DataSource 的形式用不上 index join 的话,用 NO_DECORRELATE hint 走 apply 算子基本能解决大部分问题。但像这里的查询本身就是 join 的话,需要改写成子查询形式才能用 apply,可能让 index join 支持 Agg->DataSource 形式的 inner side 会比较好。

2 个赞

此话题已在最后回复的 60 天后被自动关闭。不再允许新回复。

这里补充下,如下这个pr已经做了修复:https://github.com/pingcap/tidb/pull/51354
应该会在8.0以后版本中体现出来。

学习了