逻辑优化未能消除无用排序

对于tpch表中的orders表有如下查询:
select max(o_clerk),min(o_clerk) from (select o_clerk from orders order by o_clerk desc) a;
查看执行计划:

mysql> explain select max(o_clerk),min(o_clerk) from (select o_clerk from orders order by o_clerk desc) a;
+----------------------------+-------------+-----------+---------------+----------------------------------------------------------------------------------------+
| id                         | estRows     | task      | access object | operator info                                                                          |
+----------------------------+-------------+-----------+---------------+----------------------------------------------------------------------------------------+
| StreamAgg_8                | 1.00        | root      |               | funcs:max(tpch1.orders.o_clerk)->Column#10, funcs:min(tpch1.orders.o_clerk)->Column#11 |
| └─Sort_13                  | 15000000.00 | root      |               | tpch1.orders.o_clerk:desc                                                              |
|   └─TableReader_12         | 15000000.00 | root      |               | data:TableFullScan_11                                                                  |
|     └─TableFullScan_11     | 15000000.00 | cop[tikv] | table:orders  | keep order:false                                                                       |
+----------------------------+-------------+-----------+---------------+----------------------------------------------------------------------------------------+
4 rows in set (0.00 sec)

可以看到还是做了Sort操作。
预期:优化器将语句进行改写为:select max(o_clerk),min(o_clerk) from orders;
预期执行计划为:

mysql> explain select max(o_clerk),min(o_clerk) from orders;
+----------------------------+-------------+-----------+---------------+----------------------------------------------------------------------------------------+
| id                         | estRows     | task      | access object | operator info                                                                          |
+----------------------------+-------------+-----------+---------------+----------------------------------------------------------------------------------------+
| StreamAgg_16               | 1.00        | root      |               | funcs:max(Column#14)->Column#10, funcs:min(Column#15)->Column#11                       |
| └─TableReader_17           | 1.00        | root      |               | data:StreamAgg_8                                                                       |
|   └─StreamAgg_8            | 1.00        | cop[tikv] |               | funcs:max(tpch1.orders.o_clerk)->Column#14, funcs:min(tpch1.orders.o_clerk)->Column#15 |
|     └─TableFullScan_15     | 15000000.00 | cop[tikv] | table:orders  | keep order:false                                                                       |
+----------------------------+-------------+-----------+---------------+----------------------------------------------------------------------------------------+
4 rows in set (0.01 sec)

该类问题是否可以进行优化增强?

还是没那么智能 :grinning:

这样写就是经典的子查询啊,它肯定优先查select o_clerk from orders order by o_clerk desc这一段。。。

无效关联的消除也需要加强,如下:

mysql> show create table customer \G
*************************** 1. row ***************************
       Table: customer
Create Table: CREATE TABLE `customer` (
  `C_CUSTKEY` bigint(20) NOT NULL,
  `C_NAME` varchar(25) NOT NULL,
  `C_ADDRESS` varchar(40) NOT NULL,
  `C_NATIONKEY` bigint(20) NOT NULL,
  `C_PHONE` char(15) NOT NULL,
  `C_ACCTBAL` decimal(15,2) NOT NULL,
  `C_MKTSEGMENT` char(10) NOT NULL,
  `C_COMMENT` varchar(117) NOT NULL,
  PRIMARY KEY (`C_CUSTKEY`) /*T![clustered_index] CLUSTERED */
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin
1 row in set (0.00 sec)

mysql> show create table nation \G
*************************** 1. row ***************************
       Table: nation
Create Table: CREATE TABLE `nation` (
  `N_NATIONKEY` bigint(20) NOT NULL,
  `N_NAME` char(25) NOT NULL,
  `N_REGIONKEY` bigint(20) NOT NULL,
  `N_COMMENT` varchar(152) DEFAULT NULL,
  PRIMARY KEY (`N_NATIONKEY`) /*T![clustered_index] CLUSTERED */
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin
1 row in set (0.00 sec)

mysql> explain select count(*) from customer a left join nation b on a.C_NATIONKEY=b.N_NATIONKEY;
+-------------------------------+------------+-----------+---------------+---------------------------------------------------------------------------------+
| id                            | estRows    | task      | access object | operator info                                                                   |
+-------------------------------+------------+-----------+---------------+---------------------------------------------------------------------------------+
| HashAgg_8                     | 1.00       | root      |               | funcs:count(1)->Column#13                                                       |
| └─HashJoin_19                 | 1500000.00 | root      |               | left outer join, equal:[eq(tpch.customer.c_nationkey, tpch.nation.n_nationkey)] |
|   ├─TableReader_24(Build)     | 25.00      | root      |               | data:TableFullScan_23                                                           |
|   │ └─TableFullScan_23        | 25.00      | cop[tikv] | table:b       | keep order:false, stats:pseudo                                                  |
|   └─TableReader_22(Probe)     | 1500000.00 | root      |               | data:TableFullScan_21                                                           |
|     └─TableFullScan_21        | 1500000.00 | cop[tikv] | table:a       | keep order:false                                                                |
+-------------------------------+------------+-----------+---------------+---------------------------------------------------------------------------------+
6 rows in set (0.00 sec)

实际上因为nation表的N_NATIONKEY是主键,应该直接走customer主键的count才对,这块希望也能在逻辑优化层面进行关联消除。