指定INL_JOIN,但执行计划并不走

从这里看感觉这种select a,sum(b) from table group by a形式的SQL语句就是不走索引,而且也没有任何warning。

mysql> explain select /*+ USE_INDEX(@sel_2 x@sel_2,o_custkey) */ * from (select o_custkey,sum(O_TOTALPRICE) as price from tpch.orders x group by o_custkey) as b;
+------------------------------+-------------+-----------+---------------+-------------------------------------------------------------------------------------------------------------------------------+
| id                           | estRows     | task      | access object | operator info                                                                                                                 |
+------------------------------+-------------+-----------+---------------+-------------------------------------------------------------------------------------------------------------------------------+
| Projection_10                | 1009664.00  | root      |               | tpch.orders.o_custkey, Column#10                                                                                              |
| └─HashAgg_14                 | 1009664.00  | root      |               | group by:tpch.orders.o_custkey, funcs:sum(Column#11)->Column#10, funcs:firstrow(tpch.orders.o_custkey)->tpch.orders.o_custkey |
|   └─TableReader_15           | 1009664.00  | root      |               | data:HashAgg_16                                                                                                               |
|     └─HashAgg_16             | 1009664.00  | cop[tikv] |               | group by:tpch.orders.o_custkey, funcs:sum(tpch.orders.o_totalprice)->Column#11                                                |
|       └─TableFullScan_13     | 15000000.00 | cop[tikv] | table:x       | keep order:false                                                                                                              |
+------------------------------+-------------+-----------+---------------+-------------------------------------------------------------------------------------------------------------------------------+
5 rows in set (0.00 sec)


mysql> explain select  * from (select /*+ USE_INDEX(x,o_custkey) */ o_custkey,sum(O_TOTALPRICE) as price from tpch.orders x group by o_custkey) as b;
+------------------------------+-------------+-----------+---------------+-------------------------------------------------------------------------------------------------------------------------------+
| id                           | estRows     | task      | access object | operator info                                                                                                                 |
+------------------------------+-------------+-----------+---------------+-------------------------------------------------------------------------------------------------------------------------------+
| Projection_10                | 1009664.00  | root      |               | tpch.orders.o_custkey, Column#10                                                                                              |
| └─HashAgg_14                 | 1009664.00  | root      |               | group by:tpch.orders.o_custkey, funcs:sum(Column#11)->Column#10, funcs:firstrow(tpch.orders.o_custkey)->tpch.orders.o_custkey |
|   └─TableReader_15           | 1009664.00  | root      |               | data:HashAgg_16                                                                                                               |
|     └─HashAgg_16             | 1009664.00  | cop[tikv] |               | group by:tpch.orders.o_custkey, funcs:sum(tpch.orders.o_totalprice)->Column#11                                                |
|       └─TableFullScan_13     | 15000000.00 | cop[tikv] | table:x       | keep order:false                                                                                                              |
+------------------------------+-------------+-----------+---------------+-------------------------------------------------------------------------------------------------------------------------------+
5 rows in set (0.00 sec)

mysql> explain select /*+ USE_INDEX(@sel_2 x@sel_2,o_custkey) */ * from (select o_custkey from tpch.orders x group by o_custkey) as b;
+----------------------------+-------------+-----------+-------------------------------------+----------------------------------------------------------------------------------------------+
| id                         | estRows     | task      | access object                       | operator info                                                                                |
+----------------------------+-------------+-----------+-------------------------------------+----------------------------------------------------------------------------------------------+
| HashAgg_18                 | 1009664.00  | root      |                                     | group by:tpch.orders.o_custkey, funcs:firstrow(tpch.orders.o_custkey)->tpch.orders.o_custkey |
| └─IndexReader_19           | 1009664.00  | root      |                                     | index:HashAgg_20                                                                             |
|   └─HashAgg_20             | 1009664.00  | cop[tikv] |                                     | group by:tpch.orders.o_custkey,                                                              |
|     └─IndexFullScan_17     | 15000000.00 | cop[tikv] | table:x, index:o_custkey(O_CUSTKEY) | keep order:false                                                                             |
+----------------------------+-------------+-----------+-------------------------------------+----------------------------------------------------------------------------------------------+
4 rows in set (0.00 sec)

o_custkey重复度比较低,这里也不走索引。

表总记录数1.5千万
mysql> select * from mysql.stats_meta where table_id = (select tidb_table_id from information_schema.tables a where a.table_name='orders' and a.table_schema='tpch');
+--------------------+----------+--------------+----------+--------------------+
| version            | table_id | modify_count | count    | snapshot           |
+--------------------+----------+--------------+----------+--------------------+
| 436371908666064898 |      129 |            0 | 15000000 | 436371895938449409 |
+--------------------+----------+--------------+----------+--------------------+
1 row in set (0.02 sec)

O_CUSTKEY字段基数1百万,选择度基本上为1/15
mysql> select distinct_count,null_count,modify_count from stats_histograms where (table_id,hist_id) in (select tidb_table_id,ORDINAL_POSITION from information_schema.tables a,information_schema.columns b  where a.table_name=b.table_name and b.table_name='orders' and b.COLUMN_NAME='O_CUSTKEY' and b.TABLE_SCHEMA='tpch');
+----------------+------------+--------------+
| distinct_count | null_count | modify_count |
+----------------+------------+--------------+
|        1009664 |          0 |            0 |
+----------------+------------+--------------+
1 row in set (0.06 sec)


其实这里如果走索引那么效率应该会很高,但是还是没有走。可能上面那个关联条件不走INL_JOIN的一个原因是b表无法利用索引,看如下SQL执行计划(不走索引):
mysql> explain select o_custkey,sum(O_TOTALPRICE) as price from tpch.orders x where o_custkey=369001 group by o_custkey;
+--------------------------------+-------------+-----------+---------------+-------------------------------------------------------------------------------------------------------------------------------+
| id                             | estRows     | task      | access object | operator info                                                                                                                 |
+--------------------------------+-------------+-----------+---------------+-------------------------------------------------------------------------------------------------------------------------------+
| Projection_10                  | 807731.20   | root      |               | tpch.orders.o_custkey, Column#10                                                                                              |
| └─HashAgg_15                   | 807731.20   | root      |               | group by:tpch.orders.o_custkey, funcs:sum(Column#11)->Column#10, funcs:firstrow(tpch.orders.o_custkey)->tpch.orders.o_custkey |
|   └─TableReader_16             | 807731.20   | root      |               | data:HashAgg_17                                                                                                               |
|     └─HashAgg_17               | 807731.20   | cop[tikv] |               | group by:tpch.orders.o_custkey, funcs:sum(tpch.orders.o_totalprice)->Column#11                                                |
|       └─Selection_13           | 12000000.00 | cop[tikv] |               | eq(tpch.orders.o_custkey, 369001)                                                                                             |
|         └─TableFullScan_14     | 15000000.00 | cop[tikv] | table:x       | keep order:false                                                                                                              |
+--------------------------------+-------------+-----------+---------------+-------------------------------------------------------------------------------------------------------------------------------+
6 rows in set (0.00 sec)

这是否为优化器的缺陷呢?