从这里看感觉这种select a,sum(b) from table group by a形式的SQL语句就是不走索引,而且也没有任何warning。
mysql> explain select /*+ USE_INDEX(@sel_2 x@sel_2,o_custkey) */ * from (select o_custkey,sum(O_TOTALPRICE) as price from tpch.orders x group by o_custkey) as b;
+------------------------------+-------------+-----------+---------------+-------------------------------------------------------------------------------------------------------------------------------+
| id | estRows | task | access object | operator info |
+------------------------------+-------------+-----------+---------------+-------------------------------------------------------------------------------------------------------------------------------+
| Projection_10 | 1009664.00 | root | | tpch.orders.o_custkey, Column#10 |
| └─HashAgg_14 | 1009664.00 | root | | group by:tpch.orders.o_custkey, funcs:sum(Column#11)->Column#10, funcs:firstrow(tpch.orders.o_custkey)->tpch.orders.o_custkey |
| └─TableReader_15 | 1009664.00 | root | | data:HashAgg_16 |
| └─HashAgg_16 | 1009664.00 | cop[tikv] | | group by:tpch.orders.o_custkey, funcs:sum(tpch.orders.o_totalprice)->Column#11 |
| └─TableFullScan_13 | 15000000.00 | cop[tikv] | table:x | keep order:false |
+------------------------------+-------------+-----------+---------------+-------------------------------------------------------------------------------------------------------------------------------+
5 rows in set (0.00 sec)
mysql> explain select * from (select /*+ USE_INDEX(x,o_custkey) */ o_custkey,sum(O_TOTALPRICE) as price from tpch.orders x group by o_custkey) as b;
+------------------------------+-------------+-----------+---------------+-------------------------------------------------------------------------------------------------------------------------------+
| id | estRows | task | access object | operator info |
+------------------------------+-------------+-----------+---------------+-------------------------------------------------------------------------------------------------------------------------------+
| Projection_10 | 1009664.00 | root | | tpch.orders.o_custkey, Column#10 |
| └─HashAgg_14 | 1009664.00 | root | | group by:tpch.orders.o_custkey, funcs:sum(Column#11)->Column#10, funcs:firstrow(tpch.orders.o_custkey)->tpch.orders.o_custkey |
| └─TableReader_15 | 1009664.00 | root | | data:HashAgg_16 |
| └─HashAgg_16 | 1009664.00 | cop[tikv] | | group by:tpch.orders.o_custkey, funcs:sum(tpch.orders.o_totalprice)->Column#11 |
| └─TableFullScan_13 | 15000000.00 | cop[tikv] | table:x | keep order:false |
+------------------------------+-------------+-----------+---------------+-------------------------------------------------------------------------------------------------------------------------------+
5 rows in set (0.00 sec)
mysql> explain select /*+ USE_INDEX(@sel_2 x@sel_2,o_custkey) */ * from (select o_custkey from tpch.orders x group by o_custkey) as b;
+----------------------------+-------------+-----------+-------------------------------------+----------------------------------------------------------------------------------------------+
| id | estRows | task | access object | operator info |
+----------------------------+-------------+-----------+-------------------------------------+----------------------------------------------------------------------------------------------+
| HashAgg_18 | 1009664.00 | root | | group by:tpch.orders.o_custkey, funcs:firstrow(tpch.orders.o_custkey)->tpch.orders.o_custkey |
| └─IndexReader_19 | 1009664.00 | root | | index:HashAgg_20 |
| └─HashAgg_20 | 1009664.00 | cop[tikv] | | group by:tpch.orders.o_custkey, |
| └─IndexFullScan_17 | 15000000.00 | cop[tikv] | table:x, index:o_custkey(O_CUSTKEY) | keep order:false |
+----------------------------+-------------+-----------+-------------------------------------+----------------------------------------------------------------------------------------------+
4 rows in set (0.00 sec)
o_custkey重复度比较低,这里也不走索引。
表总记录数1.5千万
mysql> select * from mysql.stats_meta where table_id = (select tidb_table_id from information_schema.tables a where a.table_name='orders' and a.table_schema='tpch');
+--------------------+----------+--------------+----------+--------------------+
| version | table_id | modify_count | count | snapshot |
+--------------------+----------+--------------+----------+--------------------+
| 436371908666064898 | 129 | 0 | 15000000 | 436371895938449409 |
+--------------------+----------+--------------+----------+--------------------+
1 row in set (0.02 sec)
O_CUSTKEY字段基数1百万,选择度基本上为1/15
mysql> select distinct_count,null_count,modify_count from stats_histograms where (table_id,hist_id) in (select tidb_table_id,ORDINAL_POSITION from information_schema.tables a,information_schema.columns b where a.table_name=b.table_name and b.table_name='orders' and b.COLUMN_NAME='O_CUSTKEY' and b.TABLE_SCHEMA='tpch');
+----------------+------------+--------------+
| distinct_count | null_count | modify_count |
+----------------+------------+--------------+
| 1009664 | 0 | 0 |
+----------------+------------+--------------+
1 row in set (0.06 sec)
其实这里如果走索引那么效率应该会很高,但是还是没有走。可能上面那个关联条件不走INL_JOIN的一个原因是b表无法利用索引,看如下SQL执行计划(不走索引):
mysql> explain select o_custkey,sum(O_TOTALPRICE) as price from tpch.orders x where o_custkey=369001 group by o_custkey;
+--------------------------------+-------------+-----------+---------------+-------------------------------------------------------------------------------------------------------------------------------+
| id | estRows | task | access object | operator info |
+--------------------------------+-------------+-----------+---------------+-------------------------------------------------------------------------------------------------------------------------------+
| Projection_10 | 807731.20 | root | | tpch.orders.o_custkey, Column#10 |
| └─HashAgg_15 | 807731.20 | root | | group by:tpch.orders.o_custkey, funcs:sum(Column#11)->Column#10, funcs:firstrow(tpch.orders.o_custkey)->tpch.orders.o_custkey |
| └─TableReader_16 | 807731.20 | root | | data:HashAgg_17 |
| └─HashAgg_17 | 807731.20 | cop[tikv] | | group by:tpch.orders.o_custkey, funcs:sum(tpch.orders.o_totalprice)->Column#11 |
| └─Selection_13 | 12000000.00 | cop[tikv] | | eq(tpch.orders.o_custkey, 369001) |
| └─TableFullScan_14 | 15000000.00 | cop[tikv] | table:x | keep order:false |
+--------------------------------+-------------+-----------+---------------+-------------------------------------------------------------------------------------------------------------------------------+
6 rows in set (0.00 sec)
这是否为优化器的缺陷呢?