【 TiDB 使用环境】测试
【 TiDB 版本】v7.0.0
在文档执行计划概览一章中有介绍build和probe含义的一些描述:
https://docs.pingcap.com/zh/tidb/stable/explain-overview
这里描述的没有问题,但是在执行计划中都有build和probe很容易让人和hashjoin等算子真正执行时候的build、probe混淆。
1、hash_join,总是选择一个小表作为buildSide,在执行计划中也是在build端,没有问题。
mysql> explain select count(*) from (select /*+ hash_join(a,b) */ * from customer a,orders b where a.c_custkey=b.o_custkey)a;
+-------------------------------+--------------+-----------+--------------------------------+------------------------------------------------------------------------------+
| id                            | estRows      | task      | access object                  | operator info                                                                |
+-------------------------------+--------------+-----------+--------------------------------+------------------------------------------------------------------------------+
| StreamAgg_10                  | 1.00         | root      |                                | funcs:count(1)->Column#18                                                    |
| └─HashJoin_60                 | 150000000.00 | root      |                                | inner join, equal:[eq(tpch100.customer.c_custkey, tpch100.orders.o_custkey)] |
|   ├─IndexReader_35(Build)     | 15000000.00  | root      |                                | index:IndexFullScan_34                                                       |
|   │ └─IndexFullScan_34        | 15000000.00  | cop[tikv] | table:a, index:idx1(C_PHONE)   | keep order:false                                                             |
|   └─IndexReader_39(Probe)     | 150000000.00 | root      |                                | index:IndexFullScan_38                                                       |
|     └─IndexFullScan_38        | 150000000.00 | cop[tikv] | table:b, index:idx1(O_CUSTKEY) | keep order:false                                                             |
+-------------------------------+--------------+-----------+--------------------------------+------------------------------------------------------------------------------+
6 rows in set (0.01 sec)
2、inl_join,outer表取部分记录,查找inner表匹配的记录,并在inner表查找到的结果集上创建hash_table作为buildSide,outer的记录进行probe。
从inner:IndexReader_40可以看出IndexReader_40是inner表,在实际执行时候发生了build,这里和执行计划标记的正好相反,很容易让人混淆。
mysql> explain select count(*) from (select /*+ inl_join(a,b) */ * from customer a,orders b where a.c_custkey=b.o_custkey)a;
+-------------------------------+--------------+-----------+--------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| id                            | estRows      | task      | access object                  | operator info                                                                                                                                                                   |
+-------------------------------+--------------+-----------+--------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| StreamAgg_10                  | 1.00         | root      |                                | funcs:count(1)->Column#18                                                                                                                                                       |
| └─IndexJoin_41                | 150000000.00 | root      |                                | inner join, inner:IndexReader_40, outer key:tpch100.customer.c_custkey, inner key:tpch100.orders.o_custkey, equal cond:eq(tpch100.customer.c_custkey, tpch100.orders.o_custkey) |
|   ├─IndexReader_33(Build)     | 15000000.00  | root      |                                | index:IndexFullScan_32                                                                                                                                                          |
|   │ └─IndexFullScan_32        | 15000000.00  | cop[tikv] | table:a, index:idx1(C_PHONE)   | keep order:false                                                                                                                                                                |
|   └─IndexReader_40(Probe)     | 150000000.00 | root      |                                | index:IndexRangeScan_39                                                                                                                                                         |
|     └─IndexRangeScan_39       | 150000000.00 | cop[tikv] | table:b, index:idx1(O_CUSTKEY) | range: decided by [eq(tpch100.orders.o_custkey, tpch100.customer.c_custkey)], keep order:false                                                                                  |
+-------------------------------+--------------+-----------+--------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
6 rows in set (0.00 sec)
如果Build总是先于Probe执行,那么可以按照从右到左,从上到下的执行原则即可,不用要标记Build和Probe就可以吧?标记了反而和内部实现更容易混淆。或者直接标记(inner、outer)是否更合理?
