【 TiDB 使用环境】测试
【 TiDB 版本】v7.0.0
在文档执行计划概览一章中有介绍build和probe含义的一些描述:
https://docs.pingcap.com/zh/tidb/stable/explain-overview
这里描述的没有问题,但是在执行计划中都有build和probe很容易让人和hashjoin等算子真正执行时候的build、probe混淆。
1、hash_join,总是选择一个小表作为buildSide,在执行计划中也是在build端,没有问题。
mysql> explain select count(*) from (select /*+ hash_join(a,b) */ * from customer a,orders b where a.c_custkey=b.o_custkey)a;
+-------------------------------+--------------+-----------+--------------------------------+------------------------------------------------------------------------------+
| id | estRows | task | access object | operator info |
+-------------------------------+--------------+-----------+--------------------------------+------------------------------------------------------------------------------+
| StreamAgg_10 | 1.00 | root | | funcs:count(1)->Column#18 |
| └─HashJoin_60 | 150000000.00 | root | | inner join, equal:[eq(tpch100.customer.c_custkey, tpch100.orders.o_custkey)] |
| ├─IndexReader_35(Build) | 15000000.00 | root | | index:IndexFullScan_34 |
| │ └─IndexFullScan_34 | 15000000.00 | cop[tikv] | table:a, index:idx1(C_PHONE) | keep order:false |
| └─IndexReader_39(Probe) | 150000000.00 | root | | index:IndexFullScan_38 |
| └─IndexFullScan_38 | 150000000.00 | cop[tikv] | table:b, index:idx1(O_CUSTKEY) | keep order:false |
+-------------------------------+--------------+-----------+--------------------------------+------------------------------------------------------------------------------+
6 rows in set (0.01 sec)
2、inl_join,outer表取部分记录,查找inner表匹配的记录,并在inner表查找到的结果集上创建hash_table作为buildSide,outer的记录进行probe。
从inner:IndexReader_40可以看出IndexReader_40是inner表,在实际执行时候发生了build,这里和执行计划标记的正好相反,很容易让人混淆。
mysql> explain select count(*) from (select /*+ inl_join(a,b) */ * from customer a,orders b where a.c_custkey=b.o_custkey)a;
+-------------------------------+--------------+-----------+--------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| id | estRows | task | access object | operator info |
+-------------------------------+--------------+-----------+--------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| StreamAgg_10 | 1.00 | root | | funcs:count(1)->Column#18 |
| └─IndexJoin_41 | 150000000.00 | root | | inner join, inner:IndexReader_40, outer key:tpch100.customer.c_custkey, inner key:tpch100.orders.o_custkey, equal cond:eq(tpch100.customer.c_custkey, tpch100.orders.o_custkey) |
| ├─IndexReader_33(Build) | 15000000.00 | root | | index:IndexFullScan_32 |
| │ └─IndexFullScan_32 | 15000000.00 | cop[tikv] | table:a, index:idx1(C_PHONE) | keep order:false |
| └─IndexReader_40(Probe) | 150000000.00 | root | | index:IndexRangeScan_39 |
| └─IndexRangeScan_39 | 150000000.00 | cop[tikv] | table:b, index:idx1(O_CUSTKEY) | range: decided by [eq(tpch100.orders.o_custkey, tpch100.customer.c_custkey)], keep order:false |
+-------------------------------+--------------+-----------+--------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
6 rows in set (0.00 sec)
如果Build总是先于Probe执行,那么可以按照从右到左,从上到下的执行原则即可,不用要标记Build和Probe就可以吧?标记了反而和内部实现更容易混淆。或者直接标记(inner、outer)是否更合理?