当JOIN的ON等值条件包含字符串函数时不能走IndexJoin

【需求涉及的问题场景】select * from a,b where a.id=b.id (走索引) and substr(a.name,1,10)=substr(b.name,1,10),在指定hint的情况下无法走IndexJoin

【期望的需求行为】可以强制走IndexJoin

表结构:

mysql> show create table customer \G
*************************** 1. row ***************************
       Table: customer
Create Table: CREATE TABLE `customer` (
  `C_CUSTKEY` bigint(20) NOT NULL,
  `C_NAME` varchar(25) NOT NULL,
  `C_ADDRESS` varchar(40) NOT NULL,
  `C_NATIONKEY` bigint(20) NOT NULL,
  `C_PHONE` char(15) NOT NULL,
  `C_ACCTBAL` decimal(15,2) NOT NULL,
  `C_MKTSEGMENT` char(10) NOT NULL,
  `C_COMMENT` varchar(117) NOT NULL,
  PRIMARY KEY (`C_CUSTKEY`) /*T![clustered_index] CLUSTERED */,
  KEY `idx1` (`C_PHONE`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin
1 row in set (0.01 sec)

mysql> show create table orders \G
*************************** 1. row ***************************
       Table: orders
Create Table: CREATE TABLE `orders` (
  `O_ORDERKEY` bigint(20) NOT NULL,
  `O_CUSTKEY` bigint(20) NOT NULL,
  `O_ORDERSTATUS` char(1) NOT NULL,
  `O_TOTALPRICE` decimal(15,2) NOT NULL,
  `O_ORDERDATE` date NOT NULL,
  `O_ORDERPRIORITY` char(15) NOT NULL,
  `O_CLERK` char(15) NOT NULL,
  `O_SHIPPRIORITY` bigint(20) NOT NULL,
  `O_COMMENT` varchar(79) NOT NULL,
  PRIMARY KEY (`O_ORDERKEY`) /*T![clustered_index] CLUSTERED */,
  KEY `idx1` (`O_CUSTKEY`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin
1 row in set (0.01 sec)

语句:explain analyze select /*+ INL_JOIN(a,b) / count() from customer a,orders b where a.c_phone=‘25-989-741-2988’ and a.c_custkey=b.o_custkey and a.C_COMMENT=b.O_COMMENT;
b.o_custkey是索引字段,b.o_comment不是索引字段,可以正常走IndexJoin算法。

mysql> explain analyze select /*+ INL_JOIN(a,b) */ count(*) from customer a,orders b where a.c_phone='25-989-741-2988' and a.c_custkey=b.o_custkey and a.C_COMMENT=b.O_COMMENT;
+--------------------------------------+---------+---------+-----------+--------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------+------+
| id                                   | estRows | actRows | task      | access object                  | execution info                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      | operator info                                                                                                                                                                                                                 | memory  | disk |
+--------------------------------------+---------+---------+-----------+--------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------+------+
| StreamAgg_10                         | 1.00    | 1       | root      |                                | time:16.8ms, loops:2, RU:2.519252                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   | funcs:count(1)->Column#18                                                                                                                                                                                                     | 8 Bytes | N/A  |
| └─IndexJoin_43                       | 1.04    | 0       | root      |                                | time:16.8ms, loops:1, inner:{total:14.9ms, concurrency:5, task:1, construct:11.9ms, fetch:2.97ms, build:28.2µs}, probe:2.48µs                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       | inner join, inner:IndexLookUp_42, outer key:tpch1.customer.c_custkey, inner key:tpch1.orders.o_custkey, equal cond:eq(tpch1.customer.c_comment, tpch1.orders.o_comment), eq(tpch1.customer.c_custkey, tpch1.orders.o_custkey) | 51.1 KB | N/A  |
|   ├─IndexLookUp_37(Build)            | 1.04    | 1       | root      |                                | time:1.56ms, loops:3, index_task: {total_time: 750.8µs, fetch_handle: 748.9µs, build: 728ns, wait: 1.12µs}, table_task: {total_time: 708.2µs, num: 1, concurrency: 5}, next: {wait_index: 889.4µs, wait_table_lookup_build: 0s, wait_table_lookup_resp: 648.9µs}                                                                                                                                                                                                                                                                                                                                                    |                                                                                                                                                                                                                               | 2.13 KB | N/A  |
|   │ ├─IndexRangeScan_35(Build)       | 1.04    | 1       | cop[tikv] | table:a, index:idx1(C_PHONE)   | time:747.4µs, loops:3, cop_task: {num: 1, max: 668.4µs, proc_keys: 1, tot_proc: 92.2µs, tot_wait: 93.2µs, rpc_num: 1, rpc_time: 646.1µs, copr_cache_hit_ratio: 0.00, build_task_duration: 17.9µs, max_distsql_concurrency: 1}, tikv_task:{time:0s, loops:1}, scan_detail: {total_process_keys: 1, total_process_keys_size: 55, total_keys: 2, get_snapshot_time: 24.2µs, rocksdb: {key_skipped_count: 1, block: {cache_hit_count: 2}}}                                                                                                                                                                              | range:["25-989-741-2988","25-989-741-2988"], keep order:false                                                                                                                                                                 | N/A     | N/A  |
|   │ └─TableRowIDScan_36(Probe)       | 1.04    | 1       | cop[tikv] | table:a                        | time:630.3µs, loops:2, cop_task: {num: 1, max: 548µs, proc_keys: 1, tot_proc: 57µs, tot_wait: 71µs, rpc_num: 1, rpc_time: 533.1µs, copr_cache_hit_ratio: 0.00, build_task_duration: 24µs, max_distsql_concurrency: 1, max_extra_concurrency: 1}, tikv_task:{time:0s, loops:1}, scan_detail: {total_process_keys: 1, total_process_keys_size: 184, total_keys: 1, get_snapshot_time: 14.4µs, rocksdb: {block: {cache_hit_count: 1}}}                                                                                                                                                                                 | keep order:false                                                                                                                                                                                                              | N/A     | N/A  |
|   └─IndexLookUp_42(Probe)            | 1.04    | 6       | root      |                                | time:2.81ms, loops:2, index_task: {total_time: 834.1µs, fetch_handle: 830.6µs, build: 804ns, wait: 2.64µs}, table_task: {total_time: 1.05ms, num: 1, concurrency: 5}, next: {wait_index: 1.71ms, wait_table_lookup_build: 98.5µs, wait_table_lookup_resp: 961.5µs}                                                                                                                                                                                                                                                                                                                                                  |                                                                                                                                                                                                                               | 9.34 KB | N/A  |
|     ├─IndexRangeScan_40(Build)       | 1.04    | 6       | cop[tikv] | table:b, index:idx1(O_CUSTKEY) | time:827.2µs, loops:3, cop_task: {num: 2, max: 713.5µs, min: 654µs, avg: 683.8µs, p95: 713.5µs, max_proc_keys: 6, p95_proc_keys: 6, tot_proc: 186µs, tot_wait: 255.8µs, rpc_num: 2, rpc_time: 1.33ms, copr_cache_hit_ratio: 0.00, build_task_duration: 814.7µs, max_distsql_concurrency: 2}, tikv_task:{proc max:0s, min:0s, avg: 0s, p80:0s, p95:0s, iters:2, tasks:2}, scan_detail: {total_process_keys: 6, total_process_keys_size: 276, total_keys: 8, get_snapshot_time: 86.8µs, rocksdb: {key_skipped_count: 6, block: {cache_hit_count: 1, read_count: 1, read_byte: 3.81 KB, read_time: 12.4µs}}}           | range: decided by [eq(tpch1.orders.o_custkey, tpch1.customer.c_custkey)], keep order:false                                                                                                                                    | N/A     | N/A  |
|     └─TableRowIDScan_41(Probe)       | 1.04    | 6       | cop[tikv] | table:b                        | time:944.2µs, loops:2, cop_task: {num: 3, max: 825.8µs, min: 0s, avg: 275.3µs, p95: 825.8µs, max_proc_keys: 3, p95_proc_keys: 3, tot_proc: 190.9µs, tot_wait: 280.1µs, rpc_num: 1, rpc_time: 810.6µs, copr_cache_hit_ratio: 0.00, build_task_duration: 29.8µs, max_distsql_concurrency: 1, max_extra_concurrency: 1, store_batch_num: 2}, tikv_task:{proc max:0s, min:0s, avg: 0s, p80:0s, p95:0s, iters:3, tasks:3}, scan_detail: {total_process_keys: 6, total_process_keys_size: 929, total_keys: 6, get_snapshot_time: 39.6µs, rocksdb: {block: {cache_hit_count: 6}}}                                          | keep order:false                                                                                                                                                                                                              | N/A     | N/A  |
+--------------------------------------+---------+---------+-----------+--------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------+------+
8 rows in set (0.02 sec)

当在 a.C_COMMENT=b.O_COMMENT 关联条件字段上(b.O_COMMENT)加上函数后则不走IndexJoin:

mysql> explain analyze select /*+ INL_JOIN(a,b) */ count(*) from customer a,orders b where a.c_phone='25-989-741-2988' and a.c_custkey=b.o_custkey and substr(a.C_COMMENT,1,10)=substr(b.O_COMMENT,1,10);
+----------------------------------------+------------+---------+-----------+------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------+----------+---------+
| id                                     | estRows    | actRows | task      | access object                | execution info                                                                                                                                                                                                                                                                                                                                                                                                                               | operator info                                                                                     | memory   | disk    |
+----------------------------------------+------------+---------+-----------+------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------+----------+---------+
| StreamAgg_13                           | 1.00       | 1       | root      |                              | time:345ms, loops:2, RU:2322.550974                                                                                                                                                                                                                                                                                                                                                                                                          | funcs:count(1)->Column#18                                                                         | 8 Bytes  | N/A     |
| └─HashJoin_27                          | 1.04       | 0       | root      |                              | time:345ms, loops:1, build_hash_table:{total:3.87ms, fetch:3.85ms, build:13.6µs}, probe:{concurrency:5, total:1.72s, max:344.8ms, probe:165.2ms, fetch:1.56s}                                                                                                                                                                                                                                                                                | inner join, equal:[eq(tpch1.customer.c_custkey, tpch1.orders.o_custkey) eq(Column#19, Column#20)] | 17.8 KB  | 0 Bytes |
|   ├─Projection_16(Build)               | 1.04       | 1       | root      |                              | time:3.84ms, loops:2, Concurrency:OFF                                                                                                                                                                                                                                                                                                                                                                                                        | tpch1.customer.c_custkey, substr(tpch1.customer.c_comment, 1, 10)->Column#19                      | 17.5 KB  | N/A     |
|   │ └─IndexLookUp_22                   | 1.04       | 1       | root      |                              | time:3.81ms, loops:2, index_task: {total_time: 1.06ms, fetch_handle: 1.05ms, build: 832ns, wait: 1.3µs}, table_task: {total_time: 2.55ms, num: 1, concurrency: 5}, next: {wait_index: 1.33ms, wait_table_lookup_build: 0s, wait_table_lookup_resp: 2.47ms}                                                                                                                                                                                   |                                                                                                   | 34.2 KB  | N/A     |
|   │   ├─IndexRangeScan_20(Build)       | 1.04       | 1       | cop[tikv] | table:a, index:idx1(C_PHONE) | time:1.05ms, loops:3, cop_task: {num: 1, max: 968.3µs, proc_keys: 1, tot_proc: 40.9µs, tot_wait: 398.6µs, rpc_num: 1, rpc_time: 955.2µs, copr_cache_hit_ratio: 0.00, build_task_duration: 11.9µs, max_distsql_concurrency: 1}, tikv_task:{time:0s, loops:1}, scan_detail: {total_process_keys: 1, total_process_keys_size: 55, total_keys: 2, get_snapshot_time: 11.1µs, rocksdb: {key_skipped_count: 1, block: {cache_hit_count: 2}}}       | range:["25-989-741-2988","25-989-741-2988"], keep order:false                                     | N/A      | N/A     |
|   │   └─TableRowIDScan_21(Probe)       | 1.04       | 1       | cop[tikv] | table:a                      | time:2.46ms, loops:2, cop_task: {num: 1, max: 2.34ms, proc_keys: 1, tot_proc: 55.4µs, tot_wait: 88.8µs, rpc_num: 1, rpc_time: 2.33ms, copr_cache_hit_ratio: 0.00, build_task_duration: 32.7µs, max_distsql_concurrency: 1, max_extra_concurrency: 1}, tikv_task:{time:0s, loops:1}, scan_detail: {total_process_keys: 1, total_process_keys_size: 184, total_keys: 1, get_snapshot_time: 19.3µs, rocksdb: {block: {cache_hit_count: 1}}}     | keep order:false                                                                                  | N/A      | N/A     |
|   └─Projection_23(Probe)               | 1500000.00 | 1500000 | root      |                              | time:331.8ms, loops:1471, Concurrency:5                                                                                                                                                                                                                                                                                                                                                                                                      | tpch1.orders.o_custkey, substr(tpch1.orders.o_comment, 1, 10)->Column#20                          | 495.7 KB | N/A     |
|     └─TableReader_25                   | 1500000.00 | 1500000 | root      |                              | time:92.3ms, loops:1471, cop_task: {num: 69, max: 94.7ms, min: 231.5µs, avg: 16.5ms, p95: 63ms, max_proc_keys: 50144, p95_proc_keys: 50144, tot_proc: 673.7ms, tot_wait: 23.6ms, rpc_num: 69, rpc_time: 1.13s, copr_cache_hit_ratio: 0.58, build_task_duration: 12µs, max_distsql_concurrency: 5}                                                                                                                                            | data:TableFullScan_24                                                                             | 15.4 MB  | N/A     |
|       └─TableFullScan_24               | 1500000.00 | 1500000 | cop[tikv] | table:b                      | tikv_task:{proc max:294ms, min:0s, avg: 61.3ms, p80:89ms, p95:259ms, iters:1737, tasks:69}, scan_detail: {total_process_keys: 893024, total_process_keys_size: 135281341, total_keys: 893053, get_snapshot_time: 1.02ms, rocksdb: {key_skipped_count: 893024, block: {cache_hit_count: 4574}}}                                                                                                                                               | keep order:false                                                                                  | N/A      | N/A     |
+----------------------------------------+------------+---------+-----------+------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------+----------+---------+
9 rows in set, 2 warnings (0.35 sec)

mysql> show warnings;
+---------+------+--------------------------------------------------------------------------------+
| Level   | Code | Message                                                                        |
+---------+------+--------------------------------------------------------------------------------+
| Warning | 1815 | Optimizer Hint /*+ INL_JOIN(a, b) */ or /*+ TIDB_INLJ(a, b) */ is inapplicable |
| Warning | 1815 | Optimizer Hint /*+ INL_JOIN(a, b) */ or /*+ TIDB_INLJ(a, b) */ is inapplicable |
+---------+------+--------------------------------------------------------------------------------+
2 rows in set (0.00 sec)

另外,对于a表(很小数据量)join b表(超大数据量)on a.date between (b.date,b.date+1) 这种关联条件都是非等值情况也不支持IndexJoin,但是如果支持IndexJoin,有一些场景下IndexJoin效率要比HashJoin高很多。参考帖子:TIDB 索引失效

希望IndexJoin算法可以支持更多形式的的On Condition条件。

1、希望支持关联条件全是非等值情况下,支持可走IndexJoin。
2、希望支持存在部分关联条件等值on condition(索引)情况下,均支持可走IndexJoin。

IndexJoin 不支持 substr(...) ,所以如果把 substr(...) 放到 Join Key 里面,就没法生成 IndexJoin,而只能走 HashJoin 了
–这个知识点是从哪里获取的呢?

简单来说这是从他脑子里获取的。这位大佬是tidb优化器核心设计和实现者。