当JOIN的ON等值条件包含字符串函数时不能走IndexJoin

【需求涉及的问题场景】select * from a,b where a.id=b.id (走索引) and substr(a.name,1,10)=substr(b.name,1,10),在指定hint的情况下无法走IndexJoin

【期望的需求行为】可以强制走IndexJoin

表结构:

mysql> show create table customer \G
*************************** 1. row ***************************
       Table: customer
Create Table: CREATE TABLE `customer` (
  `C_CUSTKEY` bigint(20) NOT NULL,
  `C_NAME` varchar(25) NOT NULL,
  `C_ADDRESS` varchar(40) NOT NULL,
  `C_NATIONKEY` bigint(20) NOT NULL,
  `C_PHONE` char(15) NOT NULL,
  `C_ACCTBAL` decimal(15,2) NOT NULL,
  `C_MKTSEGMENT` char(10) NOT NULL,
  `C_COMMENT` varchar(117) NOT NULL,
  PRIMARY KEY (`C_CUSTKEY`) /*T![clustered_index] CLUSTERED */,
  KEY `idx1` (`C_PHONE`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin
1 row in set (0.01 sec)

mysql> show create table orders \G
*************************** 1. row ***************************
       Table: orders
Create Table: CREATE TABLE `orders` (
  `O_ORDERKEY` bigint(20) NOT NULL,
  `O_CUSTKEY` bigint(20) NOT NULL,
  `O_ORDERSTATUS` char(1) NOT NULL,
  `O_TOTALPRICE` decimal(15,2) NOT NULL,
  `O_ORDERDATE` date NOT NULL,
  `O_ORDERPRIORITY` char(15) NOT NULL,
  `O_CLERK` char(15) NOT NULL,
  `O_SHIPPRIORITY` bigint(20) NOT NULL,
  `O_COMMENT` varchar(79) NOT NULL,
  PRIMARY KEY (`O_ORDERKEY`) /*T![clustered_index] CLUSTERED */,
  KEY `idx1` (`O_CUSTKEY`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin
1 row in set (0.01 sec)

语句:explain analyze select /*+ INL_JOIN(a,b) / count() from customer a,orders b where a.c_phone=‘25-989-741-2988’ and a.c_custkey=b.o_custkey and a.C_COMMENT=b.O_COMMENT;
b.o_custkey是索引字段,b.o_comment不是索引字段,可以正常走IndexJoin算法。

mysql> explain analyze select /*+ INL_JOIN(a,b) */ count(*) from customer a,orders b where a.c_phone='25-989-741-2988' and a.c_custkey=b.o_custkey and a.C_COMMENT=b.O_COMMENT;
+--------------------------------------+---------+---------+-----------+--------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------+------+
| id                                   | estRows | actRows | task      | access object                  | execution info                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      | operator info                                                                                                                                                                                                                 | memory  | disk |
+--------------------------------------+---------+---------+-----------+--------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------+------+
| StreamAgg_10                         | 1.00    | 1       | root      |                                | time:16.8ms, loops:2, RU:2.519252                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   | funcs:count(1)->Column#18                                                                                                                                                                                                     | 8 Bytes | N/A  |
| └─IndexJoin_43                       | 1.04    | 0       | root      |                                | time:16.8ms, loops:1, inner:{total:14.9ms, concurrency:5, task:1, construct:11.9ms, fetch:2.97ms, build:28.2µs}, probe:2.48µs                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       | inner join, inner:IndexLookUp_42, outer key:tpch1.customer.c_custkey, inner key:tpch1.orders.o_custkey, equal cond:eq(tpch1.customer.c_comment, tpch1.orders.o_comment), eq(tpch1.customer.c_custkey, tpch1.orders.o_custkey) | 51.1 KB | N/A  |
|   ├─IndexLookUp_37(Build)            | 1.04    | 1       | root      |                                | time:1.56ms, loops:3, index_task: {total_time: 750.8µs, fetch_handle: 748.9µs, build: 728ns, wait: 1.12µs}, table_task: {total_time: 708.2µs, num: 1, concurrency: 5}, next: {wait_index: 889.4µs, wait_table_lookup_build: 0s, wait_table_lookup_resp: 648.9µs}                                                                                                                                                                                                                                                                                                                                                    |                                                                                                                                                                                                                               | 2.13 KB | N/A  |
|   │ ├─IndexRangeScan_35(Build)       | 1.04    | 1       | cop[tikv] | table:a, index:idx1(C_PHONE)   | time:747.4µs, loops:3, cop_task: {num: 1, max: 668.4µs, proc_keys: 1, tot_proc: 92.2µs, tot_wait: 93.2µs, rpc_num: 1, rpc_time: 646.1µs, copr_cache_hit_ratio: 0.00, build_task_duration: 17.9µs, max_distsql_concurrency: 1}, tikv_task:{time:0s, loops:1}, scan_detail: {total_process_keys: 1, total_process_keys_size: 55, total_keys: 2, get_snapshot_time: 24.2µs, rocksdb: {key_skipped_count: 1, block: {cache_hit_count: 2}}}                                                                                                                                                                              | range:["25-989-741-2988","25-989-741-2988"], keep order:false                                                                                                                                                                 | N/A     | N/A  |
|   │ └─TableRowIDScan_36(Probe)       | 1.04    | 1       | cop[tikv] | table:a                        | time:630.3µs, loops:2, cop_task: {num: 1, max: 548µs, proc_keys: 1, tot_proc: 57µs, tot_wait: 71µs, rpc_num: 1, rpc_time: 533.1µs, copr_cache_hit_ratio: 0.00, build_task_duration: 24µs, max_distsql_concurrency: 1, max_extra_concurrency: 1}, tikv_task:{time:0s, loops:1}, scan_detail: {total_process_keys: 1, total_process_keys_size: 184, total_keys: 1, get_snapshot_time: 14.4µs, rocksdb: {block: {cache_hit_count: 1}}}                                                                                                                                                                                 | keep order:false                                                                                                                                                                                                              | N/A     | N/A  |
|   └─IndexLookUp_42(Probe)            | 1.04    | 6       | root      |                                | time:2.81ms, loops:2, index_task: {total_time: 834.1µs, fetch_handle: 830.6µs, build: 804ns, wait: 2.64µs}, table_task: {total_time: 1.05ms, num: 1, concurrency: 5}, next: {wait_index: 1.71ms, wait_table_lookup_build: 98.5µs, wait_table_lookup_resp: 961.5µs}                                                                                                                                                                                                                                                                                                                                                  |                                                                                                                                                                                                                               | 9.34 KB | N/A  |
|     ├─IndexRangeScan_40(Build)       | 1.04    | 6       | cop[tikv] | table:b, index:idx1(O_CUSTKEY) | time:827.2µs, loops:3, cop_task: {num: 2, max: 713.5µs, min: 654µs, avg: 683.8µs, p95: 713.5µs, max_proc_keys: 6, p95_proc_keys: 6, tot_proc: 186µs, tot_wait: 255.8µs, rpc_num: 2, rpc_time: 1.33ms, copr_cache_hit_ratio: 0.00, build_task_duration: 814.7µs, max_distsql_concurrency: 2}, tikv_task:{proc max:0s, min:0s, avg: 0s, p80:0s, p95:0s, iters:2, tasks:2}, scan_detail: {total_process_keys: 6, total_process_keys_size: 276, total_keys: 8, get_snapshot_time: 86.8µs, rocksdb: {key_skipped_count: 6, block: {cache_hit_count: 1, read_count: 1, read_byte: 3.81 KB, read_time: 12.4µs}}}           | range: decided by [eq(tpch1.orders.o_custkey, tpch1.customer.c_custkey)], keep order:false                                                                                                                                    | N/A     | N/A  |
|     └─TableRowIDScan_41(Probe)       | 1.04    | 6       | cop[tikv] | table:b                        | time:944.2µs, loops:2, cop_task: {num: 3, max: 825.8µs, min: 0s, avg: 275.3µs, p95: 825.8µs, max_proc_keys: 3, p95_proc_keys: 3, tot_proc: 190.9µs, tot_wait: 280.1µs, rpc_num: 1, rpc_time: 810.6µs, copr_cache_hit_ratio: 0.00, build_task_duration: 29.8µs, max_distsql_concurrency: 1, max_extra_concurrency: 1, store_batch_num: 2}, tikv_task:{proc max:0s, min:0s, avg: 0s, p80:0s, p95:0s, iters:3, tasks:3}, scan_detail: {total_process_keys: 6, total_process_keys_size: 929, total_keys: 6, get_snapshot_time: 39.6µs, rocksdb: {block: {cache_hit_count: 6}}}                                          | keep order:false                                                                                                                                                                                                              | N/A     | N/A  |
+--------------------------------------+---------+---------+-----------+--------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------+------+
8 rows in set (0.02 sec)

当在 a.C_COMMENT=b.O_COMMENT 关联条件字段上(b.O_COMMENT)加上函数后则不走IndexJoin:

mysql> explain analyze select /*+ INL_JOIN(a,b) */ count(*) from customer a,orders b where a.c_phone='25-989-741-2988' and a.c_custkey=b.o_custkey and substr(a.C_COMMENT,1,10)=substr(b.O_COMMENT,1,10);
+----------------------------------------+------------+---------+-----------+------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------+----------+---------+
| id                                     | estRows    | actRows | task      | access object                | execution info                                                                                                                                                                                                                                                                                                                                                                                                                               | operator info                                                                                     | memory   | disk    |
+----------------------------------------+------------+---------+-----------+------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------+----------+---------+
| StreamAgg_13                           | 1.00       | 1       | root      |                              | time:345ms, loops:2, RU:2322.550974                                                                                                                                                                                                                                                                                                                                                                                                          | funcs:count(1)->Column#18                                                                         | 8 Bytes  | N/A     |
| └─HashJoin_27                          | 1.04       | 0       | root      |                              | time:345ms, loops:1, build_hash_table:{total:3.87ms, fetch:3.85ms, build:13.6µs}, probe:{concurrency:5, total:1.72s, max:344.8ms, probe:165.2ms, fetch:1.56s}                                                                                                                                                                                                                                                                                | inner join, equal:[eq(tpch1.customer.c_custkey, tpch1.orders.o_custkey) eq(Column#19, Column#20)] | 17.8 KB  | 0 Bytes |
|   ├─Projection_16(Build)               | 1.04       | 1       | root      |                              | time:3.84ms, loops:2, Concurrency:OFF                                                                                                                                                                                                                                                                                                                                                                                                        | tpch1.customer.c_custkey, substr(tpch1.customer.c_comment, 1, 10)->Column#19                      | 17.5 KB  | N/A     |
|   │ └─IndexLookUp_22                   | 1.04       | 1       | root      |                              | time:3.81ms, loops:2, index_task: {total_time: 1.06ms, fetch_handle: 1.05ms, build: 832ns, wait: 1.3µs}, table_task: {total_time: 2.55ms, num: 1, concurrency: 5}, next: {wait_index: 1.33ms, wait_table_lookup_build: 0s, wait_table_lookup_resp: 2.47ms}                                                                                                                                                                                   |                                                                                                   | 34.2 KB  | N/A     |
|   │   ├─IndexRangeScan_20(Build)       | 1.04       | 1       | cop[tikv] | table:a, index:idx1(C_PHONE) | time:1.05ms, loops:3, cop_task: {num: 1, max: 968.3µs, proc_keys: 1, tot_proc: 40.9µs, tot_wait: 398.6µs, rpc_num: 1, rpc_time: 955.2µs, copr_cache_hit_ratio: 0.00, build_task_duration: 11.9µs, max_distsql_concurrency: 1}, tikv_task:{time:0s, loops:1}, scan_detail: {total_process_keys: 1, total_process_keys_size: 55, total_keys: 2, get_snapshot_time: 11.1µs, rocksdb: {key_skipped_count: 1, block: {cache_hit_count: 2}}}       | range:["25-989-741-2988","25-989-741-2988"], keep order:false                                     | N/A      | N/A     |
|   │   └─TableRowIDScan_21(Probe)       | 1.04       | 1       | cop[tikv] | table:a                      | time:2.46ms, loops:2, cop_task: {num: 1, max: 2.34ms, proc_keys: 1, tot_proc: 55.4µs, tot_wait: 88.8µs, rpc_num: 1, rpc_time: 2.33ms, copr_cache_hit_ratio: 0.00, build_task_duration: 32.7µs, max_distsql_concurrency: 1, max_extra_concurrency: 1}, tikv_task:{time:0s, loops:1}, scan_detail: {total_process_keys: 1, total_process_keys_size: 184, total_keys: 1, get_snapshot_time: 19.3µs, rocksdb: {block: {cache_hit_count: 1}}}     | keep order:false                                                                                  | N/A      | N/A     |
|   └─Projection_23(Probe)               | 1500000.00 | 1500000 | root      |                              | time:331.8ms, loops:1471, Concurrency:5                                                                                                                                                                                                                                                                                                                                                                                                      | tpch1.orders.o_custkey, substr(tpch1.orders.o_comment, 1, 10)->Column#20                          | 495.7 KB | N/A     |
|     └─TableReader_25                   | 1500000.00 | 1500000 | root      |                              | time:92.3ms, loops:1471, cop_task: {num: 69, max: 94.7ms, min: 231.5µs, avg: 16.5ms, p95: 63ms, max_proc_keys: 50144, p95_proc_keys: 50144, tot_proc: 673.7ms, tot_wait: 23.6ms, rpc_num: 69, rpc_time: 1.13s, copr_cache_hit_ratio: 0.58, build_task_duration: 12µs, max_distsql_concurrency: 5}                                                                                                                                            | data:TableFullScan_24                                                                             | 15.4 MB  | N/A     |
|       └─TableFullScan_24               | 1500000.00 | 1500000 | cop[tikv] | table:b                      | tikv_task:{proc max:294ms, min:0s, avg: 61.3ms, p80:89ms, p95:259ms, iters:1737, tasks:69}, scan_detail: {total_process_keys: 893024, total_process_keys_size: 135281341, total_keys: 893053, get_snapshot_time: 1.02ms, rocksdb: {key_skipped_count: 893024, block: {cache_hit_count: 4574}}}                                                                                                                                               | keep order:false                                                                                  | N/A      | N/A     |
+----------------------------------------+------------+---------+-----------+------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------+----------+---------+
9 rows in set, 2 warnings (0.35 sec)

mysql> show warnings;
+---------+------+--------------------------------------------------------------------------------+
| Level   | Code | Message                                                                        |
+---------+------+--------------------------------------------------------------------------------+
| Warning | 1815 | Optimizer Hint /*+ INL_JOIN(a, b) */ or /*+ TIDB_INLJ(a, b) */ is inapplicable |
| Warning | 1815 | Optimizer Hint /*+ INL_JOIN(a, b) */ or /*+ TIDB_INLJ(a, b) */ is inapplicable |
+---------+------+--------------------------------------------------------------------------------+
2 rows in set (0.00 sec)

另外,对于a表(很小数据量)join b表(超大数据量)on a.date between (b.date,b.date+1) 这种关联条件都是非等值情况也不支持IndexJoin,但是如果支持IndexJoin,有一些场景下IndexJoin效率要比HashJoin高很多。参考帖子:TIDB 索引失效

希望IndexJoin算法可以支持更多形式的的On Condition条件。

1、希望支持关联条件全是非等值情况下,支持可走IndexJoin。
2、希望支持存在部分关联条件等值on condition(索引)情况下,均支持可走IndexJoin。

简化测试:

--表结构:
create table a(id int,name varchar(100));
create table b(id int,name varchar(100));
alter table b add index(id);
insert into a values (1,'1'),(1,'11'),(2,'2');
insert into b values (1,'1'),(1,'22'),(2,'2');
--统计信息搜集
analyze table a;
analyze table b;

支持IndexJoin
explain select /*+ INL_JOIN(a,b) / * from a,b where a.id=b.id and a.name=b.name;
explain select /
+ INL_JOIN(a,b) / * from a,b where a.id=b.id and a.name<substr(b.name,1,10);
explain select /
+ INL_JOIN(a,b) / * from a,b where a.id=b.id and a.name!=substr(b.name,1,10);
不支持IndexJoin
explain select /
+ INL_JOIN(a,b) / * from a,b where a.id=b.id and a.name=substr(b.name,1,10);
explain select /
+ INL_JOIN(a,b) */ * from a,b where a.id < b.id;

奇怪的是a.name<substr(b.name,1,10) 支持IndexJoin,等值时候确不支持。
具体执行计划如下:

--支持IndexJoin情况
mysql> explain select /*+ INL_JOIN(a,b) */ * from a,b where a.id=b.id and a.name=b.name;
+------------------------------+---------+-----------+-----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------+
| id                           | estRows | task      | access object         | operator info                                                                                                                                 |
+------------------------------+---------+-----------+-----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------+
| IndexJoin_13                 | 6.00    | root      |                       | inner join, inner:IndexLookUp_12, outer key:test.a.id, inner key:test.b.id, equal cond:eq(test.a.id, test.b.id), eq(test.a.name, test.b.name) |
| ├─TableReader_25(Build)      | 6.00    | root      |                       | data:Selection_24                                                                                                                             |
| │ └─Selection_24             | 6.00    | cop[tikv] |                       | not(isnull(test.a.id)), not(isnull(test.a.name))                                                                                              |
| │   └─TableFullScan_23       | 6.00    | cop[tikv] | table:a               | keep order:false                                                                                                                              |
| └─IndexLookUp_12(Probe)      | 6.00    | root      |                       |                                                                                                                                               |
|   ├─Selection_10(Build)      | 6.00    | cop[tikv] |                       | not(isnull(test.b.id))                                                                                                                        |
|   │ └─IndexRangeScan_8       | 6.00    | cop[tikv] | table:b, index:id(id) | range: decided by [eq(test.b.id, test.a.id)], keep order:false                                                                                |
|   └─Selection_11(Probe)      | 6.00    | cop[tikv] |                       | not(isnull(test.b.name))                                                                                                                      |
|     └─TableRowIDScan_9       | 6.00    | cop[tikv] | table:b               | keep order:false                                                                                                                              |
+------------------------------+---------+-----------+-----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------+
9 rows in set (0.06 sec)

mysql> explain select /*+ INL_JOIN(a,b) */ * from a,b where a.id=b.id and a.name<substr(b.name,1,10);
+---------------------------------+---------+-----------+-----------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| id                              | estRows | task      | access object         | operator info                                                                                                                                                           |
+---------------------------------+---------+-----------+-----------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| IndexJoin_12                    | 9.00    | root      |                       | inner join, inner:IndexLookUp_11, outer key:test.a.id, inner key:test.b.id, equal cond:eq(test.a.id, test.b.id), other cond:lt(test.a.name, substr(test.b.name, 1, 10)) |
| ├─TableReader_24(Build)         | 6.00    | root      |                       | data:Selection_23                                                                                                                                                       |
| │ └─Selection_23                | 6.00    | cop[tikv] |                       | not(isnull(test.a.id))                                                                                                                                                  |
| │   └─TableFullScan_22          | 6.00    | cop[tikv] | table:a               | keep order:false                                                                                                                                                        |
| └─IndexLookUp_11(Probe)         | 9.00    | root      |                       |                                                                                                                                                                         |
|   ├─Selection_10(Build)         | 9.00    | cop[tikv] |                       | not(isnull(test.b.id))                                                                                                                                                  |
|   │ └─IndexRangeScan_8          | 9.00    | cop[tikv] | table:b, index:id(id) | range: decided by [eq(test.b.id, test.a.id)], keep order:false                                                                                                          |
|   └─TableRowIDScan_9(Probe)     | 9.00    | cop[tikv] | table:b               | keep order:false                                                                                                                                                        |
+---------------------------------+---------+-----------+-----------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
8 rows in set (0.01 sec)

mysql> explain select /*+ INL_JOIN(a,b) */ * from a,b where a.id=b.id and a.name!=substr(b.name,1,10);
+---------------------------------+---------+-----------+-----------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| id                              | estRows | task      | access object         | operator info                                                                                                                                                           |
+---------------------------------+---------+-----------+-----------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| IndexJoin_12                    | 9.00    | root      |                       | inner join, inner:IndexLookUp_11, outer key:test.a.id, inner key:test.b.id, equal cond:eq(test.a.id, test.b.id), other cond:ne(test.a.name, substr(test.b.name, 1, 10)) |
| ├─TableReader_24(Build)         | 6.00    | root      |                       | data:Selection_23                                                                                                                                                       |
| │ └─Selection_23                | 6.00    | cop[tikv] |                       | not(isnull(test.a.id))                                                                                                                                                  |
| │   └─TableFullScan_22          | 6.00    | cop[tikv] | table:a               | keep order:false                                                                                                                                                        |
| └─IndexLookUp_11(Probe)         | 9.00    | root      |                       |                                                                                                                                                                         |
|   ├─Selection_10(Build)         | 9.00    | cop[tikv] |                       | not(isnull(test.b.id))                                                                                                                                                  |
|   │ └─IndexRangeScan_8          | 9.00    | cop[tikv] | table:b, index:id(id) | range: decided by [eq(test.b.id, test.a.id)], keep order:false                                                                                                          |
|   └─TableRowIDScan_9(Probe)     | 9.00    | cop[tikv] | table:b               | keep order:false                                                                                                                                                        |
+---------------------------------+---------+-----------+-----------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
8 rows in set (0.00 sec)

--不支持IndexJoin情况

mysql> explain select /*+ INL_JOIN(a,b) */ * from a,b where a.id=b.id and a.name=substr(b.name,1,10);
+------------------------------+---------+-----------+---------------+------------------------------------------------------------------------+
| id                           | estRows | task      | access object | operator info                                                          |
+------------------------------+---------+-----------+---------------+------------------------------------------------------------------------+
| HashJoin_12                  | 6.00    | root      |               | inner join, equal:[eq(test.a.id, test.b.id) eq(test.a.name, Column#7)] |
| ├─TableReader_15(Build)      | 6.00    | root      |               | data:Selection_14                                                      |
| │ └─Selection_14             | 6.00    | cop[tikv] |               | not(isnull(test.a.id))                                                 |
| │   └─TableFullScan_13       | 6.00    | cop[tikv] | table:a       | keep order:false                                                       |
| └─Projection_16(Probe)       | 6.00    | root      |               | test.b.id, test.b.name, substr(test.b.name, 1, 10)->Column#7           |
|   └─TableReader_19           | 6.00    | root      |               | data:Selection_18                                                      |
|     └─Selection_18           | 6.00    | cop[tikv] |               | not(isnull(test.b.id))                                                 |
|       └─TableFullScan_17     | 6.00    | cop[tikv] | table:b       | keep order:false                                                       |
+------------------------------+---------+-----------+---------------+------------------------------------------------------------------------+
8 rows in set, 1 warning (0.00 sec)

mysql> show warnings;
+---------+------+--------------------------------------------------------------------------------+
| Level   | Code | Message                                                                        |
+---------+------+--------------------------------------------------------------------------------+
| Warning | 1815 | Optimizer Hint /*+ INL_JOIN(a, b) */ or /*+ TIDB_INLJ(a, b) */ is inapplicable |
+---------+------+--------------------------------------------------------------------------------+
1 row in set (0.00 sec)

mysql> explain select /*+ INL_JOIN(a,b) */ * from a,b where a.id < b.id;
+------------------------------+---------+-----------+---------------+-----------------------------------------------------------+
| id                           | estRows | task      | access object | operator info                                             |
+------------------------------+---------+-----------+---------------+-----------------------------------------------------------+
| HashJoin_8                   | 36.00   | root      |               | CARTESIAN inner join, other cond:lt(test.a.id, test.b.id) |
| ├─TableReader_15(Build)      | 6.00    | root      |               | data:Selection_14                                         |
| │ └─Selection_14             | 6.00    | cop[tikv] |               | not(isnull(test.b.id))                                    |
| │   └─TableFullScan_13       | 6.00    | cop[tikv] | table:b       | keep order:false                                          |
| └─TableReader_12(Probe)      | 6.00    | root      |               | data:Selection_11                                         |
|   └─Selection_11             | 6.00    | cop[tikv] |               | not(isnull(test.a.id))                                    |
|     └─TableFullScan_10       | 6.00    | cop[tikv] | table:a       | keep order:false                                          |
+------------------------------+---------+-----------+---------------+-----------------------------------------------------------+
7 rows in set, 1 warning (0.00 sec)

mysql> show warnings;
+---------+------+------------------------------------------------------------------------------------------------------------------+
| Level   | Code | Message                                                                                                          |
+---------+------+------------------------------------------------------------------------------------------------------------------+
| Warning | 1815 | Optimizer Hint /*+ INL_JOIN(a, b) */ or /*+ TIDB_INLJ(a, b) */ is inapplicable without column equal ON condition |
+---------+------+------------------------------------------------------------------------------------------------------------------+
1 row in set (0.01 sec)

能否支持下禁止 SQL 改写的Hint语法?比如/*+ no_rewrite */ 。
在无法从IndexJoin时,想先形成一个子查询,然后对函数字段过关联过滤,改写如下:

select * from (select /*+ USE_NL(b,a) */ a.id a_id,a.name a_name,b.id b_id,b.name b_name from a,b where a.id=b.id)c where c.a_name=substr(c.b_name,1,10);

但是优化器总是给我将条件下推,如果可以用/*+ no_rewrite */ 禁止改写SQL那么可能就走到了我想要的结果了

select /*+ no_rewrite */ * from (select /*+ INL_JOIN(a,b)  */ a.id a_id,a.name a_name,b.id b_id,b.name b_name from a,b where a.id=b.id) c where c.a_name=substr(c.b_name,1,10);

能否支持下禁止SQL改写的Hint呢?

select * from a,b where a.id=b.id (走索引) and substr(a.name,1,10)=substr(b.name,1,10) 这种哪个数据库都走不了

Tidb中 IndexJoin有一些限制 官方建议预计需要连接的行数较少(一般小于 1 万行),推荐使用 Index Join 算法。Index Join 算法对内存消耗较小,但如果需要执行大量探查操作,运行速度可能会慢于其他 Join 算法如果 Build 端的数据量比 Probe 端小,且 Probe 端的数据已预先建立了索引,那么这种情况下 Index Join 算法效率更高

能举一个不能走的例子吗???

我举两个可以走的,MySQL和OceanBase社区版都可以走。

MySQL:

--表结构:
create table a(id int,name varchar(100));
create table b(id int,name varchar(100));
alter table b add index(id);
insert into a values (1,'1'),(1,'11'),(2,'2');
insert into b values (1,'1'),(1,'22'),(2,'2');
mysql> select version();
+-----------+
| version() |
+-----------+
| 8.0.30    |
+-----------+
1 row in set (0.00 sec)

--TiDB中不能走,MySQL中可以走:
mysql> explain analyze select /*+ NO_BNL(a,b) */ * from a,b where a.id=b.id and a.name=substr(b.name,1,10);
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| -> Nested loop inner join  (cost=2.12 rows=4) (actual time=0.036..0.046 rows=2 loops=1)
    -> Filter: (a.id is not null)  (cost=0.55 rows=3) (actual time=0.019..0.021 rows=3 loops=1)
        -> Table scan on a  (cost=0.55 rows=3) (actual time=0.018..0.020 rows=3 loops=1)
    -> Filter: (a.`name` = substr(b.`name`,1,10))  (cost=0.42 rows=2) (actual time=0.006..0.008 rows=1 loops=3)
        -> Index lookup on b using id (id=a.id)  (cost=0.42 rows=2) (actual time=0.005..0.006 rows=2 loops=3)
 |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

OceanBase

--表结构:
create table a(id int,name varchar(100));
create table b(id int,name varchar(100));
alter table b add index(id);
insert into a values (1,'1'),(1,'11'),(2,'2');
insert into b values (1,'1'),(1,'22'),(2,'2');
obclient [test]> select version();
+------------------------------+
| version()                    |
+------------------------------+
| 5.7.25-OceanBase_CE-v4.2.0.0 |
+------------------------------+
1 row in set (0.001 sec)
--TiDB中不能走,OceanBase中可以走:
obclient [test]> explain extended pretty_color select /*+ USE_NL(a,b) leading(a,b) */ * from a,b where a.id=b.id and a.name=substr(b.name,1,10);
+------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Query Plan                                                                                                                                                       |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| ===============================================================                                                                                                  |
| |ID|OPERATOR                      |NAME |EST.ROWS|EST.TIME(us)|                                                                                                  |
| ---------------------------------------------------------------                                                                                                  |
| |0 |NESTED-LOOP JOIN              |     |2       |71          |                                                                                                  |
| |1 |├─TABLE FULL SCAN             |a    |3       |4           |                                                                                         |
| |2 |└─DISTRIBUTED TABLE RANGE SCAN|b(id)|1       |22          |                                                                                         |
| ===============================================================                                                                                                  |
| Outputs & filters:                                                                                                                                               |
| -------------------------------------                                                                                                                            |
|   0 - output([a.id(0x7f4b8ce40f50)], [a.name(0x7f4b8ce431e0)], [b.id(0x7f4b8ce41230)], [b.name(0x7f4b8ce434c0)]), filter(nil), rowset=256                        |
|       conds(nil), nl_params_([a.id(0x7f4b8ce40f50)(:0)], [a.name(0x7f4b8ce431e0)(:1)]), use_batch=true                                                           |
|   1 - output([a.id(0x7f4b8ce40f50)], [a.name(0x7f4b8ce431e0)]), filter(nil), rowset=256                                                                          |
|       access([a.id(0x7f4b8ce40f50)], [a.name(0x7f4b8ce431e0)]), partitions(p0)                                                                                   |
|       is_index_back=false, is_global_index=false,                                                                                                                |
|       range_key([a.__pk_increment(0x7f4b8ce44470)]), range(MIN ; MAX)always true                                                                                 |
|   2 - output([b.id(0x7f4b8ce41230)], [b.name(0x7f4b8ce434c0)]), filter([:1 = substr(b.name(0x7f4b8ce434c0), 1, 10)(0x7f4b8ce41a90)(0x7f4b8cec23a0)]), rowset=256 |
|       access([GROUP_ID(0x7f4b8cef0150)], [b.__pk_increment(0x7f4b8ce44740)], [b.id(0x7f4b8ce41230)], [b.name(0x7f4b8ce434c0)]), partitions(p0)                   |
|       is_index_back=true, is_global_index=false, filter_before_indexback[false],                                                                                 |
|       range_key([b.id(0x7f4b8ce41230)], [b.__pk_increment(0x7f4b8ce44740)]), range(MIN ; MAX),                                                                   |
|       range_cond([:0 = b.id(0x7f4b8ce41230)(0x7f4b8cec18b0)])                                                                                                    |
| Used Hint:                                                                                                                                                       |
| -------------------------------------                                                                                                                            |
|   /*+                                                                                                                                                            |
|                                                                                                                                                                  |
|       LEADING(("a" "b"))                                                                                                                                         |
|       USE_NL("b")                                                                                                                                                |
|   */                                                                                                                                                             |
| Qb name trace:                                                                                                                                                   |
| -------------------------------------                                                                                                                            |
|   stmt_id:0, stmt_type:T_EXPLAIN                                                                                                                                 |
|   stmt_id:1, SEL$1                                                                                                                                               |
| Outline Data:                                                                                                                                                    |
| -------------------------------------                                                                                                                            |
|   /*+                                                                                                                                                            |
|       BEGIN_OUTLINE_DATA                                                                                                                                         |
|       LEADING(@"SEL$1" ("test"."a"@"SEL$1" "test"."b"@"SEL$1"))                                                                                                  |
|       USE_NL(@"SEL$1" "test"."b"@"SEL$1")                                                                                                                        |
|       FULL(@"SEL$1" "test"."a"@"SEL$1")                                                                                                                          |
|       INDEX(@"SEL$1" "test"."b"@"SEL$1" "id")                                                                                                                    |
|       USE_DAS(@"SEL$1" "test"."b"@"SEL$1")                                                                                                                       |
|       OPTIMIZER_FEATURES_ENABLE('4.0.0.0')                                                                                                                       |
|       END_OUTLINE_DATA                                                                                                                                           |
|   */                                                                                                                                                             |
| Optimization Info:                                                                                                                                               |
| -------------------------------------                                                                                                                            |
|   a:                                                                                                                                                             |
|       table_rows:3                                                                                                                                               |
|       physical_range_rows:3                                                                                                                                      |
|       logical_range_rows:3                                                                                                                                       |
|       index_back_rows:0                                                                                                                                          |
|       output_rows:3                                                                                                                                              |
|       table_dop:1                                                                                                                                                |
|       dop_method:Table DOP                                                                                                                                       |
|       avaiable_index_name:[a]                                                                                                                                    |
|       stats version:0                                                                                                                                            |
|       dynamic sampling level:1                                                                                                                                   |
|   b:                                                                                                                                                             |
|       table_rows:3                                                                                                                                               |
|       physical_range_rows:1                                                                                                                                      |
|       logical_range_rows:1                                                                                                                                       |
|       index_back_rows:1                                                                                                                                          |
|       output_rows:0                                                                                                                                              |
|       table_dop:1                                                                                                                                                |
|       dop_method:DAS DOP                                                                                                                                         |
|       avaiable_index_name:[id, b]                                                                                                                                |
|       unstable_index_name:[b]                                                                                                                                    |
|       stats version:0                                                                                                                                            |
|       dynamic sampling level:1                                                                                                                                   |
|   Plan Type:                                                                                                                                                     |
|       LOCAL                                                                                                                                                      |
|   Note:                                                                                                                                                          |
|       Degree of Parallelisim is 1 because of table property                                                                                                      |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------+
72 rows in set (0.004 sec)

我们这边看了下,不能走 IndexJoin 的原因是现在优化器会把所有的等值条件用做 Join Key,也包含这里的 substr(...),但是又因为 IndexJoin 不支持 substr(...),所以如果把 substr(...) 放到 Join Key 里面,就没法生成 IndexJoin,而只能走 HashJoin 了。
实际上一个更好的处理方式是把 substr(...) 不放到 Join Key 里面,放到 Other Key 里面,比如如下图这样,这样就能用上 IndexJoin 了:

我们这边记录了一个 issue,我们会在下一个 LTS v7.5 之前修复这个问题:https://github.com/pingcap/tidb/issues/47233 @人如其名

4 个赞

IndexJoin 不支持 substr(...) ,所以如果把 substr(...) 放到 Join Key 里面,就没法生成 IndexJoin,而只能走 HashJoin 了
–这个知识点是从哪里获取的呢?

简单来说这是从他脑子里获取的。这位大佬是tidb优化器核心设计和实现者。

:exploding_head: 好吧好吧。还想着去深入学习一下呢,目前可没那么多精力看源码

此话题已在最后回复的 60 天后被自动关闭。不再允许新回复。