一条 like 条件的慢 SQL 语句优化

【是否原创】是
【首发渠道】TiDB 社区

问题背景

遇到下面这条慢 sql,查询计划走了高效的唯一索引,满足条件的行数只有 10 行,但是耗时却达到了 10s。

mysql> select name, uid, size, etag from objects where bucket_id = '.bucket.meta.cmu-bucket01' and name like 'dir_abc4999/%'  order by name asc limit 501;                                 
+----------------------------------+--------+-------+----------------------------------+
| name                             | uid    | size  | etag                             |
+----------------------------------+--------+-------+----------------------------------+
| dir_abc4999/obj_20211027_1_0.txt | test01 | 61441 | 79702d022dafea0173dc9bcbce1b562e |
| dir_abc4999/obj_20211027_1_1.txt | test01 | 61441 | 79702d022dafea0173dc9bcbce1b562e |
| dir_abc4999/obj_20211027_1_2.txt | test01 | 61441 | 79702d022dafea0173dc9bcbce1b562e |
| dir_abc4999/obj_20211027_1_3.txt | test01 | 61441 | 79702d022dafea0173dc9bcbce1b562e |
| dir_abc4999/obj_20211027_1_4.txt | test01 | 61441 | 79702d022dafea0173dc9bcbce1b562e |
| dir_abc4999/obj_20211027_1_5.txt | test01 | 61441 | 79702d022dafea0173dc9bcbce1b562e |
| dir_abc4999/obj_20211027_1_6.txt | test01 | 61441 | 79702d022dafea0173dc9bcbce1b562e |
| dir_abc4999/obj_20211027_1_7.txt | test01 | 61441 | 79702d022dafea0173dc9bcbce1b562e |
| dir_abc4999/obj_20211027_1_8.txt | test01 | 61441 | 79702d022dafea0173dc9bcbce1b562e |
| dir_abc4999/obj_20211027_1_9.txt | test01 | 61441 | 79702d022dafea0173dc9bcbce1b562e |
+----------------------------------+--------+-------+----------------------------------+
10 rows in set (10.08 sec)

表有一个唯一性的联合索引,可以完好地覆盖查询条件列 bucket_id 和 name,从表结构和 sql 语句第一印象是这条 sql 应该是非常快的,但是却出现了上面的结果。

mysql> show create table objects\G
*************************** 1. row ***************************
       Table: objects
Create Table: CREATE TABLE `objects` (
  `id` bigint(20) NOT NULL /*T![auto_rand] AUTO_RANDOM(5) */,
  `bucket_id` varchar(192) NOT NULL,
  `name` varchar(512) NOT NULL,
  `version_id` varchar(64) NOT NULL,
  ...
  ...
  PRIMARY KEY (`id`),
  UNIQUE KEY `idx` (`bucket_id`,`name`,`version_id`),
  KEY `cluster_id` (`cluster_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin /*T![auto_rand_base] AUTO_RANDOM_BASE=2313570001 */
1 row in set (0.00 sec)

问题排查

看查询计划好像没有太大问题,走了联合索引做范围查找,最后回表查询返回结果,按道理这个执行计划非常好,那问题在哪里呢。

mysql> explain select name, uid, size, etag from objects where bucket_id = '.bucket.meta.cmu-bucket01' and name like 'dir_abc4999/%'  order by name asc limit 501\G
*************************** 1. row ***************************
           id: Projection_7
      estRows: 501.00
         task: root
access object: 
operator info: ogw.objects.name, ogw.objects.uid, ogw.objects.size, ogw.objects.etag
*************************** 2. row ***************************
           id: └─Projection_35
      estRows: 501.00
         task: root
access object: 
operator info: ogw.objects.bucket_id, ogw.objects.name, ogw.objects.uid, ogw.objects.size, ogw.objects.etag
*************************** 3. row ***************************
           id:   └─IndexLookUp_34
      estRows: 501.00
         task: root
access object: 
operator info: limit embedded(offset:0, count:501)
*************************** 4. row ***************************
           id:     ├─Limit_33(Build)
      estRows: 501.00
         task: cop[tikv]
access object: 
operator info: offset:0, count:501
*************************** 5. row ***************************
           id:     │ └─Selection_32
      estRows: 501.00
         task: cop[tikv]
access object: 
operator info: like(ogw.objects.name, "dir_abc4999/%", 92)
*************************** 6. row ***************************
           id:     │   └─IndexRangeScan_30
      estRows: 501.00
         task: cop[tikv]
access object: table:objects, index:idx(bucket_id, name, version_id)
operator info: range:(".bucket.meta.cmu-bucket01" "dir",".bucket.meta.cmu-bucket01" "dis"), keep order:true
*************************** 7. row ***************************
           id:     └─TableRowIDScan_31(Probe)
      estRows: 501.00
         task: cop[tikv]
access object: table:objects
operator info: keep order:false, stats:pseudo
7 rows in set (0.00 sec)

唯一奇怪的在于上面执行计划中的第 6 行 IndexRangeScan_30 中,like 算子做范围查询的范围,计划里选择的是 [“dir”, “dis”],按道理应该是 [“dir_abc4999/”, “dir_abc49990”] 才对呀(因为 ‘/’ 下一个字符是 ‘0’),满足 [“dir”, “dis”] 范围的行数竟然达到 2 亿+ 条,难怪会这么慢。

mysql> select count(1) from objects where bucket_id = '.bucket.meta.cmu-bucket01' and name >= 'dir' and name < 'dis';
+-----------+
| count(1)  |
+-----------+
| 233596134 |
+-----------+
1 row in set (7.21 sec)

从这里不免怀疑是 tidb 执行计划这里出了问题,错误的选择了 index 的 range 范围,导致扫描了不必要的 keys,难道是 tidb 执行计划的 bug 问题?

改写 sql,将出问题的 sql 改写为它的等价 sql,耗时 0.00 秒!!

mysql> select name, uid, size, etag from objects where bucket_id = '.bucket.meta.cmu-bucket01' and name >= 'dir_abc4999/' and name <=
'dir_abc49990'  order by name asc limit 501;  
+----------------------------------+--------+-------+----------------------------------+
| name                             | uid    | size  | etag                             |
+----------------------------------+--------+-------+----------------------------------+
| dir_abc4999/obj_20211027_1_0.txt | test01 | 61441 | 79702d022dafea0173dc9bcbce1b562e |
| dir_abc4999/obj_20211027_1_1.txt | test01 | 61441 | 79702d022dafea0173dc9bcbce1b562e |
| dir_abc4999/obj_20211027_1_2.txt | test01 | 61441 | 79702d022dafea0173dc9bcbce1b562e |
| dir_abc4999/obj_20211027_1_3.txt | test01 | 61441 | 79702d022dafea0173dc9bcbce1b562e |
| dir_abc4999/obj_20211027_1_4.txt | test01 | 61441 | 79702d022dafea0173dc9bcbce1b562e |
| dir_abc4999/obj_20211027_1_5.txt | test01 | 61441 | 79702d022dafea0173dc9bcbce1b562e |
| dir_abc4999/obj_20211027_1_6.txt | test01 | 61441 | 79702d022dafea0173dc9bcbce1b562e |
| dir_abc4999/obj_20211027_1_7.txt | test01 | 61441 | 79702d022dafea0173dc9bcbce1b562e |
| dir_abc4999/obj_20211027_1_8.txt | test01 | 61441 | 79702d022dafea0173dc9bcbce1b562e |
| dir_abc4999/obj_20211027_1_9.txt | test01 | 61441 | 79702d022dafea0173dc9bcbce1b562e |
+----------------------------------+--------+-------+----------------------------------+
10 rows in set (0.00 sec)

查看改写后 sql 的执行计划,执行计划相同,但是这里联合索引 name 列的范围变为了 [“dir_abc4999/”, “dir_abc49990”],这里扫描的 keys 数量紧紧为 10 个。

mysql> explain select name, uid, size, etag from objects where bucket_id = '.bucket.meta.cmu-bucket01' and name >= 'dir_abc4999/' and
name <= 'dir_abc49990'  order by name asc limit 501\G
*************************** 1. row ***************************
           id: Projection_7
      estRows: 1.00
         task: root
access object: 
operator info: ogw.objects.name, ogw.objects.uid, ogw.objects.size, ogw.objects.etag
*************************** 2. row ***************************
           id: └─Projection_32
      estRows: 1.00
         task: root
access object: 
operator info: ogw.objects.bucket_id, ogw.objects.name, ogw.objects.uid, ogw.objects.size, ogw.objects.etag
*************************** 3. row ***************************
           id:   └─IndexLookUp_31
      estRows: 1.00
         task: root
access object: 
operator info: limit embedded(offset:0, count:501)
*************************** 4. row ***************************
           id:     ├─Limit_30(Build)
      estRows: 1.00
         task: cop[tikv]
access object: 
operator info: offset:0, count:501
*************************** 5. row ***************************
           id:     │ └─IndexRangeScan_28
      estRows: 1.00
         task: cop[tikv]
access object: table:objects, index:idx(bucket_id, name, version_id)
operator info: range:[".bucket.meta.cmu-bucket01" "dir_abc4999/",".bucket.meta.cmu-bucket01" "dir_abc49990"], keep order:true
*************************** 6. row ***************************
           id:     └─TableRowIDScan_29(Probe)
      estRows: 1.00
         task: cop[tikv]
access object: table:objects
operator info: keep order:false, stats:pseudo
6 rows in set (0.00 sec)

看到这里似乎找到了问题的原因,是 tidb 的 like 算子不准,导致执行计划扫描过多的数据,tidb 在这里存在 bug,通过改写这个 like sql 到后面那个来规避这个问题吧!!!!

真实原因

关于 sql 的 like 语句,’_’ 符号竟然是个通配符!!!!

With LIKE you can use the following two wildcard characters in the pattern:

  • % matches any number of characters, even zero characters.
  • _ matches exactly one character.

mysql 官方文档关于 like 语法的描述:链接

也就是说 name like ‘dir_abc4999/%’ 的条件中,’_‘ 可以为任何字符,这样的话匹配的前缀只有 “dir",tidb 选择的 range 范围恰好是正确的!!!

这里业务错误的拿 ’_‘ 符号来进行精确匹配,出现了误用 like 语法中特殊字符的问题。

这里将 ’_’ 进行转义,问题同样也就可以解决了:

mysql> select name, uid, size, etag from objects where bucket_id = '.bucket.meta.cmu-bucket01' and name like 'dir\_abc4999/%'  order by name asc limit 501;
+----------------------------------+--------+-------+----------------------------------+
| name                             | uid    | size  | etag                             |
+----------------------------------+--------+-------+----------------------------------+
| dir_abc4999/obj_20211027_1_0.txt | test01 | 61441 | 79702d022dafea0173dc9bcbce1b562e |
| dir_abc4999/obj_20211027_1_1.txt | test01 | 61441 | 79702d022dafea0173dc9bcbce1b562e |
| dir_abc4999/obj_20211027_1_2.txt | test01 | 61441 | 79702d022dafea0173dc9bcbce1b562e |
| dir_abc4999/obj_20211027_1_3.txt | test01 | 61441 | 79702d022dafea0173dc9bcbce1b562e |
| dir_abc4999/obj_20211027_1_4.txt | test01 | 61441 | 79702d022dafea0173dc9bcbce1b562e |
| dir_abc4999/obj_20211027_1_5.txt | test01 | 61441 | 79702d022dafea0173dc9bcbce1b562e |
| dir_abc4999/obj_20211027_1_6.txt | test01 | 61441 | 79702d022dafea0173dc9bcbce1b562e |
| dir_abc4999/obj_20211027_1_7.txt | test01 | 61441 | 79702d022dafea0173dc9bcbce1b562e |
| dir_abc4999/obj_20211027_1_8.txt | test01 | 61441 | 79702d022dafea0173dc9bcbce1b562e |
| dir_abc4999/obj_20211027_1_9.txt | test01 | 61441 | 79702d022dafea0173dc9bcbce1b562e |
+----------------------------------+--------+-------+----------------------------------+
10 rows in set (0.00 sec)
1赞