【SOP 系列 23】创建了SPM,但是程序执行的SQL没有按照强制走索引

【是否原创】是
【首发渠道】TiDB 社区
【正文】

1、环境

TiDB版本:v5.0.4
本文来源于案例:https://asktug.com/t/topic/243043/26

2、场景

程序运行过程中,TiDB大量慢SQL,查看dashboard的慢SQL执行计划,发现走的错误索引,将该慢SQL复制出来到客户端运行正常,查询确实走了SPM。

总结:使用SPM bingding了SQL,但是出现了大量的SQL,查看dashboard的慢SQL执行计划,发现没有走SPM,但是从dashboard复制出现的SQL到客户端执行,又是可以走SMP的。

3、排查分析

3.1、基础信息

表Schema:

CREATE TABLE `xxxx_extra` (
  `id` varchar(50) NOT NULL COMMENT '主键',
  `merchant_id` varchar(50) DEFAULT NULL,
  `shop_id` varchar(50) DEFAULT NULL,
  `order_code` varchar(50) DEFAULT NULL,
  `field_name` varchar(50) DEFAULT NULL,
  `field_value` varchar(4000) DEFAULT NULL,
  `create_time` datetime DEFAULT NULL,
  PRIMARY KEY (`id`) /*T![clustered_index] NONCLUSTERED */,
  KEY `idx_order_extra_2` (`order_code`),
  KEY `idx_order_extra_1` (`shop_id`,`create_time`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin;

dashboard慢SQL的执行计划(执行计划1)

SELECT
        t.order_code,
        t.coin_num 
FROM
        (
SELECT
        order_code,
        ( sum( CASE field_name WHEN 'sale_quantity' THEN field_value ELSE 0 END ) - sum( CASE field_name WHEN 'give_quantity' THEN field_value ELSE 0 END ) ) coin_num 
FROM
        XXXXXTB 
WHERE
        shop_id = 'eb14a985-f788-48b2-a73b-738e08ef158e' 
        AND create_time >= "2021-10-04 00:00:00" 
        AND create_time < "2021-10-05 00:00:00" GROUP BY order_code ) t WHERE t.coin_num > 0;
        
        
        
        id                                       task             estRows         operator info                                                                                                                                                                                                                                                                                                                                                                         actRows           execution info                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                memory          disk
        Projection_12                            root             1               XXXXXDB.XXXXXTB.order_code, minus(Column#9, Column#10)->Column#11                                                                                                                                                                                                                                                                                                        1                 time:20m17.4s, loops:2, Concurrency:OFF                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       1.34 KB         N/A
        └─Limit_15                               root             1               offset:0, count:1                                                                                                                                                                                                                                                                                                                                                                     1                 time:20m17.4s, loops:2                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        N/A             N/A
          └─Selection_16                         root             1               gt(minus(Column#9, Column#10), 0)                                                                                                                                                                                                                                                                                                                                                     1                 time:20m17.4s, loops:1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        41.4 KB         N/A
            └─StreamAgg_21                       root             1               group by:Column#26, funcs:sum(Column#23)->Column#9, funcs:sum(Column#24)->Column#10, funcs:firstrow(Column#25)->XXXXXDB.XXXXXTB.order_code                                                                                                                                                                                                                               771               time:20m17.4s, loops:1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        84.0 KB         N/A
              └─Projection_41                    root             1.25            cast(case(eq(XXXXXDB.XXXXXTB.field_name, sale_quantity), XXXXXDB.XXXXXTB.field_value, 0), double BINARY)->Column#23, cast(case(eq(XXXXXDB.XXXXXTB.field_name, give_quantity), XXXXXDB.XXXXXTB.field_value, 0), double BINARY)->Column#24, XXXXXDB.XXXXXTB.order_code, XXXXXDB.XXXXXTB.order_code        18196             time:20m17.4s, loops:19, Concurrency:OFF                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      302.6 KB        N/A
                └─Projection_40                  root             1.25            XXXXXDB.XXXXXTB.shop_id, XXXXXDB.XXXXXTB.order_code, XXXXXDB.XXXXXTB.field_name, XXXXXDB.XXXXXTB.field_value, XXXXXDB.XXXXXTB.create_time                                                                                                                                                                            18196             time:20m17.4s, loops:19, Concurrency:OFF                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      819.2 KB        N/A
                  └─IndexLookUp_39               root             1.25                                                                                                                                                                                                                                                                                                                                                                                                  18196             time:20m17.4s, loops:19, index_task: {total_time: 20m17.1s, fetch_handle: 2m54.6s, build: 8m55.9s, wait: 8m26.6s}, table_task: {total_time: 1h41m25.7s, num: 56723, concurrency: 5}                                                                                                                                                                                                                                                                                                                                                                                                                                           265.8 MB        N/A
                    ├─IndexFullScan_36           cop[tikv]        17452.40        table:XXXXXTB, index:idx_XXXXXTB_2(order_code), keep order:true                                                                                                                                                                                                                                                                                                               1161519342        time:1.87s, loops:1134751, cop_task: {num: 1125, max: 5.4s, min: 643.1μs, avg: 851.6ms, p95: 1.48s, max_proc_keys: 1448429, p95_proc_keys: 1334907, tot_proc: 13m0s, tot_wait: 297ms, rpc_num: 1133, rpc_time: 15m58s, copr_cache_hit_ratio: 0.00}, backoff{regionMiss: 10ms}, tikv_task:{proc max:2.48s, min:163ms, p80:757ms, p95:1.04s, iters:1139217, tasks:1125}, scan_detail: {total_process_keys: 1160620563, total_keys: 1160623468, rocksdb: {delete_skipped_count: 0, key_skipped_count: 1178721014, block: {cache_hit_count: 904535, read_count: 136, read_byte: 851.0 KB}}}                                      N/A             N/A
                    └─Selection_38               cop[tikv]        1.25            eq(XXXXXDB.XXXXXTB.shop_id, "eb14a985-f788-48b2-a73b-738e08ef158e"), ge(XXXXXDB.XXXXXTB.create_time, 2021-10-04 00:00:00.000000), lt(XXXXXDB.XXXXXTB.create_time, 2021-10-05 00:00:00.000000)                                                                                                                                                  18196             time:55m47.9s, loops:56963, cop_task: {num: 6648294, max: 4.33s, min: 219.3μs, avg: 2.99ms, p95: 9.1ms, max_proc_keys: 21022, p95_proc_keys: 63, tot_proc: 1h5m30.9s, tot_wait: 25m6.9s, rpc_num: 6648390, rpc_time: 5h29m59.6s, copr_cache_hit_ratio: 0.00}, backoff{regionMiss: 54ms, staleCommand: 2ms}, tikv_task:{proc max:386ms, min:0s, p80:1ms, p95:1ms, iters:9755705, tasks:6648294}, scan_detail: {total_process_keys: 1160441875, total_keys: 1185259460, rocksdb: {delete_skipped_count: 0, key_skipped_count: 1141620284, block: {cache_hit_count: 468441273, read_count: 21828, read_byte: 394.1 MB}}}        N/A             N/A
                      └─TableRowIDScan_37        cop[tikv]        17452.40        table:XXXXXTB, keep order:false                                                                                                                                                                                                                                                                                                                                                   1161519342        tikv_task:{proc max:384ms, min:0s, p80:1ms, p95:1ms, iters:9755705, tasks:6648294}, scan_detail: {total_process_keys: 0, total_keys: 0, rocksdb: {delete_skipped_count: 0, key_skipped_count: 0, block: {cache_hit_count: 0, read_count: 0, read_byte: 0 Bytes}}}                                                                                                                                                                                                                                                                                                                                                             N/A             N/A

把dashboard上的慢SQL拷贝到客户端,手动执行SQL,对应执行计划(执行计划2)

EXPLAIN ANALYZE SELECT
        t.order_code,
        t.coin_num 
FROM
        (
SELECT
        order_code,
        ( sum( CASE field_name WHEN 'sale_quantity' THEN field_value ELSE 0 END ) - sum( CASE field_name WHEN 'give_quantity' THEN field_value ELSE 0 END ) ) coin_num 
FROM
        XXXXXTB 
WHERE
        shop_id = 'eb14a985-f788-48b2-a73b-738e08ef158e' 
        AND create_time >= "2021-10-04 00:00:00" 
        AND create_time < "2021-10-05 00:00:00" GROUP BY order_code ) t WHERE t.coin_num > 0;
        
        
Projection_9        53242.86        218        root                time:85ms, loops:2, Concurrency:5        XXXXXDB.XXXXXTB.order_code, minus(Column#9, Column#10)->Column#11        21.3 KB        N/A
└─Selection_10        53242.86        218        root                time:84.6ms, loops:2        gt(minus(Column#9, Column#10), 0)        9.54 KB        N/A
  └─HashAgg_16        66553.58        771        root                time:84ms, loops:7, partial_worker:{wall_time:83.484628ms, concurrency:5, task_num:1, tot_wait:406.721842ms, tot_exec:1.878203ms, tot_time:408.795545ms, max:83.418615ms, p95:83.418615ms}, final_worker:{wall_time:84.332323ms, concurrency:5, task_num:5, tot_wait:417.21916ms, tot_exec:3.550493ms, tot_time:420.781142ms, max:84.297101ms, p95:84.297101ms}        group by:XXXXXDB.XXXXXTB.order_code, funcs:sum(Column#14)->Column#9, funcs:sum(Column#15)->Column#10, funcs:firstrow(XXXXXDB.XXXXXTB.order_code)->XXXXXDB.XXXXXTB.order_code        341.2 KB        N/A
    └─IndexLookUp_17        66553.58        776        root                time:81.3ms, loops:2, index_task: {total_time: 24ms, fetch_handle: 24ms, build: 4.63μs, wait: 27.1μs}, table_task: {total_time: 275.9ms, num: 5, concurrency: 5}                480.6 KB        N/A
      ├─IndexRangeScan_14(Build)        83191.98        18196        cop[tikv]        table:XXXXXTB, index:idx_XXXXXTB_1(shop_id, create_time)        time:22.2ms, loops:20, cop_task: {num: 1, max: 22.1ms, proc_keys: 18196, tot_proc: 20ms, rpc_num: 1, rpc_time: 22.1ms, copr_cache_hit_ratio: 0.00}, tikv_task:{time:20ms, loops:22}, scan_detail: {total_process_keys: 18196, total_keys: 18197, rocksdb: {delete_skipped_count: 0, key_skipped_count: 18196, block: {cache_hit_count: 23, read_count: 1, read_byte: 11.4 KB}}}        range:["eb14a985-f788-48b2-a73b-738e08ef158e" 2021-10-04 00:00:00,"eb14a985-f788-48b2-a73b-738e08ef158e" 2021-10-05 00:00:00), keep order:false        N/A        N/A
      └─HashAgg_12(Probe)        66553.58        776        cop[tikv]                time:113.2ms, loops:10, cop_task: {num: 56, max: 39.9ms, min: 1.34ms, avg: 9.37ms, p95: 23ms, max_proc_keys: 2072, p95_proc_keys: 909, tot_proc: 419ms, tot_wait: 16ms, rpc_num: 56, rpc_time: 522.9ms, copr_cache_hit_ratio: 0.00}, tikv_task:{proc max:36ms, min:0s, p80:14ms, p95:20ms, iters:58, tasks:56}, scan_detail: {total_process_keys: 18196, total_keys: 21419, rocksdb: {delete_skipped_count: 0, key_skipped_count: 9232, block: {cache_hit_count: 113866, read_count: 32, read_byte: 545.0 KB}}}        group by:XXXXXDB.XXXXXTB.order_code, funcs:sum(case(eq(XXXXXDB.XXXXXTB.field_name, "sale_quantity"), XXXXXDB.XXXXXTB.field_value, "0"))->Column#14, funcs:sum(case(eq(XXXXXDB.XXXXXTB.field_name, "give_quantity"), XXXXXDB.XXXXXTB.field_value, "0"))->Column#15        N/A        N/A
        └─TableRowIDScan_15        83191.98        18196        cop[tikv]        table:XXXXXTB        tikv_task:{proc max:34ms, min:0s, p80:13ms, p95:19ms, iters:58, tasks:56}, scan_detail: {total_process_keys: 0, total_keys: 0, rocksdb: {delete_skipped_count: 0, key_skipped_count: 0, block: {cache_hit_count: 0, read_count: 0, read_byte: 0 Bytes}}}        keep order:false        N/A        N/A

3.2、可能性分析

分析有两个可能性:

  • 1、用户是否设置 sql_select_limit ?如果设置了这个,会导致 SPM 不生效,这个是一个已知问题,目前已经在解决中了,issue见 https://github.com/pingcap/tidb/issues/27949
  • 2、用户运行 query 和 binding 的字符集是不是一致?如果不一致也可能导致 binding 匹配不上;
  • 3、从上面执行计划1与执行计划2对比来看,执行计划里面多了一个Limit_15的算子,说明由程序发过来的SQL中包含有limit的算子,而手动在客户端执行的SQL没有limit算子,所以猜想是由于程序端可能设置session变量sql_select_limit,或者是框架中初始化启动的时候设置了该参数。

3.3、给用户的排查建议

  • 1、使用抓包工具,开启抓包,然后重启一个应用程序,停止抓包,检索是否包含sql_select_limit的关键字。
  • 2、通过使用他们的驱动,写一个简单的程序,执行SQL:show variable like ‘%select_limit%’ ,查看返回的值是否不是默认值:。
  • 3、先binding一个简单的SQL,比如:select * from table where a=1 类似的SQL,然后使用连接数据库的驱动,比如jdbc等,写一个简单的程序,看是否可以命中前面的binding,如果不能命中,那说明不是本次案例复杂SQL的问题,因为需要排除SQL是否存在回车、换行等差异导致没有命中binding。

3.4、实锤

通过抓包工具的排查手段,的确找到相关关键字:
QQ%E6%88%AA%E5%9B%BE20211126231836
基础分析来源,最终在kettle自带的jdbc中找到配置项:
QQ%E6%88%AA%E5%9B%BE20211126234150

3.5、根因

  • 1、存在一个已知的BUG:设置了session的sql_select_limit变量,在发送的SQL就会相当于加了limit,导致即使binding了,但是不会执行binding。
  • 2、用户虽然没有在代码中设置session的sql_select_limit变量,但是在kettle自带的jdbc中设置了这个配置项,触发了上面的BUG。

4、解决方案

  • 1、用户通过Optimizer Hints方式修改程序代码中的 SQL语句,临时解决;
  • 2、找到问题根因是由于kettle自带的jdbc设置了sql_select_limit参数导致,去掉框架启动参数;
  • 3、等PR出来后升级版本解决这个BUG;issue 链接:https://github.com/pingcap/tidb/issues/27949
9 个赞