同一个sql,在生产服务里面和手动运行的Mysql客户端,执行计划不一致

【 TiDB 使用环境】生产环境
【 TiDB 版本】
【复现路径】

  1. 如下是表信息,
-- 数据量  760w
CREATE TABLE `mk_wx_all_clue` (
  `_id` varchar(72) NOT NULL,
  `company_id` varchar(24) NOT NULL,
  `country` varchar(24) DEFAULT NULL,
  `platform` varchar(8) DEFAULT NULL,
  `companyId` varchar(24) DEFAULT NULL,
  `secondScene` varchar(2) DEFAULT NULL,
  ... 
  PRIMARY KEY (`_id`,`company_id`) /*T![clustered_index] NONCLUSTERED */,
  KEY `idx_company_id` (`company_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin;

-- 数据量: 950w 
CREATE TABLE `mk_wx_all_clue_userIds` (
  `_id` varchar(72) NOT NULL,
  `company_id` varchar(24) NOT NULL,
  `userIds_idx` bigint(20) NOT NULL,
  ...
  PRIMARY KEY (`_id`,`company_id`,`userIds_idx`) /*T![clustered_index] NONCLUSTERED */,
  KEY `idx_company_id` (`company_id`),
  KEY `cid_uid` (`company_id`,`userIds`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin

表信息:


所执行的sql:

explain analyze
SELECT count(1)
FROM mk_wx_all_clue a
         inner join mk_wx_all_clue_userIds d FORCE INDEX (cid_uid) on a.company_id = d.company_id and a._id = d._id
WHERE (d.company_id = 'W00000000001' and d.userIds in ('56', '951', '1239', '3009', '2158', '2693', '2256'))
  AND a.company_id = 'W00000000001'
  AND (a.clueStatus is null or a.clueStatus = 0 or a.clueStatus = 1)

【遇到的问:问题现象及影响】

客户端是golang应用程序,连接参数如下:

?charset=utf8mb4&parseTime=True&loc=PRC

客户端的慢sql通过dashboard查询得到的执行计划为:

耗时 3s多,显示未全表扫描

单独执行本条sql,执行计划为:

耗时几十毫秒

疑问:

为什么同一个sql在不同的环境执行计划不一致?出现这种状况的原因是什么?


对了 golang是使用的预处理 参数都是 ?

先对表做一下analyze table 操作,在看下执行计划呢

上面那个不要截图,提个全一点的文本

你两个sql的company_id值是一样的吗,单独执行的看值是多个啊,看下表的健康度,是不是有问题,收集一下统计信息再试下

刚刚尝试了一下不用预处理,发现执行非常快,
是不是由于预处理的原因导致tidb没法准确的选择最优执行计划呢?

id estRows actRows task access object execution info operator info memory disk

HashAgg_11 1.00 1 root time:63.9ms, loops:2, partial_worker:{wall_time:63.868606ms, concurrency:5, task_num:1, tot_wait:317.978675ms, tot_exec:11.882µs, tot_time:317.994614ms, max:63.602127ms, p95:63.602127ms}, final_worker:{wall_time:0s, concurrency:5, task_num:1, tot_wait:318.060098ms, tot_exec:15.879µs, tot_time:318.078621ms, max:63.637674ms, p95:63.637674ms} funcs:count(1)->Column#49 4.21 KB N/A
└─IndexJoin_17 222.58 10 root time:63.7ms, loops:2, inner:{total:40.8ms, concurrency:5, task:1, construct:67.3µs, fetch:40.7ms, build:8.66µs}, probe:17.4µs inner join, inner:IndexLookUp_16, outer key:mongo_tx.mk_wx_all_clue_userids._id, inner key:mongo_tx.mk_wx_all_clue._id, equal cond:eq(mongo_tx.mk_wx_all_clue_userids._id, mongo_tx.mk_wx_all_clue._id) 115.8 KB N/A
├─IndexLookUp_31(Build) 222.58 48 root time:22.2ms, loops:3, index_task: {total_time: 8.9ms, fetch_handle: 8.9ms, build: 681ns, wait: 1.58µs}, table_task: {total_time: 13ms, num: 1, concurrency: 5} 6.41 KB N/A
│ ├─IndexRangeScan_29(Build) 222.58 48 cop[tikv] table:d, index:cid_uid(company_id, userIds) time:8.9ms, loops:3, cop_task: {num: 2, max: 7.24ms, min: 3.03ms, avg: 5.14ms, p95: 7.24ms, max_proc_keys: 39, p95_proc_keys: 39, rpc_num: 2, rpc_time: 10.3ms, copr_cache_hit_ratio: 0.00}, tikv_task:{proc max:0s, min:0s, p80:0s, p95:0s, iters:3, tasks:2}, scan_detail: {total_process_keys: 48, total_process_keys_size: 4789, total_keys: 55, rocksdb: {delete_skipped_count: 0, key_skipped_count: 48, block: {cache_hit_count: 93, read_count: 0, read_byte: 0 Bytes}}} range:[“W00000000001” “1239”,“W00000000001” “1239”], [“W00000000001” “2158”,“W00000000001” “2158”], [“W00000000001” “2256”,“W00000000001” “2256”], [“W00000000001” “2693”,“W00000000001” “2693”], [“W00000000001” “3009”,“W00000000001” “3009”], [“W00000000001” “56”,“W00000000001” “56”], [“W00000000001” “951”,“W00000000001” “951”], keep order:false N/A N/A
│ └─TableRowIDScan_30(Probe) 222.58 48 cop[tikv] table:d time:12.9ms, loops:2, cop_task: {num: 14, max: 6.68ms, min: 434.1µs, avg: 1.68ms, p95: 6.68ms, max_proc_keys: 7, p95_proc_keys: 7, tot_proc: 2ms, rpc_num: 14, rpc_time: 23.3ms, copr_cache_hit_ratio: 0.00}, tikv_task:{proc max:1ms, min:0s, p80:0s, p95:1ms, iters:14, tasks:14}, scan_detail: {total_process_keys: 48, total_process_keys_size: 5509, total_keys: 50, rocksdb: {delete_skipped_count: 0, key_skipped_count: 4, block: {cache_hit_count: 478, read_count: 0, read_byte: 0 Bytes}}} keep order:false N/A N/A
└─IndexLookUp_16(Probe) 1.00 10 root time:40.6ms, loops:2, index_task: {total_time: 17.3ms, fetch_handle: 17.3ms, build: 662ns, wait: 1.85µs}, table_task: {total_time: 23.2ms, num: 1, concurrency: 5} 10.5 KB N/A
├─IndexRangeScan_13(Build) 1.00 40 cop[tikv] table:a, index:PRIMARY(_id, company_id) time:17.3ms, loops:3, cop_task: {num: 11, max: 7.22ms, min: 497.9µs, avg: 3ms, p95: 7.22ms, max_proc_keys: 6, p95_proc_keys: 6, tot_proc: 3ms, tot_wait: 2ms, rpc_num: 11, rpc_time: 32.9ms, copr_cache_hit_ratio: 0.00}, tikv_task:{proc max:1ms, min:0s, p80:1ms, p95:1ms, iters:11, tasks:11}, scan_detail: {total_process_keys: 40, total_process_keys_size: 6720, total_keys: 80, rocksdb: {delete_skipped_count: 0, key_skipped_count: 40, block: {cache_hit_count: 548, read_count: 2, read_byte: 88.0 KB}}} range: decided by [eq(mongo_tx.mk_wx_all_clue._id, mongo_tx.mk_wx_all_clue_userids._id) eq(mongo_tx.mk_wx_all_clue.company_id, W00000000001)], keep order:false N/A N/A
└─Selection_15(Probe) 1.00 10 cop[tikv] time:23.1ms, loops:2, cop_task: {num: 18, max: 5.89ms, min: 556.4µs, avg: 2.43ms, p95: 5.89ms, max_proc_keys: 7, p95_proc_keys: 7, tot_proc: 3ms, tot_wait: 3ms, rpc_num: 18, rpc_time: 43.6ms, copr_cache_hit_ratio: 0.00}, tikv_task:{proc max:1ms, min:0s, p80:0s, p95:1ms, iters:18, tasks:18}, scan_detail: {total_process_keys: 40, total_process_keys_size: 19774, total_keys: 40, rocksdb: {delete_skipped_count: 0, key_skipped_count: 0, block: {cache_hit_count: 635, read_count: 0, read_byte: 0 Bytes}}} or(isnull(mongo_tx.mk_wx_all_clue.cluestatus), or(eq(mongo_tx.mk_wx_all_clue.cluestatus, 0), eq(mongo_tx.mk_wx_all_clue.cluestatus, 1))) N/A N/A
└─TableRowIDScan_14 1.00 40 cop[tikv] table:a tikv_task:{proc max:1ms, min:0s, p80:0s, p95:1ms, iters:18, tasks:18} keep order:false N/A

是一样的 租户级别 几万租户

SELECT * FROM INFORMATION_SCHEMA.STATEMENTS_SUMMARY
看下这个sql的执行计划有几个呢,也检查下有没有用执行计划绑定

你看这a.company_id = d.company_id这里和d.company_id = ‘W00000000001’ and a.company_id = 'W00000000001是不是重复了,你写两遍这个可能导致优化器处理有些问题(执行计划缓存相关),你写一遍试试,再就是像你测试出来的,这个SQL就别用prepare接口了

1 个赞

每个tidb节点都analyze下同一个表试试呢

收集统计信息

统计信息收集了吗


要另外一个慢的sql执行计划的截图,这个应该是不慢的,因为他已经走到了indexrangescan,也就是说你的过滤条件company_id 和userIds 他都用到了。
但是你上面那个慢的sql,cid_uid索引居然走了indexfullscan,这个不应该,可能是并没有用到userIds条件导致

怀疑是两方面问题,
1.你的tidb-server里面有对应sql的Prepared Plan Cache,你应用上的那个慢sql使用的是执行计划缓存,直接走的最开始生成的错误的执行计划
2.a.company_id = d.company_id and a._id = d._id
WHERE (d.company_id = ‘W00000000001’ and d.userIds in (‘56’, ‘951’, ‘1239’, ‘3009’, ‘2158’, ‘2693’, ‘2256’))
AND a.company_id = ‘W00000000001’
company_id 这个条件出现了两次,一次是(d.company_id = ‘W00000000001’ and d.userIds in (‘56’, ‘951’, ‘1239’, ‘3009’, ‘2158’, ‘2693’, ‘2256’)),另一次是a.company_id = d.company_id and AND a.company_id = 'W00000000001’导致sql有时执行计划默认先走a.company_id = d.company_id and AND a.company_id = 'W00000000001’这个条件导致无法正确走到cid_uid索引的indexrangescan而是走了indexfullscan,
建议去掉 AND a.company_id = 'W00000000001’这个条件。

tidb是分布式执行算子,mysql是单机执行算子,不一样很正常

有道理