关联子查询优化

为提高效率,提问时请提供以下信息,问题描述清晰可优先响应。

  • 【TiDB 版本】:v3.0
  • 【问题描述】:现存在一个业务场景:对同一张表(量级一年有几十亿条记录),有一个找成对投入产出记录业务场景,先找产出,加了索引找产出大约有50万+条记录,很快就能找出,找投入需要以产出记录做条件查找,按照产出结果条件查找,一条产出可对应多条投入,只需要找投入时间小于产出时间,投入时间最大的一条记录,这个时候发现很慢,到了分钟级别,咨询下针对这种场景,是否有更好的优化方法。

若提问为性能优化、故障排查类问题,请下载脚本运行。终端输出打印结果,请务必全选并复制粘贴上传。

麻烦上传一下完整的 SQL 语句、EXPLAIN ANALYZE 结果、以及表结构,我们看下

您好,信息如下: 查询sql: select a.factory_code, a.project_name, a.site_name, a.fsn, max(a.op_time) from workstation_pass_info a, (select factory_code, project_name site_name, fsn, max(op_time) op_time from workstation_pass_info where op_time >= ‘2019-12-26’ and op_time < ‘2019-12-27’ and factory_code = ‘101077’ and site_name = ‘SMT_SNO_OUTPUT’ and times >= 0 group by factory_code, project_name, site_name, fsn) b where a.fsn = b.fsn and a.op_time < b.op_time and a.site_name = ‘SMT_SNO_INPUT’ and a.times >= 0 group by a.factory_code, a.project_name, a.site_name, a.fsn;

explain结果: | id | count | task | operator info | ±-----------------------------------±------------±-----±---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Projection_12 | 74733.95 | root | supply_factory.a.factory_code, supply_factory.a.project_name, supply_factory.a.site_name, supply_factory.a.fsn, 8_col_0 | | └─HashAgg_15 | 74733.95 | root | group by:supply_factory.a.factory_code, supply_factory.a.fsn, supply_factory.a.project_name, supply_factory.a.site_name, funcs:max(supply_factory.a.op_time), firstrow(supply_factory.a.factory_code), firstrow(supply_factory.a.project_name), firstrow(supply_factory.a.fsn), firstrow(supply_factory.a.site_name) | | └─IndexJoin_21 | 74733.95 | root | inner join, inner:IndexLookUp_20, outer key:supply_factory.workstation_pass_info.fsn, inner key:supply_factory.a.fsn, other cond:lt(supply_factory.a.op_time, op_time) | | ├─Selection_24 | 1797.33 | root | not(isnull(4_col_0)) | | │ └─HashAgg_47 | 2246.67 | root | group by:col_2, col_3, col_4, col_5, funcs:max(col_0), firstrow(col_1) | | │ └─IndexLookUp_48 | 2246.67 | root | | | │ ├─Selection_45 | 97218.69 | cop | eq(supply_factory.workstation_pass_info.factory_code, “101077”), eq(supply_factory.workstation_pass_info.site_name, “SMT_SNO_OUTPUT”) | | │ │ └─IndexScan_43 | 19433134.97 | cop | table:workstation_pass_info, index:op_time, factory_code, project_name, site_name, range:[2019-12-26 00:00:00,2019-12-27 00:00:00), keep order:false | | │ └─HashAgg_26 | 2246.67 | cop | group by:supply_factory.workstation_pass_info.factory_code, supply_factory.workstation_pass_info.fsn, supply_factory.workstation_pass_info.project_name, supply_factory.workstation_pass_info.site_name, funcs:max(supply_factory.workstation_pass_info.op_time), firstrow(supply_factory.workstation_pass_info.fsn) | | │ └─Selection_46 | 93417.44 | cop | ge(supply_factory.workstation_pass_info.times, 0), not(isnull(supply_factory.workstation_pass_info.fsn)) | | │ └─TableScan_44 | 97218.69 | cop | table:workstation_pass_info, keep order:false, stats:pseudo | | └─IndexLookUp_20 | 0.91 | root | | | ├─Selection_18 | 44.85 | cop | not(isnull(supply_factory.a.fsn)) | | │ └─IndexScan_16 | 44.85 | cop | table:a, index:fsn, range: decided by [eq(supply_factory.a.fsn, supply_factory.workstation_pass_info.fsn)], keep order:false | | └─Selection_19 | 0.91 | cop | eq(supply_factory.a.site_name, “SMT_SNO_INPUT”), ge(supply_factory.a.times, 0), not(isnull(supply_factory.a.op_time)) | | └─TableScan_17 | 44.85 | cop | table:workstation_pass_info, keep order:false, stats:pseudo | ±-----------------------------------±------------±-----±---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

表结构: ----------------------±-------------±-----±-----±------------------±------+ | Field | Type | Null | Key | Default | Extra | ±---------------------±-------------±-----±-----±------------------±------+ | id | bigint(20) | NO | PRI | NULL | | | work_order | varchar(50) | YES | | NULL | | | line_id | varchar(30) | YES | | NULL | | | site_id | varchar(100) | YES | | NULL | | | factory_code | varchar(50) | YES | MUL | NULL | | | project_name | varchar(80) | YES | | NULL | | | fsn | varchar(50) | YES | MUL | NULL | | | result | varchar(1) | YES | | NULL | | | times | smallint(6) | YES | | NULL | | | machine_id | varchar(100) | YES | | NULL | | | machine_name | varchar(100) | YES | | NULL | | | imei2 | varchar(17) | YES | MUL | NULL | | | imei1 | varchar(17) | YES | MUL | NULL | | | operation | varchar(50) | YES | | NULL | | | site_name | varchar(30) | YES | | NULL | | | operator | varchar(100) | YES | | NULL | | | save_time | timestamp | YES | | NULL | | | create_time | timestamp | YES | MUL | CURRENT_TIMESTAMP | | | update_time | timestamp | YES | | CURRENT_TIMESTAMP | | | passworkstation_code | varchar(50) | YES | | NULL | | | op_time | timestamp | YES | MUL | NULL | | | idx | varchar(100) | NO | | NULL | | | computer_name | varchar(100) | YES | | NULL | | | pcb_fix | int(11) | YES | | NULL | | ±---------------------±-------------±-----±-----±------------------±------+

现有索引情况: | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | ±----------------------±-----------±---------------------------±-------------±-------------±----------±------------±---------±-------±-----±-----------±--------±--------------+ | workstation_pass_info | 0 | PRIMARY | 1 | id | A | 0 | NULL | NULL | | BTREE | | | | workstation_pass_info | 1 | index_imei1 | 1 | imei1 | A | 0 | NULL | NULL | YES | BTREE | | | | workstation_pass_info | 1 | index_create_time | 1 | create_time | A | 0 | NULL | NULL | YES | BTREE | | | | workstation_pass_info | 1 | index_imei2 | 1 | imei2 | A | 0 | NULL | NULL | YES | BTREE | | | | workstation_pass_info | 1 | index_fsn | 1 | fsn | A | 0 | NULL | NULL | YES | BTREE | | | | workstation_pass_info | 1 | index_code_project_op | 1 | factory_code | A | 0 | NULL | NULL | YES | BTREE | | | | workstation_pass_info | 1 | index_code_project_op | 2 | project_name | A | 0 | NULL | NULL | YES | BTREE | | | | workstation_pass_info | 1 | index_code_project_op | 3 | op_time | A | 0 | NULL | NULL | YES | BTREE | | | | workstation_pass_info | 1 | index_code_project_op_site | 1 | factory_code | A | 0 | NULL | NULL | YES | BTREE | | | | workstation_pass_info | 1 | index_code_project_op_site | 2 | project_name | A | 0 | NULL | NULL | YES | BTREE | | | | workstation_pass_info | 1 | index_code_project_op_site | 3 | op_time | A | 0 | NULL | NULL | YES | BTREE | | | | workstation_pass_info | 1 | index_code_project_op_site | 4 | site_name | A | 0 | NULL | NULL | YES | BTREE | | | | workstation_pass_info | 1 | index_op_code_project_site | 1 | op_time | A | 0 | NULL | NULL | YES | BTREE | | | | workstation_pass_info | 1 | index_op_code_project_site | 2 | factory_code | A | 0 | NULL | NULL | YES | BTREE | | | | workstation_pass_info | 1 | index_op_code_project_site | 3 | project_name | A | 0 | NULL | NULL | YES | BTREE | | | | workstation_pass_info | 1 | index_op_code_project_site | 4 | site_name | A | 0 | NULL | NULL | YES | BTREE | | | ±----------------------±-----------±---------------------------±-------------±-------------±----------±------------±---------±-------±-----±-----------±--------±--------------+

麻烦调整一下回复的格式,回复支持 markdown 语法的,可以将内容用代码块格式输出,或者截图的方式,现在这样不太容易看清楚

查询sql.txt (13.2 KB)

您好,需要的信息如上附件,可用notepad打开

  1. 能提供一下 explain analyze 的结果吗
  2. 查看一下表统计信息的健康度
SHOW STATS_HEALTHY [ShowLikeOrWhere];

explain_analyze.txt (10.5 KB)

explain analyze的结果

不好意思,这边可能还得在麻烦导出一下统计信息,我在我这边的环境运行一下,试下改写sql的效果

导出统计信息方式:
https://pingcap.com/docs-cn/stable/reference/performance/statistics/#导出统计信息

另外可以:

  1. 重新收集一下统计信息然后看下执行计划有没有变化
  2. 集群的拓扑结构是怎么样的(几个 tidb/pd/tikv)?具体版本是 3.0.几的?

分析表后的explain结果.txt (9.1 KB)

从收集统计信息之后的explain analyze 结果看,sql 在毫秒级别就能运行出来?