SQL上线引发的血案

背景

9.10号下午18点多,研发人员反馈有个服务的延时变高了,平时都是2s以下,突然变成了10s多

排查过程

  • 登录集群的dashboard,查看集群延时和慢查询确实都升高了,并且业务SQL慢的前后都有ANALYZE TABLEdip_saos_w.db_main_dw;操作,这个Tidb自动触发的收集更新表的统计信息的操作,如下图:
  • 咨询了开发人员,db_main_dw这个表很早就上了,并且今天也没有做过任何业务上的调整,我试着调整analyze执行的频率,并且手动执行了一下今天变慢的SQL,发现集群在没有做analyze的时候也很慢,如下图:
  • 接着在群里咨询大家上图变慢的这个SQL今天有没有做过调整,得知下午上过线,这个SQL去掉了一个排序条件,ORDER BY c.priority desc, a.start_time变成了 ORDER BY a.start_time,调整前的SQL如下:
SELECT   a.task_id,   a.task_type,   a.task_type_name,   a.task_owner,   a.task_owner_name,   a.task_title,   a.start_time,   b.customer_account_id,   b.customer_name,   c.owner,   c.owner_name,   c.intention_level,   c.oppty_status_category,   c.oppty_status,   c.last_follow_up_time,   c.last_follow_up_status,   c.last_follow_up_content,   c.priority FROM   saos_base.tsa_task a   LEFT JOIN saos_tur.tur_customer b ON a.customer_Account = b.customer_account_id   LEFT JOIN (     SELECT       oo.customer_account_id,       oo.OWNER,       oo.owner_name,       oo.intention_level,       oo.oppty_status_category,       oo.oppty_status,       oo.last_follow_up_time,       oo.last_follow_up_status,       oo.last_follow_up_content,       oo.priority     FROM       saos_tur.rb_opportunity oo     WHERE       oo.oppty_status_category = 10   ) c ON b.customer_account_id = c.customer_account_id WHERE   a.biz_app = 'RB'   AND a.task_owner = '5123'   AND a.task_type = 7134   AND a.task_state = 11   AND a.start_time < '2020-09-10 00:00:00' ORDER BY  c.priority desc,  a.start_time ASC LIMIT   0, 10;

  • 然后在PingCAP的技术支持群里咨询各位老师得知,原 SQL 语句使用了更为高效的索引。新的语句去掉了原先的排序列更换了为了低效的索引,所以执行变慢。(从最终的情况来看还是与执行计划不准确有直接关系)

TopN和Limit下推规则

  • 那么这个SQL还能再快嘛,8s业务肯定是无法接受的,我想到了用tiflash加速,然后我把这个SQL涉及到的表都开启了tiflash,explain 这个SQL发现查询竟然没有走tiflash,然后我手动设置了session 级别的set @@session.tidb_isolation_read_engines = "tidb,tiflash";把tikv禁止掉,再次explain 这个SQL都走tiflash了,手动执行时间从8s将到了500ms

tiflash的用法

  • 既然走tiflash查询效率提升比较明显,那就让开发人员改下SQL,让这个查询强制走tiflash,比较尴尬的是我们按官方文档把SQL调整了还是不能走tiflash,强制走tiflash的SQL如下:
SELECT /*+ read_from_storage(tiflash[a,b]) */   a.task_id,   a.task_type,   a.task_type_name,   a.task_owner,   a.task_owner_name,   a.task_title,   a.start_time,   b.customer_account_id,   b.customer_name,   c.owner,   c.owner_name,   c.intention_level,   c.oppty_status_category,   c.oppty_status,   c.last_follow_up_time,   c.last_follow_up_status,   c.last_follow_up_content,   c.priority FROM   saos_base.tsa_task a   LEFT JOIN saos_tur.tur_customer b ON a.customer_Account = b.customer_account_id   LEFT JOIN (     SELECT       oo.customer_account_id,       oo.OWNER,       oo.owner_name,       oo.intention_level,       oo.oppty_status_category,       oo.oppty_status,       oo.last_follow_up_time,       oo.last_follow_up_status,       oo.last_follow_up_content,       oo.priority     FROM       saos_tur.rb_opportunity oo     WHERE       oo.oppty_status_category = 10   ) c ON b.customer_account_id = c.customer_account_id WHERE   a.biz_app = 'RB'   AND a.task_owner = '5123'   AND a.task_type = 7134   AND a.task_state = 11   AND a.start_time < '2020-09-10 00:00:00' ORDER BY  c.priority desc,  a.start_time ASC LIMIT   0, 10;

tiflash的用法

  • 上面问题反馈给群里PingCAP的技术支持小伙伴们然后继续排查,收集了表的健康度、表的统计信息、统计信息直方图,信息如下:
#表健康度
MySQL [(none)]> SHOW STATS_HEALTHY where Table_name in ( 'tsa_task','tur_customer','rb_opportunity');
+-----------+----------------+----------------+---------+
| Db_name   | Table_name     | Partition_name | Healthy |
+-----------+----------------+----------------+---------+
| saos_base | tsa_task       |                |      51 |
| saos_tur  | tur_customer   |                |      80 |
| saos_tur  | rb_opportunity |                |      72 |
+-----------+----------------+----------------+---------+

#表统计信息
MySQL [(none)]> SHOW stats_meta where Table_name in ( 'tsa_task','tur_customer','rb_opportunity');
+-----------+----------------+----------------+---------------------+--------------+-----------+
| Db_name   | Table_name     | Partition_name | Update_time         | Modify_count | Row_count |
+-----------+----------------+----------------+---------------------+--------------+-----------+
| saos_tur  | tur_customer   |                | 2020-09-11 14:25:23 |       157738 |    795808 |
| saos_tur  | rb_opportunity |                | 2020-09-11 14:27:06 |       144563 |    517778 |
| saos_base | tsa_task       |                | 2020-09-11 14:27:51 |      1112555 |   2311207 |
+-----------+----------------+----------------+---------------------+--------------+-----------+
#统计信息直方图
MySQL [(none)]> SHOW STATS_HISTOGRAMS where Table_name in ( 'tsa_task','tur_customer','rb_opportunity');
+-----------+----------------+----------------+-------------------------------+----------+---------------------+----------------+------------+---------------------+-------------+
| Db_name   | Table_name     | Partition_name | Column_name                   | Is_index | Update_time         | Distinct_count | Null_count | Avg_col_size        | Correlation |
+-----------+----------------+----------------+-------------------------------+----------+---------------------+----------------+------------+---------------------+-------------+
| saos_base | tsa_task       |                | biz_code                      |        0 | 2020-08-13 08:51:18 |        1510400 |          0 |               22.83 |    0.198177 |
| saos_base | tsa_task       |                | task_type                     |        0 | 2020-08-13 08:51:18 |             93 |          0 |                7.37 |   -0.124592 |
| saos_base | tsa_task       |                | create_time                   |        0 | 2020-08-13 08:51:21 |        1309184 |          0 |                              
+-----------+----------------+----------------+-------------------------------+----------+---------------------+----------------+------------+---------------------+-------------+
65 rows in set (0.00 sec)


查看表的健康度信息
查看表的总行数以及修改的行数
查看统计信息直方图

  • 通过上面信息发现这3个表的健康度都不高,官方建议健康度最好在80以上,所以我们先重新收集下表的统计信息,命令如下:
analyze table tsa_task;
analyze table tur_customer;
analyze table rb_opportunity;

analyze 用法

  • 然后再次执行发现还是无法走tiflash,但是走tikv变快了,通过执行计划可以看到查询优化器用到了TopN和IndexMergeJoin这两个优化算子,执行计划如下:
explain analyze  SELECT  a.task_id,   a.task_type,   a.task_type_name,   a.task_owner,   a.task_owner_name,   a.task_title,   a.start_time,   b.customer_account_id,   b.customer_name,   c.owner,   c.owner_name,   c.intention_level,   c.oppty_status_category,   c.oppty_status,   c.last_follow_up_time,    c.last_follow_up_status,   c.last_follow_up_content,   c.priority 
FROM   saos_base.tsa_task a    
LEFT JOIN saos_tur.tur_customer b ON a.customer_Account = b.customer_account_id    
LEFT JOIN (     SELECT       oo.customer_account_id,       oo.OWNER,       oo.owner_name,       oo.intention_level,       oo.oppty_status_category,       oo.oppty_status,        oo.last_follow_up_time,       oo.last_follow_up_status,       oo.last_follow_up_content,       oo.priority      FROM       saos_tur.rb_opportunity oo     WHERE       oo.oppty_status_category = 10   ) c   
ON b.customer_account_id = c.customer_account_id   
WHERE   a.biz_app = 'RB'   AND a.task_owner = '5123'   AND a.task_type = 7134     AND a.task_state = 11   AND a.start_time < '2020-09-10 00:00:00' 
ORDER BY  a.start_time ASC LIMIT   0, 10;
+------------------------------------------------+----------+---------+-----------+--------------------------------------------------------------+-------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------+------+
| id                                             | estRows  | actRows | task      | access object                                                | execution info                                                                | operator info                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               | memory       | disk |
+------------------------------------------------+----------+---------+-----------+--------------------------------------------------------------+-------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------+------+
| Projection_15                                  | 10.00    | 0       | root      |                                                              | time:159.603696ms, loops:1, Concurrency:OFF                                   | saos_base.tsa_task.task_id, saos_base.tsa_task.task_type, saos_base.tsa_task.task_type_name, saos_base.tsa_task.task_owner, saos_base.tsa_task.task_owner_name, saos_base.tsa_task.task_title, saos_base.tsa_task.start_time, saos_tur.tur_customer.customer_account_id, saos_tur.tur_customer.customer_name, saos_tur.rb_opportunity.owner, saos_tur.rb_opportunity.owner_name, saos_tur.rb_opportunity.intention_level, saos_tur.rb_opportunity.oppty_status_category, saos_tur.rb_opportunity.oppty_status, saos_tur.rb_opportunity.last_follow_up_time, saos_tur.rb_opportunity.last_follow_up_status, saos_tur.rb_opportunity.last_follow_up_content, saos_tur.rb_opportunity.priority | 3.7265625 KB | N/A  |
| └─Limit_21                                     | 10.00    | 0       | root      |                                                              | time:159.601289ms, loops:1                                                    | offset:0, count:10                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          | N/A          | N/A  |
|   └─IndexJoin_160                              | 10.00    | 0       | root      |                                                              | time:159.584142ms, loops:1, Concurrency:4                                     | left outer join, inner:IndexLookUp_159, outer key:saos_tur.tur_customer.customer_account_id, inner key:saos_tur.rb_opportunity.customer_account_id                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          | 0 Bytes      | N/A  |
|     ├─Limit_174(Build)                         | 10.00    | 0       | root      |                                                              | time:159.685046ms, loops:1                                                    | offset:0, count:10                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          | N/A          | N/A  |
|     │ └─IndexJoin_130                          | 10.00    | 0       | root      |                                                              | time:159.681804ms, loops:1, Concurrency:4                                     | left outer join, inner:IndexLookUp_129, outer key:saos_base.tsa_task.customer_account, inner key:saos_tur.tur_customer.customer_account_id                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  | 0 Bytes      | N/A  |
|     │   ├─TopN_139(Build)                      | 10.00    | 0       | root      |                                                              | time:159.753396ms, loops:1                                                    | saos_base.tsa_task.start_time:asc, offset:0, count:10                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       | 0 Bytes      | N/A  |
|     │   │ └─IndexLookUp_147                    | 10.00    | 0       | root      |                                                              | time:159.720228ms, loops:2, rpc num: 1, rpc time:14.658549ms, proc keys:19366 |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             | 369.71875 KB | N/A  |
|     │   │   ├─IndexRangeScan_70(Build)         | 19354.27 | 19366   | cop[tikv] | table:a, index:idx_owner(task_owner)                         | time:11ms, loops:23                                                           | range:["5123","5123"], keep order:false                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     | N/A          | N/A  |
|     │   │   └─TopN_146(Probe)                  | 10.00    | 0       | cop[tikv] |                                                              | proc max:73ms, min:0s, p80:64ms, p95:73ms, iters:31, tasks:17                 | saos_base.tsa_task.start_time:asc, offset:0, count:10                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       | N/A          | N/A  |
|     │   │     └─Selection_72                   | 672.94   | 0       | cop[tikv] |                                                              | proc max:72ms, min:0s, p80:64ms, p95:72ms, iters:31, tasks:17                 | eq(saos_base.tsa_task.biz_app, "RB"), eq(saos_base.tsa_task.task_state, 11), eq(saos_base.tsa_task.task_type, 7134), lt(saos_base.tsa_task.start_time, 2020-09-10 00:00:00.000000)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          | N/A          | N/A  |
|     │   │       └─TableRowIDScan_71            | 19354.27 | 19366   | cop[tikv] | table:a                                                      | proc max:72ms, min:0s, p80:64ms, p95:72ms, iters:31, tasks:17                 | keep order:false                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            | N/A          | N/A  |
|     │   └─IndexLookUp_129(Probe)               | 1.00     |         | root      |                                                              | time:0ns, loops:0                                                             |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             | N/A          | N/A  |
|     │     ├─IndexRangeScan_127(Build)          | 1.00     |         | cop[tikv] | table:b, index:uniq_customeraccountid(customer_account_id)   | time:0ns, loops:0                                                             | range: decided by [eq(saos_tur.tur_customer.customer_account_id, saos_base.tsa_task.customer_account)], keep order:false                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    | N/A          | N/A  |
|     │     └─TableRowIDScan_128(Probe)          | 1.00     |         | cop[tikv] | table:b                                                      | time:0ns, loops:0                                                             | keep order:false                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            | N/A          | N/A  |
|     └─IndexLookUp_159(Probe)                   | 1.04     |         | root      |                                                              | time:0ns, loops:0                                                             |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             | N/A          | N/A  |
|       ├─IndexRangeScan_156(Build)              | 2.54     |         | cop[tikv] | table:oo, index:idx_customer_account_id(customer_account_id) | time:0ns, loops:0                                                             | range: decided by [eq(saos_tur.rb_opportunity.customer_account_id, saos_tur.tur_customer.customer_account_id)], keep order:false                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            | N/A          | N/A  |
|       └─Selection_158(Probe)                   | 1.04     |         | cop[tikv] |                                                              | time:0ns, loops:0                                                             | eq(saos_tur.rb_opportunity.oppty_status_category, 10)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       | N/A          | N/A  |
|         └─TableRowIDScan_157                   | 2.54     |         | cop[tikv] | table:oo                                                     | time:0ns, loops:0                                                             | keep order:false                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            | N/A          | N/A  |
+------------------------------------------------+----------+---------+-----------+--------------------------------------------------------------+-------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------+------+
18 rows in set, 1 warning (0.17 sec)

  • Tidb虽然高度兼容Mysql,但是有些用法和Mysql还是不一样的,包括SQL优化,下面是SQL优化相关的原理文章,要想把Tidb用好,SQL的优化这块还需要从原理方面深入学习,SQL优化原理相关

下面这三篇原理相关的文章可以了解下
说存储
说计算
谈调度

写在最后

特别感谢PingCAP的小伙伴们的大力支持,每次遇到问题会帮忙积极主动跟进处理,再次感谢李仲舒、王贤净、苏丹、高振娇。

更多学习和交流可以关注我公众号:

%E5%85%AC%E4%BC%97%E5%8F%B7%E5%90%8D%E7%89%87

学习了