【紧急求助】tidb 升级到6.1.3 集群响应变慢

【 TiDB 使用环境】生产环境
【 TiDB 版本】6.1.3
【复现路径 从5.4.2 升级到6.1.3
【遇到的问题:问题现象及影响】
业务sql 全部响应特别慢

【资源配置】

【附件:截图/日志/监控】

看看有没有锁?

138 那个节点为什么region那么少?还是增长的趋势,像是在balance

看节点的CPU比较高,通过dashboard看看topsql,应该有比较吃cpu的大事务

  1. 看一下 Analyze 的 tidb_analyze_version 参数变量值多少?
  2. 提供一下 TiKV-details 和 PD 的监控看看 https://metricstool.pingcap.net/

看看pd statistics-balance 监控?

通过tidb-dashboard面板,打开topsql,你就能看到哪些SQL消耗CPU多了。另外你最好再介绍一下集群细节,只看监控16:50开始update和insert的语句开始变多了,duration也开始增加,tikv负载开始增高。
topsql开启方式,给你截图了,开启后选择某个tikv节点,就能显示当时的CPU消耗情况:

1 个赞

可以按照下面的方法把tikv-detail的grafana数据导出下,tikv CPU有点高,看下具体哪个部分引起的

可能是相关参数问题,或某些SQL的执行计划跑偏了

tikv-details 监控
tidb-kunpeng_dms-TiKV-Details_2023-01-12T10 29 12.492Z.json (31.1 MB)

可以先把gc life time调小点,2天时间有点太久了,看compaction flow越来越大,pending bytes也有点高;另外把auto analyze可以先关闭下,或者把start和end调整到UTC时间的00:00 +0800和07:00 +0800

当前的状态:给update 做了执行计划绑定

MySQL [kunpeng_dms_call]> show global bindings\G
*************************** 1. row ***************************
Original_sql: update `kunpeng_dms_call` . `tbl_device_send_cmd_log` set `device_code` = ? , `sign_no` = ? , `status` = ? where `uniq_id` = ? and `uniq_id` = ?
    Bind_sql: UPDATE `kunpeng_dms_call`.`tbl_device_send_cmd_log` USE INDEX (`PRIMARY`) SET `device_code`='ZYBSJ0037581', `sign_no`='4005', `status`=1 WHERE `uniq_id` = 'rpca5a64e9b529e4497b5198ac4abe406161673513647' AND `uniq_id` = 'rpca5a64e9b529e4497b5198ac4abe406161673513647'
  Default_db: kunpeng_dms_call
      Status: enabled
 Create_time: 2023-01-12 23:43:04.190
 Update_time: 2023-01-12 23:43:04.190
     Charset: utf8
   Collation: utf8_general_ci
      Source: manual
*************************** 2. row ***************************
Original_sql: select count ( ? ) from `kunpeng_dms_call` . `tbl_tel_call_log` where `encrypt_number` = ? and `create_time` >= ? and `call_status` in ( ... )
    Bind_sql: SELECT /*+ read_from_storage(tiflash[`tbl_tel_call_log`])*/ count(1) FROM `kunpeng_dms_call`.`tbl_tel_call_log` WHERE `encrypt_number` = '8a4f5a835373df4da2154f34cdfb4b790a4b8bc2ebf95d0c' AND `create_time` >= 1673452800 AND `call_status` IN (3,5,31)
  Default_db: kunpeng_dms_call
      Status: enabled
 Create_time: 2023-01-12 17:19:57.029
 Update_time: 2023-01-12 17:19:57.029
     Charset: utf8
   Collation: utf8_general_ci
      Source: manual

此时update 还是全表扫描的执行计划,慢日志中如下

*************************** 3. row ***************************
                     INSTANCE: 10.105.128.60:10094
                         Time: 2023-01-12 23:46:14.049842
                 Txn_start_ts: 438707998008803343
                         User: kunpengdm_app
                         Host: 10.110.0.232
                      Conn_ID: 1721487763422848565
             Exec_retry_count: 0
              Exec_retry_time: 0
                   Query_time: 173.620817781
                   Parse_time: 0
                 Compile_time: 0.000460205
                 Rewrite_time: 0.000290375
           Preproc_subqueries: 0
      Preproc_subqueries_time: 0
                Optimize_time: 0
                      Wait_TS: 0.000010595
                Prewrite_time: 0
    Wait_prewrite_binlog_time: 0
                  Commit_time: 0
           Get_commit_ts_time: 0
          Commit_backoff_time: 0
                Backoff_types: 
            Resolve_lock_time: 0
        Local_latch_wait_time: 0
                   Write_keys: 0
                   Write_size: 0
              Prewrite_region: 0
                    Txn_retry: 0
                     Cop_time: 173.250368907
                 Process_time: 2465.093
                    Wait_time: 9.785
                 Backoff_time: 0
                LockKeys_time: 0
                Request_count: 456
                   Total_keys: 478359688
                 Process_keys: 478358738
 Rocksdb_delete_skipped_count: 310
    Rocksdb_key_skipped_count: 478359522
Rocksdb_block_cache_hit_count: 518277
     Rocksdb_block_read_count: 101
      Rocksdb_block_read_byte: 5267193
                           DB: kunpeng_dms_call
                  Index_names: [tbl_device_send_cmd_log:PRIMARY]
                  Is_internal: 0
                       Digest: b6c811b9cb1881469e00ef9d5406f8d9e20d03aaba0c65fdc85b809b8b8906a4
                        Stats: tbl_device_send_cmd_log:438707801073123363
                 Cop_proc_avg: 5.405905701
                 Cop_proc_p90: 14.489
                 Cop_proc_max: 32.277
                Cop_proc_addr: 10.105.128.115:20175
                 Cop_wait_avg: 0.021458333
                 Cop_wait_p90: 0.053
                 Cop_wait_max: 0.122
                Cop_wait_addr: 10.105.128.115:20175
                      Mem_max: 7210
                     Disk_max: 0
                     KV_total: 2604.177794892
                     PD_total: 0.000009744
                Backoff_total: 0
     Write_sql_response_total: 0
                  Result_rows: 0
               Backoff_Detail: 
                     Prepared: 1
                         Succ: 0
                IsExplicitTxn: 0
            IsWriteCacheTable: 0
              Plan_from_cache: 0
            Plan_from_binding: 1
             Has_more_results: 0
                         Plan:  id                      task            estRows         operator info                                                                                                                                                                                                  actRows         execution info                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                memory         disk
        Update_4                root            0               N/A                                                                                                                                                                                                            0               time:2m53.6s, loops:1,                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        0 Bytes         N/A
        └─Selection_10          root            1.01            eq(kunpeng_dms_call.tbl_device_send_cmd_log.uniq_id, "rpc7b57f49908294744a5f1309f39aeac111673538200"), eq(kunpeng_dms_call.tbl_device_send_cmd_log.uniq_id, "rpc7b57f49908294744a5f1309f39aeac111673538200")   0               time:2m53.6s, loops:1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         6.70 KB         N/A
          └─IndexLookUp_9       root            1.01                                                                                                                                                                                                                           0               time:2m53.6s, loops:1,                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        354 Bytes       N/A
            ├─Selection_8       cop[tikv]       1.01            eq(kunpeng_dms_call.tbl_device_send_cmd_log.uniq_id, "rpc7b57f49908294744a5f1309f39aeac111673538200"), eq(kunpeng_dms_call.tbl_device_send_cmd_log.uniq_id, "rpc7b57f49908294744a5f1309f39aeac111673538200")   0               time:2m53.6s, loops:1, cop_task: {num: 456, max: 32.3s, min: 228.5ms, avg: 5.43s, p95: 20.3s, max_proc_keys: 1305408, p95_proc_keys: 1212810, tot_proc: 41m5.1s, tot_wait: 9.79s, rpc_num: 456, rpc_time: 41m15.2s, copr_cache_hit_ratio: 0.00}, tikv_task:{proc max:973ms, min:226ms, p80:661ms, p95:744ms, iters:469212, tasks:456}, scan_detail: {total_process_keys: 478358738, total_process_keys_size: 33906243952, total_keys: 478359688, rocksdb: {delete_skipped_count: 310, key_skipped_count: 478359522, block: {cache_hit_count: 518277, read_count: 101, read_byte: 5.02 MB}}}   N/A             N/A
            │ └─IndexFullScan_6 cop[tikv]       1709241802      table:tbl_device_send_cmd_log, index:PRIMARY(uniq_id), keep order:false                                                                                                                                        478358738       tikv_task:{proc max:891ms, min:169ms, p80:568ms, p95:643ms, iters:469212, tasks:456}                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          N/A             N/A
            └─TableRowIDScan_7  cop[tikv]       1.01            table:tbl_device_send_cmd_log, keep order:false                                                                                                                                                                0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             N/A             N/A
                  Plan_digest: 3cdd8c63b2c7ae87f199bc9169f173bd4fe22337c634f361182fc532a63fd9b6
                    Prev_stmt: 
                        Query: UPDATE `tbl_device_send_cmd_log` SET `device_code`=?,`sign_no`=?,`status`=? WHERE uniq_id = ? AND `uniq_id` = ? [arguments: ("ZYBSJ0129423", 4002, 1, "rpc7b57f49908294744a5f1309f39aeac111673538200", "rpc7b57f49908294744a5f1309f39aeac111673538200")];

当我手动执行update sql 的时候是非常快的,执行计划跟慢查询中也不一样 。执行速度是很快的

MySQL [kunpeng_dms_call]> explain  UPDATE `tbl_device_send_cmd_log` SET `device_code`='ZYBSJ0086840',`sign_no`=4002,`status`=1 WHERE uniq_id = 'rpc1be6f4d4e26a4306b1bf45b5c75b65f31673537670' AND `uniq_id` = 'rpc1be6f4d4e26a4306b1bf45b5c75b65f31673537670';

+-------------------+---------+------+-------------------------------------------------------+---------------+
| id                | estRows | task | access object                                         | operator info |
+-------------------+---------+------+-------------------------------------------------------+---------------+
| Update_4          | N/A     | root |                                                       | N/A           |
| └─Point_Get_6     | 1.00    | root | table:tbl_device_send_cmd_log, index:PRIMARY(uniq_id) |               |
+-------------------+---------+------+-------------------------------------------------------+---------------+
2 rows in set (0.00 sec)

MySQL [kunpeng_dms_call]> 
MySQL [kunpeng_dms_call]> 
MySQL [kunpeng_dms_call]> UPDATE `tbl_device_send_cmd_log` SET `device_code`='ZYBSJ0086840',`sign_no`=4002,`status`=1 WHERE uniq_id = 'rpc1be6f4d4e26a4306b1bf45b5c75b65f31673537670' AND `uniq_id` = 'rpc1be6f4d4e26a4306b1bf45b5c75b65f31673537670';
Query OK, 0 rows affected (0.01 sec)
Rows matched: 1  Changed: 0  Warnings: 0

update kunpeng_dms_call . tbl_device_send_cmd_log set device_code = ? , sign_no = ? , status = ? where uniq_id = ? and uniq_id = ?你这个sql我想不到什么场景会用到,你uniq_id判断两次,这两次的值一样不就相当于一次,要是不一样不就啥也更新不了?这上面业务场景?你写两次,执行计划就走到索引全扫描了,如果只写一次,就是点查啊?
explain UPDATE tbl_device_send_cmd_log SET device_code=‘ZYBSJ0086840’,sign_no=4002,status=1 WHERE uniq_id = ‘rpc1be6f4d4e26a4306b1bf45b5c75b65f31673537670’ AND uniq_id = ‘rpc1be6f4d4e26a4306b1bf45b5c75b65f31673537670’;你这样也是点查,但是你
explain UPDATE tbl_device_send_cmd_log SET device_code=‘ZYBSJ0086840’,sign_no=4002,status=1 WHERE uniq_id = ‘rpc1be6f4d4e26a4306b1bf45b5c75b65f31673537670’ AND uniq_id = ‘rpc1be6f4d4e26a4306b1bf45b5c75b65f31673537671’;你这样他也是索引全扫描吧?

你下面的sql 是不是写重复了,没太看明白。
你看我上面贴的执行计划,我自己手动跑的话如果是两个uniq_id = xxx ,执行计划也是点查,速度很快。
但是如果是业务跑起来,慢查询里看到的就是索引全扫描了。

即使业务sql 写的不太对,但是两个uniq_id 也不应该扫描全索引吧 ?

我今天先让业务把冗余的uniq_id 去掉试试。

不是,我的意思是你传的两个 uniq_id是一致还是不一致,一致的话写一个就行,不一致的话,根本没必要更新,写两个然后传不一样的值,肯定只能走索引全扫描啊,比方你传rpc1be6f4d4e26a4306b1bf45b5c75b65f31673537670和112312312312312312312312123123123试试

传的值都是一样的,但是执行计划里看着也是走的全表。如下:

                         Plan:  id                      task            estRows         operator info                                                                                                                                                                                                  actRows         execution info                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                memory         disk
        Update_4                root            0               N/A                                                                                                                                                                                                            0               time:2m53.6s, loops:1,                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        0 Bytes         N/A
        └─Selection_10          root            1.01            eq(kunpeng_dms_call.tbl_device_send_cmd_log.uniq_id, "rpc7b57f49908294744a5f1309f39aeac111673538200"), eq(kunpeng_dms_call.tbl_device_send_cmd_log.uniq_id, "rpc7b57f49908294744a5f1309f39aeac111673538200")   0               time:2m53.6s, loops:1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         6.70 KB         N/A
          └─IndexLookUp_9       root            1.01                                                                                                                                                                                                                           0               time:2m53.6s, loops:1,                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        354 Bytes       N/A
            ├─Selection_8       cop[tikv]       1.01            eq(kunpeng_dms_call.tbl_device_send_cmd_log.uniq_id, "rpc7b57f49908294744a5f1309f39aeac111673538200"), eq(kunpeng_dms_call.tbl_device_send_cmd_log.uniq_id, "rpc7b57f49908294744a5f1309f39aeac111673538200")   0               time:2m53.6s, loops:1, cop_task: {num: 456, max: 32.3s, min: 228.5ms, avg: 5.43s, p95: 20.3s, max_proc_keys: 1305408, p95_proc_keys: 1212810, tot_proc: 41m5.1s, tot_wait: 9.79s, rpc_num: 456, rpc_time: 41m15.2s, copr_cache_hit_ratio: 0.00}, tikv_task:{proc max:973ms, min:226ms, p80:661ms, p95:744ms, iters:469212, tasks:456}, scan_detail: {total_process_keys: 478358738, total_process_keys_size: 33906243952, total_keys: 478359688, rocksdb: {delete_skipped_count: 310, key_skipped_count: 478359522, block: {cache_hit_count: 518277, read_count: 101, read_byte: 5.02 MB}}}   N/A             N/A
            │ └─IndexFullScan_6 cop[tikv]       1709241802      table:tbl_device_send_cmd_log, index:PRIMARY(uniq_id), keep order:false                                                                                                                                        478358738       tikv_task:{proc max:891ms, min:169ms, p80:568ms, p95:643ms, iters:469212, tasks:456}                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          N/A             N/A
            └─TableRowIDScan_7  cop[tikv]       1.01            table:tbl_device_send_cmd_log, keep order:false                                                                                                                                                                0                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             N/A             N/A
                  Plan_digest: 3cdd8c63b2c7ae87f199bc9169f173bd4fe22337c634f361182fc532a63fd9b6
                    Prev_stmt: 
                        Query: UPDATE `tbl_device_send_cmd_log` SET `device_code`=?,`sign_no`=?,`status`=? WHERE uniq_id = ? AND `uniq_id` = ? [arguments: ("ZYBSJ0129423", 4002, 1, "rpc7b57f49908294744a5f1309f39aeac111673538200", "rpc7b57f49908294744a5f1309f39aeac111673538200")];

问题已解决:
研发改写sql 去掉多余的uniq_id,重新上线后,未出现之前的问题。改写后的SQL如下:
UPDATE tbl_device_send_cmd_log SET device_code=‘xxxxxx’,sign_no=4002,status=1 WHERE uniq_id = ‘xxxxxx’ ;

在v4 v5版本 也是范围表扫 ,改成IN 才能点查

问题解决了 下次记得用积分送个键盘来