升级TiDB5.2.3后,一直报TiKV_pending_task(类型为gc-worker),日志显示range delete error

【 TiDB 使用环境】
3tidb + 3tikv + 3pd

【概述】
版本从5.2.2升级到5.2.3之后,gc-worker 一只报警,看起来像是来不及gc, 请教一些该如何处理?

【背景】
之前一只使用的5.2.2,最近看到各个版本都修正了GCKeys的bug,所以升级到了5.2.3

【现象】
监控报警 TiKV_pending_task,类型gc-worker

  1. gc_delete_range一直增加
MySQL [(none)]> select count(*) from mysql.gc_delete_range_done;
+----------+
| count(*) |
+----------+
|      814 |
+----------+
1 row in set (0.00 sec)

MySQL [(none)]> select count(*) from mysql.gc_delete_range;
+----------+
| count(*) |
+----------+
|     1990 |
+----------+
1 row in set (0.00 sec)
  1. tidb.log中出现大量range delete错误, gc workder is too busy
[2022/01/28 11:18:21.036 +09:00] [ERROR] [gc_worker.go:705] ["[gc worker] delete range failed on range"] [uuid=5fa57b4e89c000a] [startKey=7480000000001200a2] [endKey=7480000000001200a3] [error="[gc worker] destroy range finished with errors: [unsafe destroy range failed on store 1: gc worker is too busy unsafe destroy range failed on store 4: gc worker is too busy]"]
[2022/01/28 11:18:21.050 +09:00] [ERROR] [gc_worker.go:705] ["[gc worker] delete range failed on range"] [uuid=5fa57b4e89c000a] [startKey=7480000000001200e9] [endKey=7480000000001200ea] [error="[gc worker] destroy range finished with errors: [unsafe destroy range failed on store 1: gc worker is too busy unsafe destroy range failed on store 4: gc worker is too busy]"]
[2022/01/28 11:18:21.050 +09:00] [INFO] [gc_worker.go:736] ["[gc worker] finish delete ranges"] [uuid=5fa57b4e89c000a] ["num of ranges"=1990] ["cost time"=1.132902063s]
[2022/01/28 11:18:21.078 +09:00] [INFO] [gc_worker.go:759] ["[gc worker] start redo-delete ranges"] [uuid=5fa57b4e89c000a] ["num of ranges"=814]
....
[2022/01/28 11:18:21.519 +09:00] [ERROR] [gc_worker.go:768] ["[gc worker] redo-delete range failed on range"] [uuid=5fa57b4e89c000a] [startKey=74800000000011fb58] [endKey=74800000000011fb59] [error="[gc worker] destroy range finished with errors: [unsafe destroy range failed on store 1: gc worker is too busy unsafe destroy range failed on store 4: gc worker is too busy]"]
[2022/01/28 11:18:21.519 +09:00] [INFO] [gc_worker.go:788] ["[gc worker] finish redo-delete ranges"] [uuid=5fa57b4e89c000a] ["num of ranges"=814] ["cost time"=441.427515ms]
[2022/01/28 11:18:21.523 +09:00] [INFO] [gc_worker.go:1548] ["[gc worker] sent safe point to PD"] [uuid=5fa57b4e89c000a] ["safe point"=430790567310131200]

【业务影响】
业务目前可正常使用

【TiDB 版本】
v5.2.3
GC配置如下

bootstrapped True 
tidb_server_version 72 
system_tz  Asia/Tokyo 
new_collation_enabled True 
tikv_gc_leader_uuid 5fa57b4e89c000a
tikv_gc_leader_lease 20220128-11:19:39 +0900
tikv_gc_enable  true 
tikv_gc_run_interval 10m0s
tikv_gc_life_time 10m0s
tikv_gc_last_run_time 20220128-11:16:39 +0900
tikv_gc_safe_point  20220128-11:06:39 +0900
tikv_gc_auto_concurrency true 
tikv_gc_scan_lock_mode  legacy 
tikv_gc_mode distributed
gc.enable-compaction-filter  true 

升级前后是否修改过 enable-compaction-filter 参数

升级前没有特意设置过这个参数,应该一直是默认值;

另外文档看到 rocksdb的这个这个配置,是否需要设置成true?
https://docs.pingcap.com/zh/tidb/v5.2/tikv-configuration-file#use-delete-range

use-delete-range

  • 开启 rocksdb delete_range 接口删除数据的开关。
  • 默认值:false

应该和 use-delete-range 参数没关系,可以试试关掉 gc compaction filter:

set config tikv gc.enable-compaction-filter = false;

谢谢,我去试试看

非常感谢。设置为false后,range delete都完成了。
那想再问下, gc.enable-compaction-filter 是就保持false好,还是需要换回默认的true?

如果没有大量的 delete 或 update 操作,可以保持 false

不好意思,问下 是没有大量delete/update就保持false?还是反过来?
//我们现在就是每天有大量的更新删除操作才出现gc失败的~

如果没有大量的 delete 或 update 操作,可以保持 false

有大量的更新删除,开启 compaction in filter 可以减少 gc 对负载的影响,之前版本有一些 bug 不建议开启,目前 v5.2.3 没有已知 bug,可以考虑再设置为 true

好的,多谢~

您好,我遇到类似问题,版本5.0.6,尝试着改了gc.enable-compaction-filter 为false之后,出现了不GC的情况。


然后就立马改回来了,现在也是日志报大量的gc work is too busy,然后task那块也全是gc-worker

此话题已在最后回复的 1 分钟后被自动关闭。不再允许新回复。