TiKV GC can not work

为提高效率,提问时请提供以下信息,问题描述清晰可优先响应。

根据文档,做了排查:

1.执行如下命令, 找到 gc leader 对应的 tidb-server;

 select VARIABLE_VALUE from mysql.tidb where VARIABLE_NAME=“tikv_gc_leader_desc”

2.查看该 tidb-server 的日志,grep gc_worker tidb.log;

[2020/03/30 16:29:58.130 +08:00] [INFO] [gc_worker.go:226] ["[gc worker] there's already a gc job running, skipped"] ["leaderTick on"=5c39eb9fc240005]
[2020/03/30 16:30:57.734 +08:00] [INFO] [gc_worker.go:226] ["[gc worker] there's already a gc job running, skipped"] ["leaderTick on"=5c39eb9fc240005]
[2020/03/30 16:31:58.195 +08:00] [INFO] [gc_worker.go:226] ["[gc worker] there's already a gc job running, skipped"] ["leaderTick on"=5c39eb9fc240005]
[2020/03/30 16:32:57.721 +08:00] [INFO] [gc_worker.go:226] ["[gc worker] there's already a gc job running, skipped"] ["leaderTick on"=5c39eb9fc240005]
[2020/03/30 16:33:58.155 +08:00] [INFO] [gc_worker.go:226] ["[gc worker] there's already a gc job running, skipped"] ["leaderTick on"=5c39eb9fc240005]
[2020/03/30 16:34:57.717 +08:00] [INFO] [gc_worker.go:226] ["[gc worker] there's already a gc job running, skipped"] ["leaderTick on"=5c39eb9fc240005]
[2020/03/30 16:35:57.731 +08:00] [INFO] [gc_worker.go:226] ["[gc worker] there's already a gc job running, skipped"] ["leaderTick on"=5c39eb9fc240005]
[2020/03/30 16:36:57.903 +08:00] [INFO] [gc_worker.go:226] ["[gc worker] there's already a gc job running, skipped"] ["leaderTick on"=5c39eb9fc240005]
[2020/03/30 16:37:57.783 +08:00] [INFO] [gc_worker.go:226] ["[gc worker] there's already a gc job running, skipped"] ["leaderTick on"=5c39eb9fc240005]
[2020/03/30 16:38:31.548 +08:00] [INFO] [gc_worker.go:832] ["[gc worker]"] ["gc in process"=5c39eb9fc240005] [safePoint=415548293214109696] ["successful regions"=89478] ["failed regions"=0] ["total cost time"=1h10m0.204035371s]
[2020/03/30 16:38:57.793 +08:00] [INFO] [gc_worker.go:226] ["[gc worker] there's already a gc job running, skipped"] ["leaderTick on"=5c39eb9fc240005]
[2020/03/30 16:39:58.047 +08:00] [INFO] [gc_worker.go:226] ["[gc worker] there's already a gc job running, skipped"] ["leaderTick on"=5c39eb9fc240005]
[2020/03/30 16:40:57.717 +08:00] [INFO] [gc_worker.go:226] ["[gc worker] there's already a gc job running, skipped"] ["leaderTick on"=5c39eb9fc240005]
[2020/03/30 16:41:57.724 +08:00] [INFO] [gc_worker.go:226] ["[gc worker] there's already a gc job running, skipped"] ["leaderTick on"=5c39eb9fc240005]
[2020/03/30 16:42:57.918 +08:00] [INFO] [gc_worker.go:226] ["[gc worker] there's already a gc job running, skipped"] ["leaderTick on"=5c39eb9fc240005]
[2020/03/30 16:43:57.799 +08:00] [INFO] [gc_worker.go:226] ["[gc worker] there's already a gc job running, skipped"] ["leaderTick on"=5c39eb9fc240005]
[2020/03/30 16:44:57.797 +08:00] [INFO] [gc_worker.go:226] ["[gc worker] there's already a gc job running, skipped"] ["leaderTick on"=5c39eb9fc240005]
[2020/03/30 16:45:58.157 +08:00] [INFO] [gc_worker.go:226] ["[gc worker] there's already a gc job running, skipped"] ["leaderTick on"=5c39eb9fc240005]
[2020/03/30 16:46:57.805 +08:00] [INFO] [gc_worker.go:226] ["[gc worker] there's already a gc job running, skipped"] ["leaderTick on"=5c39eb9fc240005]
[2020/03/30 16:47:58.090 +08:00] [INFO] [gc_worker.go:226] ["[gc worker] there's already a gc job running, skipped"] ["leaderTick on"=5c39eb9fc240005]
[2020/03/30 16:48:32.538 +08:00] [INFO] [gc_worker.go:832] ["[gc worker]"] ["gc in process"=5c39eb9fc240005] [safePoint=415548293214109696] ["successful regions"=113717] ["failed regions"=0] ["total cost time"=1h20m1.194416206s]
[2020/03/30 16:48:57.712 +08:00] [INFO] [gc_worker.go:226] ["[gc worker] there's already a gc job running, skipped"] ["leaderTick on"=5c39eb9fc240005]
[2020/03/30 16:49:58.214 +08:00] [INFO] [gc_worker.go:226] ["[gc worker] there's already a gc job running, skipped"] ["leaderTick on"=5c39eb9fc240005]
[2020/03/30 16:50:57.728 +08:00] [INFO] [gc_worker.go:226] ["[gc worker] there's already a gc job running, skipped"] ["leaderTick on"=5c39eb9fc240005]
[2020/03/30 16:51:58.252 +08:00] [INFO] [gc_worker.go:226] ["[gc worker] there's already a gc job running, skipped"] ["leaderTick on"=5c39eb9fc240005]
[2020/03/30 16:52:57.717 +08:00] [INFO] [gc_worker.go:226] ["[gc worker] there's already a gc job running, skipped"] ["leaderTick on"=5c39eb9fc240005]
[2020/03/30 16:53:58.157 +08:00] [INFO] [gc_worker.go:226] ["[gc worker] there's already a gc job running, skipped"] ["leaderTick on"=5c39eb9fc240005]
[2020/03/30 16:54:57.714 +08:00] [INFO] [gc_worker.go:226] ["[gc worker] there's already a gc job running, skipped"] ["leaderTick on"=5c39eb9fc240005]
[2020/03/30 16:55:57.713 +08:00] [INFO] [gc_worker.go:226] ["[gc worker] there's already a gc job running, skipped"] ["leaderTick on"=5c39eb9fc240005]
[2020/03/30 16:56:58.026 +08:00] [INFO] [gc_worker.go:226] ["[gc worker] there's already a gc job running, skipped"] ["leaderTick on"=5c39eb9fc240005]
[2020/03/30 16:57:57.714 +08:00] [INFO] [gc_worker.go:226] ["[gc worker] there's already a gc job running, skipped"] ["leaderTick on"=5c39eb9fc240005]
[2020/03/30 16:58:31.816 +08:00] [INFO] [gc_worker.go:832] ["[gc worker]"] ["gc in process"=5c39eb9fc240005] [safePoint=415548293214109696] ["successful regions"=116964] ["failed regions"=0] ["total cost time"=1h30m0.472121066s]
[2020/03/30 16:58:58.094 +08:00] [INFO] [gc_worker.go:226] ["[gc worker] there's already a gc job running, skipped"] ["leaderTick on"=5c39eb9fc240005]
[2020/03/30 16:59:57.762 +08:00] [INFO] [gc_worker.go:226] ["[gc worker] there's already a gc job running, skipped"] ["leaderTick on"=5c39eb9fc240005]
[2020/03/30 17:00:57.984 +08:00] [INFO] [gc_worker.go:226] ["[gc worker] there's already a gc job running, skipped"] ["leaderTick on"=5c39eb9fc240005]
[2020/03/30 17:01:57.728 +08:00] [INFO] [gc_worker.go:226] ["[gc worker] there's already a gc job running, skipped"] ["leaderTick on"=5c39eb9fc240005]
[2020/03/30 17:02:58.189 +08:00] [INFO] [gc_worker.go:226] ["[gc worker] there's already a gc job running, skipped"] ["leaderTick on"=5c39eb9fc240005]
[2020/03/30 17:03:57.727 +08:00] [INFO] [gc_worker.go:226] ["[gc worker] there's already a gc job running, skipped"] ["leaderTick on"=5c39eb9fc240005]
[2020/03/30 17:04:58.056 +08:00] [INFO] [gc_worker.go:226] ["[gc worker] there's already a gc job running, skipped"] ["leaderTick on"=5c39eb9fc240005]

重复命令2,找了很久,都没有找到满足如下条件的现象:

如果发现这段时间一直在 resolve locks(最后一条日志是 start resolve locks ) 或者 delete ranges(最后一条日志是 start delete {number} ranges) 说明是正常现象

目前服务正常,想知道 这个报警正确的处理方式是啥?

您好:

 1. 请执行一下select VARIABLE_NAME, VARIABLE_VALUE from mysql.tidb;看下当前的信息. 
 2. [2020/03/30 16:58:31.816 +08:00] [INFO] [gc_worker.go:832] ["[gc worker]"] ["gc in process"=5c39eb9fc240005] [safePoint=415548293214109696] ["successful regions"=116964] ["failed regions"=0] ["total cost time"=1h30m0.472121066s]  从日志看,当前一直有一个gc进程在运行,还没有结束. 
3.  请观察successful regions数量是否一直在增长,应该就是正常的

您好:

   1.  从规则看,gc life time 保留了100h,有可能会影响集群的性能

2. 或者是之前批量删除了很多数据,导致gc长时间没有结束 3. 3.0之后gc已经改为了多线程,建议升级到3.0以上版本.

您好:

  1. 请问gc成功了吗?   当前region总数有多少?
  2. 对于resolve查找,可以查找告警时间点,6小时前的日志信息,多谢

gc应该是正常的

好的,有问题,我们再继续查看,多谢