为提高效率,提问时请提供以下信息,问题描述清晰可优先响应。
- 【TiDB 版本】:V2.1.15
- 【问题描述】:
集群不常报此问题,下图:
根据文档,做了排查:
1.执行如下命令, 找到 gc leader 对应的 tidb-server;
select VARIABLE_VALUE from mysql.tidb where VARIABLE_NAME=“tikv_gc_leader_desc”
2.查看该 tidb-server 的日志,grep gc_worker tidb.log;
[2020/03/30 16:29:58.130 +08:00] [INFO] [gc_worker.go:226] ["[gc worker] there's already a gc job running, skipped"] ["leaderTick on"=5c39eb9fc240005]
[2020/03/30 16:30:57.734 +08:00] [INFO] [gc_worker.go:226] ["[gc worker] there's already a gc job running, skipped"] ["leaderTick on"=5c39eb9fc240005]
[2020/03/30 16:31:58.195 +08:00] [INFO] [gc_worker.go:226] ["[gc worker] there's already a gc job running, skipped"] ["leaderTick on"=5c39eb9fc240005]
[2020/03/30 16:32:57.721 +08:00] [INFO] [gc_worker.go:226] ["[gc worker] there's already a gc job running, skipped"] ["leaderTick on"=5c39eb9fc240005]
[2020/03/30 16:33:58.155 +08:00] [INFO] [gc_worker.go:226] ["[gc worker] there's already a gc job running, skipped"] ["leaderTick on"=5c39eb9fc240005]
[2020/03/30 16:34:57.717 +08:00] [INFO] [gc_worker.go:226] ["[gc worker] there's already a gc job running, skipped"] ["leaderTick on"=5c39eb9fc240005]
[2020/03/30 16:35:57.731 +08:00] [INFO] [gc_worker.go:226] ["[gc worker] there's already a gc job running, skipped"] ["leaderTick on"=5c39eb9fc240005]
[2020/03/30 16:36:57.903 +08:00] [INFO] [gc_worker.go:226] ["[gc worker] there's already a gc job running, skipped"] ["leaderTick on"=5c39eb9fc240005]
[2020/03/30 16:37:57.783 +08:00] [INFO] [gc_worker.go:226] ["[gc worker] there's already a gc job running, skipped"] ["leaderTick on"=5c39eb9fc240005]
[2020/03/30 16:38:31.548 +08:00] [INFO] [gc_worker.go:832] ["[gc worker]"] ["gc in process"=5c39eb9fc240005] [safePoint=415548293214109696] ["successful regions"=89478] ["failed regions"=0] ["total cost time"=1h10m0.204035371s]
[2020/03/30 16:38:57.793 +08:00] [INFO] [gc_worker.go:226] ["[gc worker] there's already a gc job running, skipped"] ["leaderTick on"=5c39eb9fc240005]
[2020/03/30 16:39:58.047 +08:00] [INFO] [gc_worker.go:226] ["[gc worker] there's already a gc job running, skipped"] ["leaderTick on"=5c39eb9fc240005]
[2020/03/30 16:40:57.717 +08:00] [INFO] [gc_worker.go:226] ["[gc worker] there's already a gc job running, skipped"] ["leaderTick on"=5c39eb9fc240005]
[2020/03/30 16:41:57.724 +08:00] [INFO] [gc_worker.go:226] ["[gc worker] there's already a gc job running, skipped"] ["leaderTick on"=5c39eb9fc240005]
[2020/03/30 16:42:57.918 +08:00] [INFO] [gc_worker.go:226] ["[gc worker] there's already a gc job running, skipped"] ["leaderTick on"=5c39eb9fc240005]
[2020/03/30 16:43:57.799 +08:00] [INFO] [gc_worker.go:226] ["[gc worker] there's already a gc job running, skipped"] ["leaderTick on"=5c39eb9fc240005]
[2020/03/30 16:44:57.797 +08:00] [INFO] [gc_worker.go:226] ["[gc worker] there's already a gc job running, skipped"] ["leaderTick on"=5c39eb9fc240005]
[2020/03/30 16:45:58.157 +08:00] [INFO] [gc_worker.go:226] ["[gc worker] there's already a gc job running, skipped"] ["leaderTick on"=5c39eb9fc240005]
[2020/03/30 16:46:57.805 +08:00] [INFO] [gc_worker.go:226] ["[gc worker] there's already a gc job running, skipped"] ["leaderTick on"=5c39eb9fc240005]
[2020/03/30 16:47:58.090 +08:00] [INFO] [gc_worker.go:226] ["[gc worker] there's already a gc job running, skipped"] ["leaderTick on"=5c39eb9fc240005]
[2020/03/30 16:48:32.538 +08:00] [INFO] [gc_worker.go:832] ["[gc worker]"] ["gc in process"=5c39eb9fc240005] [safePoint=415548293214109696] ["successful regions"=113717] ["failed regions"=0] ["total cost time"=1h20m1.194416206s]
[2020/03/30 16:48:57.712 +08:00] [INFO] [gc_worker.go:226] ["[gc worker] there's already a gc job running, skipped"] ["leaderTick on"=5c39eb9fc240005]
[2020/03/30 16:49:58.214 +08:00] [INFO] [gc_worker.go:226] ["[gc worker] there's already a gc job running, skipped"] ["leaderTick on"=5c39eb9fc240005]
[2020/03/30 16:50:57.728 +08:00] [INFO] [gc_worker.go:226] ["[gc worker] there's already a gc job running, skipped"] ["leaderTick on"=5c39eb9fc240005]
[2020/03/30 16:51:58.252 +08:00] [INFO] [gc_worker.go:226] ["[gc worker] there's already a gc job running, skipped"] ["leaderTick on"=5c39eb9fc240005]
[2020/03/30 16:52:57.717 +08:00] [INFO] [gc_worker.go:226] ["[gc worker] there's already a gc job running, skipped"] ["leaderTick on"=5c39eb9fc240005]
[2020/03/30 16:53:58.157 +08:00] [INFO] [gc_worker.go:226] ["[gc worker] there's already a gc job running, skipped"] ["leaderTick on"=5c39eb9fc240005]
[2020/03/30 16:54:57.714 +08:00] [INFO] [gc_worker.go:226] ["[gc worker] there's already a gc job running, skipped"] ["leaderTick on"=5c39eb9fc240005]
[2020/03/30 16:55:57.713 +08:00] [INFO] [gc_worker.go:226] ["[gc worker] there's already a gc job running, skipped"] ["leaderTick on"=5c39eb9fc240005]
[2020/03/30 16:56:58.026 +08:00] [INFO] [gc_worker.go:226] ["[gc worker] there's already a gc job running, skipped"] ["leaderTick on"=5c39eb9fc240005]
[2020/03/30 16:57:57.714 +08:00] [INFO] [gc_worker.go:226] ["[gc worker] there's already a gc job running, skipped"] ["leaderTick on"=5c39eb9fc240005]
[2020/03/30 16:58:31.816 +08:00] [INFO] [gc_worker.go:832] ["[gc worker]"] ["gc in process"=5c39eb9fc240005] [safePoint=415548293214109696] ["successful regions"=116964] ["failed regions"=0] ["total cost time"=1h30m0.472121066s]
[2020/03/30 16:58:58.094 +08:00] [INFO] [gc_worker.go:226] ["[gc worker] there's already a gc job running, skipped"] ["leaderTick on"=5c39eb9fc240005]
[2020/03/30 16:59:57.762 +08:00] [INFO] [gc_worker.go:226] ["[gc worker] there's already a gc job running, skipped"] ["leaderTick on"=5c39eb9fc240005]
[2020/03/30 17:00:57.984 +08:00] [INFO] [gc_worker.go:226] ["[gc worker] there's already a gc job running, skipped"] ["leaderTick on"=5c39eb9fc240005]
[2020/03/30 17:01:57.728 +08:00] [INFO] [gc_worker.go:226] ["[gc worker] there's already a gc job running, skipped"] ["leaderTick on"=5c39eb9fc240005]
[2020/03/30 17:02:58.189 +08:00] [INFO] [gc_worker.go:226] ["[gc worker] there's already a gc job running, skipped"] ["leaderTick on"=5c39eb9fc240005]
[2020/03/30 17:03:57.727 +08:00] [INFO] [gc_worker.go:226] ["[gc worker] there's already a gc job running, skipped"] ["leaderTick on"=5c39eb9fc240005]
[2020/03/30 17:04:58.056 +08:00] [INFO] [gc_worker.go:226] ["[gc worker] there's already a gc job running, skipped"] ["leaderTick on"=5c39eb9fc240005]
重复命令2,找了很久,都没有找到满足如下条件的现象:
如果发现这段时间一直在 resolve locks(最后一条日志是 start resolve locks ) 或者 delete ranges(最后一条日志是 start delete {number} ranges) 说明是正常现象
目前服务正常,想知道 这个报警正确的处理方式是啥?