[FAQ] TiKV_GC_can_not_work 告警 gc safepoint blocked by a running session

[问题澄清]

  1. 现象:监控有告警 TiKV_GC_can_not_work

[问题分析]

  1. 查看 tikv_gc_safe_point 一直没有更新
  2. 查看 tidb 日志 存在 gc safepoint blocked by a running session
  3. 查看 information_schema.processlist 和 mysql.tidb 表看到 safepoint 表被 session block

    [解决方案]
  4. 在日志中找到对应的globalMinStartTS:
    [2020/11/02 18:54:27.461 +08:00] [INFO] [gc_worker.go:359] [“[gc worker] gc safepoint blocked by a running session”] [uuid=5d5e4d5cc240021] [globalMinStartTS=420540966397804571] [safePoint=2020/11/01 21:14:34.601 +08:00]
  5. 在每个 tidb-server 中查找对应的 session 信息:
    SELECT *FROM information_schema.processlist WHERE txnstart LIKE ‘%420564495932063751%’;
  6. 如果需要尽快恢复,和业务确认后,可以 kill 进程,即可恢复

[原因说明]

  1. TiDB 在显示事务中,如果有事务开始后,一直不 commit 或者 rollback,会导致 gc 被 block

[参考案例]

https://asktug.com/t/topic/63313/9

1 个赞

请问您这边是否有提过akstug帖子,麻烦发下链接,多谢。

好的,以后要是有具体案例,可以发个帖子,这样方便其他人查看,多谢了。

补充一下:
grep -r “safepoint” ./tidb.log | grep ‘2021/04’

[2021/04/08 10:28:38.964 +08:00] [INFO] [gc_worker.go:359] ["[gc worker] gc safepoint blocked by a running session"] [uuid=5dfd9ae296c0015] [globalMinStartTS=424086753288388611] [safePoint=2021/04/07 10:29:38.157 +08:00]

SELECT *FROM INFORMATION_SCHEMA.CLUSTER_PROCESSLIST WHERE txnstart = ‘424086753288388611’

一定要查全局的processlist表!!! INFORMATION_SCHEMA.CLUSTER_PROCESSLIST

1 个赞