V8.5.3 plan replayer handle collect tasks failed tikv:9006

一个好的问题描述有利于社区小伙伴更快帮你定位到问题,高效解决你的问题

【TiDB 使用环境】测试环境
【TiDB 版本】V8.5.3
【部署方式】虚拟机
【操作系统/CPU 架构/芯片详情】
【机器部署详情】CPU大小/内存大小/磁盘大小
【集群数据量】
【集群节点数】1PD+1KV+1TIDB
【问题复现路径】做过哪些操作出现的问题
【遇到的问题:问题现象及影响】
每隔10s,tidb日志就会报GC的报错。
[“plan replayer handle collect tasks failed”] [error=“[tikv:9006]GC life time is shorter than transaction duration, transaction starts at 2026-01-13 17:18:21.451 +0800 CST, GC safe point is 2026-01-14 11:34:53.463 +0800 CST”]
重启tidb,一段时间后也会继续告警。
【资源配置】进入到 TiDB Dashboard -集群信息 (Cluster Info) -主机(Hosts) 截图此页面
【复制黏贴 ERROR 报错的日志】

[2026/01/14 11:52:42.847 +08:00] [WARN] [domain.go:2241] [“plan replayer handle collect tasks failed”] [error=“[tikv:9006]GC life time is shorter than transaction duration, transaction starts at 2026-01-13 17:18:21.451 +0800 CST, GC safe point is 2026-01-14 11:34:53.463 +0800 CST”]
[2026/01/14 11:52:52.770 +08:00] [INFO] [advancer.go:668] [“No tasks yet, skipping advancing.”]
[2026/01/14 11:52:52.848 +08:00] [WARN] [domain.go:2241] [“plan replayer handle collect tasks failed”] [error=“[tikv:9006]GC life time is shorter than transaction duration, transaction starts at 2026-01-13 17:18:21.451 +0800 CST, GC safe point is 2026-01-14 11:34:53.463 +0800 CST”]
[2026/01/14 11:53:02.845 +08:00] [WARN] [domain.go:2241] [“plan replayer handle collect tasks failed”] [error=“[tikv:9006]GC life time is shorter than transaction duration, transaction starts at 2026-01-13 17:18:21.451 +0800 CST, GC safe point is 2026-01-14 11:34:53.463 +0800 CST”]
[2026/01/14 11:53:04.769 +08:00] [INFO] [advancer.go:668] [“No tasks yet, skipping advancing.”]
[2026/01/14 11:53:12.847 +08:00] [WARN] [domain.go:2241] [“plan replayer handle collect tasks failed”] [error=“[tikv:9006]GC life time is shorter than transaction duration, transaction starts at 2026-01-13 17:18:21.451 +0800 CST, GC safe point is 2026-01-14 11:34:53.463 +0800 CST”]
[2026/01/14 11:53:16.769 +08:00] [INFO] [advancer.go:668] [“No tasks yet, skipping advancing.”]
[2026/01/14 11:53:22.848 +08:00] [WARN] [domain.go:2241] [“plan replayer handle collect tasks failed”] [error=“[tikv:9006]GC life time is shorter than transaction duration, transaction starts at 2026-01-13 17:18:21.451 +0800 CST, GC safe point is 2026-01-14 11:34:53.463 +0800 CST”]
[2026/01/14 11:53:28.769 +08:00] [INFO] [advancer.go:668] [“No tasks yet, skipping advancing.”]
[2026/01/14 11:53:32.847 +08:00] [WARN] [domain.go:2241] [“plan replayer handle collect tasks failed”] [error=“[tikv:9006]GC life time is shorter than transaction duration, transaction starts at 2026-01-13 17:18:21.451 +0800 CST, GC safe point is 2026-01-14 11:34:53.463 +0800 CST”]
[2026/01/14 11:53:40.769 +08:00] [INFO] [advancer.go:668] [“No tasks yet, skipping advancing.”]
[2026/01/14 11:53:42.846 +08:00] [WARN] [domain.go:2241] [“plan replayer handle collect tasks failed”] [error=“[tikv:9006]GC life time is shorter than transaction duration, transaction starts at 2026-01-13 17:18:21.451 +0800 CST, GC safe point is 2026-01-14 11:34:53.463 +0800 CST”]

【其他附件:截图/日志/监控】
mysql> select *from information_schema.tidb_trx;
Empty set (0.00 sec)

mysql> SELECT * FROM mysql.plan_replayer_task;
Empty set (0.00 sec)

mysql> SELECT * FROM mysql.plan_replayer_status;
Empty set (0.01 sec)

mysql> select @@tidb_enable_plan_replayer_capture;
±------------------------------------+
| @@tidb_enable_plan_replayer_capture |
±------------------------------------+
| 0 |
±------------------------------------+
1 row in set (0.01 sec)

mysql> select version();
±-------------------+
| version() |
±-------------------+
| 8.0.11-TiDB-v8.5.3 |
±-------------------+
1 row in set (0.00 sec)

1PD+1KV+1TIDB

1个kv 节点么?数据副本设的几个?

应该是TiDB 的 GC安全点推进速度超过了某个事务的存活时间,可以临时调整一下GC的生命周期试一下

默认的,没有修改过。
| pd | 192.168.222.129:2379 | replication.max-replicas | 3 |

调整完还是报错。
| tidb_gc_life_time | 100h0m0s |

[2026/01/14 17:22:02.551 +08:00] [WARN] [domain.go:2241] [“plan replayer handle collect tasks failed”] [error=“[tikv:9006]GC life time is shorter than transaction duration, transaction starts at 2026-01-14 13:36:12.814 +0800 CST, GC safe point is 2026-01-14 16:53:02.39 +0800 CST”]
[2026/01/14 17:22:02.596 +08:00] [INFO] [gc_worker.go:705] [“last safe point is later than current one.No need to gc.This might be caused by manually enlarging gc lifetime”] [category=“gc worker”] [“leaderTick on”=66eec03f3f40008] [“last safe point”=2026/01/14 16:53:02.390 +08:00] [“current safe point”=2026/01/10 13:22:02.839 +08:00]
[2026/01/14 17:22:12.550 +08:00] [WARN] [domain.go:2241] [“plan replayer handle collect tasks failed”] [error=“[tikv:9006]GC life time is shorter than transaction duration, transaction starts at 2026-01-14 13:36:12.814 +0800 CST, GC safe point is 2026-01-14 16:53:02.39 +0800 CST”]
[2026/01/14 17:22:14.463 +08:00] [INFO] [advancer.go:668] [“No tasks yet, skipping advancing.”]

那关闭 plan replayer 后台任务试试呢

那节点数不够啦,得先补齐至少 3节点的 tikv

你有一个超长运行的事务 (从 1 月 13 日 17:18 到至少 1 月 14 日 11:34 之后还在运行),而 TiDB 的 tikv_gc_life_time 设置得太短(默认是 10 分钟 ,但生产环境常设为 10m ~ 1h ,甚至更长)。

当 GC 线程推进 safe point 超过了事务的 start_ts 对应的时间戳,该事务就“失效”了。

并没有跑plan replayer的任务吧。

没有找到在运行的事物。
而且,tidb重启后,该事物就是tidb的启动时间,但是没找到运行的事物。

1个节点不是也可以正常跑的么?不是必须得3节点的吧。

TiDB的GC特性在实际项目中很有价值,特别是在实时分析查询场景下。 个人经验仅供参考。

大事务太多了吗

检查下,像是TiKV 的 GC(垃圾回收)生命周期配置过短导致的
config set gc.life_time 1h

个人的测试环境,实际上并没有跑东西,也没有事务。
mysql> select *from information_schema.tidb_trx;

Empty set (0.01 sec)

mysql>
mysql> show processlist;
±-----------±-----±----------------------±-----±--------±-----±------±-----------------+
| Id | User | Host | db | Command | Time | State | Info |
±-----------±-----±----------------------±-----±--------±-----±------±-----------------+
| 2522874010 | root | 192.168.222.129:47166 | NULL | Query | 0 | | show processlist |
±-----------±-----±----------------------±-----±--------±-----±------±-----------------+

结贴了。
将autocommit设置为off,然后重启tidb,该告警就消失了。(tidb重启是必须的
[domain.go:2226] [“PlanReplayerTaskCollectHandle started”],猜测是不是该任务会开启个事务,然后没有提交???,10秒一检测??