tikv failed to update max timestamp for region xx get timestamp timeout 问题

Inkjade · 2024 年4 月 1 日 06:33

【 TiDB 使用环境】测试环境
【 TiDB 版本】6.5.2
【复现路径】tidb 集群没有磁盘空间重启tidb 集群
【遇到的问题：问题现象及影响】
tikv.log 不断输出如下日志：集群不可用
[2024/03/31 23:10:28.240 +08:00] [WARN] [pd.rs:1707] [“failed to update max timestamp for region 261033: Pd(Other("[components/pd_client/src/client.rs:981]: get timestamp timeout"))”]
[2024/03/31 23:10:28.240 +08:00] [WARN] [pd.rs:1707] [“failed to update max timestamp for region 261035: Pd(Other("[components/pd_client/src/client.rs:981]: get timestamp timeout"))”]
[2024/03/31 23:10:28.240 +08:00] [WARN] [pd.rs:1707] [“failed to update max timestamp for region 261031: Pd(Other("[components/pd_client/src/client.rs:981]: get timestamp timeout"))”]
[2024/03/31 23:10:28.240 +08:00] [WARN] [pd.rs:1707] [“failed to update max timestamp for region 261029: Pd(Other("[components/pd_client/src/client.rs:981]: get timestamp timeout"))”]
[2024/03/31 23:10:28.240 +08:00] [WARN] [pd.rs:1707] [“failed to update max timestamp for region 261025: Pd(Other("[components/pd_client/src/client.rs:981]: get timestamp timeout"))”]
[2024/03/31 23:10:28.240 +08:00] [WARN] [pd.rs:1707] [“failed to update max timestamp for region 261027: Pd(Other("[components/pd_client/src/client.rs:981]: get timestamp timeout"))”]
[2024/03/31 23:10:28.240 +08:00] [WARN] [pd.rs:1707] [“failed to update max timestamp for region 261023: Pd(Other("[components/pd_client/src/client.rs:981]: get timestamp timeout"))”]
[2024/03/31 23:10:28.240 +08:00] [WARN] [pd.rs:1707] [“failed to update max timestamp for region 261021: Pd(Other("[components/pd_client/src/client.rs:981]: get timestamp timeout"))”]
[2024/03/31 23:10:28.240 +08:00] [WARN] [pd.rs:1707] [“failed to update max timestamp for region 261019: Pd(Other("[components/pd_client/src/client.rs:981]: get timestamp timeout"))”]
[2024/03/31 23:10:28.240 +08:00] [WARN] [pd.rs:1707] [“failed to update max timestamp for region 261017: Pd(Other("[components/pd_client/src/client.rs:981]: get timestamp timeout"))”]
[2024/03/31 23:10:28.240 +08:00] [WARN] [pd.rs:1707] [“failed to update max timestamp for region 261015: Pd(Other("[components/pd_client/src/client.rs:981]: get timestamp timeout"))”]
[2024/03/31 23:10:28.240 +08:00] [WARN] [pd.rs:1707] [“failed to update max timestamp for region 261013: Pd(Other("[components/pd_client/src/client.rs:981]: get timestamp timeout"))”]
[2024/03/31 23:10:28.240 +08:00] [WARN] [pd.rs:1707] [“failed to update max timestamp for region 261011: Pd(Other("[components/pd_client/src/client.rs:981]: get timestamp timeout"))”]
[2024/03/31 23:10:28.240 +08:00] [WARN] [pd.rs:1707] [“failed to update max timestamp for region 261009: Pd(Other("[components/pd_client/src/client.rs:981]: get timestamp timeout"))”]
[2024/03/31 23:10:28.240 +08:00] [WARN] [pd.rs:1707] [“failed to update max timestamp for region 261007: Pd(Other("[components/pd_client/src/client.rs:981]: get timestamp timeout"))”]

首先：清除tikv清除日志。释放磁盘空间。

尝试方法1：通过
– 获取机器id
cat /data/tidb-deploy/pd-2379/log/pd.log |grep ‘init cluster id’

[2024/03/31 13:25:37.607 +08:00] [INFO] [server.go:384] [“init cluster id”] [cluster-id=7166168149192488053]

– 获取已分配 ID
[webapp@lg-test-shuabao log]$ cat /data/tidb-deploy/pd-2379/log/pd.log| grep “idAllocator allocates a new id” | awk -F’=’ ‘{print $2}’ | awk -F’]’ ‘{print $1}’ | sort -r -n | head -n 1
1417000

重建pd
./pd-recover -endpoints http://127.0.0.1:2379 -cluster-id 7166168149192488053 -alloc-id 1417000

错误依然存在
[2024/03/31 23:10:28.240 +08:00] [WARN] [pd.rs:1707] [“failed to update max timestamp for region 261033: Pd(Other("[components/pd_client/src/client.rs:981]: get timestamp timeout"))”]

通过获取 tso 无异常
https://docs.pingcap.com/zh/tidb/stable/tso#tidb-中的-timestamp-oracle-tso

尝试方法2
尝试升级小版本从6.5.2 升级到 6.5.8 重启集群问题依旧。
启动集群：tikv节点疯狂数据日志
[2024/03/31 23:10:28.240 +08:00] [WARN] [pd.rs:1707] [“failed to update max timestamp for region 261007: Pd(Other("[components/pd_client/src/client.rs:981]: get timestamp timeout"))”]

【资源配置】进入到 TiDB Dashboard -集群信息 (Cluster Info) -主机(Hosts) 截图此页面
【附件：截图/日志/监控】
同一个虚拟机混合部署 tidb pd 和tikv

Billmay表妹 · 2024 年4 月 1 日 06:39

先扩容 2 个kv 看看是不是集群正常了

Billmay表妹 · 2024 年4 月 1 日 06:44

虚拟机资源如何？

Inkjade · 2024 年4 月 1 日 06:50

最高峰值峰值 CPU 使用 7个核心

像风一样的男子 · 2024 年4 月 1 日 06:58

集群不可用的表现是什么？

Inkjade · 2024 年4 月 1 日 07:03

因为是混合部署，tikv疯狂输出日志，最高峰值700% cpu使用率，当前测试机器 12u的，IOPS很高，大概1分钟会输出 4G的日志文件。查询时延搞，一个简单的select 的语句，需要将近2s.

日志文件截图：

日志内容：
[2024/04/01 00:48:04.289 +08:00] [WARN] [pd.rs:1707] [“failed to update max timestamp for region 732777: Pd(Other("[components/pd_client/src/client.rs:981]: get timestamp timeout"))”]
[2024/04/01 00:48:04.289 +08:00] [WARN] [pd.rs:1707] [“failed to update max timestamp for region 747465: Pd(Other("[components/pd_client/src/client.rs:981]: get timestamp timeout"))”]
[2024/04/01 00:48:04.289 +08:00] [WARN] [pd.rs:1707] [“failed to update max timestamp for region 733273: Pd(Other("[components/pd_client/src/client.rs:981]: get timestamp timeout"))”]
[2024/04/01 00:48:04.289 +08:00] [WARN] [pd.rs:1707] [“failed to update max timestamp for region 741235: Pd(Other("[components/pd_client/src/client.rs:981]: get timestamp timeout"))”]
[2024/04/01 00:48:04.289 +08:00] [WARN] [pd.rs:1707] [“failed to update max timestamp for region 738057: Pd(Other("[components/pd_client/src/client.rs:981]: get timestamp timeout"))”]
[2024/04/01 00:48:04.289 +08:00] [WARN] [pd.rs:1707] [“failed to update max timestamp for region 741129: Pd(Other("[components/pd_client/src/client.rs:981]: get timestamp timeout"))”]
[2024/04/01 00:48:04.289 +08:00] [WARN] [pd.rs:1707] [“failed to update max timestamp for region 735223: Pd(Other("[components/pd_client/src/client.rs:981]: get timestamp timeout"))”]
[2024/04/01 00:48:04.289 +08:00] [WARN] [pd.rs:1707] [“failed to update max timestamp for region 719255: Pd(Other("[components/pd_client/src/client.rs:981]: get timestamp timeout"))”]
[2024/04/01 00:48:04.289 +08:00] [WARN] [pd.rs:1707] [“failed to update max timestamp for region 734127: Pd(Other("[components/pd_client/src/client.rs:981]: get timestamp timeout"))”]
[2024/04/01 00:48:04.289 +08:00] [WARN] [pd.rs:1707] [“failed to update max timestamp for region 734201: Pd(Other("[components/pd_client/src/client.rs:981]: get timestamp timeout"))”]

没有太多的有价值的日志信息。

tidb菜鸟一只 · 2024 年4 月 1 日 09:58

测试的话，直接用tiup playground吧

TIDB-Learner · 2024 年4 月 1 日 10:33

调整日志级别 error。先不让输出这么大的日志，再试试呢，是不是，磁盘IO成瓶颈了/

changpeng75 · 2024 年4 月 1 日 14:54

各节点时间同步了吗？

xiaoqiao · 2024 年4 月 2 日 00:03

看下granfana各个指标正常么，异常点在哪

xiao · 2024 年4 月 2 日 11:11

请问问题解决了吗？遇到了同样的问题

Inkjade · 2024 年4 月 2 日 14:48

目前是重建了测试的数据库！

xiao · 2024 年4 月 3 日 08:46

好的，谢谢～

mono · 2024 年6 月 19 日 10:07

应该不是时间同步问题。我是同1台机器上跑的服务。遇到了类似问题。由于测试环境，没有监控。怀疑是磁盘满了导致的。把日志清理后，集群服务就工作了。

[2024/06/12 17:51:10.428 +08:00] [WARN] [pd.rs:1707] [“failed to update max timestamp for region 15101: Pd(Other("[components/pd_client/src/tso.rs:97]: TimestampRequest channel is closed"))”]
[2024/06/12 17:51:10.428 +08:00] [WARN] [pd.rs:1707] [“failed to update max timestamp for region 15101: Pd(Other("[components/pd_client/src/tso.rs:97]: TimestampRequest channel is closed"))”]
[2024/06/12 17:51:10.428 +08:00] [WARN] [pd.rs:1707] [“failed to update max timestamp for region 15101: Pd(Other("[components/pd_client/src/tso.rs:97]: TimestampRequest channel is closed"))”]
[2024/06/12 17:51:10.428 +08:00] [WARN] [pd.rs:1707] [“failed to update max timestamp for region 15101: Pd(Other("[components/pd_client/src/tso.rs:97]: TimestampRequest channel is closed"))”]
[2024/06/12 17:51:10.428 +08:00] [WARN] [pd.rs:1707] [“failed to update max timestamp for region 15101: Pd(Other("[components/pd_client/src/tso.rs:97]: TimestampRequest channel is closed"))”]
[2024/06/12 17:51:10.428 +08:00] [WARN] [pd.rs:1707] [“failed to update max timestamp for region 15101: Pd(Other("[components/pd_client/src/tso.rs:97]: TimestampRequest channel is closed"))”]