tikv failed to update max timestamp for region xx get timestamp timeout 问题

【 TiDB 使用环境】测试环境
【 TiDB 版本】6.5.2
【复现路径】tidb 集群没有磁盘空间 重启tidb 集群
【遇到的问题:问题现象及影响】
tikv.log 不断输出 如下日志:集群不可用
[2024/03/31 23:10:28.240 +08:00] [WARN] [pd.rs:1707] [“failed to update max timestamp for region 261033: Pd(Other("[components/pd_client/src/client.rs:981]: get timestamp timeout"))”]
[2024/03/31 23:10:28.240 +08:00] [WARN] [pd.rs:1707] [“failed to update max timestamp for region 261035: Pd(Other("[components/pd_client/src/client.rs:981]: get timestamp timeout"))”]
[2024/03/31 23:10:28.240 +08:00] [WARN] [pd.rs:1707] [“failed to update max timestamp for region 261031: Pd(Other("[components/pd_client/src/client.rs:981]: get timestamp timeout"))”]
[2024/03/31 23:10:28.240 +08:00] [WARN] [pd.rs:1707] [“failed to update max timestamp for region 261029: Pd(Other("[components/pd_client/src/client.rs:981]: get timestamp timeout"))”]
[2024/03/31 23:10:28.240 +08:00] [WARN] [pd.rs:1707] [“failed to update max timestamp for region 261025: Pd(Other("[components/pd_client/src/client.rs:981]: get timestamp timeout"))”]
[2024/03/31 23:10:28.240 +08:00] [WARN] [pd.rs:1707] [“failed to update max timestamp for region 261027: Pd(Other("[components/pd_client/src/client.rs:981]: get timestamp timeout"))”]
[2024/03/31 23:10:28.240 +08:00] [WARN] [pd.rs:1707] [“failed to update max timestamp for region 261023: Pd(Other("[components/pd_client/src/client.rs:981]: get timestamp timeout"))”]
[2024/03/31 23:10:28.240 +08:00] [WARN] [pd.rs:1707] [“failed to update max timestamp for region 261021: Pd(Other("[components/pd_client/src/client.rs:981]: get timestamp timeout"))”]
[2024/03/31 23:10:28.240 +08:00] [WARN] [pd.rs:1707] [“failed to update max timestamp for region 261019: Pd(Other("[components/pd_client/src/client.rs:981]: get timestamp timeout"))”]
[2024/03/31 23:10:28.240 +08:00] [WARN] [pd.rs:1707] [“failed to update max timestamp for region 261017: Pd(Other("[components/pd_client/src/client.rs:981]: get timestamp timeout"))”]
[2024/03/31 23:10:28.240 +08:00] [WARN] [pd.rs:1707] [“failed to update max timestamp for region 261015: Pd(Other("[components/pd_client/src/client.rs:981]: get timestamp timeout"))”]
[2024/03/31 23:10:28.240 +08:00] [WARN] [pd.rs:1707] [“failed to update max timestamp for region 261013: Pd(Other("[components/pd_client/src/client.rs:981]: get timestamp timeout"))”]
[2024/03/31 23:10:28.240 +08:00] [WARN] [pd.rs:1707] [“failed to update max timestamp for region 261011: Pd(Other("[components/pd_client/src/client.rs:981]: get timestamp timeout"))”]
[2024/03/31 23:10:28.240 +08:00] [WARN] [pd.rs:1707] [“failed to update max timestamp for region 261009: Pd(Other("[components/pd_client/src/client.rs:981]: get timestamp timeout"))”]
[2024/03/31 23:10:28.240 +08:00] [WARN] [pd.rs:1707] [“failed to update max timestamp for region 261007: Pd(Other("[components/pd_client/src/client.rs:981]: get timestamp timeout"))”]

首先:清除tikv清除日志。释放磁盘空间。

尝试方法1:通过
– 获取机器id
cat /data/tidb-deploy/pd-2379/log/pd.log |grep ‘init cluster id’

[2024/03/31 13:25:37.607 +08:00] [INFO] [server.go:384] [“init cluster id”] [cluster-id=7166168149192488053]

– 获取已分配 ID
[webapp@lg-test-shuabao log]$ cat /data/tidb-deploy/pd-2379/log/pd.log| grep “idAllocator allocates a new id” | awk -F’=’ ‘{print $2}’ | awk -F’]’ ‘{print $1}’ | sort -r -n | head -n 1
1417000

重建pd
./pd-recover -endpoints http://127.0.0.1:2379 -cluster-id 7166168149192488053 -alloc-id 1417000

错误依然存在
[2024/03/31 23:10:28.240 +08:00] [WARN] [pd.rs:1707] [“failed to update max timestamp for region 261033: Pd(Other("[components/pd_client/src/client.rs:981]: get timestamp timeout"))”]

通过获取 tso 无异常
https://docs.pingcap.com/zh/tidb/stable/tso#tidb-中的-timestamp-oracle-tso

尝试方法2
尝试升级小版本 从6.5.2 升级到 6.5.8 重启集群问题依旧。
启动集群:tikv节点疯狂数据日志
[2024/03/31 23:10:28.240 +08:00] [WARN] [pd.rs:1707] [“failed to update max timestamp for region 261007: Pd(Other("[components/pd_client/src/client.rs:981]: get timestamp timeout"))”]

【资源配置】进入到 TiDB Dashboard -集群信息 (Cluster Info) -主机(Hosts) 截图此页面
【附件:截图/日志/监控】
同一个虚拟机混合部署 tidb pd 和tikv

先扩容 2 个kv 看看是不是集群正常了

虚拟机资源如何?


最高峰值 峰值 CPU 使用 7个核心

集群不可用的表现是什么?

因为是混合部署,tikv疯狂输出日志,最高峰值700% cpu使用率,当前测试机器 12u的,IOPS很高,大概1分钟会输出 4G的日志文件。查询时延搞,一个简单的select 的语句,需要将近2s.

日志文件截图:

日志内容:
[2024/04/01 00:48:04.289 +08:00] [WARN] [pd.rs:1707] [“failed to update max timestamp for region 732777: Pd(Other("[components/pd_client/src/client.rs:981]: get timestamp timeout"))”]
[2024/04/01 00:48:04.289 +08:00] [WARN] [pd.rs:1707] [“failed to update max timestamp for region 747465: Pd(Other("[components/pd_client/src/client.rs:981]: get timestamp timeout"))”]
[2024/04/01 00:48:04.289 +08:00] [WARN] [pd.rs:1707] [“failed to update max timestamp for region 733273: Pd(Other("[components/pd_client/src/client.rs:981]: get timestamp timeout"))”]
[2024/04/01 00:48:04.289 +08:00] [WARN] [pd.rs:1707] [“failed to update max timestamp for region 741235: Pd(Other("[components/pd_client/src/client.rs:981]: get timestamp timeout"))”]
[2024/04/01 00:48:04.289 +08:00] [WARN] [pd.rs:1707] [“failed to update max timestamp for region 738057: Pd(Other("[components/pd_client/src/client.rs:981]: get timestamp timeout"))”]
[2024/04/01 00:48:04.289 +08:00] [WARN] [pd.rs:1707] [“failed to update max timestamp for region 741129: Pd(Other("[components/pd_client/src/client.rs:981]: get timestamp timeout"))”]
[2024/04/01 00:48:04.289 +08:00] [WARN] [pd.rs:1707] [“failed to update max timestamp for region 735223: Pd(Other("[components/pd_client/src/client.rs:981]: get timestamp timeout"))”]
[2024/04/01 00:48:04.289 +08:00] [WARN] [pd.rs:1707] [“failed to update max timestamp for region 719255: Pd(Other("[components/pd_client/src/client.rs:981]: get timestamp timeout"))”]
[2024/04/01 00:48:04.289 +08:00] [WARN] [pd.rs:1707] [“failed to update max timestamp for region 734127: Pd(Other("[components/pd_client/src/client.rs:981]: get timestamp timeout"))”]
[2024/04/01 00:48:04.289 +08:00] [WARN] [pd.rs:1707] [“failed to update max timestamp for region 734201: Pd(Other("[components/pd_client/src/client.rs:981]: get timestamp timeout"))”]

没有太多的有价值的日志信息。

测试的话,直接用tiup playground吧

调整日志级别 error。先不让输出这么大的日志,再试试呢, 是不是,磁盘IO成瓶颈了/

各节点时间同步了吗?

看下granfana各个指标正常么,异常点在哪

请问问题解决了吗?遇到了同样的问题

目前是重建了测试的数据库!

1 个赞

好的,谢谢~