pd机器更改ip recover后,读取数据异常

【TiDB 使用环境】生产环境 /测试/ Poc
【TiDB 版本】
【操作系统】
【部署方式】云上部署(什么云)/机器部署(什么机器配置、什么硬盘)
【集群数据量】30Gib
【集群节点数】3
【问题复现路径】机器迁移机柜,3台pd机器和tikv机器ip都改变了,恢复集群后出现读写速度慢问题
【遇到的问题:问题现象及影响】
现在读取速度特别慢,更改pd ip之前读写速度是正常的
【资源配置】

Cluster type:       tidb
Cluster name:       hb72
Cluster version:    v6.5.11
Deploy user:        root
SSH type:           builtin
Dashboard URL:      http://10.30.1.121:2379/dashboard
Dashboard URLs:     http://10.30.1.121:2379/dashboard
ID                 Role  Host         Ports        OS/Arch       Status   Data Dir                        Deploy Dir
--                 ----  ----         -----        -------       ------   --------                        ----------
10.30.1.117:2379   pd    10.30.1.117  2379/2380    linux/x86_64  Up       /opt/tikv/tikv-data/pd-2379     /opt/tikv/tikv-deploy/pd-2379
10.30.1.121:2379   pd    10.30.1.121  2379/2380    linux/x86_64  Up|L|UI  /opt/tikv/tikv-data/pd-2379     /opt/tikv/tikv-deploy/pd-2379
10.30.1.122:2379   pd    10.30.1.122  2379/2380    linux/x86_64  Up       /opt/tikv/tikv-data/pd-2379     /opt/tikv/tikv-deploy/pd-2379
10.30.1.115:20160  tikv  10.30.1.115  20160/20180  linux/x86_64  Up       /opt/tikv/tikv-data/tikv-20160  /opt/tikv/tikv-deploy/tikv-20160
10.30.1.119:20160  tikv  10.30.1.119  20160/20180  linux/x86_64  Up       /opt/tikv/tikv-data/tikv-20160  /opt/tikv/tikv-deploy/tikv-20160
10.30.1.120:20160  tikv  10.30.1.120  20160/20180  linux/x86_64  Up       /opt/tikv/tikv-data/tikv-20160  /opt/tikv/tikv-deploy/tikv-20160
Total nodes: 6

【复制黏贴 ERROR 报错的日志】
【其他附件:截图/日志/监控】

ansible pd -m shell -a "tail /opt/tikv/tikv-deploy/pd-2379/log/pd.log" -f 3
10.30.1.117 | CHANGED | rc=0 >>
[2025/09/17 11:15:11.687 +08:00] [WARN] [util.go:167] ["apply request took too long"] [took=228.954811ms] [expected-duration=100ms] [prefix="read-only range "] [request="key:\"/pd/7535653128917855129/config\" "] [response="range_response_count:1 size:3777"] []
[2025/09/17 11:15:11.687 +08:00] [INFO] [trace.go:152] ["trace[555164612] range"] [detail="{range_begin:/pd/7535653128917855129/config; range_end:; response_count:1; response_revision:74268; }"] [duration=229.082183ms] [start=2025/09/17 11:15:11.458 +08:00] [end=2025/09/17 11:15:11.687 +08:00] [steps="[\"trace[555164612] 'agreement among raft nodes before linearized reading'  (duration: 228.86383ms)\"]"]
[2025/09/17 11:16:08.582 +08:00] [WARN] [util.go:167] ["apply request took too long"] [took=124.611365ms] [expected-duration=100ms] [prefix="read-only range "] [request="key:\"/pd/7535653128917855129/config\" "] [response="range_response_count:1 size:3777"] []
[2025/09/17 11:16:08.582 +08:00] [INFO] [trace.go:152] ["trace[144274117] range"] [detail="{range_begin:/pd/7535653128917855129/config; range_end:; response_count:1; response_revision:74292; }"] [duration=124.684481ms] [start=2025/09/17 11:16:08.457 +08:00] [end=2025/09/17 11:16:08.582 +08:00] [steps="[\"trace[144274117] 'range keys from in-memory index tree'  (duration: 124.082024ms)\"]"]
[2025/09/17 11:17:08.593 +08:00] [WARN] [util.go:167] ["apply request took too long"] [took=134.818779ms] [expected-duration=100ms] [prefix="read-only range "] [request="key:\"/pd/7535653128917855129/config\" "] [response="range_response_count:1 size:3777"] []
[2025/09/17 11:17:08.593 +08:00] [INFO] [trace.go:152] ["trace[750406164] range"] [detail="{range_begin:/pd/7535653128917855129/config; range_end:; response_count:1; response_revision:74321; }"] [duration=134.892069ms] [start=2025/09/17 11:17:08.458 +08:00] [end=2025/09/17 11:17:08.593 +08:00] [steps="[\"trace[750406164] 'range keys from in-memory index tree'  (duration: 134.14526ms)\"]"]
[2025/09/17 11:18:08.607 +08:00] [WARN] [util.go:167] ["apply request took too long"] [took=148.748948ms] [expected-duration=100ms] [prefix="read-only range "] [request="key:\"/pd/7535653128917855129/config\" "] [response="range_response_count:1 size:3777"] []
[2025/09/17 11:18:08.607 +08:00] [INFO] [trace.go:152] ["trace[573855812] range"] [detail="{range_begin:/pd/7535653128917855129/config; range_end:; response_count:1; response_revision:74347; }"] [duration=148.833323ms] [start=2025/09/17 11:18:08.458 +08:00] [end=2025/09/17 11:18:08.607 +08:00] [steps="[\"trace[573855812] 'range keys from in-memory index tree'  (duration: 148.113321ms)\"]"]
[2025/09/17 11:19:08.566 +08:00] [WARN] [util.go:167] ["apply request took too long"] [took=107.94327ms] [expected-duration=100ms] [prefix="read-only range "] [request="key:\"/pd/7535653128917855129/config\" "] [response="range_response_count:1 size:3777"] []
[2025/09/17 11:19:08.566 +08:00] [INFO] [trace.go:152] ["trace[1310650280] range"] [detail="{range_begin:/pd/7535653128917855129/config; range_end:; response_count:1; response_revision:74373; }"] [duration=108.057741ms] [start=2025/09/17 11:19:08.458 +08:00] [end=2025/09/17 11:19:08.566 +08:00] [steps="[\"trace[1310650280] 'range keys from in-memory index tree'  (duration: 107.337244ms)\"]"]
10.30.1.122 | CHANGED | rc=0 >>
[2025/09/17 11:05:56.479 +08:00] [INFO] [client.go:170] ["server starts to synchronize with leader"] [server=pd-10.30.1.122-2379] [leader=pd-10.30.1.121-2379] [request-index=22000]
[2025/09/17 11:08:11.621 +08:00] [INFO] [trace.go:152] ["trace[1110149908] linearizableReadLoop"] [detail="{readStateIndex:74186; appliedIndex:74186; }"] [duration=162.89202ms] [start=2025/09/17 11:08:11.459 +08:00] [end=2025/09/17 11:08:11.621 +08:00] [steps="[\"trace[1110149908] 'read index received'  (duration: 162.887349ms)\",\"trace[1110149908] 'applied index is now lower than readState.Index'  (duration: 3.286µs)\"]"]
[2025/09/17 11:08:11.622 +08:00] [WARN] [util.go:167] ["apply request took too long"] [took=163.114652ms] [expected-duration=100ms] [prefix="read-only range "] [request="key:\"/pd/7535653128917855129/config\" "] [response="range_response_count:1 size:3777"] []
[2025/09/17 11:08:11.622 +08:00] [INFO] [trace.go:152] ["trace[821505762] range"] [detail="{range_begin:/pd/7535653128917855129/config; range_end:; response_count:1; response_revision:74083; }"] [duration=163.331295ms] [start=2025/09/17 11:08:11.459 +08:00] [end=2025/09/17 11:08:11.622 +08:00] [steps="[\"trace[821505762] 'agreement among raft nodes before linearized reading'  (duration: 162.983769ms)\"]"]
[2025/09/17 11:13:11.603 +08:00] [INFO] [trace.go:152] ["trace[1278619771] linearizableReadLoop"] [detail="{readStateIndex:74319; appliedIndex:74319; }"] [duration=143.619443ms] [start=2025/09/17 11:13:11.459 +08:00] [end=2025/09/17 11:13:11.603 +08:00] [steps="[\"trace[1278619771] 'read index received'  (duration: 143.614453ms)\",\"trace[1278619771] 'applied index is now lower than readState.Index'  (duration: 3.787µs)\"]"]
[2025/09/17 11:13:11.603 +08:00] [WARN] [util.go:167] ["apply request took too long"] [took=143.89799ms] [expected-duration=100ms] [prefix="read-only range "] [request="key:\"/pd/7535653128917855129/config\" "] [response="range_response_count:1 size:3777"] []
[2025/09/17 11:13:11.603 +08:00] [INFO] [trace.go:152] ["trace[834392297] range"] [detail="{range_begin:/pd/7535653128917855129/config; range_end:; response_count:1; response_revision:74216; }"] [duration=144.067544ms] [start=2025/09/17 11:13:11.459 +08:00] [end=2025/09/17 11:13:11.603 +08:00] [steps="[\"trace[834392297] 'agreement among raft nodes before linearized reading'  (duration: 143.772275ms)\"]"]
[2025/09/17 11:15:11.689 +08:00] [INFO] [trace.go:152] ["trace[1933690184] linearizableReadLoop"] [detail="{readStateIndex:74371; appliedIndex:74371; }"] [duration=230.130307ms] [start=2025/09/17 11:15:11.459 +08:00] [end=2025/09/17 11:15:11.689 +08:00] [steps="[\"trace[1933690184] 'read index received'  (duration: 230.1252ms)\",\"trace[1933690184] 'applied index is now lower than readState.Index'  (duration: 3.775µs)\"]"]
[2025/09/17 11:15:11.689 +08:00] [WARN] [util.go:167] ["apply request took too long"] [took=230.336659ms] [expected-duration=100ms] [prefix="read-only range "] [request="key:\"/pd/7535653128917855129/config\" "] [response="range_response_count:1 size:3777"] []
[2025/09/17 11:15:11.689 +08:00] [INFO] [trace.go:152] ["trace[1102451188] range"] [detail="{range_begin:/pd/7535653128917855129/config; range_end:; response_count:1; response_revision:74268; }"] [duration=230.517096ms] [start=2025/09/17 11:15:11.459 +08:00] [end=2025/09/17 11:15:11.689 +08:00] [steps="[\"trace[1102451188] 'agreement among raft nodes before linearized reading'  (duration: 230.22766ms)\"]"]
10.30.1.121 | CHANGED | rc=0 >>
[2025/09/17 11:07:13.790 +08:00] [WARN] [tso.go:331] ["clock offset"] [jet-lag=283.491317ms] [prev-physical=2025/09/17 11:07:13.506 +08:00] [now=2025/09/17 11:07:13.790 +08:00] [update-physical-interval=50ms]
[2025/09/17 11:08:11.157 +08:00] [WARN] [tso.go:331] ["clock offset"] [jet-lag=650.817744ms] [prev-physical=2025/09/17 11:08:10.506 +08:00] [now=2025/09/17 11:08:11.157 +08:00] [update-physical-interval=50ms]
[2025/09/17 11:08:11.622 +08:00] [WARN] [etcd_kv.go:160] ["txn runs too slow"] [response="{\"header\":{\"cluster_id\":15139412056096839125,\"member_id\":3765587707878827446,\"revision\":74083,\"raft_term\":14},\"succeeded\":true,\"responses\":[{\"Response\":{\"ResponsePut\":{\"header\":{\"revision\":74083}}}}]}"] [cost=1.050090234s] []
[2025/09/17 11:09:13.750 +08:00] [WARN] [tso.go:331] ["clock offset"] [jet-lag=243.90189ms] [prev-physical=2025/09/17 11:09:13.506 +08:00] [now=2025/09/17 11:09:13.750 +08:00] [update-physical-interval=50ms]
[2025/09/17 11:11:13.772 +08:00] [WARN] [tso.go:331] ["clock offset"] [jet-lag=266.815801ms] [prev-physical=2025/09/17 11:11:13.506 +08:00] [now=2025/09/17 11:11:13.772 +08:00] [update-physical-interval=50ms]
[2025/09/17 11:13:11.160 +08:00] [WARN] [tso.go:331] ["clock offset"] [jet-lag=653.484581ms] [prev-physical=2025/09/17 11:13:10.506 +08:00] [now=2025/09/17 11:13:11.160 +08:00] [update-physical-interval=50ms]
[2025/09/17 11:15:11.235 +08:00] [WARN] [tso.go:331] ["clock offset"] [jet-lag=729.687957ms] [prev-physical=2025/09/17 11:15:10.506 +08:00] [now=2025/09/17 11:15:11.235 +08:00] [update-physical-interval=50ms]
[2025/09/17 11:15:11.690 +08:00] [WARN] [etcd_kv.go:160] ["txn runs too slow"] [response="{\"header\":{\"cluster_id\":15139412056096839125,\"member_id\":3765587707878827446,\"revision\":74268,\"raft_term\":14},\"succeeded\":true,\"responses\":[{\"Response\":{\"ResponsePut\":{\"header\":{\"revision\":74268}}}}]}"] [cost=1.008247491s] []
[2025/09/17 11:17:13.757 +08:00] [WARN] [tso.go:331] ["clock offset"] [jet-lag=251.350005ms] [prev-physical=2025/09/17 11:17:13.506 +08:00] [now=2025/09/17 11:17:13.757 +08:00] [update-physical-interval=50ms]
[2025/09/17 11:19:13.753 +08:00] [WARN] [tso.go:331] ["clock offset"] [jet-lag=196.483825ms] [prev-physical=2025/09/17 11:19:13.556 +08:00] [now=2025/09/17 11:19:13.753 +08:00] [update-physical-interval=50ms]

chronyc tracking 结果

10.30.1.117 | CHANGED | rc=0 >>
Reference ID    : D21C8204 (time.nju.edu.cn)
Stratum         : 2
Ref time (UTC)  : Wed Sep 17 03:18:59 2025
System time     : 0.001623639 seconds fast of NTP time
Last offset     : +0.000622989 seconds
RMS offset      : 0.002910159 seconds
Frequency       : 13.326 ppm slow
Residual freq   : +0.015 ppm
Skew            : 0.212 ppm
Root delay      : 0.059497610 seconds
Root dispersion : 0.000563287 seconds
Update interval : 1041.7 seconds
Leap status     : Normal
10.30.1.122 | CHANGED | rc=0 >>
Reference ID    : 8BC7D6CA (139.199.214.202)
Stratum         : 3
Ref time (UTC)  : Wed Sep 17 03:13:41 2025
System time     : 0.000345686 seconds slow of NTP time
Last offset     : -0.000148230 seconds
RMS offset      : 0.001260231 seconds
Frequency       : 17.033 ppm slow
Residual freq   : +0.002 ppm
Skew            : 0.146 ppm
Root delay      : 0.054492999 seconds
Root dispersion : 0.017223055 seconds
Update interval : 1027.8 seconds
Leap status     : Normal
10.30.1.121 | CHANGED | rc=0 >>
Reference ID    : 8BC7D6CA (139.199.214.202)
Stratum         : 3
Ref time (UTC)  : Wed Sep 17 03:13:45 2025
System time     : 0.000279026 seconds slow of NTP time
Last offset     : +0.000215401 seconds
RMS offset      : 0.000708730 seconds
Frequency       : 20.181 ppm slow
Residual freq   : -0.016 ppm
Skew            : 0.288 ppm
Root delay      : 0.057069357 seconds
Root dispersion : 0.013686049 seconds
Update interval : 1041.3 seconds
Leap status     : Normal

尝试同步下ntp时钟呢?

没有效果

dashboard看看迁移前后的延迟呢
tidb跟pd/tikv还是同一个网段吗

可以抓包看下吗?

改之后,都正常。只有读写慢,先看看磁盘IO问题。和之前有没有差距,监控里面看看整体延迟,还有慢SQL看看有没有内容

测试下tikv机器的io看下,磁盘是是磁盘,通过什么方式挂载到服务器上的

在同一个网段 双万兆的

在recover集群的时候,没有使用最大的alloc-id, 导致部分数据无法查询,查询的时候并不报错,应该是超时机制,40s之后才有错误返回

使用的原来的alloc-id?
迁移可以扩缩容的吧,怎么使用recover了?

  1. 原来有三个pd,三个pd机器的alloc-id不一样,没有使用最大的
  2. 迁移是机器搬迁机柜,机器没变,ip变了,所以使用了下面的命令
tiup pd-recover -endpoints xxxxx -cluster-id xxxxx -alloc-id 10000
1 个赞

这个时间为何会这么高。。。。729 653 650

时钟不同步导致的,日志里写的很清楚。
[2025/09/17 11:07:13.790 +08:00] [WARN] [tso.go:331] [“clock offset”] [jet-lag=283.491317ms]
[2025/09/17 11:08:11.157 +08:00] [WARN] [tso.go:331] [“clock offset”] [jet-lag=650.817744ms]

1 个赞