tiup 升级到 v5.4.0 时 pd 报错,升级失败

Hi 由于历史原因有4个pd节点分别是 pd_db120 、pd_db126 、pd_db128 、pd_db112,在用 tiup 做由 v4.0.14 升级到 v5.4.0 时遇到了如下报错:

pd_db126 节点报错:

[2022/05/23 20:07:00.672 +08:00] [ERROR] [middleware.go:102] ["redirect but server is not leader"] [from=pd_db120] [server=pd_db126] [error="[PD:apiutil:ErrRedirect]redirect failed"]
[2022/05/23 20:07:00.679 +08:00] [ERROR] [middleware.go:102] ["redirect but server is not leader"] [from=pd_db128] [server=pd_db126] [error="[PD:apiutil:ErrRedirect]redirect failed"]
[2022/05/23 20:07:00.682 +08:00] [ERROR] [middleware.go:102] ["redirect but server is not leader"] [from=pd_db112] [server=pd_db126] [error="[PD:apiutil:ErrRedirect]redirect failed"]
[2022/05/23 20:07:01.034 +08:00] [ERROR] [middleware.go:102] ["redirect but server is not leader"] [from=pd_db120] [server=pd_db126] [error="[PD:apiutil:ErrRedirect]redirect failed"]
[2022/05/23 20:07:01.035 +08:00] [ERROR] [middleware.go:102] ["redirect but server is not leader"] [from=pd_db128] [server=pd_db126] [error="[PD:apiutil:ErrRedirect]redirect failed"]
[2022/05/23 20:07:01.036 +08:00] [ERROR] [middleware.go:102] ["redirect but server is not leader"] [from=pd_db112] [server=pd_db126] [error="[PD:apiutil:ErrRedirect]redirect failed"]

pd_db120 节点报错:

[2022/05/23 20:07:00.097 +08:00] [ERROR] [tso.go:302] ["invalid timestamp"] [timestamp={}]
[2022/05/23 20:07:00.174 +08:00] [ERROR] [tso.go:302] ["invalid timestamp"] [timestamp={}]
[2022/05/23 20:07:00.497 +08:00] [ERROR] [tso.go:302] ["invalid timestamp"] [timestamp={}]
[2022/05/23 20:07:00.622 +08:00] [ERROR] [tso.go:302] ["invalid timestamp"] [timestamp={}]
[2022/05/23 20:07:00.764 +08:00] [ERROR] [tso.go:302] ["invalid timestamp"] [timestamp={}]
[2022/05/23 20:07:01.056 +08:00] [ERROR] [tso.go:302] ["invalid timestamp"] [timestamp={}]

pd_db128 无明显报错:

[2022/05/23 20:02:28.087 +08:00] [WARN] [client.go:128] ["failed to load regions."] [error="context canceled"]
[2022/05/23 20:02:28.087 +08:00] [INFO] [server.go:1208] ["pd leader has changed, try to re-campaign a pd leader"]
[2022/05/23 20:02:28.087 +08:00] [INFO] [server.go:1214] ["skip campaigning of pd leader and check later"] [server-name=pd_db128] [etcd-leader-id=1201241185604364551] [member-id=14861365263914448536]
[2022/05/23 20:02:28.288 +08:00] [INFO] [server.go:1214] ["skip campaigning of pd leader and check later"] [server-name=pd_db128] [etcd-leader-id=1201241185604364551] [member-id=14861365263914448536]
[2022/05/23 20:02:28.491 +08:00] [INFO] [server.go:1352] ["server enable region storage"]
[2022/05/23 20:02:28.491 +08:00] [INFO] [server.go:1204] ["start to watch pd leader"] [pd-leader="name:\"pd_db126\" member_id:1201241185604364551 peer_urls:\"http://10.203.41.113:2380\" client_urls:\"http://10.203.41.113:2379\" "]
[2022/05/23 20:02:28.491 +08:00] [INFO] [client.go:123] ["region syncer start load region"]
[2022/05/23 20:05:52.748 +08:00] [INFO] [client.go:126] ["region syncer finished load region"] [time-cost=3m24.256911147s]
[2022/05/23 20:05:52.749 +08:00] [INFO] [client.go:167] ["server starts to synchronize with leader"] [server=pd_db128] [leader=pd_db126] [request-index=1335102200]
[2022/05/23 20:10:47.307 +08:00] [WARN] [client.go:179] ["server sync index not match the leader"] [server=pd_db128] [own=1335102200] [leader=1335102100] [records-length=0]

pd_db112 无明显报错:

[2022/05/23 20:02:26.815 +08:00] [INFO] [serve.go:139] ["serving client traffic insecurely; this is strongly discouraged!"] [address=0.0.0.0:2379]
[2022/05/23 20:02:26.817 +08:00] [INFO] [server.go:358] ["init cluster id"] [cluster-id=6729800275828476637]
[2022/05/23 20:02:26.997 +08:00] [INFO] [history_buffer.go:147] ["start from history index"] [start-index=1335102394]
[2022/05/23 20:02:27.002 +08:00] [INFO] [server.go:1352] ["server enable region storage"]
[2022/05/23 20:02:27.002 +08:00] [INFO] [server.go:1204] ["start to watch pd leader"] [pd-leader="name:\"pd_db120\" member_id:13476713458720471704 peer_urls:\"http://10.203.41.50:2380\" client_urls:\"http://10.203.41.50:2379\" "]
[2022/05/23 20:02:27.002 +08:00] [INFO] [client.go:123] ["region syncer start load region"]
[2022/05/23 20:02:27.085 +08:00] [WARN] [leadership.go:194] ["required revision has been compacted, use the compact revision"] [required-revision=190894796] [compact-revision=191536147]
[2022/05/23 20:02:27.829 +08:00] [INFO] [raft.go:859] ["9a96e678a10991fb [term: 19] received a MsgVote message with higher term from 10abaa7e6d846907 [term: 20]"]
[2022/05/23 20:02:27.829 +08:00] [INFO] [raft.go:700] ["9a96e678a10991fb became follower at term 20"]
[2022/05/23 20:02:27.829 +08:00] [INFO] [raft.go:960] ["9a96e678a10991fb [logterm: 19, index: 374905615, vote: 0] cast MsgVote for 10abaa7e6d846907 [logterm: 19, index: 374905615] at term 20"]
[2022/05/23 20:02:27.829 +08:00] [INFO] [node.go:331] ["raft.node: 9a96e678a10991fb lost leader bb06e77ceec27a98 at term 20"]
[2022/05/23 20:02:27.830 +08:00] [INFO] [node.go:325] ["raft.node: 9a96e678a10991fb elected leader 10abaa7e6d846907 at term 20"]
[2022/05/23 20:02:27.875 +08:00] [INFO] [leadership.go:211] ["current leadership is deleted"] [leader-key=/pd/6729800275828476637/leader] [purpose="pd leader election"]
[2022/05/23 20:02:30.640 +08:00] [INFO] [client.go:126] ["region syncer finished load region"] [time-cost=3.637527061s]
[2022/05/23 20:02:30.640 +08:00] [WARN] [client.go:128] ["failed to load regions."] [error="context canceled"]
[2022/05/23 20:02:30.640 +08:00] [INFO] [server.go:1208] ["pd leader has changed, try to re-campaign a pd leader"]
[2022/05/23 20:02:30.640 +08:00] [INFO] [server.go:1214] ["skip campaigning of pd leader and check later"] [server-name=pd_db112] [etcd-leader-id=1201241185604364551] [member-id=11139344134119723515]
[2022/05/23 20:02:30.842 +08:00] [INFO] [server.go:1352] ["server enable region storage"]
[2022/05/23 20:02:30.842 +08:00] [INFO] [server.go:1204] ["start to watch pd leader"] [pd-leader="name:\"pd_db126\" member_id:1201241185604364551 peer_urls:\"http://10.203.41.113:2380\" client_urls:\"http://10.203.41.113:2379\" "]
[2022/05/23 20:02:30.843 +08:00] [INFO] [client.go:123] ["region syncer start load region"]
[2022/05/23 20:06:27.063 +08:00] [INFO] [client.go:126] ["region syncer finished load region"] [time-cost=3m56.220524753s]
[2022/05/23 20:06:27.064 +08:00] [INFO] [client.go:167] ["server starts to synchronize with leader"] [server=pd_db112] [leader=pd_db126] [request-index=1335102394]
[2022/05/23 20:10:47.307 +08:00] [WARN] [client.go:179] ["server sync index not match the leader"] [server=pd_db112] [own=1335102394] [leader=1335102100] [records-length=0]

目前的状态是 pd v5.4.0 ,tikv,tidb,tiflash是 v4.0.14 ,请问为什么会有这个报错,以及后续该如何升级到 v5.4.0

看起来是pd选主的问题,你试试把pd节点缩掉一个后再执行升级操作看看

1 个赞

现在有部分节点的版本为 5.4.0,有一些为 4.0.14,缩容会由于版本问题卡主吗?

不会的