启动时UpdateTopology失败,rpc error(网络是通的用telnet试过).pd-ctl获取cluster信息503

林快乐etc · 2021 年7 月 26 日 02:53

日志如下：
2021-07-26T10:39:51.207+0800 DEBUG TaskFinish {“task”: “StartCluster”}
2021-07-26T10:39:51.207+0800 INFO + [ Serial ] - UpdateTopology: cluster=tidb-poc
2021-07-26T10:39:51.207+0800 DEBUG TaskBegin {“task”: “UpdateTopology: cluster=tidb-poc”}
2021-07-26T10:40:01.209+0800 DEBUG TaskFinish {“task”: “UpdateTopology: cluster=tidb-poc”, “error”: “context deadline exceeded”}
2021-07-26T10:40:01.209+0800 INFO Execute command finished {“code”: 1, “error”: “context deadline exceeded”, “errorVerbose”: “context deadline exceeded
github.com/pingcap/errors.AddStack
\tgithub.com/pingcap/errors@v0.11.4/errors.go:174
github.com/pingcap/errors.Trace
\tgithub.com/pingcap/errors@v0.11.4/juju_adaptor.go:15
github.com/pingcap/tiup/pkg/cluster/manager.(*Manager).StartCluster
\tgithub.com/pingcap/tiup/pkg/cluster/manager/basic.go:114
github.com/pingcap/tiup/components/cluster/command.newStartCmd.func1
\tgithub.com/pingcap/tiup/components/cluster/command/start.go:39
github.com/spf13/cobra.(*Command).execute
\tgithub.com/spf13/cobra@v1.1.3/command.go:852
github.com/spf13/cobra.(*Command).ExecuteC
\tgithub.com/spf13/cobra@v1.1.3/command.go:960
github.com/spf13/cobra.(*Command).Execute
\tgithub.com/spf13/cobra@v1.1.3/command.go:897
github.com/pingcap/tiup/components/cluster/command.Execute
\tgithub.com/pingcap/tiup/components/cluster/command/root.go:264
main.main
\tgithub.com/pingcap/tiup/components/cluster/main.go:23
runtime.main
\truntime/proc.go:225
runtime.goexit
\truntime/asm_amd64.s:1371”}

tiflash 也没起来

xfworld · 2021 年7 月 26 日 04:34

那你是不是贴错图了？

你的版本？目前状态是什么？问题也没描述清楚，是集群没启动，还是什么？

林快乐etc · 2021 年7 月 26 日 04:35

集群部分组件，启动成功了。版本 v5.1.0，deploy的时候报的这个错误，日志就是这样。我也不知道为什么会这样

林快乐etc · 2021 年7 月 26 日 04:47

在执行UpdateTopology时报错

xfworld · 2021 年7 月 26 日 05:19

PD 挂了啊，完全连不上呢

参考这个操作方式：

也可以参考这个：

林快乐etc · 2021 年7 月 26 日 05:26

我看pd状态是活着的啊

林快乐etc · 2021 年7 月 26 日 05:27

xfworld · 2021 年7 月 26 日 05:27

你贴图上的日志写的是挂的… rpc error…

林快乐etc · 2021 年7 月 26 日 05:28

rpc挂了啊？

xfworld · 2021 年7 月 26 日 05:41

关键是 PD 中自带 ETCD 的实现，做为ETCD 的实例要提供服务的，在日志中描述的是调度失败（你的网络是通的么？最好参考下我发的链接检查下）

林快乐etc · 2021 年7 月 26 日 06:30

网络是通的啊，我在deploy前check都是正常的呢

xfworld · 2021 年7 月 26 日 06:33

那和你的日志对不上啊，日志上描述的状态是调度失败…

林快乐etc · 2021 年7 月 26 日 06:34

是通的啊

xfworld · 2021 年7 月 26 日 06:37

你用这个工具试试

https://docs.pingcap.com/zh/tidb/v4.0/pd-control

打印相关的状态和一些cluster, store,regions信息等等看看

林快乐etc · 2021 年7 月 26 日 06:48

林快乐etc · 2021 年7 月 26 日 07:08

您好,recover也没用啊。

林快乐etc · 2021 年7 月 26 日 08:09

大佬我重新改了拓扑，销毁重建集群依然没用

xfworld · 2021 年7 月 26 日 08:24

环境只能你自己确认啊，如果你是按照步骤部署的，然后每个步骤都是ok的，最后查阅集群状态是否正常就好了啊！

日志描述是命令无法调度，肯定是网络不通啊，我建议你自己在排查下…

林快乐etc · 2021 年7 月 26 日 08:26

你也看到了我发的telnet的截图，网络都是通的啊。我试了好多遍的呢。不过还是谢谢你

林快乐etc · 2021 年7 月 27 日 01:41

不是网络不通的问题啊，这个图我访问的是本机。503显示的是downtime和capacity的问题，这怎么解决