pd.log中报错

[2020/10/21 09:26:10.066 +08:00] [WARN] [proxy.go:181] [“fail to recv activity from remote, stay inactive and wait to next checking round”] [remote=0.0.0.0:4000] [interval=2s] [error=“dial tcp 0.0.0.0:4000: connect: connection refused”]
[2020/10/21 09:26:10.066 +08:00] [WARN] [proxy.go:181] [“fail to recv activity from remote, stay inactive and wait to next checking round”] [remote=0.0.0.0:10080] [interval=2s] [error=“dial tcp 0.0.0.0:10080: connect: connection refused”]

[2020/10/21 09:26:33.668 +08:00] [WARN] [cluster.go:427] [“store does not have enough disk space”] [store-id=1] [capacity=53660876800] [available=2030014464]
[2020/10/21 09:26:43.674 +08:00] [WARN] [cluster.go:427] [“store does not have enough disk space”] [store-id=1] [capacity=53660876800] [available=2029948928]

报错之后 日志狂增,没多一会,pd和tikv就自动断开,tidb.log日志持续增长,数据库无法访问,请大神指点!

1、部署的方式和架构、当前的版本提供下
2、请提供完整的包含报错信息的日志文件:

1) tidb log
2) tikv log
3) pd log

3、当前环境中有空间不足的提示,建议再拿下 pd-ctl config show all 以及 pd-ctl store 信息

pd.log (66.4 KB) tikv.log (353.6 KB)

tidb 的log日志 90M 上传失败,以下是tidb.log中一直再写的日志:
[2020/10/22 12:54:10.476 +08:00] [WARN] [base_client.go:184] [“[pd] cannot update leader”] [address=http://127.0.0.1:2379] [error=“error:rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused" target:127.0.0.1:2379 status:TRANSIENT_FAILURE”]
[2020/10/22 12:54:10.476 +08:00] [ERROR] [base_client.go:130] [“[pd] failed updateLeader”] [error=“failed to get leader from [http://127.0.0.1:2379]”]
[2020/10/22 12:54:11.476 +08:00] [ERROR] [client.go:225] [“[pd] create tso stream error”] [error=“rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused"”]
[2020/10/22 12:54:11.476 +08:00] [WARN] [base_client.go:184] [“[pd] cannot update leader”] [address=http://127.0.0.1:2379] [error=“error:rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused" target:127.0.0.1:2379 status:TRANSIENT_FAILURE”]
[2020/10/22 12:54:11.476 +08:00] [ERROR] [base_client.go:130] [“[pd] failed updateLeader”] [error=“failed to get leader from [http://127.0.0.1:2379]”]
[2020/10/22 12:54:11.677 +08:00] [INFO] [client_batch.go:309] [“batchRecvLoop re-create streaming fail”] [target=127.0.0.1:20160] [error=“context deadline exceeded”]
[2020/10/22 12:54:12.477 +08:00] [ERROR] [client.go:225] [“[pd] create tso stream error”] [error=“rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused"”]
[2020/10/22 12:54:12.477 +08:00] [ERROR] [pd.go:130] [“updateTS error”] [error=“rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused"”]
[2020/10/22 12:54:12.477 +08:00] [WARN] [base_client.go:184] [“[pd] cannot update leader”] [address=http://127.0.0.1:2379] [error=“error:rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused" target:127.0.0.1:2379 status:TRANSIENT_FAILURE”]
[2020/10/22 12:54:12.477 +08:00] [ERROR] [base_client.go:130] [“[pd] failed updateLeader”] [error=“failed to get leader from [http://127.0.0.1:2379]”]
[2020/10/22 12:54:13.429 +08:00] [INFO] [client_batch.go:309] [“batchRecvLoop re-create streaming fail”] [target=127.0.0.1:20160] [error=“context deadline exceeded”]
[2020/10/22 12:54:13.477 +08:00] [ERROR] [client.go:225] [“[pd] create tso stream error”] [error=“rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused"”]
[2020/10/22 12:54:13.478 +08:00] [WARN] [base_client.go:184] [“[pd] cannot update leader”] [address=http://127.0.0.1:2379] [error=“error:rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused" target:127.0.0.1:2379 status:TRANSIENT_FAILURE”]
[2020/10/22 12:54:13.478 +08:00] [ERROR] [base_client.go:130] [“[pd] failed updateLeader”] [error=“failed to get leader from [http://127.0.0.1:2379]”]
[2020/10/22 12:54:14.403 +08:00] [ERROR] [kv.go:270] [“fail to load safepoint from pd”] [error=“context deadline exceeded”]
[2020/10/22 12:54:14.478 +08:00] [ERROR] [client.go:225] [“[pd] create tso stream error”] [error=“rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused"”]
[2020/10/22 12:54:14.478 +08:00] [ERROR] [pd.go:130] [“updateTS error”] [error=“rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused"”]
[2020/10/22 12:54:14.479 +08:00] [WARN] [base_client.go:184] [“[pd] cannot update leader”] [address=http://127.0.0.1:2379] [error=“error:rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused" target:127.0.0.1:2379 status:TRANSIENT_FAILURE”]
[2020/10/22 12:54:14.479 +08:00] [ERROR] [base_client.go:130] [“[pd] failed updateLeader”] [error=“failed to get leader from [http://127.0.0.1:2379]”]
[2020/10/22 12:54:14.892 +08:00] [INFO] [client_batch.go:309] [“batchRecvLoop re-create streaming fail”] [target=127.0.0.1:20160] [error=“context deadline exceeded”]

目前是在centos中单点部署的模式,版本tidb-v4.0.4-linux-amd64。请大神过目!

1、确认下部署的架构是在一台服务器上同时部署了tidb ,tikv 以及 pd ,节点数量分别是多少 ?
2、如果是使用 tiup 部署的请使用 tiup cluster edit-config {cluster_name} 上传下当前的配置信息

参照的安装说明,https://www.cnblogs.com/zgqbky/p/11919310.html,完全按照链接中的方法进行的tidb安装。部署在一台centos机器上

建议使用官方部署工具 tiup 进行安装部署,tiup 相关内容可参考:
https://docs.pingcap.com/zh/tidb/stable/production-deployment-using-tiup

新建 topology.yaml 文件,并且参照下述模板文件进行配置:
https://github.com/pingcap-incubator/tiup-cluster/blob/master/examples/topology.example.yaml

想要单点部署,不使用集群,应该怎么操作呢?

单机部署仍然可以使用 tiup 工具,只是在 topology.yaml 文件中,各个组件的数量保持一个即可。另外,将集群 tikv 的 region 副本数调整为 1,对应 pd 的参数为 replication.max-replicas

https://docs.pingcap.com/zh/tidb/stable/pd-configuration-file#max-replicas

建议使用物理 IP 地址,不要使用回环地址 127.0.0.1。
此部署方式仅适用于测试环境,对高可用没有强需求。生产环境强烈建议安装官方推荐进行部署。

topology.yaml文件 应该创建在哪个目录下呢?

理论上任意目录都可以,有权限读取即可。为方便管理建议单独建立目录来存放 topo 以及后续的 scale-in 脚本 ~

tiup-cluster-debug-2020-10-29-10-25-52.log (26.0 KB)
好像是没有执行权限,应该怎么设置呢?

请看下这个帖子是否有帮助:

上个问题解决了 又报了一个新的错误

建议先看下 debug log 中的具体报错信息 ~

谢谢大神,安装成功了,但是启动的时候,kitv报错 一直启动失败,请大神指教!

tikv.log (156.8 KB)

1、tikv 的部署目录是什么?
2、相应目录的磁盘空间使用率请一并确认下

%E6%B7%B1%E5%BA%A6%E6%88%AA%E5%9B%BE_%E9%80%89%E6%8B%A9%E5%8C%BA%E5%9F%9F_20201029182635 %E6%B7%B1%E5%BA%A6%E6%88%AA%E5%9B%BE_%E9%80%89%E6%8B%A9%E5%8C%BA%E5%9F%9F_20201029182529

目前默认安装在这个位置,空间使用率centos-root还有567M 是空间不足了吗?

使用率到 99% 了,建议清理下目录再尝试启动看看 ~

清理出来1.2G的空间 还是启动失败,问题还是一样

建议保持剩余可使用空间在 30%+,如果无法清理,请扩容后再试~