为提高效率,提问时请提供以下信息,问题描述清晰可优先响应。
- 【TiDB 版本】:3.0.6
- 【问题描述】:tikv节点离线,pd-ctl 提示Failed to get store: Get http://127.0.0.1:2379/pd/api/v1/stores: dial tcp 127.0.0.1:2379: connect: connection refused,pd.log报[client.go:301] ["[pd] failed updateLeader"] [error=“failed to get leader from xxx,tikv.log 提示 [2020/04/02 19:31:33.539 +08:00] [ERROR] [util.rs:287] [“request failed, retry”] [err=“Other(SendError(”…”))"]
[2020/04/02 19:32:10.918 +08:00] [ERROR] [kv.rs:731] [“KvService::batch_raft send response fail”] [err=RemoteStopped]
若提问为性能优化、故障排查类问题,请下载脚本运行。终端输出的打印结果,请务必全选并复制粘贴上传。
来了老弟
4
您好,
执行下:
使用物理 IP 访问,使用 127.0.0.1 会出现异常
./bin/pd-ctl -u http://172.16.51.169:22379 store
{
“count”: 3,
“stores”: [
{
“store”: {
“id”: 1,
“address”: “192.168.1.138:20160”,
“version”: “3.0.5”,
“state_name”: “Down”
},
“status”: {
“leader_weight”: 1,
“region_weight”: 1,
“start_ts”: “1970-01-01T08:00:00+08:00”
}
},
{
“store”: {
“id”: 2,
“address”: “192.168.1.137:20160”,
“version”: “3.0.5”,
“state_name”: “Up”
},
“status”: {
“capacity”: “4.96TiB”,
“available”: “4.677TiB”,
“leader_count”: 7168,
“leader_weight”: 1,
“leader_score”: 591474,
“leader_size”: 591474,
“region_count”: 14304,
“region_weight”: 1,
“region_score”: 1183477,
“region_size”: 1183477,
“start_ts”: “2020-04-03T03:59:24+08:00”,
“last_heartbeat_ts”: “2020-04-03T17:21:14.391892834+08:00”,
“uptime”: “13h21m50.391892834s”
}
},
{
“store”: {
“id”: 7,
“address”: “192.168.1.139:20160”,
“version”: “3.0.5”,
“state_name”: “Up”
},
“status”: {
“capacity”: “4.96TiB”,
“available”: “4.676TiB”,
“leader_count”: 7136,
“leader_weight”: 1,
“leader_score”: 592003,
“leader_size”: 592003,
“region_count”: 14304,
“region_weight”: 1,
“region_score”: 1183477,
“region_size”: 1183477,
“start_ts”: “2020-04-03T03:58:50+08:00”,
“last_heartbeat_ts”: “2020-04-03T17:21:08.439567229+08:00”,
“uptime”: “13h22m18.439567229s”
}
}
]
}
1.重启可以成功,但是已经离线好几次了,一直找不到原因
2.很多日志不确定是哪一个,之前上传的是tikv.log
1.重启后不会马上down
2.这是所有的tikv日志
tikv.tar.gz (561.8 KB)
1.上次重启也是因为138离线了才重启的,最近重启就是本次离线,重启过后截止目前还没出现问题
2.我在tidb节点用 ansible-playbook start.yml -l 192.168.1.138 命令重启的,这样是否有问题?
3.是只有138这一个离线,我从监控页面看离线的,并且 ./pd-ctl -u http://192.168.1.136:2379 store 也提示138down