tiup cluster display状态显示问题

tidb版本:v4.0.5
tiup版本:v1.1.1

执行命令:tiup-cluster display XXXXX
显示tikv节点已经down


但是telnet发现相关tikv节点是存活了

这个大概是什么原因呢,请大佬帮忙解答!

  1. 这里的 down 不是指的整个服务器down,是指的 tikv-server 的这个进程down了
  2. 服务器down只是进程down的一种可能,可以登录主机,具体排查 tikv-server 进程 down 的原因。

我第二个截图就是telnet 对应的tikv进程,进程是存活状态的,并没有挂,这个大概是什么原因呢

  1. 登录到 tikv ,ps -ef 查看下 tikv 进程是否一致正常
  2. 查看 tikv 日志是否有异常

我也遇到了同样的问题。
display显示为down, 使用tiup cluster start n-cluster提示已经存在一个tikv进程了, 我登录上看到确实tikv进程是存在的。 机器间没有防火墙。 请问题主解决了吗?

麻烦在中控机 curl “{pdip}:{pd_port}/pd/api/v1/stores?state=0&state=1&state=2”
看看请求返回成功不

问题没解决,还在跟,不清楚是什么情况

麻烦反馈下楼上需要的信息,多谢。

{
  "store": {
    "id": 1,
    "address": "xxx:20160",
    "version": "4.0.5",
    "status_address": "xxx:20180",
    "git_hash": "xxx",
    "start_timestamp": 1600158018,
    "deploy_path": "/data/deploy/tikv-20160/bin",
    "last_heartbeat": xx,
    "state_name": "Down"
  },
  "status": {
    "capacity": "0B",
    "available": "0B",
    "used_size": "0B",
    "leader_count": 2188,
    "leader_weight": 1,
    "leader_score": 2188,
    "leader_size": 0,
    "region_count": 6071,
    "region_weight": 1,
    "region_score": 380569,
    "region_size": 380569,
    "start_ts": "2020-09-15T16:20:18+08:00",
    "last_heartbeat_ts": "2020-09-18T16:05:54.689287451+08:00",
    "uptime": "71h45m36.689287451s"
  }
},

显示是down 但是我telnet 对应的20160 20180端口都是通的,不清楚是什么原因导致的这种情况

  1. 请上传下对应的 tikv 日志
  2. 当前业务可以正常使用吗?

[2020/09/18 16:04:58.099 +08:00] [INFO] [raft.rs:1192] ["[logterm: 5, index: 5, vote: 0] cast vote for 565975 [logterm: 5, index: 5] at term 6"] [“msg type”=MsgRequestVote] [term=6] [msg_index=5] [msg_term=5] [from=565975] [vote=0] [log_index=5] [log_term=5] [raft_id=565976] [region_id=565973]
[2020/09/18 16:05:18.987 +08:00] [INFO] [compact.rs:120] [“compact range finished”] [time_takes=111.518µs] [cf=default] [range_end=7A7480000000000000FF465F728000000000FF4C49EB0000000000FA] [range_start=7A7480000000000000FF465F728000000000FF152D9F0000000000FA]
[2020/09/18 16:06:26.037 +08:00] [INFO] [gc_manager.rs:456] [“gc_worker: finished auto gc”] [processed_regions=2189]
[2020/09/18 16:06:40.262 +08:00] [INFO] [compact.rs:120] [“compact range finished”] [time_takes=81.275336586s] [cf=write] [range_end=7A7480000000000000FF465F728000000000FF4C49EB0000000000FA] [range_start=7A7480000000000000FF465F728000000000FF152D9F0000000000FA]
[2020/09/18 16:06:40.262 +08:00] [INFO] [compact.rs:120] [“compact range finished”] [time_takes=58.154µs] [cf=default] [range_end=7A7480000000000000FF465F728000000001FF57CB4B0000000000FA] [range_start=7A7480000000000000FF465F728000000001FF49A6E00000000000FA]
[2020/09/18 16:08:03.491 +08:00] [INFO] [compact.rs:120] [“compact range finished”] [time_takes=83.228340517s] [cf=write] [range_end=7A7480000000000000FF465F728000000001FF57CB4B0000000000FA] [range_start=7A7480000000000000FF465F728000000001FF49A6E00000000000FA]
[2020/09/18 16:08:03.491 +08:00] [INFO] [compact.rs:120] [“compact range finished”] [time_takes=64.666µs] [cf=default] [range_end=7A7480000000000000FF465F728000000002FF9AB5AF0000000000FA] [range_start=7A7480000000000000FF465F728000000002FF9398FC0000000000FA]
[2020/09/18 16:08:03.491 +08:00] [INFO] [compact.rs:120] [“compact range finished”] [time_takes=72.065µs] [cf=write] [range_end=7A7480000000000000FF465F728000000002FF9AB5AF0000000000FA] [range_start=7A7480000000000000FF465F728000000002FF9398FC0000000000FA]
[2020/09/18 16:08:03.491 +08:00] [INFO] [compact.rs:120] [“compact range finished”] [time_takes=47.957µs] [cf=default] [range_end=7A7480000000000000FF465F728000000005FF25B90F0000000000FA] [range_start=7A7480000000000000FF465F728000000005FF1EDF680000000000FA]
[2020/09/18 16:08:03.491 +08:00] [INFO] [compact.rs:120] [“compact range finished”] [time_takes=62.509µs] [cf=write] [range_end=7A7480000000000000FF465F728000000005FF25B90F0000000000FA] [range_start=7A7480000000000000FF465F728000000005FF1EDF680000000000FA]
[2020/09/18 16:08:03.491 +08:00] [INFO] [compact.rs:120] [“compact range finished”] [time_takes=34.029µs] [cf=default] [range_end=7A7480000000000000FF465F728000000006FF5ECC0A0000000000FA] [range_start=7A7480000000000000FF465F728000000006FF57D9E20000000000FA]
[2020/09/18 16:08:03.491 +08:00] [INFO] [compact.rs:120] [“compact range finished”] [time_takes=21.631µs] [cf=write] [range_end=7A7480000000000000FF465F728000000006FF5ECC0A0000000000FA] [range_start=7A7480000000000000FF465F728000000006FF57D9E20000000000FA]
[2020/09/18 16:08:03.491 +08:00] [INFO] [compact.rs:120] [“compact range finished”] [time_takes=17.416µs] [cf=default] [range_end=7A7480000000000000FF465F728000000006FF6C78DA0000000000FA] [range_start=7A7480000000000000FF465F728000000006FF65B1770000000000FA]
[2020/09/18 16:08:03.491 +08:00] [INFO] [compact.rs:120] [“compact range finished”] [time_takes=18.399µs] [cf=write] [range_end=7A7480000000000000FF465F728000000006FF6C78DA0000000000FA] [range_start=7A7480000000000000FF465F728000000006FF65B1770000000000FA]
[2020/09/18 16:08:03.491 +08:00] [INFO] [compact.rs:120] [“compact range finished”] [time_takes=21.861µs] [cf=default] [range_end=7A7480000000000000FF465F728000000007FFAA78510000000000FA] [range_start=7A7480000000000000FF465F728000000007FFA390080000000000FA]
[2020/09/18 16:08:03.491 +08:00] [INFO] [compact.rs:120] [“compact range finished”] [time_takes=20.821µs] [cf=write] [range_end=7A7480000000000000FF465F728000000007FFAA78510000000000FA] [range_start=7A7480000000000000FF465F728000000007FFA390080000000000FA]
[2020/09/18 16:08:03.491 +08:00] [INFO] [compact.rs:120] [“compact range finished”] [time_takes=16.946µs] [cf=default] [range_end=7A7480000000000000FF465F728000000008FFE7E5430000000000FA] [range_start=7A7480000000000000FF465F728000000007FFB1546C0000000000FA]
[2020/09/18 16:08:03.491 +08:00] [INFO] [compact.rs:120] [“compact range finished”] [time_takes=19.411µs] [cf=write] [range_end=7A7480000000000000FF465F728000000008FFE7E5430000000000FA] [range_start=7A7480000000000000FF465F728000000007FFB1546C0000000000FA]
[2020/09/18 16:08:03.491 +08:00] [INFO] [compact.rs:120] [“compact range finished”] [time_takes=17.087µs] [cf=default] [range_end=7A7480000000000000FF465F72800000000BFF71D7E60000000000FA] [range_start=7A7480000000000000FF465F72800000000BFF6AC8820000000000FA]
[2020/09/18 16:08:03.491 +08:00] [INFO] [compact.rs:120] [“compact range finished”] [time_takes=19.232µs] [cf=write] [range_end=7A7480000000000000FF465F72800000000BFF71D7E60000000000FA] [range_start=7A7480000000000000FF465F72800000000BFF6AC8820000000000FA]

这个是2020-09-18T16:05:54前后的对应tikv节点日志,看着是服务正常,我尝试telnet 20160 20180端口,都是可以通的

  1. 看起来tikv在做gc 操作,能否在 tikv.log 日志 grep 一下 Welcome 关键词,看看是否有过重启
  2. 麻烦上传 detail-tikv 监控日志,多谢。
    [FAQ] Grafana Metrics 页面的导出和导入

有重启记录,但是那个重启记录是我看到tikv节点down掉以后,使用tiup触发的重启操作,监控数据如附件


Tidb-Online-TiKV-Details_2020-09-22T07_22_44.067Z.json (472.6 KB)

请问一下现在 tiup cluster display TiKV 节点的状态还是 down 吗?
如果是的话麻烦提供完整的中控机请求:
curl “{pdip}:{pd_port}/pd/api/v1/stores?state=0&state=1&state=2” 的结果。IP 可以处理一下再上传。

在使用tiup重启tikv节点后,状态就变为UP了,就是出现这个情况非常奇怪

Tiup cluster display 里面 TiKV 的 status 字段,目前的逻辑是使用 curl “{pdip}:{pd_port}/pd/api/v1/stores?state=0&state=1&state=2” 这个接口反回的数据进行解析的。后续如果再出现该问题,可以先把这个数据保留一下。