tikv节点 192.168.241.53 先缩容,重装系统 后扩容,有报错

【TiDB 版本】
v4.0.9
【问题描述】
tikv节点 192.168.241.53 先缩容,重装系统 后扩容,有报错,请问这个有影响吗?


问题2,tiup的时候一直提示有新版本,这个可以升级吗? 还有集群可以直接在线升级吗?有啥影响?

问题一:

  1. 缩容的时候执行 scale-in 操作之后,有等待节点状态变为 tombstone 吗?
  2. 节点状态变为 tombstone 之后,有执行 tiup cluster prune 操作么?
  3. 可以提供一下 pd-ctl 执行 store 命令的结果看下

问题二:

  1. 可以通过 tiup update --self 升级 TiUP 本身,通过 tiup update cluster 升级 cluster 组件
  2. 集群可以在线升级,可以参考升级 SOP 文档:【SOP 系列 11】3.0 线上集群升级 4.0

192.168.241.53 scale-in后,等待状态变为tombstone,后执行了tiup clusterprune操作,重装系统后,又把192.168.241.53扩容进了集群。
下面是store日志:

[tidb@b16 ~]$  pd-ctl -u 192.168.241.24:12379
» store
{
  "count": 7,
  "stores": [
    {
      "store": {
        "id": 3353181,
        "address": "192.168.241.59:20160",
        "version": "4.0.9",
        "status_address": "192.168.241.59:20180",
        "git_hash": "18dec72b12eafdc40a463eee8f6c32594ee4a9ff",
        "start_timestamp": 1614096585,
        "deploy_path": "/disk1/tikv/bin",
        "last_heartbeat": 1616673956450655088,
        "state_name": "Up"
      },
      "status": {
        "capacity": "1.719TiB",
        "available": "945.9GiB",
        "used_size": "695.7GiB",
        "leader_count": 13039,
        "leader_weight": 1,
        "leader_score": 13039,
        "leader_size": 778977,
        "region_count": 34957,
        "region_weight": 1,
        "region_score": 2065303,
        "region_size": 2065303,
        "start_ts": "2021-02-24T00:09:45+08:00",
        "last_heartbeat_ts": "2021-03-25T20:05:56.450655088+08:00",
        "uptime": "715h56m11.450655088s"
      }
    },
    {
      "store": {
        "id": 3353983,
        "address": "192.168.241.61:20160",
        "version": "4.0.9",
        "status_address": "192.168.241.61:20180",
        "git_hash": "18dec72b12eafdc40a463eee8f6c32594ee4a9ff",
        "start_timestamp": 1614096601,
        "deploy_path": "/disk1/tikv/bin",
        "last_heartbeat": 1616673956418596891,
        "state_name": "Up"
      },
      "status": {
        "capacity": "1.719TiB",
        "available": "977.3GiB",
        "used_size": "676.2GiB",
        "leader_count": 12989,
        "leader_weight": 1,
        "leader_score": 12989,
        "leader_size": 783162,
        "region_count": 34624,
        "region_weight": 1,
        "region_score": 2069845,
        "region_size": 2069845,
        "start_ts": "2021-02-24T00:10:01+08:00",
        "last_heartbeat_ts": "2021-03-25T20:05:56.418596891+08:00",
        "uptime": "715h55m55.418596891s"
      }
    },
    {
      "store": {
        "id": 3353984,
        "address": "192.168.241.60:20160",
        "version": "4.0.9",
        "status_address": "192.168.241.60:20180",
        "git_hash": "18dec72b12eafdc40a463eee8f6c32594ee4a9ff",
        "start_timestamp": 1614067793,
        "deploy_path": "/disk1/tikv/bin",
        "last_heartbeat": 1616673962741807804,
        "state_name": "Up"
      },
      "status": {
        "capacity": "1.719TiB",
        "available": "957.4GiB",
        "used_size": "695.2GiB",
        "leader_count": 12939,
        "leader_weight": 1,
        "leader_score": 12939,
        "leader_size": 773383,
        "region_count": 34840,
        "region_weight": 1,
        "region_score": 2063991,
        "region_size": 2063991,
        "start_ts": "2021-02-23T16:09:53+08:00",
        "last_heartbeat_ts": "2021-03-25T20:06:02.741807804+08:00",
        "uptime": "723h56m9.741807804s"
      }
    },
    {
      "store": {
        "id": 22457949,
        "address": "192.168.241.53:20160",
        "version": "4.0.9",
        "status_address": "192.168.241.53:20180",
        "git_hash": "18dec72b12eafdc40a463eee8f6c32594ee4a9ff",
        "start_timestamp": 1616553875,
        "deploy_path": "/disk1/tikv/bin",
        "last_heartbeat": 1616673954595699163,
        "state_name": "Up"
      },
      "status": {
        "capacity": "1.791TiB",
        "available": "1.262TiB",
        "used_size": "535.5GiB",
        "leader_count": 12462,
        "leader_weight": 1,
        "leader_score": 12462,
        "leader_size": 741286,
        "region_count": 25723,
        "region_weight": 1,
        "region_score": 1538773,
        "region_size": 1538773,
        "start_ts": "2021-03-24T10:44:35+08:00",
        "last_heartbeat_ts": "2021-03-25T20:05:54.595699163+08:00",
        "uptime": "33h21m19.595699163s"
      }
    },
    {
      "store": {
        "id": 189492,
        "address": "192.168.241.56:20160",
        "version": "4.0.9",
        "status_address": "192.168.241.56:20180",
        "git_hash": "18dec72b12eafdc40a463eee8f6c32594ee4a9ff",
        "start_timestamp": 1612741285,
        "deploy_path": "/disk1/tikv/bin",
        "last_heartbeat": 1616673954883251634,
        "state_name": "Up"
      },
      "status": {
        "capacity": "1.791TiB",
        "available": "1.086TiB",
        "used_size": "662.3GiB",
        "leader_count": 217,
        "leader_weight": 1,
        "leader_score": 217,
        "leader_size": 21889,
        "region_count": 33570,
        "region_weight": 1,
        "region_score": 2122243,
        "region_size": 2122243,
        "start_ts": "2021-02-08T07:41:25+08:00",
        "last_heartbeat_ts": "2021-03-25T20:05:54.883251634+08:00",
        "uptime": "1092h24m29.883251634s"
      }
    },
    {
      "store": {
        "id": 326765,
        "address": "192.168.241.58:20160",
        "version": "4.0.9",
        "status_address": "192.168.241.58:20180",
        "git_hash": "18dec72b12eafdc40a463eee8f6c32594ee4a9ff",
        "start_timestamp": 1610968293,
        "deploy_path": "/disk1/tikv/bin",
        "last_heartbeat": 1616673954290521985,
        "state_name": "Up"
      },
      "status": {
        "capacity": "1.471TiB",
        "available": "837.5GiB",
        "used_size": "641.8GiB",
        "leader_count": 12825,
        "leader_weight": 1,
        "leader_score": 12825,
        "leader_size": 757871,
        "region_count": 35186,
        "region_weight": 1,
        "region_score": 2058779,
        "region_size": 2058779,
        "start_ts": "2021-01-18T19:11:33+08:00",
        "last_heartbeat_ts": "2021-03-25T20:05:54.290521985+08:00",
        "uptime": "1584h54m21.290521985s"
      }
    },
    {
      "store": {
        "id": 503256,
        "address": "192.168.241.11:20160",
        "version": "4.0.9",
        "status_address": "192.168.241.11:20180",
        "git_hash": "18dec72b12eafdc40a463eee8f6c32594ee4a9ff",
        "start_timestamp": 1612784076,
        "deploy_path": "/disk1/tikv/bin",
        "last_heartbeat": 1616673957830294073,
        "state_name": "Up"
      },
      "status": {
        "capacity": "1.791TiB",
        "available": "1.15TiB",
        "used_size": "629.4GiB",
        "leader_count": 13318,
        "leader_weight": 1,
        "leader_score": 13318,
        "leader_size": 795938,
        "region_count": 34469,
        "region_weight": 1,
        "region_score": 2038769,
        "region_size": 2038769,
        "start_ts": "2021-02-08T19:34:36+08:00",
        "last_heartbeat_ts": "2021-03-25T20:05:57.830294073+08:00",
        "uptime": "1080h31m21.830294073s"
      }
    }
  ]
}

»
  1. 目前 tikv 日志中还是有持续报这个错误吗?
  2. 业务侧有什么影响?

刚才看了下,有些tikv日志还有持续在输出错,


同事有反馈朝tidb导数据的时候进程被杀,还有反应超时断开连接的 跟这个有关系吗?

超时断开连接返回的错误是什么?
可能是 max-execute-timeout 参数设置的影响。

错误信息开发没有记录,重新导入后说成功了。
max-execute-timeout 这个参数是需要在哪里配置的呢?

https://docs.pingcap.com/zh/tidb/stable/system-variables#max_execution_time

我tiup cluster edit-config xxxx-test 进去过滤 max_execution_time这个参数过滤不到,是不说明就是默认的,默认的是没限制的,那就不是这个问题了。还有其他的可能 吗?
刚过滤了下日志,还有打印错误

可以看下 PD 监控面板 -> region health 的监控项内容么?
看下是不是有异常的 region

此话题已在最后回复的 1 分钟后被自动关闭。不再允许新回复。