tivk raft 无法选举

【 TiDB 使用环境】测试
【 TiDB 版本】6.5.0
【复现路径】
我在修改配置过程中发现tikv只存在两个节点。
(show config where type = ‘tikv’ and name like ‘%enable-compaction-filter%’;)


后我用SHOW CONFIG LIKE ‘tikv’;也查了下也只能查到两个节点。
我又核查了集群状态,集群状态时正常的。

我又登陆机器核查了tikv日志信息发现日志中有大量选举日志。

有大神帮吗看下什么问题吗?

还有我的监控中一直告警我5个服务离线,这个应该怎么排查。

通过Dashboard 查阅下集群状态,

之前是不是做过什么操作导致目前这个状态?

pd-ctl store 看看store的状态。截个图出来。比如说有没有is_busy之类的。
然后去那台down的tikv去看日志,看看最后的日志是什么
另外,看看这个tikv和其他节点的网络通不通。

之前做过 集群 Deploy Dir 的迁移。使用的 专栏 - 使用TiUP 修改集群目录实践 | TiDB 社区 里面的方法。 集群检查 和集群状态时合适的。




tiup cluster check zdww-tidb --cluster 检查后集群也没什么问题

网络时通的。
image
日志中没有ERROR信息。

日志中 没有down的信息 就一直在选举 。
附件为节点日志 ,请帮忙看看。
tikv.rar (3.2 MB)

有大神帮忙看看吗。

网络通不通看了吗?
pd-ctl store 看下这个tikv的信息,拿上来。

网络是通的 ping 都是OK的,以下是节点信息

Starting component `ctl`: /home/tidb/.tiup/components/ctl/v6.5.0/ctl pd -u http://10.18.104.156:2379 -i
» store 
{
  "count": 4,
  "stores": [
    {
      "store": {
        "id": 1,
        "address": "10.18.104.161:20160",
        "version": "6.5.0",
        "peer_address": "10.18.104.161:20160",
        "status_address": "10.18.104.161:20180",
        "git_hash": "47b81680f75adc4b7200480cea5dbe46ae07c4b5",
        "start_timestamp": 1685072490,
        "deploy_path": "/home/tidb/tidb-deploy/tikv-20160/bin",
        "last_heartbeat": 1685357940322646958,
        "state_name": "Up"
      },
      "status": {
        "capacity": "116.9GiB",
        "available": "52.15GiB",
        "used_size": "10.91GiB",
        "leader_count": 118,
        "leader_weight": 1,
        "leader_score": 118,
        "leader_size": 385,
        "region_count": 332,
        "region_weight": 1,
        "region_score": 83519.07268055658,
        "region_size": 9142,
        "witness_count": 0,
        "slow_score": 1,
        "start_ts": "2023-05-26T11:41:30+08:00",
        "last_heartbeat_ts": "2023-05-29T18:59:00.322646958+08:00",
        "uptime": "79h17m30.322646958s"
      }
    },
    {
      "store": {
        "id": 2,
        "address": "10.18.104.163:20160",
        "version": "6.5.0",
        "peer_address": "10.18.104.163:20160",
        "status_address": "10.18.104.163:20180",
        "git_hash": "47b81680f75adc4b7200480cea5dbe46ae07c4b5",
        "start_timestamp": 1685342282,
        "deploy_path": "/home/tidb/tidb-deploy/tikv-20160/bin",
        "last_heartbeat": 1685357943424142105,
        "state_name": "Up"
      },
      "status": {
        "capacity": "116.9GiB",
        "available": "47.55GiB",
        "used_size": "11.25GiB",
        "leader_count": 0,
        "leader_weight": 1,
        "leader_score": 0,
        "leader_size": 0,
        "region_count": 332,
        "region_weight": 1,
        "region_score": 491046795.2541935,
        "region_size": 9142,
        "witness_count": 0,
        "slow_score": 1,
        "start_ts": "2023-05-29T14:38:02+08:00",
        "last_heartbeat_ts": "2023-05-29T18:59:03.424142105+08:00",
        "uptime": "4h21m1.424142105s"
      }
    },
    {
      "store": {
        "id": 179,
        "address": "10.18.104.165:3930",
        "labels": [
          {
            "key": "engine",
            "value": "tiflash"
          }
        ],
        "version": "v6.5.0",
        "peer_address": "10.18.104.165:20170",
        "status_address": "10.18.104.165:20292",
        "git_hash": "41c08dbe20901f6cfd28ce642b39ce53f35ef48a",
        "start_timestamp": 1684824077,
        "deploy_path": "/home/tidb/tidb-deploy/tiflash-9000/bin/tiflash",
        "last_heartbeat": 1685357943309588754,
        "state_name": "Up"
      },
      "status": {
        "capacity": "116.9GiB",
        "available": "69.35GiB",
        "used_size": "1B",
        "leader_count": 0,
        "leader_weight": 1,
        "leader_score": 0,
        "leader_size": 0,
        "region_count": 0,
        "region_weight": 1,
        "region_score": 0,
        "region_size": 0,
        "witness_count": 0,
        "slow_score": 1,
        "start_ts": "2023-05-23T14:41:17+08:00",
        "last_heartbeat_ts": "2023-05-29T18:59:03.309588754+08:00",
        "uptime": "148h17m46.309588754s"
      }
    },
    {
      "store": {
        "id": 31001,
        "address": "10.18.104.154:20160",
        "version": "6.5.0",
        "peer_address": "10.18.104.154:20160",
        "status_address": "10.18.104.154:20180",
        "git_hash": "47b81680f75adc4b7200480cea5dbe46ae07c4b5",
        "start_timestamp": 1685072396,
        "deploy_path": "/home/tidb/tidb-deploy/tikv-20160/bin",
        "last_heartbeat": 1685357936559721705,
        "state_name": "Up"
      },
      "status": {
        "capacity": "145GiB",
        "available": "110.3GiB",
        "used_size": "9.073GiB",
        "leader_count": 214,
        "leader_weight": 1,
        "leader_score": 214,
        "leader_size": 8757,
        "region_count": 332,
        "region_weight": 1,
        "region_score": 33211.80239468139,
        "region_size": 9142,
        "witness_count": 0,
        "slow_score": 1,
        "start_ts": "2023-05-26T11:39:56+08:00",
        "last_heartbeat_ts": "2023-05-29T18:58:56.559721705+08:00",
        "uptime": "79h19m0.559721705s"
      }
    }
  ]
}

看这个tikv也在心跳啊。是有驱逐leader的scheduler吗?
pd-ctl scheduler show 看看?
其他的想不出为什么了

您给看下。

Starting component `ctl`: /home/tidb/.tiup/components/ctl/v6.5.0/ctl pd -u http://10.18.104.156:2379 -i
» scheduler show
[
  "balance-hot-region-scheduler",
  "balance-leader-scheduler",
  "balance-region-scheduler",
  "split-bucket-scheduler"
]

» 

这是我重启tikv的启动日志。
tikv.log (367.3 KB)

顶一下

你三个tikv 的 region_score 差别太大了,尤其10.18.104.163,比其他两个节点高了太多
leader_score还为0

查查你最近对这个节点做过什么操作

1 个赞

确实是这个score太高了,score是怎么算的我没研究过,暂时给不出什么建议了。

就迁移了下 数据库目录 再也没有其他操作啊。
这个有什么办法排查下吗 ?

把数据备份一下,重装得了…

  1. 不知道你迁移目录的过程到底发生了什么,信息比较少

  2. 如果确认没有驱逐leader的动作,可以检查下163的数据盘有没有异常

  3. 实在不行,就新扩个节点,然后把63下掉吧;再不行就备份数据重装吧

    :smiling_imp::smiling_imp::smiling_imp::smiling_imp::smiling_imp::smiling_imp::smiling_imp::smiling_imp::smiling_imp::smiling_imp::smiling_imp::smiling_imp::smiling_imp:

各位大神那再问问

这里的监控应该怎么排查下。

重启下监控试试 grafana prometheus alertmanager 都重启看看

重启了 也不好使啊

看看对应节点的日志

monitored:
node_exporter_port: 9100
blackbox_exporter_port: 9115
deploy_dir: /home/tidb/tidb-deploy/monitored
data_dir: /home/tidb/tidb-data/monitored
log_dir: /home/tidb/tidb-deploy/monitored/log