在磁盘大小不均匀的机器上,硬生生将磁盘小的tikv机器磁盘写满,机器down掉

【 TiDB 使用环境】
【概述】:场景 + 问题概述
我有一些3T的盘(ssd),一些1T的盘(ssd)。tikv在存储的时候没有考虑剩余容量,硬生生把tikv给写死。

【背景】:做过哪些操作
扩容加过3T的tikv的盘
【现象】:业务和数据库现象
tikv down不能启动
【问题】:当前遇到的问题
tikv down不能启动
【业务影响】:
由于只是一个tikv,对业务没有影响
【TiDB 版本】:
Cluster version: v5.2.1
【附件】:

  • 相关日志
    [2021/12/27 10:01:51.232 +08:00] [INFO] [mod.rs:118] [“encryption: none of key dictionary and file dictionary are found.”]
    [2021/12/27 10:01:51.232 +08:00] [INFO] [mod.rs:479] [“encryption is disabled.”]
    [2021/12/27 10:01:51.232 +08:00] [ERROR] [server.rs:1030] [“failed to init io snooper”] [err_code=KV:Unknown] [err="“IO snooper is not started due to not compiling with BCC”"]
    [2021/12/27 10:01:51.517 +08:00] [FATAL] [server.rs:1231] [“failed to create raft engine: Storage Engine IO error: No space left on deviceWhile appending to file: /tidb_ssd_data/tikv-20160/raft/471853.sst: No space left on device”]
    ~ ~
  • 配置文件
    Starting component cluster: /home/tidb/.tiup/components/cluster/v1.7.0/tiup-cluster display tidb_prod
    Cluster type: tidb
    Cluster name: tidb_prod
    Cluster version: v5.2.1
    Deploy user: tidb
    SSH type: builtin
    Dashboard URL: http://192.168.0.66:2379/dashboard
    ID Role Host Ports OS/Arch Status Data Dir Deploy Dir

192.168.0.44:9093 alertmanager 192.168.0.44 9093/9094 linux/x86_64 Up /home/tidb/tidb-data/alertmanager-9093 /home/tidb/tidb-deploy/alertmanager-9093
116.57.100.238:3000 grafana 116.57.100.238 3000 linux/x86_64 Up - /home/tidb/tidb-deploy/grafana-3000
192.168.0.55:3000 grafana 192.168.0.55 3000 linux/x86_64 Up - /home/tidb/tidb-deploy/grafana-3000
192.168.0.44:2379 pd 192.168.0.44 2379/2380 linux/x86_64 Up /home/tidb/tidb-data/pd-2379 /home/tidb/tidb-deploy/pd-2379
192.168.0.55:2379 pd 192.168.0.55 2379/2380 linux/x86_64 Up|L /home/tidb/tidb-data/pd-2379 /home/tidb/tidb-deploy/pd-2379
192.168.0.66:2379 pd 192.168.0.66 2379/2380 linux/x86_64 Up|UI /home/tidb/tidb-data/pd-2379 /home/tidb/tidb-deploy/pd-2379
192.168.0.77:2379 pd 192.168.0.77 2379/2380 linux/x86_64 Up /home/tidb/tidb-data/pd-2379 /home/tidb/tidb-deploy/pd-2379
192.168.0.88:2379 pd 192.168.0.88 2379/2380 linux/x86_64 Up /home/tidb/tidb-data/pd-2379 /home/tidb/tidb-deploy/pd-2379
192.168.0.44:9090 prometheus 192.168.0.44 9090 linux/x86_64 Up /home/tidb/tidb-data/prometheus-9090 /home/tidb/tidb-deploy/prometheus-9090
192.168.0.44:4000 tidb 192.168.0.44 4000/10080 linux/x86_64 Up - /home/tidb/tidb-deploy/tidb-4000
192.168.0.55:4000 tidb 192.168.0.55 4000/10080 linux/x86_64 Up - /home/tidb/tidb-deploy/tidb-4000
192.168.0.66:4000 tidb 192.168.0.66 4000/10080 linux/x86_64 Up - /home/tidb/tidb-deploy/tidb-4000
192.168.0.77:4000 tidb 192.168.0.77 4000/10080 linux/x86_64 Up - /home/tidb/tidb-deploy/tidb-4000
192.168.0.88:4000 tidb 192.168.0.88 4000/10080 linux/x86_64 Up - /home/tidb/tidb-deploy/tidb-4000
192.168.0.11:20160 tikv 192.168.0.11 20160/20180 linux/x86_64 Tombstone /tidb_ssd_data/tikv-20160 /home/tidb/tidb-deploy/tikv-20160
192.168.0.33:20160 tikv 192.168.0.33 20160/20180 linux/x86_64 Down /tidb_ssd_data/tikv-20160 /home/tidb/tidb-deploy/tikv-20160
192.168.0.44:20160 tikv 192.168.0.44 20160/20180 linux/x86_64 Up /tidb_ssd_data/tikv-20160 /home/tidb/tidb-deploy/tikv-20160
192.168.0.55:20160 tikv 192.168.0.55 20160/20180 linux/x86_64 Up /tidb_ssd_data/tikv-20160 /home/tidb/tidb-deploy/tikv-20160
192.168.0.66:20160 tikv 192.168.0.66 20160/20180 linux/x86_64 Up /tidb_ssd_data/tikv-20160 /home/tidb/tidb-deploy/tikv-20160
192.168.0.77:20160 tikv 192.168.0.77 20160/20180 linux/x86_64 Up /tidb_ssd_data/tikv-20160 /home/tidb/tidb-deploy/tikv-20160
192.168.0.88:20160 tikv 192.168.0.88 20160/20180 linux/x86_64 Up /tidb_ssd_data/tikv-20160 /home/tidb/tidb-deploy/tikv-20160
Total nodes: 21

1 个赞

数据文件目录placeholder文件可以删掉,释放些空间

1 个赞

这样做也行可以救回来一些。但也不是解决问题之道呀。 这应该从代码层次上杜绝这个问题呀。

1 个赞

这个是应急措施

1 个赞

此话题已在最后回复的 1 分钟后被自动关闭。不再允许新回复。