tiflash自动关闭

【 TiDB 使用环境】生产环境
【 TiDB 版本】
【复现路径】做过哪些操作出现的问题
【遇到的问题:问题现象及影响】
集群今天前几天同步好的,今天把流量切进来。刚发现有一台tiflash异常down掉。启动不起来,启动自动又stop了
日志如下
[2023/11/22 20:11:51.331 +08:00] [DEBUG] [Segment.cpp:1923] [“Finish segment getReadInfo”] [source=“table_id=1267 segment_id=26930 epoch=57”]

[thread_id=24]

[2023/11/22 20:11:51.331 +08:00] [DEBUG] [Segment.cpp:1930] [“Segment updated delta index”] [source=“table_id=1267 segment_id=26930 epoch=57”

] [thread_id=24]

[2023/11/22 20:11:51.336 +08:00] [DEBUG] [Segment.cpp:2098] ["Finish segment ensurePlace, read_ranges={[-9223372036854775808,9223372036854775

807)} placed_items=1 shared_delta_index=<placed_rows=0 placed_deletes=0 tree_entries=0 tree_inserts=0 tree_deletes=0> my_delta_index=<placed_

rows=63805 placed_deletes=0 tree_entries=63713 tree_inserts=63713 tree_deletes=0>"] [source=“table_id=1267 segment_id=12422 epoch=68”] [threa

d_id=30]

[2023/11/22 20:11:51.337 +08:00] [DEBUG] [Segment.cpp:1923] [“Finish segment getReadInfo”] [source=“table_id=1267 segment_id=12422 epoch=68”]

[thread_id=30]

[2023/11/22 20:11:51.337 +08:00] [DEBUG] [Segment.cpp:1930] [“Segment updated delta index”] [source=“table_id=1267 segment_id=12422 epoch=68”

] [thread_id=30]

[2023/11/22 20:11:51.339 +08:00] [DEBUG] [Segment.cpp:2098] ["Finish segment ensurePlace, read_ranges={[-9223372036854775808,9223372036854775

807)} placed_items=1 shared_delta_index=<placed_rows=0 placed_deletes=0 tree_entries=0 tree_inserts=0 tree_deletes=0> my_delta_index=<placed_

rows=55684 placed_deletes=0 tree_entries=55574 tree_inserts=55574 tree_deletes=0>"] [source=“table_id=1267 segment_id=26951 epoch=12”] [threa

d_id=38]

[2023/11/22 20:11:51.339 +08:00] [DEBUG] [Segment.cpp:1923] [“Finish segment getReadInfo”] [source=“table_id=1267 segment_id=26951 epoch=12”]

[thread_id=38]

[2023/11/22 20:11:51.339 +08:00] [DEBUG] [Segment.cpp:1930] [“Segment updated delta index”] [source=“table_id=1267 segment_id=26951 epoch=12”

] [thread_id=38]

[2023/11/22 20:11:52.299 +08:00] [DEBUG] [SegmentReader.cpp:45] [“Stop begin”] [source=SegmentReader] [thread_id=37]

[2023/11/22 20:11:52.299 +08:00] [INFO] [SegmentReader.cpp:94] [“Pop fail, stop=true”] [source=SegmentReader] [thread_id=3]

[2023/11/22 20:11:52.299 +08:00] [INFO] [SegmentReader.cpp:94] [“Pop fail, stop=true”] [source=SegmentReader] [thread_id=6]

[2023/11/22 20:11:52.299 +08:00] [INFO] [SegmentReader.cpp:94] [“Pop fail, stop=true”] [source=SegmentReader] [thread_id=9]

[2023/11/22 20:11:52.299 +08:00] [INFO] [SegmentReader.cpp:94] [“Pop fail, stop=true”] [source=SegmentReader] [thread_id=11]

[2023/11/22 20:11:52.299 +08:00] [INFO] [SegmentReader.cpp:94] [“Pop fail, stop=true”] [source=SegmentReader] [thread_id=14]

[2023/11/22 20:11:52.299 +08:00] [INFO] [SegmentReader.cpp:94] [“Pop fail, stop=true”] [source=SegmentReader] [thread_id=16]

[2023/11/22 20:11:52.299 +08:00] [INFO] [SegmentReader.cpp:94] [“Pop fail, stop=true”] [source=SegmentReader] [thread_id=8]

[2023/11/22 20:11:52.299 +08:00] [INFO] [SegmentReader.cpp:94] [“Pop fail, stop=true”] [source=SegmentReader] [thread_id=4]

[2023/11/22 20:11:52.299 +08:00] [INFO] [SegmentReader.cpp:94] [“Pop fail, stop=true”] [source=SegmentReader] [thread_id=10]

[2023/11/22 20:11:52.299 +08:00] [INFO] [SegmentReader.cpp:94] [“Pop fail, stop=true”] [source=SegmentReader] [thread_id=2]

[2023/11/22 20:11:52.299 +08:00] [INFO] [SegmentReader.cpp:94] [“Pop fail, stop=true”] [source=SegmentReader] [thread_id=12]

[2023/11/22 20:11:52.299 +08:00] [INFO] [SegmentReader.cpp:94] [“Pop fail, stop=true”] [source=SegmentReader] [thread_id=13]

[2023/11/22 20:11:52.299 +08:00] [INFO] [SegmentReader.cpp:94] [“Pop fail, stop=true”] [source=SegmentReader] [thread_id=7]

[2023/11/22 20:11:52.299 +08:00] [INFO] [SegmentReader.cpp:94] [“Pop fail, stop=true”] [source=SegmentReader] [thread_id=15]

[2023/11/22 20:11:52.299 +08:00] [INFO] [SegmentReader.cpp:94] [“Pop fail, stop=true”] [source=SegmentReader] [thread_id=5]

[2023/11/22 20:11:52.299 +08:00] [INFO] [SegmentReader.cpp:94] [“Pop fail, stop=true”] [source=SegmentReader] [thread_id=17]

[2023/11/22 20:11:52.299 +08:00] [DEBUG] [SegmentReader.cpp:47] [“Stop end”] [source=SegmentReader] [thread_id=37]

[2023/11/22 20:11:52.299 +08:00] [DEBUG] [SegmentReader.cpp:45] [“Stop begin”] [source=SegmentReader] [thread_id=37]

[2023/11/22 20:11:52.299 +08:00] [DEBUG] [SegmentReader.cpp:47] [“Stop end”] [source=SegmentReader] [thread_id=37]

[2023/11/22 20:11:52.299 +08:00] [DEBUG] [SegmentReader.cpp:45] [“Stop begin”] [source=SegmentReader] [thread_id=37]

[2023/11/22 20:11:52.299 +08:00] [DEBUG] [SegmentReader.cpp:47] [“Stop end”] [source=SegmentReader] [thread_id=37]

[2023/11/22 20:11:52.299 +08:00] [DEBUG] [SegmentReader.cpp:45] [“Stop begin”] [source=SegmentReader] [thread_id=37]

[2023/11/22 20:11:52.299 +08:00] [DEBUG] [SegmentReader.cpp:47] [“Stop end”] [source=SegmentReader] [thread_id=37]

[2023/11/22 20:11:52.299 +08:00] [DEBUG] [SegmentReader.cpp:45] [“Stop begin”] [source=SegmentReader] [thread_id=37]

[2023/11/22 20:11:52.299 +08:00] [DEBUG] [SegmentReader.cpp:47] [“Stop end”] [source=SegmentReader] [thread_id=37]

[2023/11/22 20:11:52.299 +08:00] [DEBUG] [SegmentReader.cpp:45] [“Stop begin”] [source=SegmentReader] [thread_id=37]

[2023/11/22 20:11:52.299 +08:00] [DEBUG] [SegmentReader.cpp:47] [“Stop end”] [source=SegmentReader] [thread_id=37]

[2023/11/22 20:11:52.299 +08:00] [DEBUG] [SegmentReader.cpp:45] [“Stop begin”] [source=SegmentReader] [thread_id=37]

[2023/11/22 20:11:52.299 +08:00] [DEBUG] [SegmentReader.cpp:47] [“Stop end”] [source=SegmentReader] [thread_id=37]

[2023/11/22 20:11:52.299 +08:00] [DEBUG] [SegmentReader.cpp:45] [“Stop begin”] [source=SegmentReader] [thread_id=37]

[2023/11/22 20:11:52.299 +08:00] [DEBUG] [SegmentReader.cpp:47] [“Stop end”] [source=SegmentReader] [thread_id=37]

[2023/11/22 20:11:52.299 +08:00] [DEBUG] [SegmentReader.cpp:45] [“Stop begin”] [source=SegmentReader] [thread_id=37]

[2023/11/22 20:11:52.299 +08:00] [DEBUG] [SegmentReader.cpp:47] [“Stop end”] [source=SegmentReader] [thread_id=37]

[2023/11/22 20:11:52.299 +08:00] [DEBUG] [SegmentReader.cpp:45] [“Stop begin”] [source=SegmentReader] [thread_id=37]

[2023/11/22 20:11:52.299 +08:00] [DEBUG] [SegmentReader.cpp:47] [“Stop end”] [source=SegmentReader] [thread_id=37]

[2023/11/22 20:11:52.299 +08:00] [DEBUG] [SegmentReader.cpp:45] [“Stop begin”] [source=SegmentReader] [thread_id=37]

[2023/11/22 20:11:52.299 +08:00] [DEBUG] [SegmentReader.cpp:47] [“Stop end”] [source=SegmentReader] [thread_id=37]

[2023/11/22 20:11:52.299 +08:00] [DEBUG] [SegmentReader.cpp:45] [“Stop begin”] [source=SegmentReader] [thread_id=37]

[2023/11/22 20:11:52.299 +08:00] [DEBUG] [SegmentReader.cpp:47] [“Stop end”] [source=SegmentReader] [thread_id=37]

[2023/11/22 20:11:52.299 +08:00] [DEBUG] [SegmentReader.cpp:45] [“Stop begin”] [source=SegmentReader] [thread_id=37]

[2023/11/22 20:11:52.300 +08:00] [DEBUG] [SegmentReader.cpp:47] [“Stop end”] [source=SegmentReader] [thread_id=37]

[2023/11/22 20:11:52.300 +08:00] [DEBUG] [SegmentReader.cpp:45] [“Stop begin”] [source=SegmentReader] [thread_id=37]

[2023/11/22 20:11:52.300 +08:00] [DEBUG] [SegmentReader.cpp:47] [“Stop end”] [source=SegmentReader] [thread_id=37]

[2023/11/22 20:11:52.300 +08:00] [DEBUG] [SegmentReader.cpp:45] [“Stop begin”] [source=SegmentReader] [thread_id=37]

[2023/11/22 20:11:52.300 +08:00] [DEBUG] [SegmentReader.cpp:47] [“Stop end”] [source=SegmentReader] [thread_id=37]

[2023/11/22 20:11:52.300 +08:00] [DEBUG] [SegmentReader.cpp:45] [“Stop begin”] [source=SegmentReader] [thread_id=37]

[2023/11/22 20:11:52.300 +08:00] [DEBUG] [SegmentReader.cpp:47] [“Stop end”] [source=SegmentReader] [thread_id=37]

tiflash_error.log里大量
[2023/11/22 20:21:50.686 +08:00] [WARN] [StorageConfigParser.cpp:241] [“The configuration "path" is deprecated. Check [storage] section for new style.”] [thread_id=1]

[2023/11/22 20:22:08.426 +08:00] [WARN] [StorageConfigParser.cpp:241] [“The configuration "path" is deprecated. Check [storage] section for new style.”] [thread_id=1]

[2023/11/22 20:22:26.172 +08:00] [WARN] [StorageConfigParser.cpp:241] [“The configuration "path" is deprecated. Check [storage] section for new style.”] [thread_id=1]

[2023/11/22 20:22:43.925 +08:00] [WARN] [StorageConfigParser.cpp:241] [“The configuration "path" is deprecated. Check [storage] section for new style.”] [thread_id=1]

[2023/11/22 20:23:01.676 +08:00] [WARN] [StorageConfigParser.cpp:241] [“The configuration "path" is deprecated. Check [storage] section for new style.”] [thread_id=1]

[2023/11/22 20:23:19.423 +08:00] [WARN] [StorageConfigParser.cpp:241] [“The configuration "path" is deprecated. Check [storage] section for new style.”] [thread_id=1]

[2023/11/22 20:23:37.171 +08:00] [WARN] [StorageConfigParser.cpp:241] [“The configuration "path" is deprecated. Check [storage] section for new style.”] [thread_id=1]

[2023/11/22 20:23:54.673 +08:00] [WARN] [StorageConfigParser.cpp:241] [“The configuration "path" is deprecated. Check [storage] section for new style.”] [thread_id=1]

[2023/11/22 20:24:12.450 +08:00] [WARN] [StorageConfigParser.cpp:241] [“The configuration "path" is deprecated. Check [storage] section for new style.”] [thread_id=1]

[2023/11/22 20:24:29.919 +08:00] [WARN] [StorageConfigParser.cpp:241] [“The configuration "path" is deprecated. Check [storage] section for new style.”] [thread_id=1]

[2023/11/22 20:24:47.442 +08:00] [WARN] [StorageConfigParser.cpp:241] [“The configuration "path" is deprecated. Check [storage] section for new style.”] [thread_id=1]

[2023/11/22 20:25:04.924 +08:00] [WARN] [StorageConfigParser.cpp:241] [“The configuration "path" is deprecated. Check [storage] section for new style.”] [thread_id=1]

tiflash 几个节点?

没有error级别的日志,看看系统日志呢

The configuration “path” is deprecated. Check [storage] section for new style.
这一段你自己翻译下
配置项"path"已被弃用。请检查[storage]部分,采用新的风格。

新格式是啥样的,文档上没说啊

Nov 22 20:57:12 tiflash-5 systemd: tiflash-9000.service holdoff time over, scheduling restart.

Nov 22 20:57:12 tiflash-5 systemd: Stopped tiflash service.

Nov 22 20:57:12 tiflash-5 systemd: Started tiflash service.

Nov 22 20:57:12 tiflash-5 bash: sync …

Nov 22 20:57:12 tiflash-5 bash: real#0110m0.061s

Nov 22 20:57:12 tiflash-5 bash: user#0110m0.000s

Nov 22 20:57:12 tiflash-5 bash: sys#0110m0.038s

Nov 22 20:57:12 tiflash-5 bash: ok

Nov 22 20:57:15 tiflash-5 systemd: tiflash-9000.service: main process exited, code=exited, status=1/FAILURE

Nov 22 20:57:15 tiflash-5 systemd: Unit tiflash-9000.service entered failed state.

Nov 22 20:57:15 tiflash-5 systemd: tiflash-9000.service failed.

tiflash有5个节点
另外3个是当时安装时部署的。第4,5台是scale-out扩容的,就是第第5台今天莫名down了
系统版本centos7.4
3.10.0-1160.83.1.el7.x86_64 #1 SMP Wed Jan 25 16:41:43 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

tiflash什么版本?只有这个节点有问题?1267这张表有什么特殊的地方么?

搜了几个类似的案例,都是有类似升级操作导致的。建议贴一下tiflash的配置文件看看,感觉确实有可能是配置问题。

v6.5版本。这个表是上游otter通过64个子表合并过来的。在之前5里也是这样同步的
要说不一样,我发现这表的pk_type不一样,有好几张表是NONCLUSTERED

这个日志感觉不会导致 tiflash down 有其日志么?tiflash OOM 没?

1 个赞


  • host: 略
    ssh_port: 22
    tcp_port: 9000
    http_port: 8123
    flash_service_port: 3930
    flash_proxy_port: 20170
    flash_proxy_status_port: 20292
    metrics_port: 8234
    deploy_dir: /opt/tidb-deploy/tiflash-9000
    data_dir: /data/tidb-data/tiflash-9000
    log_dir: /opt/tidb-deploy/tiflash-9000/log
    arch: amd64
    os: linux

有很多core日志,
gdb ./core.31493显示如下

Failed to read a valid object file image from memory.

Core was generated by `bin/tiflash/tiflash server --config-file conf/tiflash.toml’.

Program terminated with signal 6, Aborted.

#0 0x00007f1c166b4387 in ?? ()

感觉你这台机器内存坏了 你拿橡皮插一下 下掉节点再加入就好了

机器重启过多次了。橡皮?
是用scale-in命令吗?执行了Pending Offline一直这样显示,远端tiflash都没启来,这样应该下不掉的吧

节点是虚机么,如果是在开台同配,日志记录好先剔除这台有问题tiflash节点。如果是物理机既然起不来也下掉,清理后在重新加入集群,评估四台节点第二天生产能撑的住么

子版本是啥,也是6.5.5吗? TiDB 是6.5.5 ,tiflash 也是6.5.5吗

还有4台应该能扛住
机器是腾讯S6型号Intel Ice Lake(2.7GHz/3.3GHz)。16C/32GB内存
只是另外几台也有个别core文件,很担心其它机器会不会白天忽然挂掉就完了

系统版本太低