【 TiDB 使用环境】生产环境
【 TiDB 版本】6.5.1
【复现路径】对部分表开起tiflash
【遇到的问题:问题现象及影响】重启 其中一个节点的 tiflash 出现,cpu & 内存资源耗尽,系统卡死,tiflash 无法启动情况
【资源配置】
【附件:截图/日志/监控】
有两台tiflash 分配47G ,其中一台重启后恢复正常 可用内存
剩下一台,重启就会导致 cpu&内存耗尽,然后系统卡死,tiflash 也无法启动
查看出问题的tiflash 节点 tiflash_error 日志,打印如下日志,后系统就卡死
[2024/06/13 18:29:31.326 +08:00] [WARN] [StorageConfigParser.cpp:241] [“The configuration "path" is deprecated. Check [storage] section for new style.”] [thread_id=1]
[2024/06/13 18:29:47.022 +08:00] [WARN] [DMFile.cpp:732] [“Existing temporary or dropped dmfile, removed: /data1/tidb-data/tiflash-9000/data/t_21836383/stable/.tmp.dmf_10142635”] [source=DMFile] [thread_id=51]
[2024/06/13 18:30:52.892 +08:00] [WARN] [SchemaGetter.cpp:208] [“The schema diff for version 11163789, key Diff:11163789 is empty.”] [source=SchemaGetter] [thread_id=54]
[2024/06/13 18:39:21.252 +08:00] [WARN] [StorageConfigParser.cpp:241] [“The configuration "path" is deprecated. Check [storage] section for new style.”] [thread_id=1]
[2024/06/13 18:39:37.601 +08:00] [WARN] [DMFile.cpp:732] [“Existing temporary or dropped dmfile, removed: /data1/tidb-data/tiflash-9000/data/t_21836383/stable/.tmp.dmf_10142639”] [source=DMFile] [thread_id=51]
[2024/06/13 18:50:22.504 +08:00] [WARN] [StorageConfigParser.cpp:241] [“The configuration "path" is deprecated. Check [storage] section for new style.”] [thread_id=1]
[2024/06/13 18:50:39.733 +08:00] [WARN] [DMFile.cpp:732] [“Existing temporary or dropped dmfile, removed: /data1/tidb-data/tiflash-9000/data/t_21836383/stable/.tmp.dmf_10142643”] [source=DMFile] [thread_id=51]
[2024/06/13 19:05:44.285 +08:00] [ERROR] [] [“get member failed: 4: Deadline Exceeded”] [source=pingcap.pd] [thread_id=99]
[2024/06/13 19:05:44.302 +08:00] [WARN] [PageDirectory.cpp:1519] [“Meet a stale snapshot [thread id=64] [tracing id=write] [seq=91675760] [alive time(s)=819.093752026]”] [source=global.meta] [thread_id=69]
[2024/06/13 19:05:44.302 +08:00] [WARN] [PageDirectory.cpp:1519] [“Meet a stale snapshot [thread id=97] [tracing id=write] [seq=91675776] [alive time(s)=818.760227131]”] [source=global.meta] [thread_id=69]
[2024/06/13 19:05:44.312 +08:00] [WARN] [] [“failed to get cluster id by :http://xxxx:2379”] [source=pingcap.pd] [thread_id=99]
[2024/06/13 19:05:44.363 +08:00] [ERROR] [] [“Send TsoRequest failed”] [source=pingcap.pd] [thread_id=102]
[2024/06/13 19:05:44.364 +08:00] [WARN] [PageDirectory.cpp:1519] [“Meet a stale snapshot [thread id=64] [tracing id=write] [seq=21915395] [alive time(s)=819.180260066]”] [source=global.data] [thread_id=69]
[2024/06/13 19:05:44.365 +08:00] [WARN] [PageDirectory.cpp:1519] [“Meet a stale snapshot [thread id=97] [tracing id=write] [seq=21915397] [alive time(s)=818.823370803]”] [source=global.data] [thread_id=69]
[2024/06/13 19:05:44.403 +08:00] [WARN] [] [“update ts error: Exception: Send TsoRequest failed”] [source=pd/oracle] [thread_id=102]
请问遇到这种问题,是怎么解决的