tiup playground 部署 集群,TiFLash 异常退出

问题描述

tiup playground 部署集群之后,TiFLash 异常退出

tiup playground 启动集群

[tidb@ansible ~]$ tiup playground v4.0.8 --db 2 --kv 2 --host 10.0.17.58
Found playground newer version:

The latest version:         v1.3.0
Local installed version:    v1.2.3
Update current component:   tiup update playground
Update all components:      tiup update --all

Starting component playground: /home/tidb/.tiup/components/playground/v1.2.3/tiup-playground v4.0.8 --db 2 --kv 2 --host 10.0.17.58
Playground Bootstrapping…
Start pd instance
Start tikv instance
Start tikv instance
Start tidb instance
Start tidb instance
Waiting for tidb 10.0.17.58:4000 ready … ?
Waiting for tidb 10.0.17.58:4000 ready … Done
Waiting for tidb 10.0.17.58:4001 ready … Done
Waiting for tikv 10.0.17.58:20160 ready … Done
Waiting for tikv 10.0.17.58:20161 ready … Done
Start tiflash instance
Waiting for tiflash 10.0.17.58:3930 ready … Done
CLUSTER START SUCCESSFULLY, Enjoy it ^-^
To connect TiDB: mysql --host 10.0.17.58 --port 4000 -u root
To connect TiDB: mysql --host 10.0.17.58 --port 4001 -u root
To view the dashboard: http://10.0.17.58:2379/dashboard
To view the Prometheus: http://10.0.17.58:9090
To view the Grafana: http://10.0.17.58:3000
tiflash quit: signal: segmentation fault (core dumped) <------------------TiFlash 异常退出

Logging debug to /home/tidb/.tiup/data/SK02eyp/tiflash-0/log/tiflash.log
Logging errors to /home/tidb/.tiup/data/SK02eyp/tiflash-0/log/tiflash_error.log

check detail log from: /home/tidb/.tiup/data/SK02eyp/tiflash-0/tiflash.log

TiFlash 报错日志如下

2020.12.23 21:26:12.191519 [ 5 ] BaseDaemon: ########################################
2020.12.23 21:26:12.191693 [ 5 ] BaseDaemon: (from thread 4) Received signal Segmentation fault (11).
2020.12.23 21:26:12.191739 [ 5 ] BaseDaemon: Address: NULL pointer.
2020.12.23 21:26:12.191774 [ 5 ] BaseDaemon: Access: read.
2020.12.23 21:26:12.191809 [ 5 ] BaseDaemon: Unknown si_code.
2020.12.23 21:26:12.226731 [ 5 ] BaseDaemon: 0. /lib64/libc.so.6() [0x3267c36ba2]
2020.12.23 21:26:12.226784 [ 5 ] BaseDaemon: 1. /home/tidb/.tiup/components/tiflash/v4.0.8/tiflash/tiflash() [0x6f9ae82]
2020.12.23 21:26:12.226876 [ 5 ] BaseDaemon: 2. /home/tidb/.tiup/components/tiflash/v4.0.8/tiflash/tiflash(DB::DiagnosticsService::LinuxCpuTime::curr
ent()+0x391) [0x6fac7a1]
2020.12.23 21:26:12.226952 [ 5 ] BaseDaemon: 3. /home/tidb/.tiup/components/tiflash/v4.0.8/tiflash/tiflash(DB::DiagnosticsService::server_info(grpc_i
mpl::ServerContext*, diagnosticspb::ServerInfoRequest const*, diagnosticspb::ServerInfoResponse*)+0x5bc) [0x6fa6cdc]
2020.12.23 21:26:12.227095 [ 5 ] BaseDaemon: 4. /home/tidb/.tiup/components/tiflash/v4.0.8/tiflash/tiflash(std::_Function_handler<grpc::Status (diagn
osticspb::Diagnostics::Service*, grpc_impl::ServerContext*, diagnosticspb::ServerInfoRequest const*, diagnosticspb::ServerInfoResponse*), std::Mem_fn<grpc::
Status (diagnosticspb::Diagnostics::Service::)(grpc_impl::ServerContext, diagnosticspb::ServerInfoRequest const*, diagnosticspb::ServerInfoResponse*)> >::

M_invoke(std::_Any_data const&, diagnosticspb::Diagnostics::Service*&&, grpc_impl::ServerContext*&&, diagnosticspb::ServerInfoRequest const*&&, diagnosticspb
::ServerInfoResponse*&&)+0x39) [0x7560de9]
2020.12.23 21:26:12.227170 [ 5 ] BaseDaemon: 5. /home/tidb/.tiup/components/tiflash/v4.0.8/tiflash/tiflash(grpc::Status grpc::internal::CatchingFunct
ionHandler<grpc::internal::RpcMethodHandler<diagnosticspb::Diagnostics::Service, diagnosticspb::ServerInfoRequest, diagnosticspb::ServerInfoResponse>::RunHan
dler(grpc::internal::MethodHandler::HandlerParameter const&)::{lambda()#1}>(grpc::internal::RpcMethodHandler<diagnosticspb::Diagnostics::Service, diagnostics
pb::ServerInfoRequest, diagnosticspb::ServerInfoResponse>::RunHandler(grpc::internal::MethodHandler::HandlerParameter const&)::{lambda()#1}&&)+0x54) [0x756ca
34]
2020.12.23 21:26:12.227229 [ 5 ] BaseDaemon: 6. /home/tidb/.tiup/components/tiflash/v4.0.8/tiflash/tiflash(grpc::internal::RpcMethodHandler<diagnosti
cspb::Diagnostics::Service, diagnosticspb::ServerInfoRequest, diagnosticspb::ServerInfoResponse>::RunHandler(grpc::internal::MethodHandler::HandlerParameter
const&)+0x4e6) [0x756f7c6]
2020.12.23 21:26:12.227276 [ 5 ] BaseDaemon: 7. /home/tidb/.tiup/components/tiflash/v4.0.8/tiflash/tiflash(grpc_impl::Server::SyncRequest::CallData::
ContinueRunAfterInterception()+0x161) [0x782d891]
2020.12.23 21:26:12.227330 [ 5 ] BaseDaemon: 8. /home/tidb/.tiup/components/tiflash/v4.0.8/tiflash/tiflash(grpc_impl::Server::SyncRequestThreadManage
r::DoWork(void*, bool, bool)+0x430) [0x782ee50]
2020.12.23 21:26:12.227392 [ 5 ] BaseDaemon: 9. /home/tidb/.tiup/components/tiflash/v4.0.8/tiflash/tiflash(grpc::ThreadManager::MainWorkLoop()+0x9b)
[0x7834a8b]
2020.12.23 21:26:12.227442 [ 5 ] BaseDaemon: 10. /home/tidb/.tiup/components/tiflash/v4.0.8/tiflash/tiflash(grpc::ThreadManager::WorkerThread::Run()+
0xc) [0x7834b6c]
2020.12.23 21:26:12.227479 [ 5 ] BaseDaemon: 11. /home/tidb/.tiup/components/tiflash/v4.0.8/tiflash/tiflash() [0x7babc33]
2020.12.23 21:26:12.227515 [ 5 ] BaseDaemon: 12. /lib64/libpthread.so.0() [0x3268406d14]

提供下服务器版本,服务器配置,辛苦提供下

Logging debug to /home/tidb/.tiup/data/SK02eyp/tiflash-0/log/tiflash.log
Logging errors to /home/tidb/.tiup/data/SK02eyp/tiflash-0/log/tiflash_error.log
两个日志文件辛苦上传下

RHEL 6.7-64bit
8C12G

以上就是tiflash_error.log的全部内容,集群已经重启,拿不到tiflash.log了

  1. 请问这个报错是稳定复现吗?
  2. cat /proc/stat 辛苦反馈下。
  3. 辛苦升级到 redhat 7.3 以上(并展示下 2 步骤的结果,非常期待你的反馈)在试下https://docs.pingcap.com/zh/tidb/stable/hardware-and-software-requirements

1.多次启动 TiFLash 都会报错
2.如下:

$ cat /proc/stat
cpu 4013169 2230 1311676 390406288 2011224 21 41037 0 0
cpu0 325202 77 139511 49111871 103172 5 5344 0 0
cpu1 692394 422 175098 48509488 339401 2 5196 0 0
cpu2 359391 295 172647 48812739 392621 1 5778 0 0
cpu3 692070 375 182189 48769184 88521 2 4954 0 0
cpu4 322296 228 146490 48812942 437348 1 5194 0 0
u5 639233 212 167886 48788269 118298 4 4744 0 0
cpu6 326342 253 155133 48822916 423761 1 5175 0 0
cpu7 656237 364 172718 48778876 108100 0 4650 0 0
intr 1130596650 1918 9 0 0 0 0 0 0 1 0 0 0 110 0 0 90 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1478431 2410963 222213 531679 150584 412722 155239 632040 515797 0 71166 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
ctxt 2069805227
btime 1608301889
processes 378986
procs_running 1
procs_blocked 0
softirq 280059122 0 109336149 99823 25277156 1456265 0 71169 30925868 1569240 111323452

3.后面我用 RHEL 7 试一下,谢谢

TiFLash 日志参考附件
tiflash.log (4.7 KB) tiflash_error.log (194 字节) tiflash_tikv.log (19.5 KB)

这可能是两方面的原因,第一tidb集群不支持部署在red hat6上面。

第二取决于 /proc/stat 这个文件 cpu 那行有多少列 只要 >=11 列 就没问题, D版本的redhat. 列数较少就会出现这个错误。这个问题我们会在4.0.10修复。但是我们依旧不建议再red hat6上面部署tidb集群。

升级到7 可以先看下 cat /proc/stat

此话题已在最后回复的 1 分钟后被自动关闭。不再允许新回复。