海光X86 、统信操作系统环境,tiflash启动失败,报错“Cannot cpu_identify: Unsupported processor”

为提高效率,请提供以下信息,问题描述清晰能够更快得到解决:

【TiDB 版本】
v4.0.11

【问题描述】
海光X86 、统信UOS Server Enterprise-C 20环境,tiflash启动失败,报错“Cannot cpu_identify: Unsupported processor”

进程日志
log.tar.gz (19.3 KB)

环境信息

$ uname -a
Linux HG-S-06 4.19.0-91.77.18.uelc20.x86_64 #1 SMP Thu Mar 4 09:11:24 CST 2021 x86_64 x86_64 x86_64 GNU/Linux

$ cat /etc/uos-release 
UOS Server Enterprise-C 20


$ lscpu 
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                64
On-line CPU(s) list:   0-63
Thread(s) per core:    2
Core(s) per socket:    16
座:                 2
NUMA 节点:         8
厂商 ID:           HygonGenuine
CPU 系列:          24
型号:              0
型号名称:        Hygon C86 7151 16-core Processor
步进:              1
CPU MHz:             2482.247
CPU max MHz:           2000.0000
CPU min MHz:           1200.0000
BogoMIPS:            3999.66
虚拟化:           AMD-V
L1d 缓存:          32K
L1i 缓存:          64K
L2 缓存:           512K
L3 缓存:           4096K
NUMA 节点0 CPU:    0-3,32-35
NUMA 节点1 CPU:    4-7,36-39
NUMA 节点2 CPU:    8-11,40-43
NUMA 节点3 CPU:    12-15,44-47
NUMA 节点4 CPU:    16-19,48-51
NUMA 节点5 CPU:    20-23,52-55
NUMA 节点6 CPU:    24-27,56-59
NUMA 节点7 CPU:    28-31,60-63
Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid amd_dcm aperfmperf pni monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb hw_pstate ssbd ibpb vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt xsaveopt xsavec xgetbv1 xsaves clzero irperf xsaveerptr arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif overflow_recov succor smca

若提问为性能优化、故障排查类问题,请下载脚本运行。终端输出的打印结果,请务必全选并复制粘贴上传。

  1. 从日志中,没有看到这个报错,请问是哪里的报错?
  2. 从日志中看,有空间不足的报错
    [“logger encountered error”] [err=“No space left on device (os error 28)”]
  3. 请问是 tiup 安装吗? 麻烦上传下安装的yaml文件

1、在错误日志tiflash_error.log里
2021.05.24 17:38:06.435651 [ 3 ] BaseDaemon: (from thread 1) Terminate called after throwing an instance of DB::Exception
Code: 268, e.displayText() = DB::Exception: Cannot cpu_identify: Unsupported processor, e.what() = DB::Exception
Stack trace:

  1. bin/tiflash/tiflash(StackTrace::StackTrace()+0x15) [0x35a7aa5]
  2. bin/tiflash/tiflash() [0x3659eae]
  3. bin/tiflash/tiflash(__cxxabiv1::__terminate(void (*)())+0x5) [0x846c955]
  4. bin/tiflash/tiflash(__cxa_call_terminate+0x38) [0x846d728]
  5. bin/tiflash/tiflash(__gxx_personality_v0+0x2e7) [0x846c117]
  6. bin/tiflash/tiflash() [0x85024d2]
  7. bin/tiflash/tiflash() [0x850300d]
  8. bin/tiflash/tiflash(DB::Settings::Settings()+0xead) [0x6e6972d]
  9. bin/tiflash/tiflash(DB::Context::Context()+0x1ee) [0x6e56cfe]
  10. bin/tiflash/tiflash(DB::Context::createGlobal(std::shared_ptrDB::IRuntimeComponentsFactory)+0x19) [0x6e63bb9]
  11. bin/tiflash/tiflash(DB::Context::createGlobal()+0x71) [0x6e63e61]
  12. bin/tiflash/tiflash(DB::Server::main(std::vector<std::__cxx11::basic_string<char, std::char_traits, std::allocator<
    2021.05.24 17:38:06.440155 [ 3 ] BaseDaemon: ########################################
    2021.05.24 17:38:06.440189 [ 3 ] BaseDaemon: (from thread 1) Received signal Aborted (6).

2、产生很多coredump导致磁盘满
$ gdb bin/tiflash/tiflash core.42531
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-115.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type “show copying”
and “show warranty” for details.
This GDB was configured as “x86_64-redhat-linux-gnu”.
For bug reporting instructions, please see:
http://www.gnu.org/software/gdb/bugs/
Reading symbols from /flash/services/tidb/tiflash-9000/bin/tiflash/tiflash…bt
done.

warning: core file may not match specified executable file.
[New LWP 42531]
[New LWP 42574]
[New LWP 42576]
[New LWP 42578]
[New LWP 42581]
[New LWP 42575]
[New LWP 42577]
[New LWP 42587]
[New LWP 42579]
[New LWP 42584]
[New LWP 42585]
[New LWP 42586]
[New LWP 42582]
[New LWP 42580]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `bin/tiflash/tiflash server --config-file conf/tiflash.toml'.
Program terminated with signal 6, Aborted.
#0  0x00007f7adc6d6377 in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install glibc-2.17-292.uelc20.05.x86_64 libgcc-4.8.5-39.uelc20.04.x86_64
(gdb) bt
#0  0x00007f7adc6d6377 in raise () from /lib64/libc.so.6
#1  0x00007f7adc6d7ba8 in abort () from /lib64/libc.so.6
#2  0x0000000003659d32 in terminate_handler () at /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/tics/libs/libdaemon/src/BaseDaemon.cpp:548
#3  0x000000000846c956 in __cxxabiv1::__terminate (handler=<optimized out>) at ../../../../libstdc++-v3/libsupc++/eh_terminate.cc:47
#4  0x000000000846d729 in __cxa_call_terminate (ue_header=ue_header@entry=0x7f7adc027760) at ../../../../libstdc++-v3/libsupc++/eh_call.cc:54
#5  0x000000000846c118 in __cxxabiv1::__gxx_personality_v0 (version=<optimized out>, actions=6, exception_class=5138137972254386944, ue_header=0x7f7adc027760, 
    context=<optimized out>) at ../../../../libstdc++-v3/libsupc++/eh_personality.cc:676
#6  0x00000000085024d3 in _Unwind_RaiseException_Phase2 (exc=exc@entry=0x7f7adc027760, context=context@entry=0x7ffd1f8fa710) at ../../../libgcc/unwind.inc:62
#7  0x000000000850300e in _Unwind_Resume (exc=0x7f7adc027760) at ../../../libgcc/unwind.inc:230
#8  0x0000000006e6972e in deallocate (this=<optimized out>, __p=<optimized out>) at /usr/local/include/c++/7.4.0/ext/new_allocator.h:125
#9  deallocate (__a=..., __n=<optimized out>, __p=<optimized out>) at /usr/local/include/c++/7.4.0/bits/alloc_traits.h:462
#10 _M_destroy (__size=<optimized out>, this=<optimized out>) at /usr/local/include/c++/7.4.0/bits/basic_string.h:226
#11 _M_dispose (this=<optimized out>) at /usr/local/include/c++/7.4.0/bits/basic_string.h:221
#12 ~basic_string (this=<optimized out>, __in_chrg=<optimized out>) at /usr/local/include/c++/7.4.0/bits/basic_string.h:647
#13 ~SettingString (this=<optimized out>, __in_chrg=<optimized out>)
    at /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/tics/dbms/src/Interpreters/SettingsCommon.h:670
#14 DB::Settings::Settings (this=0x7ffd1f8fb5c0) at /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/tics/dbms/src/Interpreters/Settings.h:18
#15 0x0000000006e56cff in DB::Context::Context (this=0x7ffd1f8fb3a0) at /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/tics/dbms/src/Interpreters/Context.h:118
#16 0x0000000006e63bba in DB::Context::createGlobal (runtime_components_factory=...)
    at /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/tics/dbms/src/Interpreters/Context.cpp:286
#17 0x0000000006e63e62 in DB::Context::createGlobal () at /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/tics/dbms/src/Interpreters/Context.cpp:296
#18 0x00000000032441d8 in DB::Server::main (this=0x7ffd1f8fc300) at /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/tics/dbms/src/Server/Server.cpp:472
#19 0x00000000078cb356 in Poco::Util::Application::run (this=0x7ffd1f8fc300)
    at /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/tics/contrib/poco/Util/src/Application.cpp:335
#20 0x00000000035c674c in mainEntryClickHouseServer (argc=3, argv=0x7f7adbbbe680)
    at /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/tics/dbms/src/Server/Server.cpp:1258
#21 0x0000000003235951 in main (argc_=<optimized out>, argv_=<optimized out>) at /home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/tics/dbms/src/Server/main.cpp:194

3、deploy拓扑文件如下
test.yaml (2.3 KB)

目前 tiflash 尚无法识别该类 CPU,所以初始化时异常退出,产生大量 core 文件。建议先缩容所有 tiflash 节点。

什么时候可以支持?

@lizhenda https://github.com/pingcap/tics/pull/1999 有一个 PR 了,具体时间可以关注下 PR ,多谢。

此话题已在最后回复的 1 分钟后被自动关闭。不再允许新回复。