tug_twf
(Hacker Fqg5 Vi Rn)
1
【 TiDB 使用环境】生产环境
【 TiDB 版本】7.5.4
【复现路径】做过哪些操作出现的问题
【遇到的问题:问题现象及影响】
【资源配置】进入到 TiDB Dashboard -集群信息 (Cluster Info) -主机(Hosts) 截图此页面
[2025/02/13 20:31:15.888 +08:00] [ERROR] [BaseDaemon.cpp:370] [########################################] [source=BaseDaemon] [thread_id=2]
[2025/02/13 20:31:15.888 +08:00] [ERROR] [BaseDaemon.cpp:371] [“(from thread 1) Received signal Floating point exception(8).”] [source=BaseDaemon] [thread_id=2]
[2025/02/13 20:31:15.888 +08:00] [ERROR] [BaseDaemon.cpp:498] [“Integer divide by zero.”] [source=BaseDaemon] [thread_id=2]
[2025/02/13 20:31:19.643 +08:00] [ERROR] [BaseDaemon.cpp:563] [“\n 0x7772a31\tfaultSignalHandler(int, siginfo_t*, void*) [tiflash+125250097]\n \tlibs/libdaemon/src/BaseDaemon.cpp:214\n 0x7f4548779050\t [libc.so.6+245840]\n 0x1fb749c\tcomputeAndSetNumberOfPhysicalCPUCores(unsigned short, unsigned short) [tiflash+33256604]\n \tdbms/src/Common/getNumberOfCPUCores.cpp:66\n 0x1f8c09d\tDB::Server::main(std::__1::vector<std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator > > > const&) [tiflash+33079453]\n \tdbms/src/Server/Server.cpp:1086\n 0x915056a\tPoco::Util::Application::run() [tiflash+152372586]\n \tcontrib/poco/Util/src/Application.cpp:335\n 0x1f85e6a\tDB::Server::run() [tiflash+33054314]\n \tdbms/src/Server/Server.cpp:262\n 0x915b844\tPoco::Util::ServerApplication::run(int, char**) [tiflash+152418372]\n \tcontrib/poco/Util/src/ServerApplication.cpp:618\n 0x1f9e5a9\tmainEntryClickHouseServer(int, char**) [tiflash+33154473]\n \tdbms/src/Server/Server.cpp:1833\n 0x1ec1edc\tmain [tiflash+32251612]\n \tdbms/src/Server/main.cpp:173\n 0x7f454876424a\t [libc.so.6+160330]\n 0x7f4548764305\t__libc_start_main [libc.so.6+160517]”] [source=BaseDaemon] [thread_id=2]
^C
【其他附件:截图/日志/监控】
tiflash 日志
[2025/02/14 11:13:54.292 +08:00] [INFO] [BaseDaemon.cpp:1178] [“Welcome to TiFlash”] [thread_id=1]
[2025/02/14 11:13:54.292 +08:00] [INFO] [BaseDaemon.cpp:1179] [“Starting daemon with revision 54381”] [thread_id=1]
[2025/02/14 11:13:54.292 +08:00] [INFO] [BaseDaemon.cpp:1182] [“TiFlash build info: TiFlash\nRelease Version: v7.5.4\nEdition: Community\nGit Commit Hash: 85341773736131ac06d7644b47d66b8f00d36739\nGit Branch: HEAD\nUTC Build Time: 2024-10-10 15:04:00\nEnable Features: jemalloc sm4(GmSSL) avx2 avx512 unwind thinlto\nProfile: RELWITHDEBINFO\n”] [thread_id=1]
[2025/02/14 11:13:54.292 +08:00] [INFO] [] [“starting up”] [source=Application] [thread_id=1]
[2025/02/14 11:13:54.293 +08:00] [INFO] [Server.cpp:432] [“Got jemalloc version: 5.3-RC”] [thread_id=1]
[2025/02/14 11:13:54.293 +08:00] [INFO] [Server.cpp:441] [“Not found environment variable MALLOC_CONF”] [thread_id=1]
[2025/02/14 11:13:54.293 +08:00] [INFO] [Server.cpp:447] [“Got jemalloc config: opt.background_thread false, opt.max_background_threads 4”] [thread_id=1]
[2025/02/14 11:13:54.293 +08:00] [INFO] [Server.cpp:451] [“Try to use background_thread of jemalloc to handle purging asynchronously”] [thread_id=1]
[2025/02/14 11:13:54.293 +08:00] [INFO] [Server.cpp:454] [“Set jemalloc.max_background_threads 1”] [thread_id=1]
[2025/02/14 11:13:54.293 +08:00] [INFO] [Server.cpp:457] [“Set jemalloc.background_thread true”] [thread_id=1]
[2025/02/14 11:13:54.293 +08:00] [INFO] [ScanContext.cpp:235] [“flash_server_addr=7.39.108.126:3930, current_instance_id=7.39.108.126:3930”] [thread_id=1]
[2025/02/14 11:13:54.293 +08:00] [WARN] [StorageConfigParser.cpp:287] [“The configuration path
is deprecated. Check [storage] section for new style.”] [thread_id=1]
[2025/02/14 11:13:54.293 +08:00] [INFO] [StorageConfigParser.cpp:345] [“Main data candidate path: /home/tidb/data/”] [thread_id=1]
[2025/02/14 11:13:54.293 +08:00] [INFO] [StorageConfigParser.cpp:347] [“Latest data candidate path: /home/tidb/data/”] [thread_id=1]
[2025/02/14 11:13:54.293 +08:00] [INFO] [StorageConfigParser.cpp:349] [“Raft data candidate path: /home/tidb/data/kvstore/”] [thread_id=1]
[2025/02/14 11:13:54.293 +08:00] [INFO] [Server.cpp:1005] [“Using format_version=5 (default settings).”] [thread_id=1]
[2025/02/14 11:13:54.293 +08:00] [INFO] [Server.cpp:1021] [“Using api_version=1”] [thread_id=1]
[2025/02/14 11:13:54.293 +08:00] [INFO] [Server.cpp:1038] [“UniPS is not enabled for proxy, page_version=3”] [thread_id=1]
[2025/02/14 11:13:54.293 +08:00] [INFO] [Server.cpp:521] [“start raft store proxy”] [thread_id=1]
[2025/02/14 11:13:54.294 +08:00] [INFO] [Server.cpp:1047] [“wait for tiflash proxy initializing”] [thread_id=1]
[2025/02/14 11:13:54.494 +08:00] [INFO] [Server.cpp:1050] [“tiflash proxy is initialized”] [thread_id=1]
[2025/02/14 11:13:54.494 +08:00] [INFO] [Server.cpp:1057] [“encryption is disabled”] [thread_id=1]
[2025/02/14 11:13:54.503 +08:00] [ERROR] [BaseDaemon.cpp:370] [########################################] [source=BaseDaemon] [thread_id=2]
[2025/02/14 11:13:54.503 +08:00] [ERROR] [BaseDaemon.cpp:371] [“(from thread 1) Received signal Floating point exception(8).”] [source=BaseDaemon] [thread_id=2]
[2025/02/14 11:13:54.503 +08:00] [ERROR] [BaseDaemon.cpp:498] [“Integer divide by zero.”] [source=BaseDaemon] [thread_id=2]
[2025/02/14 11:13:58.508 +08:00] [ERROR] [BaseDaemon.cpp:563] [“\n 0x7772a31\tfaultSignalHandler(int, siginfo_t*, void*) [tiflash+125250097]\n \tlibs/libdaemon/src/BaseDaemon.cpp:214\n 0x7ff1f8d5c050\t [libc.so.6+245840]\n 0x1fb749c\tcomputeAndSetNumberOfPhysicalCPUCores(unsigned short, unsigned short) [tiflash+33256604]\n \tdbms/src/Common/getNumberOfCPUCores.cpp:66\n 0x1f8c09d\tDB::Server::main(std::__1::vector<std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits, std::__1::allocator > > > const&) [tiflash+33079453]\n \tdbms/src/Server/Server.cpp:1086\n 0x915056a\tPoco::Util::Application::run() [tiflash+152372586]\n \tcontrib/poco/Util/src/Application.cpp:335\n 0x1f85e6a\tDB::Server::run() [tiflash+33054314]\n \tdbms/src/Server/Server.cpp:262\n 0x915b844\tPoco::Util::ServerApplication::run(int, char**) [tiflash+152418372]\n \tcontrib/poco/Util/src/ServerApplication.cpp:618\n 0x1f9e5a9\tmainEntryClickHouseServer(int, char**) [tiflash+33154473]\n \tdbms/src/Server/Server.cpp:1833\n 0x1ec1edc\tmain [tiflash+32251612]\n \tdbms/src/Server/main.cpp:173\n 0x7ff1f8d4724a\t [libc.so.6+160330]\n 0x7ff1f8d47305\t__libc_start_main [libc.so.6+160517]”] [source=BaseDaemon] [thread_id=2]
有猫万事足
2
有猫万事足
3
我感觉应该就是这个地方出现了一个Integer divide by zero的问题。
如果是这样的话,7.5.5这个地方的写法会变。
可以看到通过std::max来保证即便获取不到这个数值也不会是0,起码是1.
你要愿意在debian12上凑合,我感觉可以升级到7.5.5.就不会被这个error卡住。
但是这个cpu core数量获取不正确的问题我不确定是否能解决。
注意到7.5.5的73行有个log info。建议打开info级的日志,注意观察输出的日志内容,确定一下cpu core的数量在debian12上是否正确。
"logical cpu cores: {}, hardware logical cpu cores: {}, hardware physical cpu cores: {}, physical cpu cores: "
“{}, number_of_physical_cpu_cores: {}”,
3 个赞
在提供一下系统 CPU 信息是否正确,可以通过 lshw -C cpu
或者 cat /proc/cpuinfo
命令查看 CPU 相关信息。如楼上所说,可能存在系统的依赖包版本和 TiFlash 不兼容的问题,会进一步确认一下。
1 个赞
TiFlash 日志完整一点的提供一下,需要往上翻一翻哈
1 个赞
nobody
(不定时出现)
7
1 个赞
tug_twf
(Hacker Fqg5 Vi Rn)
9
cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 85
model name : Intel(R) Xeon(R) Silver 4214 CPU @ 2.20GHz
stepping : 7
microcode : 0x5003707
cpu MHz : 2700.074
cache size : 16896 KB
physical id : 1
siblings : 24
core id : 12
cpu cores : 12
apicid : 56
initial apicid : 56
fpu : yes
fpu_exception : yes
cpuid level : 22
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single intel_ppin ssbd mba ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb intel_pt avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts pku ospke avx512_vnni md_clear flush_l1d arch_capabilities
vmx flags : vnmi preemption_timer posted_intr invvpid ept_x_only ept_ad ept_1gb flexpriority apicv tsc_offset vtpr mtf vapic ept vpid unrestricted_guest vapic_reg vid ple shadow_vmcs pml ept_mode_based_exec tsc_scaling
bugs : spectre_v1 spectre_v2 spec_store_bypass swapgs taa itlb_multihit mmio_stale_data retbleed eibrs_pbrsb gds bhi
bogomips : 4401.36
clflush size : 64
cache_alignment : 64
address sizes : 46 bits physical, 48 bits virtual
power management:
processor : 1
vendor_id : GenuineIntel
cpu family : 6
model : 85
model name : Intel(R) Xeon(R) Silver 4214 CPU @ 2.20GHz
stepping : 7
microcode : 0x5003707
cpu MHz : 2699.773
cache size : 16896 KB
physical id : 0
siblings : 24
core id : 10
cpu cores : 12
apicid : 20
initial apicid : 20
fpu : yes
fpu_exception : yes
cpuid level : 22
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single intel_ppin ssbd mba ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb intel_pt avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts pku ospke avx512_vnni md_clear flush_l1d arch_capabilities
vmx flags : vnmi preemption_timer posted_intr invvpid ept_x_only ept_ad ept_1gb flexpriority apicv tsc_offset vtpr mtf vapic ept vpid unrestricted_guest vapic_reg vid ple shadow_vmcs pml ept_mode_based_exec tsc_scaling
bugs : spectre_v1 spectre_v2 spec_store_bypass swapgs taa itlb_multihit mmio_stale_data retbleed eibrs_pbrsb gds bhi
bogomips : 4400.00
clflush size : 64
cache_alignment : 64
address sizes : 46 bits physical, 48 bits virtual
power management:
processor : 2
vendor_id : GenuineIntel
cpu family : 6
model : 85
model name : Intel(R) Xeon(R) Silver 4214 CPU @ 2.20GHz
stepping : 7
microcode : 0x5003707
cpu MHz : 2699.956
cache size : 16896 KB
physical id : 1
siblings : 24
core id : 10
cpu cores : 12
apicid : 52
initial apicid : 52
fpu : yes
fpu_exception : yes
cpuid level : 22
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single intel_ppin ssbd mba ibrs ibpb stibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb intel_pt avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts pku ospke avx512_vnni md_clear flush_l1d arch_capabilities
vmx flags : vnmi preemption_timer posted_intr invvpid ept_x_only ept_ad ept_1gb flexpriority apicv tsc_offset vtpr mtf vapic ept vpid unrestricted_guest vapic_reg vid ple shadow_vmcs pml ept_mode_based_exec tsc_scaling
bugs : spectre_v1 spectre_v2 spec_store_bypass swapgs taa itlb_multihit mmio_stale_data retbleed eibrs_pbrsb gds bhi
bogomips : 4401.36
clflush size : 64
cache_alignment : 64
address sizes : 46 bits physical, 48 bits virtual
power management:
tug_twf
(Hacker Fqg5 Vi Rn)
10
升级到7.5.5确实不会被卡住了,进程也正常起来了
1 个赞
有猫万事足
11
日志里面grep 一下 ,看看获取的cpu core数量是否正常,如果这个数量不正常,我担心只是能起来能用。但实际负载未必上的去。毕竟没数值就固定为1了。
system
(system)
关闭
14
此话题已在最后回复的 7 天后被自动关闭。不再允许新回复。