tiup 部署单机多tikv节点报libnuma: Warning: node argument 3 is out of range

扩容报错:

  • [ Serial ] - ClusterOperate: operation=StartOperation, options={Roles:[] Nodes:[] Force:false Timeout:0} Starting component tikv Starting instance tikv 10.10.110.31:20163 tikv 10.10.110.31:20163 failed to start: timed out waiting for port 20163 to be started after 1m0s, please check the log of the instance

Error: failed to start: failed to start tikv: tikv 10.10.110.31:20163 failed to start: timed out waiting for port 20163 to be started after 1m0s, please check the log of the instance: timed out waiting for port 20163 to be started after 1m0s

Verbose debug logs has been written to /home/tidb/logs/tiup-cluster-debug-2020-05-18-13-27-33.log. Error: run /home/tidb/.tiup/components/cluster/v0.6.0/cluster (wd:/home/tidb/.tiup/data/RzJZ4R4) failed: exit status 1

tikv_stderr.log 报错:

libnuma: Warning: node argument 3 is out of range

usage: numactl [–all | -a] [–interleave= | -i ] [–preferred= | -p ] [–physcpubind= | -C ] [–cpunodebind= | -N ] [–membind= | -m ] [–localalloc | -l] command args … numactl [–show | -s] numactl [–hardware | -H] numactl [–length | -l ] [–offset | -o ] [–shmmode | -M ] [–strict | -t] [–shmid | -I ] --shm | -S [–shmid | -I ] --file | -f [–huge | -u] [–touch | -T] memory policy | --dump | -d | --dump-nodes | -D

memory policy is --interleave | -i, --preferred | -p, --membind | -m, --localalloc | -l is a comma delimited list of node numbers or A-B ranges or all. Instead of a number a node can also be: netdev:DEV the node connected to network device DEV file:PATH the node the block device of path is connected to ip:HOST the node of the network device host routes through block:PATH the node of block device path pci:[seg:]bus:dev[:func] The node of a PCI device is a comma delimited list of cpu numbers or A-B ranges or all all ranges can be inverted with ! all numbers and ranges can be made cpuset-relative with + the old --cpubind argument is deprecated. use --cpunodebind or --physcpubind instead can have g (GB), m (MB) or k (KB) suffixes libnuma: Warning: node argument 3 is out of range

run_tikv.sh 脚本:

[root@tikv03 log]# cat …/scripts/run_tikv.sh #!/bin/bash set -e

WARNING: This file was auto-generated. Do not edit!

All your edit might be overwritten!

cd “/export/tikv4/tidb-deploy/tikv-20163” || exit 1

echo -n 'sync … ’ stat=$(time sync || sync) echo ok echo $stat exec numactl --cpunodebind=3 --membind=3 bin/tikv-server
–addr “0.0.0.0:20163”
–advertise-addr “10.10.110.31:20163”
–status-addr “10.10.110.31:20183”
–pd “10.10.110.45:2379,10.10.110.47:2379,10.10.110.49:2379”
–data-dir “/export/tikv4/tidb-data/tikv-20163”
–config conf/tikv.toml
–log-file “/export/tikv4/tidb-deploy/tikv-20163/log/tikv.log” 2>> “/export/tikv4/tidb-deploy/tikv-20163/log/tikv_stderr.log”

lscup 信息:

lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 56 On-line CPU(s) list: 0-55 Thread(s) per core: 2 Core(s) per socket: 14 座: 2 NUMA 节点: 2 厂商 ID: GenuineIntel CPU 系列: 6 型号: 85 型号名称: Intel® Xeon® Gold 5120 CPU @ 2.20GHz 步进: 4 CPU MHz: 2200.000 BogoMIPS: 4400.00 虚拟化: VT-x L1d 缓存: 32K L1i 缓存: 32K L2 缓存: 1024K L3 缓存: 19712K NUMA 节点0 CPU: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46,48,50,52,54 NUMA 节点1 CPU: 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39,41,43,45,47,49,51,53,55 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch epb cat_l3 cdp_l3 intel_ppin intel_pt ssbd mba ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts pku ospke spec_ctrl intel_stibp flush_l1d

我的部署是在本机器上部署四个tikv

  1. 安装哪个版本? ,tiup 方式安装吗?
  2. 总共几个服务器,拓扑是什么?
  3. 使用了绑核吗?
  4. 先尝试安装2个tikv实例试试

tiup cluster display tidb-test Starting component cluster: /home/tidb/.tiup/components/cluster/v0.6.0/cluster display tidb-test TiDB Cluster: tidb-test TiDB Version: v3.1.0 ID Role Host Ports Status Data Dir Deploy Dir


10.10.110.47:9093 alertmanager 10.10.110.47 9093/9094 Up /export/tidb-data/alertmanager-9093 /export/tidb-deploy/alertmanager-9093 10.10.110.47:3000 grafana 10.10.110.47 3000 Up - /export/tidb-deploy/grafana-3000 10.10.110.45:2379 pd 10.10.110.45 2379/2380 Healthy /export/tidb-data/pd-2379 /export/tidb-deploy/pd-2379 10.10.110.47:2379 pd 10.10.110.47 2379/2380 Healthy /export/tidb-data/pd-2379 /export/tidb-deploy/pd-2379 10.10.110.49:2379 pd 10.10.110.49 2379/2380 Healthy|L /export/tidb-data/pd-2379 /export/tidb-deploy/pd-2379 10.10.110.47:9090 prometheus 10.10.110.47 9090 Up /export/tidb-data/prometheus-9090 /export/tidb-deploy/prometheus-9090 10.10.110.45:4000 tidb 10.10.110.45 4000/10080 Up - /export/tidb-deploy/tidb-4000 10.10.110.47:4000 tidb 10.10.110.47 4000/10080 Up - /export/tidb-deploy/tidb-4000 10.10.110.49:4000 tidb 10.10.110.49 4000/10080 Up - /export/tidb-deploy/tidb-4000 10.10.109.21:20160 tikv 10.10.109.21 20160/20180 Up /export/tikv1/tidb-data/tikv-20160 /export/tikv1/tidb-deploy/tikv-20160 10.10.109.21:20161 tikv 10.10.109.21 20161/20181 Up /export/tikv2/tidb-data/tikv-20161 /export/tikv2/tidb-deploy/tikv-20161 10.10.109.21:20162 tikv 10.10.109.21 20162/20182 Up /export/tikv3/tidb-data/tikv-20162 /export/tikv3/tidb-deploy/tikv-20162 10.10.109.21:20163 tikv 10.10.109.21 20163/20183 Up /export/tikv4/tidb-data/tikv-20163 /export/tikv4/tidb-deploy/tikv-20163 10.10.110.28:20160 tikv 10.10.110.28 20160/20180 Up /export/tikv1/tidb-data/tikv-20160 /export/tikv1/tidb-deploy/tikv-20160 10.10.110.28:20161 tikv 10.10.110.28 20161/20181 Up /export/tikv2/tidb-data/tikv-20161 /export/tikv2/tidb-deploy/tikv-20161 10.10.110.28:20162 tikv 10.10.110.28 20162/20182 Up /export/tikv3/tidb-data/tikv-20162 /export/tikv3/tidb-deploy/tikv-20162 10.10.110.28:20163 tikv 10.10.110.28 20163/20183 Up /export/tikv4/tidb-data/tikv-20163 /export/tikv4/tidb-deploy/tikv-20163 10.10.110.31:20160 tikv 10.10.110.31 20160/20180 Up /export/tikv1/tidb-data/tikv-20160 /export/tikv1/tidb-deploy/tikv-20160 10.10.110.31:20161 tikv 10.10.110.31 20161/20181 Up /export/tikv2/tidb-data/tikv-20161 /export/tikv2/tidb-deploy/tikv-20161 10.10.110.31:20162 tikv 10.10.110.31 20162/20182 Up /export/tikv3/tidb-data/tikv-20162 /export/tikv3/tidb-deploy/tikv-20162 10.10.110.31:20163 tikv 10.10.110.31 20163/20183 Up /export/tikv4/tidb-data/tikv-20163 /export/tikv4/tidb-deploy/tikv-20163

  1. 安装哪个版本? ,tiup 方式安装吗?

v3.1.0 tiup安装

  1. 总共几个服务器,拓扑是什么?

拓扑如上图图

  1. 使用了绑核吗?

这个不知道咋查看

  1. 先尝试安装2个tikv实例试试

2tikv个没有问题,多余两个就出现这个问题了,这机器资源感觉两个太浪费了,计划装4个tikv

就是觉得生成脚本不检查物理资源的cpu信息吗,这个有点不科学啊

搜了几个报错,red hat的没有账号打不开,如果您有的话可以看一下。 感觉可能是硬件问题,内存损坏,或者是cpu不足。 你这些已经up的 4个tikv实例服务器和报错的有什么区别,对比下,多谢。

libnuma: Warning: node argument 3 is out of range
https://github.com/perfsonar/pscheduler/issues/834

我的意思是我机器只有cpubind: 0和 1 但是,生成脚本里面都有2和3了,我改成(0和 1)就能启动,全部关闭,再启动,有还原成(2和3)了,这样感觉不科学啊,我是希望tiup生成的时候获取一下这个值,然后再生成绑定脚本,并不是根据tikv数量去自增。能否改善一下,要不我集群一关闭,就启动不了了

这算个bug吗,2cpu56核,只能启动2个tikv?

这个 2、3是用户自己配置的,暂时没有对这个配置校验,需要用户自己填写有效值

我自己改了,这样有啥影响不?以后会不会对tiup进行检测

你不需要修改启动脚本,只需要在 topo 文件填写正确的值就可以了,你是否在 topo 文件中在对应的 tikv 对应实例的配置中 numa 配置配置了 2、3 呢?可以通过 tiup cluster edit-config 来修改,使用 tiup cluster reload 重载配置就可以了

没有在配置文件中填写,我再看看啊,

这个部署脚本不会自动生成,请问是否可以提供部署时的拓扑文件