Error: failed to start pd: failed to start: 10.96.129.222 pd-2379.service, please check the instance’s log(/data/tidb/tidb-deploy/pd-2379/log) for more detail.: timed out waiting for port 2379 to be started after 2m0s
详细日志信息拿出来看看~
libnuma: Warning: node argument 1 is out of range
usage: numactl [–all | -a] [–interleave= | -i ] [–preferred= | -p ]
[–physcpubind= | -C ] [–cpunodebind= | -N ]
[–membind= | -m ] [–localalloc | -l] command args …
numactl [–show | -s]
numactl [–hardware | -H]
numactl [–length | -l ] [–offset | -o ] [–shmmode | -M ]
[–strict | -t]
[–shmid | -I ] --shm | -S
[–shmid | -I ] --file | -f
[–huge | -u] [–touch | -T]
memory policy | --dump | -d | --dump-nodes | -D
memory policy is --interleave | -i, --preferred | -p, --membind | -m, --localalloc | -l
is a comma delimited list of node numbers or A-B ranges or all.
Instead of a number a node can also be:
netdev:DEV the node connected to network device DEV
file:PATH the node the block device of path is connected to
ip:HOST the node of the network device host routes through
block:PATH the node of block device path
pci:[seg:]bus:dev[:func] The node of a PCI device
is a comma delimited list of cpu numbers or A-B ranges or all
all ranges can be inverted with !
all numbers and ranges can be made cpuset-relative with +
the old --cpubind argument is deprecated.
use --cpunodebind or --physcpubind instead
can have g (GB), m (MB) or k (KB) suffixes
libnuma: Warning: node argument 1 is out of range
usage: numactl [–all | -a] [–interleave= | -i ] [–preferred= | -p ]
[–physcpubind= | -C ] [–cpunodebind= | -N ]
[–membind= | -m ] [–localalloc | -l] command args …
numactl [–show | -s]
numactl [–hardware | -H]
numactl [–length | -l ] [–offset | -o ] [–shmmode | -M ]
[–strict | -t]
[–shmid | -I ] --shm | -S
[–shmid | -I ] --file | -f
[–huge | -u] [–touch | -T]
memory policy | --dump | -d | --dump-nodes | -D
memory policy is --interleave | -i, --preferred | -p, --membind | -m, --localalloc | -l
is a comma delimited list of node numbers or A-B ranges or all.
Instead of a number a node can also be:
netdev:DEV the node connected to network device DEV
file:PATH the node the block device of path is connected to
ip:HOST the node of the network device host routes through
block:PATH the node of block device path
pci:[seg:]bus:dev[:func] The node of a PCI device
is a comma delimited list of cpu numbers or A-B ranges or all
all ranges can be inverted with !
all numbers and ranges can be made cpuset-relative with +
the old --cpubind argument is deprecated.
use --cpunodebind or --physcpubind instead
can have g (GB), m (MB) or k (KB) suffixes
2024-01-15T14:35:21.530+0800 DEBUG retry error {“error”: “operation timed out after 2m0s”}
2024-01-15T14:35:21.763+0800 INFO SSHCommand {“host”: “10.96.129.221”, “port”: “22”, “cmd”: “export LANG=C; PATH=$PATH:/bin:/sbin:/usr/bin:/usr/sbin ss -ltn”, “stdout”: “State Recv-Q Send-Q Local Address:Port Peer Address:Port \nLISTEN 0 128 :111
: \nLISTEN 0 128 :22 : \nLISTEN 0 100 127.0.0.1:25 : \nLISTEN 0 128 [::]:111 [::]: \nLISTEN 0 128 [::]:2
2 [::]: \nLISTEN 0 100 [::1]:25 [::]:* \n”, “stderr”: “”}
2024-01-15T14:35:21.763+0800 DEBUG retry error {“error”: “operation timed out after 2m0s”}
2024-01-15T14:35:22.383+0800 INFO SSHCommand {“host”: “10.96.129.223”, “port”: “22”, “cmd”: “export LANG=C; PATH=$PATH:/bin:/sbin:/usr/bin:/usr/sbin ss -ltn”, “stdout”: “State Recv-Q Send-Q Local Address:Port Peer Address:Port \nLISTEN 0 128 :111
: \nLISTEN 0 128 :22 : \nLISTEN 0 100 127.0.0.1:25 : \nLISTEN 0 128 [::]:111 [::]: \nLISTEN 0 128 [::]:2
2 [::]: \nLISTEN 0 100 [::1]:25 [::]:* \n”, “stderr”: “”}
2024-01-15T14:35:22.383+0800 DEBUG retry error {“error”: “operation timed out after 2m0s”}
2024-01-15T14:35:22.383+0800 DEBUG TaskFinish {“task”: “StartCluster”, “error”: “failed to start pd: failed to start: 10.96.129.222 pd-2379.service, please check the instance’s log(/data/tidb/tidb-deploy/pd-2379/log) for more detail.: timed out waiting for port 2379 to be started after 2m0s”, “errorVerbos
e”: “timed out waiting for port 2379 to be started after 2m0s\ngithub.com/pingcap/tiup/pkg/cluster/module.(*WaitFor).Execute\n\tgithub.com/pingcap/tiup/pkg/cluster/module/wait_for.go:92\ngithub.com/pingcap/tiup/pkg/cluster/spec.PortStarted\n\tgithub.com/pingcap/tiup/pkg/cluster/spec/instance.go:129\ngithub.com/ping
cap/tiup/pkg/cluster/spec.(*BaseInstance).Ready\n\tgithub.com/pingcap/tiup/pkg/cluster/spec/instance.go:167\ngithub.com/pingcap/tiup/pkg/cluster/operation.startInstance\n\tgithub.com/pingcap/tiup/pkg/cluster/operation/action.go:405\ngithub.com/pingcap/tiup/pkg/cluster/operation.StartComponent.func1\n\tgithub.com/pi
ngcap/tiup/pkg/cluster/operation/action.go:534\ngolang.org/x/sync/errgroup.(*Group).Go.func1\n\tgolang.org/x/sync@v0.1.0/errgroup/errgroup.go:75\nruntime.goexit\n\truntime/asm_amd64.s:1650\nfailed to start: 10.96.129.222 pd-2379.service, please check the instance’s log(/data/tidb/tidb-deploy/pd-2379/log) for more d
etail.\nfailed to start pd”}
2024-01-15T14:35:22.383+0800 INFO Execute command finished {“code”: 1, “error”: “failed to start pd: failed to start: 10.96.129.222 pd-2379.service, please check the instance’s log(/data/tidb/tidb-deploy/pd-2379/log) for more detail.: timed out waiting for port 2379 to be started after 2m0s”, “errorVer
bose”: “timed out waiting for port 2379 to be started after 2m0s\ngithub.com/pingcap/tiup/pkg/cluster/module.(*WaitFor).Execute\n\tgithub.com/pingcap/tiup/pkg/cluster/module/wait_for.go:92\ngithub.com/pingcap/tiup/pkg/cluster/spec.PortStarted\n\tgithub.com/pingcap/tiup/pkg/cluster/spec/instance.go:129\ngithub.com/p
ingcap/tiup/pkg/cluster/spec.(*BaseInstance).Ready\n\tgithub.com/pingcap/tiup/pkg/cluster/spec/instance.go:167\ngithub.com/pingcap/tiup/pkg/cluster/operation.startInstance\n\tgithub.com/pingcap/tiup/pkg/cluster/operation/action.go:405\ngithub.com/pingcap/tiup/pkg/cluster/operation.StartComponent.func1\n\tgithub.com
/pingcap/tiup/pkg/cluster/operation/action.go:534\ngolang.org/x/sync/errgroup.(*Group).Go.func1\n\tgolang.org/x/sync@v0.1.0/errgroup/errgroup.go:75\nruntime.goexit\n\truntime/asm_amd64.s:1650\nfailed to start: 10.96.129.222 pd-2379.service, please check the instance’s log(/data/tidb/tidb-deploy/pd-2379/log) for mor
e detail.\nfailed to start pd”}
感觉是绑核有点问题 ,看日志感觉是内存配置超过了你绑核的资源 。不行绑核去掉呢?
numa_node 这个已经去掉了。(是这个吗)
嗯 去掉了还是起不来么?
嗯 !!!
为什么加了numa_node启动有,改了参数也不行就是要重新建db
yaml配置文件发一下吧
global:
user: “tidb”
group: “tidb”
ssh_port: 22
deploy_dir: “/data/tidb/tidb-deploy”
data_dir: “/data/tidb/tidb-data”
listen_host: 0.0.0.0
arch: “amd64”
monitored:
node_exporter_port: 9100
blackbox_exporter_port: 9115
deploy_dir: “/data/tidb/tidb-deploy/monitored-9100”
data_dir: “/data/tidb/tidb-data/monitored-9100”
log_dir: “/data/tidb/tidb-deploy/monitored-9100/log”
pd_servers:
-
host: 10.96.129.221
ssh_port: 22
name: “pd-129-221”
client_port: 2379
peer_port: 2380
deploy_dir: “/data/tidb/tidb-deploy/pd-2379”
data_dir: “/data/tidb/tidb-data/pd-2379”
log_dir: “/data/tidb/tidb-deploy/pd-2379/log” -
host: 10.96.129.222
ssh_port: 22
name: “pd-129.222”
client_port: 2379
peer_port: 2380
deploy_dir: “/data/tidb/tidb-deploy/pd-2379”
data_dir: “/data/tidb/tidb-data/pd-2379”
log_dir: “/data/tidb/tidb-deploy/pd-2379/log” -
host: 10.96.129.223
ssh_port: 22
name: “pd-129-223”
client_port: 2379
peer_port: 2380
deploy_dir: “/data/tidb/tidb-deploy/pd-2379”
data_dir: “/data/tidb/tidb-data/pd-2379”
log_dir: “/data/tidb/tidb-deploy/pd-2379/log”
tidb_servers:
- host: 10.96.129.224
ssh_port: 22
port: 4000
status_port: 10080
deploy_dir: “/data/tidb/tidb-deploy/tidb-4000”
log_dir: “/data/tidb/tidb-deploy/tidb-4000/log” - host: 10.96.129.225
ssh_port: 22
port: 4000
status_port: 10081
deploy_dir: “/data/tidb/tidb-deploy/tidb-4000”
log_dir: “/data/tidb/tidb-deploy/tidb-4000/log” - host: 10.96.129.226
ssh_port: 22
port: 4000
status_port: 10080
deploy_dir: “/data/tidb/tidb-deploy/tidb-4000”
log_dir: “/data/tidb/tidb-deploy/tidb-4000/log”
tikv_servers:
-
host: 10.96.129.227
ssh_port: 22
port: 20160
status_port: 20180
deploy_dir: “/data/tidb/tidb-deploy/tikv-20160”
data_dir: “/data/tidb/tidb-data/tikv-20160”
log_dir: “/data/tidb/tidb-deploy/tikv-20160/log” -
host: 10.96.129.228
ssh_port: 22
port: 20161
status_port: 20181
deploy_dir: “/data/tidb/tidb-deploy/tikv-20161”
data_dir: “/data/tidb/tidb-data/tikv-20161”
log_dir: “/data/tidb/tidb-deploy/tikv-20161/log” -
host: 10.96.129.229
ssh_port: 22
port: 20160
status_port: 20180
deploy_dir: “/data/tidb/tidb-deploy/tikv-20160”
data_dir: “/data/tidb/tidb-data/tikv-20160”
log_dir: “/data/tidb/tidb-deploy/tikv-20160/log”
tidb_dashboard_servers:
- host: 10.96.129.230
ssh_port: 22
port: 12333
deploy_dir: “/data/tidb/tidb-deploy/tidb-dashboard-12333”
data_dir: “/data/tidb/tidb-data/tidb-dashboard-12333”
log_dir: “/data/tidb/tidb-deploy/tidb-dashboard-12333/log”
monitoring_servers:
- host: 10.96.129.230
ssh_port: 22
port: 9090
ng_port: 12020
deploy_dir: “/data/tidb/tidb-deploy/prometheus-8249”
data_dir: “/data/tidb/tidb-data/prometheus-8249”
log_dir: “/data/tidb/tidb-deploy/prometheus-8249/log”
#rule_dir: /data/tidb/prometheus_rule
scrape_interval: 15s
scrape_timeout: 10s
grafana_servers: - host: 10.96.129.230
port: 3000
deploy_dir: “/data/tidb/tidb-deploy/grafana-3000”
#dashboard_dir: /data/tidb/dashboards
alertmanager_servers:
- host: 10.96.129.230
ssh_port: 22
listen_host: 0.0.0.0
web_port: 9093
cluster_port: 9094
deploy_dir: “/data/tidb/tidb-deploy/alertmanager-9093”
data_dir: “/data/tidb/tidb-data/alertmanager-9093”
log_dir: “/data/tidb/tidb-deploy/alertmanager-9093/log”
#config_file: “/data/tidb/tidb-deploy/alertmanager-9093/bin/alertmanager”
检查一下网络情况
配置文件没啥问题啊看着,现在启动还报错吗?
检查一下端口是否有被占用的情况啊?
lsof -i:2379查看一下端口是不是占用了,可以重启的话,重启一下再试试。
重启试试