tidb节点状态为 Down ，无法将 Down 的 tidb 节点启动起来

withseid · 2022 年8 月 23 日 14:16

背景
一个tidb 集群，tidb 节点有 3 个 down 了其中一个，不知道是否因执行某个 delete 操作，没注意看清 delete 的数据有多少，后续发现一次 delete 的数据量太多了，造成其中一个 tidb down 了。

后续发现某个 tidb 节点 down 了之后，使用 tiup cluster restart --node ip:port ，准备将其启动时，显示以下信息

Error: failed to start tidb: failed to start: 10.20.70.39 tidb-14000.service, please check the instance's log(/ssd/tidb-deploy/tidb-14000/log) for more detail.: timed out waiting for port 14000 to be started after 2m0s

Verbose debug logs has been written to /home/tidb/.tiup/logs/tiup-cluster-debug-2022-08-23-22-06-23.log.
Error: run `/home/tidb/.tiup/components/cluster/v1.7.0/tiup-cluster` (wd:/home/tidb/.tiup/data/TFLI7w1) failed: exit status 1

/home/tidb/.tiup/logs/tiup-cluster-debug-2022-08-23-22-06-23.log 文件关键信息

2022-08-23T22:06:23.264+0800    DEBUG   retry error: operation timed out after 2m0s
2022-08-23T22:06:23.264+0800    DEBUG   TaskFinish      {"task": "StartCluster", "error": "failed to start tidb: failed to start: 10.20.70.39 tidb-14000.service, please check the instance's log(/ssd/tidb-deploy/tidb-14000/log) for more detail.: timed out waiting for port 14000 to be started after 2m0s", "errorVerbose": "timed out waiting for port 14000 to be started after 2m0s\
github.com/pingcap/tiup/pkg/cluster/module.(*WaitFor).Execute\
\tgithub.com/pingcap/tiup/pkg/cluster/module/wait_for.go:91\
github.com/pingcap/tiup/pkg/cluster/spec.PortStarted\
\tgithub.com/pingcap/tiup/pkg/cluster/spec/instance.go:115\
github.com/pingcap/tiup/pkg/cluster/spec.(*BaseInstance).Ready\
\tgithub.com/pingcap/tiup/pkg/cluster/spec/instance.go:147\
github.com/pingcap/tiup/pkg/cluster/operation.startInstance\
\tgithub.com/pingcap/tiup/pkg/cluster/operation/action.go:359\
github.com/pingcap/tiup/pkg/cluster/operation.StartComponent.func1\
\tgithub.com/pingcap/tiup/pkg/cluster/operation/action.go:485\
golang.org/x/sync/errgroup.(*Group).Go.func1\
\tgolang.org/x/sync@v0.0.0-20210220032951-036812b2e83c/errgroup/errgroup.go:57\
runtime.goexit\
\truntime/asm_amd64.s:1581\
failed to start: 10.20.70.39 tidb-14000.service, please check the instance's log(/ssd/tidb-deploy/tidb-14000/log) for more detail.\
failed to start tidb"}
2022-08-23T22:06:23.265+0800    INFO    Execute command finished        {"code": 1, "error": "failed to start tidb: failed to start: 10.20.70.39 tidb-14000.service, please check the instance's log(/ssd/tidb-deploy/tidb-14000/log) for more detail.: timed out waiting for port 14000 to be started after 2m0s", "errorVerbose": "timed out waiting for port 14000 to be started after 2m0s\
github.com/pingcap/tiup/pkg/cluster/module.(*WaitFor).Execute\
\tgithub.com/pingcap/tiup/pkg/cluster/module/wait_for.go:91\
github.com/pingcap/tiup/pkg/cluster/spec.PortStarted\
\tgithub.com/pingcap/tiup/pkg/cluster/spec/instance.go:115\
github.com/pingcap/tiup/pkg/cluster/spec.(*BaseInstance).Ready\
\tgithub.com/pingcap/tiup/pkg/cluster/spec/instance.go:147\
github.com/pingcap/tiup/pkg/cluster/operation.startInstance\
\tgithub.com/pingcap/tiup/pkg/cluster/operation/action.go:359\
github.com/pingcap/tiup/pkg/cluster/operation.StartComponent.func1\
\tgithub.com/pingcap/tiup/pkg/cluster/operation/action.go:485\
golang.org/x/sync/errgroup.(*Group).Go.func1\
\tgolang.org/x/sync@v0.0.0-20210220032951-036812b2e83c/errgroup/errgroup.go:57\
runtime.goexit\
\truntime/asm_amd64.s:1581\
failed to start: 10.20.70.39 tidb-14000.service, please check the instance's log(/ssd/tidb-deploy/tidb-14000/log) for more detail.\
failed to start tidb"}

查看 tidb log，日志如下

:32.719 +08:00] [INFO] [domain.go:506] ["globalConfigSyncerKeeper exited."]
[2022/08/23 19:27:32.719 +08:00] [WARN] [manager.go:291] ["is not the owner"] ["owner info"="[bindinfo] /tidb/bindinfo/owner ownerManager 4c6db34a-8534-43bb-aa8c-45d8a40f0a53"]
[2022/08/23 19:27:32.719 +08:00] [INFO] [domain.go:1187] ["PlanReplayerLoop exited."]
[2022/08/23 19:27:32.719 +08:00] [INFO] [manager.go:258] ["break campaign loop, context is done"] ["owner info"="[bindinfo] /tidb/bindinfo/owner ownerManager 4c6db34a-8534-43bb-aa8c-45d8a40f0a53"]
[2022/08/23 19:27:32.719 +08:00] [INFO] [domain.go:933] ["loadPrivilegeInLoop exited."]
[2022/08/23 19:27:32.719 +08:00] [WARN] [manager.go:291] ["is not the owner"] ["owner info"="[telemetry] /tidb/telemetry/owner ownerManager 4c6db34a-8534-43bb-aa8c-45d8a40f0a53"]
[2022/08/23 19:27:32.719 +08:00] [INFO] [domain.go:1165] ["TelemetryRotateSubWindowLoop exited."]
[2022/08/23 19:27:32.719 +08:00] [INFO] [manager.go:345] ["watcher is closed, no owner"] ["owner info"="[stats] ownerManager 4c6db34a-8534-43bb-aa8c-45d8a40f0a53 watch owner key /tidb/stats/owner/161581faba041325"]
[2022/08/23 19:27:32.719 +08:00] [INFO] [domain.go:560] ["loadSchemaInLoop exited."]
[2022/08/23 19:27:32.719 +08:00] [INFO] [domain.go:1061] ["globalBindHandleWorkerLoop exited."]
[2022/08/23 19:27:32.719 +08:00] [INFO] [manager.go:249] ["etcd session is done, creates a new one"] ["owner info"="[telemetry] /tidb/telemetry/owner ownerManager 4c6db34a-8534-43bb-aa8c-45d8a40f0a53"]
[2022/08/23 19:27:32.720 +08:00] [INFO] [manager.go:253] ["break campaign loop, NewSession failed"] ["owner info"="[telemetry] /tidb/telemetry/owner ownerManager 4c6db34a-8534-43bb-aa8c-45d8a40f0a53"] [error="context canceled"]
[2022/08/23 19:27:32.720 +08:00] [WARN] [manager.go:291] ["is not the owner"] ["owner info"="[stats] /tidb/stats/owner ownerManager 4c6db34a-8534-43bb-aa8c-45d8a40f0a53"]
[2022/08/23 19:27:32.720 +08:00] [INFO] [domain.go:452] ["topNSlowQueryLoop exited."]
[2022/08/23 19:27:33.826 +08:00] [INFO] [manager.go:277] ["failed to campaign"] ["owner info"="[stats] /tidb/stats/owner ownerManager 4c6db34a-8534-43bb-aa8c-45d8a40f0a53"] [error="context canceled"]
[2022/08/23 19:27:33.827 +08:00] [INFO] [manager.go:249] ["etcd session is done, creates a new one"] ["owner info"="[stats] /tidb/stats/owner ownerManager 4c6db34a-8534-43bb-aa8c-45d8a40f0a53"]
[2022/08/23 19:27:33.827 +08:00] [INFO] [manager.go:253] ["break campaign loop, NewSession failed"] ["owner info"="[stats] /tidb/stats/owner ownerManager 4c6db34a-8534-43bb-aa8c-45d8a40f0a53"] [error="context canceled"]
[2022/08/23 19:27:33.953 +08:00] [INFO] [manager.go:302] ["revoke session"] ["owner info"="[telemetry] /tidb/telemetry/owner ownerManager 4c6db34a-8534-43bb-aa8c-45d8a40f0a53"] [error="rpc error: code = Canceled desc = grpc: the client connection is closing"]
[2022/08/23 19:27:33.953 +08:00] [INFO] [domain.go:1135] ["TelemetryReportLoop exited."]
[2022/08/23 19:27:33.971 +08:00] [INFO] [manager.go:302] ["revoke session"] ["owner info"="[bindinfo] /tidb/bindinfo/owner ownerManager 4c6db34a-8534-43bb-aa8c-45d8a40f0a53"] [error="rpc error: code = Canceled desc = grpc: the client connection is closing"]
[2022/08/23 19:27:33.971 +08:00

因为没能解决重启的问题，就想着暴力一点，直接将状态为 Dwon 的 tidb 缩容，缩容后，再扩容一个新的 tidb 节点，但是扩容的时候，发现也出问题了，显示以下 Error

Error: executor.ssh.execute_failed: Failed to execute command over SSH for 'tidb@10.20.70.38:22' {ssh_stderr: , ssh_stdout: [2022/08/23 21:56:26.080 +08:00] [FATAL] [terror.go:292]["unexpected error"] [error="toml: cannot load TOML value of type string into a Go integer"] [stack="github.com/pingcap/tidb/parser/terror.MustNil\
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/parser/terror/terror.go:292\
github.com/pingcap/tidb/config.InitializeConfig\
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/config/config.go:796\
main.main\
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/tidb-server/main.go:177\
runtime.main\
\t/usr/local/go/src/runtime/proc.go:225"] [stack="github.com/pingcap/tidb/parser/terror.MustNil\
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/parser/terror/terror.go:292\
github.com/pingcap/tidb/config.InitializeConfig\
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/config/config.go:796\
main.main\
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/tidb-server/main.go:177\
runtime.main\
\t/usr/local/go/src/runtime/proc.go:225"]
, ssh_command: export LANG=C; PATH=$PATH:/bin:/sbin:/usr/bin:/usr/sbin /ssd/tidb-deploy/tidb-14000/bin/tidb-server --config-check --config=/ssd/tidb-deploy/tidb-14000/conf/tidb.toml }, cause: Process exited with status 1: check config failed

Verbose debug logs has been written to /home/tidb/.tiup/logs/tiup-cluster-debug-2022-08-23-21-56-35.log.
Error: run `/home/tidb/.tiup/components/cluster/v1.7.0/tiup-cluster` (wd:/home/tidb/.tiup/data/TFLG7Ha) failed: exit status 1

现在遇到的问题就是已经 Down 掉的 tidb 节点无法重启，也无法扩容新的 tidb 节点进来

withseid · 2022 年8 月 23 日 14:26

这边重新补充一下 tidb 故障节点的 log , 上面贴少了

[2022/08/23 19:27:32.441 +08:00] [ERROR] [terror.go:307] ["encountered error"] [error="write tcp 10.20.70.39:14000->10.20.70.27:48664: write: connection reset by peer"] [stack="github.com/pingcap/tidb/parser/terror.Log\
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/parser/terror/terror.go:307\
github.com/pingcap/tidb/server.(*packetIO).writePacket\
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/server/packetio.go:174\
github.com/pingcap/tidb/server.(*clientConn).writePacket\
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/server/conn.go:399\
github.com/pingcap/tidb/server.(*clientConn).writeError\
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/server/conn.go:1446\
github.com/pingcap/tidb/server.(*clientConn).Run\
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/server/conn.go:1078\
github.com/pingcap/tidb/server.(*Server).onConn\
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/server/server.go:551"]
[2022/08/23 19:27:32.441 +08:00] [ERROR] [terror.go:307] ["encountered error"] [error="connection was bad"] [stack="github.com/pingcap/tidb/parser/terror.Log\
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/parser/terror/terror.go:307\
github.com/pingcap/tidb/server.(*clientConn).Run\
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/server/conn.go:1079\
github.com/pingcap/tidb/server.(*Server).onConn\
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/server/server.go:551"]
[2022/08/23 19:27:32.709 +08:00] [WARN] [manager.go:291] ["is not the owner"] ["owner info"="[ddl] /tidb/ddl/fg/owner ownerManager 4c6db34a-8534-43bb-aa8c-45d8a40f0a53"]
[2022/08/23 19:27:32.709 +08:00] [INFO] [manager.go:258] ["break campaign loop, context is done"] ["owner info"="[ddl] /tidb/ddl/fg/owner ownerManager 4c6db34a-8534-43bb-aa8c-45d8a40f0a53"]
[2022/08/23 19:27:32.712 +08:00] [INFO] [manager.go:302] ["revoke session"] ["owner info"="[ddl] /tidb/ddl/fg/owner ownerManager 4c6db34a-8534-43bb-aa8c-45d8a40f0a53"] []
[2022/08/23 19:27:32.715 +08:00] [INFO] [ddl_worker.go:149] ["[ddl] DDL worker closed"] [worker="worker 1, tp general"] ["take time"=1.058µs]
[2022/08/23 19:27:32.715 +08:00] [INFO] [ddl_worker.go:149] ["[ddl] DDL worker closed"] [worker="worker 2, tp add index"] ["take time"=354ns]
[2022/08/23 19:27:32.715 +08:00] [INFO] [delete_range.go:132] ["[ddl] closing delRange"]
[2022/08/23 19:27:32.715 +08:00] [INFO] [session_pool.go:86] ["[ddl] closing sessionPool"]
[2022/08/23 19:27:32.715 +08:00] [INFO] [ddl.go:417] ["[ddl] DDL closed"] [ID=4c6db34a-8534-43bb-aa8c-45d8a40f0a53] ["take time"=6.243619ms]
[2022/08/23 19:27:32.715 +08:00] [INFO] [ddl.go:328] ["[ddl] stop DDL"] [ID=4c6db34a-8534-43bb-aa8c-45d8a40f0a53]
[2022/08/23 19:27:32.719 +08:00] [INFO] [domain.go:481] ["infoSyncerKeeper exited."]
[2022/08/23 19:27:32.719 +08:00] [INFO] [domain.go:529] ["topologySyncerKeeper exited."]
[2022/08/23 19:27:32.719 +08:00] [INFO] [domain.go:1422] ["autoAnalyzeWorker exited."]
[2022/08/23 19:27:32.719 +08:00] [INFO] [domain.go:983] ["LoadSysVarCacheLoop exited."]
[2022/08/23 19:27:32.719 +08:00] [INFO] [domain.go:1293] ["loadStatsWorker exited."]
[2022/08/23 19:27:32.719 +08:00] [INFO] [domain.go:506] ["globalConfigSyncerKeeper exited."]
[2022/08/23 19:27:32.719 +08:00] [WARN] [manager.go:291] ["is not the owner"] ["owner info"="[bindinfo] /tidb/bindinfo/owner ownerManager 4c6db34a-8534-43bb-aa8c-45d8a40f0a53"]
[2022/08/23 19:27:32.719 +08:00] [INFO] [domain.go:1187] ["PlanReplayerLoop exited."]
[2022/08/23 19:27:32.719 +08:00] [INFO] [manager.go:258] ["break campaign loop, context is done"] ["owner info"="[bindinfo] /tidb/bindinfo/owner ownerManager 4c6db34a-8534-43bb-aa8c-45d8a40f0a53"]
[2022/08/23 19:27:32.719 +08:00] [INFO] [domain.go:933] ["loadPrivilegeInLoop exited."]
[2022/08/23 19:27:32.719 +08:00] [WARN] [manager.go:291] ["is not the owner"] ["owner info"="[telemetry] /tidb/telemetry/owner ownerManager 4c6db34a-8534-43bb-aa8c-45d8a40f0a53"]
[2022/08/23 19:27:32.719 +08:00] [INFO] [domain.go:1165] ["TelemetryRotateSubWindowLoop exited."]
[2022/08/23 19:27:32.719 +08:00] [INFO] [manager.go:345] ["watcher is closed, no owner"] ["owner info"="[stats] ownerManager 4c6db34a-8534-43bb-aa8c-45d8a40f0a53 watch owner key /tidb/stats/owner/161581faba041325"]
[2022/08/23 19:27:32.719 +08:00] [INFO] [domain.go:560] ["loadSchemaInLoop exited."]
[2022/08/23 19:27:32.719 +08:00] [INFO] [domain.go:1061] ["globalBindHandleWorkerLoop exited."]
[2022/08/23 19:27:32.719 +08:00] [INFO] [manager.go:249] ["etcd session is done, creates a new one"] ["owner info"="[telemetry] /tidb/telemetry/owner ownerManager 4c6db34a-8534-43bb-aa8c-45d8a40f0a53"]
[2022/08/23 19:27:32.720 +08:00] [INFO] [manager.go:253] ["break campaign loop, NewSession failed"] ["owner info"="[telemetry] /tidb/telemetry/owner ownerManager 4c6db34a-8534-43bb-aa8c-45d8a40f0a53"] [error="context canceled"]
[2022/08/23 19:27:32.720 +08:00] [WARN] [manager.go:291] ["is not the owner"] ["owner info"="[stats] /tidb/stats/owner ownerManager 4c6db34a-8534-43bb-aa8c-45d8a40f0a53"]
[2022/08/23 19:27:32.720 +08:00] [INFO] [domain.go:452] ["topNSlowQueryLoop exited."]
[2022/08/23 19:27:33.826 +08:00] [INFO] [manager.go:277] ["failed to campaign"] ["owner info"="[stats] /tidb/stats/owner ownerManager 4c6db34a-8534-43bb-aa8c-45d8a40f0a53"] [error="context canceled"]
[2022/08/23 19:27:33.827 +08:00] [INFO] [manager.go:249] ["etcd session is done, creates a new one"] ["owner info"="[stats] /tidb/stats/owner ownerManager 4c6db34a-8534-43bb-aa8c-45d8a40f0a53"]
[2022/08/23 19:27:33.827 +08:00] [INFO] [manager.go:253] ["break campaign loop, NewSession failed"] ["owner info"="[stats] /tidb/stats/owner ownerManager 4c6db34a-8534-43bb-aa8c-45d8a40f0a53"] [error="context canceled"]
[2022/08/23 19:27:33.953 +08:00] [INFO] [manager.go:302] ["revoke session"] ["owner info"="[telemetry] /tidb/telemetry/owner ownerManager 4c6db34a-8534-43bb-aa8c-45d8a40f0a53"] [error="rpc error: code = Canceled desc = grpc: the client connection is closing"]
[2022/08/23 19:27:33.953 +08:00] [INFO] [domain.go:1135] ["TelemetryReportLoop exited."]
[2022/08/23 19:27:33.971 +08:00] [INFO] [manager.go:302] ["revoke session"] ["owner info"="[bindinfo] /tidb/bindinfo/owner ownerManager 4c6db34a-8534-43bb-aa8c-45d8a40f0a53"] [error="rpc error: code = Canceled desc = grpc: the client connection is closing"]
[2022/08/23 19:27:33.971 +08:00] [INFO] [domain.go:1102] ["handleEvolvePlanTasksLoop exited."]
[2022/08/23 19:27:35.083 +08:00] [INFO] [manager.go:302] ["revoke session"] ["owner info"="[stats] /tidb/stats/owner ownerManager 4c6db34a-8534-43bb-aa8c-45d8a40f0a53"] [error="rpcerror: code = Canceled desc = grpc: the client connection is closing"]
[2022/08/23 19:27:35.083 +08:00] [INFO] [domain.go:1369] ["updateStatsWorker exited."]
[2022/08/23 19:27:35.083 +08:00] [INFO] [domain.go:685] ["domain closed"] ["take time"=2.374434803s]
[2022/08/23 19:27:35.083 +08:00] [ERROR] [client.go:752] ["[pd] fetch pending tso requests error"] [dc-location=global] [error="[PD:client:ErrClientGetTSO]context canceled: contextcanceled"]
[2022/08/23 19:27:35.083 +08:00] [INFO] [gc_worker.go:230] ["[gc worker] quit"] [uuid=60aab57412c0007]
[2022/08/23 19:27:35.083 +08:00] [INFO] [client.go:666] ["[pd] exit tso dispatcher"] [dc-location=global]

xfworld · 2022 年8 月 24 日 01:51

–config-check --config=/ssd/tidb-deploy/tidb-14000/conf/tidb.toml }, cause: Process exited with status 1: check config failed

看起来是配置文件有错误…
多检查一下吧

withseid · 2022 年8 月 24 日 01:55

待扩容机器上的 tidb.toml 文件

# WARNING: This file is auto-generated. Do not edit! All your modification will be overwritten!
# You can use 'tiup cluster edit-config' and 'tiup cluster reload' to update the configuration
# All configuration items you want to change can be added to:
# server_configs:
#   tidb:
#     aa.b1.c3: value
#     aa.b2.c4: value
mem-quota-query = 4294967296

[log]
slow-threshold = "300i"

[performance]
txn-total-size-limit = 1073741824

Kongdom · 2022 年8 月 24 日 02:30

改成这样，根节点给注释了。

# WARNING: This file is auto-generated. Do not edit! All your modification will be overwritten!
# You can use 'tiup cluster edit-config' and 'tiup cluster reload' to update the configuration
# All configuration items you want to change can be added to:
server_configs:
tidb:
#     aa.b1.c3: value
#     aa.b2.c4: value
mem-quota-query = 4294967296

[log]
slow-threshold = "300i"

[performance]
txn-total-size-limit = 1073741824

withseid · 2022 年8 月 24 日 03:34

还是不行，改了也会被覆盖的，这个 tidb.toml 是 conf 目录下的，这个是扩容时候自动生成的，之前扩容节点的时候，都不需要手动去改自动生成的文件

这边是整个集群的配置
通过 tiup cluster edit-config cluster_name 命令获取的配置信息如下

global:
  user: tidb
  ssh_port: 22
  ssh_type: builtin
  deploy_dir: /data/tidb-deploy
  data_dir: /data/tidb-data
  os: linux
monitored:
  node_exporter_port: 19100
  blackbox_exporter_port: 19115
  deploy_dir: /data/tidb-deploy/monitored-19100
  data_dir: /data/tidb-data/monitored-19100
  log_dir: /data/tidb-deploy/monitored-19100/log
server_configs:
  tidb:
    log.slow-threshold: 300i
    mem-quota-query: 4294967296
    performance.txn-total-size-limit: 1073741824
  tikv:
    readpool.coprocessor.use-unified-pool: true
    readpool.storage.use-unified-pool: true
    readpool.unified.max-thread-count: 25
    storage.block-cache.capacity: 48G
  pd:
    replication.location-labels:
    - host
    schedule.leader-schedule-limit: 4
    schedule.region-schedule-limit: 2048
    schedule.replica-schedule-limit: 64
  tiflash: {}
  tiflash-learner: {}
  pump: {}
  drainer: {}
  cdc: {}
tidb_servers:
- host: 10.20.70.39
  ssh_port: 22
  port: 14000
  status_port: 10080
  deploy_dir: /ssd/tidb-deploy/tidb-14000
  log_dir: /ssd/tidb-deploy/tidb-14000/log
  numa_node: "1"
  arch: amd64
  os: linux
- host: 10.20.70.24
  ssh_port: 22
  port: 14000
  status_port: 10080
  deploy_dir: /ssd/tidb-deploy/tidb-14000
  log_dir: /ssd/tidb-deploy/tidb-14000/log
  arch: amd64
  os: linux
- host: 10.20.70.37
  ssh_port: 22
  port: 14000
  status_port: 10080
  deploy_dir: /ssd/tidb-deploy/tidb-14000
  log_dir: /ssd/tidb-deploy/tidb-14000/log
  numa_node: "1"
  arch: amd64
  os: linux
tikv_servers:
- host: 10.20.70.38
  ssh_port: 22
  port: 20160
  status_port: 20180
  deploy_dir: /ssd/tidb-deploy/tikv-20160
  data_dir: /ssd/tidb-data/tikv-20160
  log_dir: /ssd/tidb-deploy/tikv-20160/log
  numa_node: "0"
  config:
    server.labels:
      host: tikv2
  arch: amd64
  os: linux
- host: 10.20.70.39
  ssh_port: 22
  port: 20160
  status_port: 20180
  deploy_dir: /ssd/tidb-deploy/tikv-20160
  data_dir: /ssd/tidb-data/tikv-20160
  log_dir: /ssd/tidb-deploy/tikv-20160/log
  numa_node: "0"
  config:
    server.labels:
      host: tikv3
  arch: amd64
  os: linux
- host: 10.20.70.24
  ssh_port: 22
  port: 20160
  status_port: 20180
  deploy_dir: /ssd/tidb-deploy/tikv-20160
  data_dir: /ssd/tidb-data/tikv-20160
  log_dir: /ssd/tidb-deploy/tikv-20160/log
  config:
    server.labels:
      host: tikv5
  arch: amd64
  os: linux
- host: 10.20.70.37
  ssh_port: 22
  port: 20160
  status_port: 20180
  deploy_dir: /ssd/tidb-deploy/tikv-20160
  data_dir: /ssd/tidb-data/tikv-20160
  log_dir: /ssd/tidb-deploy/tikv-20160/log
  numa_node: "0"
  config:
    server.labels:
      host: tikv1
  arch: amd64
  os: linux
tiflash_servers: []
pd_servers:
- host: 10.20.70.38
  ssh_port: 22
  name: pd-10.20.70.38-12379
  client_port: 12379
  peer_port: 2380
  deploy_dir: /ssd/tidb-deploy/pd-12379
  data_dir: /ssd/tidb-data/pd-12379
  log_dir: /ssd/tidb-deploy/pd-12379/log
  numa_node: "1"
  arch: amd64
  os: linux
- host: 10.20.70.39
  ssh_port: 22
  name: pd-10.20.70.39-12379
  client_port: 12379
  peer_port: 2380
  deploy_dir: /ssd/tidb-deploy/pd-12379
  data_dir: /ssd/tidb-data/pd-12379
  log_dir: /ssd/tidb-deploy/pd-12379/log
  numa_node: "1"
  arch: amd64
  os: linux
- host: 10.20.70.37
  ssh_port: 22
  name: pd-10.20.70.37-12379
  client_port: 12379
  peer_port: 2380
  deploy_dir: /ssd/tidb-deploy/pd-12379
  data_dir: /ssd/tidb-data/pd-12379
  log_dir: /ssd/tidb-deploy/pd-12379/log
  numa_node: "1"
  arch: amd64
  os: linux
cdc_servers:
- host: 10.20.70.39
  ssh_port: 22
  port: 8300
  deploy_dir: /data/tidb-deploy/cdc-8300
  data_dir: /data/tidb-data/cdc-8300
  log_dir: /data/tidb-deploy/cdc-8300/log
  gc-ttl: 1209600
  arch: amd64
  os: linux
- host: 10.20.70.38
  ssh_port: 22
  port: 8300
  deploy_dir: /data/tidb-deploy/cdc-8300
  data_dir: /data/tidb-data/cdc-8300
  log_dir: /data/tidb-deploy/cdc-8300/log
  gc-ttl: 1209600
  arch: amd64
  os: linux
- host: 10.20.70.24
  ssh_port: 22
  port: 8300
  deploy_dir: /data/tidb-deploy/cdc-8300
  data_dir: /data/tidb-data/cdc-8300
  log_dir: /data/tidb-deploy/cdc-8300/log
  gc-ttl: 1209600
  arch: amd64
  os: linux
monitoring_servers:
- host: 10.20.70.36
  ssh_port: 22
  port: 9090
  deploy_dir: /data/tidb-deploy/prometheus-9090
  data_dir: /data/tidb-data/prometheus-9090
  log_dir: /data/tidb-deploy/prometheus-9090/log
  external_alertmanagers: []
  arch: amd64
  os: linux
grafana_servers:
- host: 10.20.70.36
  ssh_port: 22
  port: 3000
  deploy_dir: /data/tidb-deploy/grafana-3000
  arch: amd64
  os: linux
  username: admin
  password: admin
  anonymous_enable: false
  root_url: ""
  domain: ""
alertmanager_servers:
- host: 10.20.70.36
  ssh_port: 22
  web_port: 9093
  cluster_port: 9094
  deploy_dir: /data/tidb-deploy/alertmanager-9093
  data_dir: /data/tidb-data/alertmanager-9093
  log_dir: /data/tidb-deploy/alertmanager-9093/log
  arch: amd64
  os: linux

withseid · 2022 年8 月 24 日 04:11

解决了，是修改 config 的时候，log.slow-threshold: 300i 多加了一个 i ，导致的