tidb节点无法启动

为提高效率,请提供以下信息,问题描述清晰能够更快得到解决:

【概述】场景+问题概述

环境 4.0.0-beta2
单db节点,无法启动 Down
3 kv节点 有一个 Disconnected

【背景】做过哪些操作

数据库本来有ansiable管理后导入tiup 部分目录有问题。

【现象】业务和数据库现象

【业务影响】

【TiDB 版本】
4.0.0-beta2
【附件】

  1. TiUP Cluster Display 信息

Cluster type: tidb
Cluster name: test-cluster
Cluster version: v4.0.0-beta.2
SSH type: builtin
ID Role Host Ports OS/Arch Status Data Dir Deploy Dir


10.10.24.61:9093 alertmanager 10.10.24.61 9093/9094 linux/x86_64 inactive /data/tidb/deploy/data.alertmanager /data/tidb/deploy
10.10.24.65:8249 drainer 10.10.24.65 8249 linux/x86_64 Down /data/tidb/data/drainer-8249 /data/tidb/deploy/drainer-8249
10.10.24.61:3000 grafana 10.10.24.61 3000 linux/x86_64 inactive - /data/tidb/deploy
10.10.24.67:2379 pd 10.10.24.67 2379/2380 linux/x86_64 Up /data/tidb/deploy/data.pd /data/tidb/deploy
10.10.24.68:2379 pd 10.10.24.68 2379/2380 linux/x86_64 Up /data/tidb/deploy/data.pd /data/tidb/deploy
10.10.24.61:9090 prometheus 10.10.24.61 9090 linux/x86_64 inactive /data/tidb/deploy/prometheus2.0.0.data.metrics /data/tidb/deploy
10.10.24.65:8250 pump 10.10.24.65 8250 linux/x86_64 Up /data/tidb/data/pump-8250 /data/tidb/deploy/pump-8250
10.10.24.61:4000 tidb 10.10.24.61 4000/10080 linux/x86_64 Down - /data/deploy
10.10.24.61:20160 tikv 10.10.24.61 20160/20180 linux/x86_64 Up /data/deploy/data /data/deploy
10.10.24.62:20160 tikv 10.10.24.62 20160/20180 linux/x86_64 Up /data/deploy/data /data/deploy
10.10.24.63:20160 tikv 10.10.24.63 20160/20180 linux/x86_64 Disconnected /data/deploy/data /data/deploy

  1. TiUP Cluster Edit Config 信息

  2. TiDB- Overview 监控

  • 对应模块日志(包含问题前后1小时日志)
    循环启动并报错

[2021/07/08 16:23:14.241 +08:00] [INFO] [printer.go:41] [“Welcome to TiDB.”] [“Release Version”=v4.0.0-beta-446-g5268094af] [“Git Commit Hash”=5268094afe05c7efef0d91d2deeec428cc85abe6] [“Git Branch”=master] [“UTC Build Time”=“2020-03-17 02:22:07”] [GoVersion=go1.13] [“Race Enabled”=false] [“Check Table Before Drop”=false] [“TiKV Min Version”=v3.0.0-60965b006877ca7234adaced7890d7b029ed1306]
[2021/07/08 16:23:14.243 +08:00] [INFO] [printer.go:54] [“loaded config”] [config="{“host”:“0.0.0.0”,“advertise-address”:“10.10.24.61”,“port”:4000,“cors”:"",“store”:“tikv”,“path”:“10.10.24.67:2379,10.10.24.68:2379”,“socket”:"",“lease”:“45s”,“run-ddl”:true,“split-table”:true,“token-limit”:1000,“oom-use-tmp-storage”:true,“tmp-storage-path”:"/tmp/tidb/tmp-storage",“oom-action”:“log”,“mem-quota-query”:1073741824,“enable-streaming”:false,“enable-batch-dml”:false,“txn-local-latches”:{“enabled”:false,“capacity”:2048000},“lower-case-table-names”:2,“server-version”:"",“log”:{“level”:“info”,“format”:“text”,“disable-timestamp”:null,“enable-timestamp”:null,“disable-error-stack”:null,“enable-error-stack”:null,“file”:{“filename”:"/data/deploy/log/tidb.log",“max-size”:300,“max-days”:0,“max-backups”:0},“enable-slow-log”:true,“slow-query-file”:“log/tidb_slow_query.log”,“slow-threshold”:300,“expensive-threshold”:10000,“query-log-max-len”:4096,“record-plan-in-slow-log”:1},“security”:{“skip-grant-table”:false,“ssl-ca”:"",“ssl-cert”:"",“ssl-key”:"",“require-secure-transport”:false,“cluster-ssl-ca”:"",“cluster-ssl-cert”:"",“cluster-ssl-key”:"",“cluster-verify-cn”:null},“status”:{“status-host”:“0.0.0.0”,“metrics-addr”:"",“status-port”:10080,“metrics-interval”:15,“report-status”:true,“record-db-qps”:false},“performance”:{“max-procs”:0,“max-memory”:0,“stats-lease”:“3s”,“stmt-count-limit”:5000,“feedback-probability”:0.05,“query-feedback-limit”:1024,“pseudo-estimate-ratio”:0.8,“force-priority”:“NO_PRIORITY”,“bind-info-lease”:“3s”,“txn-total-size-limit”:104857600,“tcp-keep-alive”:true,“cross-join”:true,“run-auto-analyze”:true},“prepared-plan-cache”:{“enabled”:false,“capacity”:100,“memory-guard-ratio”:0.1},“opentracing”:{“enable”:false,“rpc-metrics”:false,“sampler”:{“type”:“const”,“param”:1,“sampling-server-url”:"",“max-operations”:0,“sampling-refresh-interval”:0},“reporter”:{“queue-size”:0,“buffer-flush-interval”:0,“log-spans”:false,“local-agent-host-port”:""}},“proxy-protocol”:{“networks”:"",“header-timeout”:5},“tikv-client”:{“grpc-connection-count”:4,“grpc-keepalive-time”:10,“grpc-keepalive-timeout”:3,“commit-timeout”:“41s”,“max-batch-size”:128,“overload-threshold”:200,“max-batch-wait-time”:0,“batch-wait-size”:8,“enable-chunk-rpc”:true,“region-cache-ttl”:600,“store-limit”:0,“copr-cache”:{“enabled”:false,“capacity-mb”:0,“admission-max-result-mb”:0,“admission-min-process-ms”:0}},“binlog”:{“enable”:true,“ignore-error”:true,“write-timeout”:“15s”,“binlog-socket”:"",“strategy”:“range”},“compatible-kill-query”:false,“plugin”:{“dir”:"",“load”:""},“pessimistic-txn”:{“enable”:true,“max-retry-count”:256},“check-mb4-value-in-utf8”:true,“max-index-length”:3072,“alter-primary-key”:false,“treat-old-version-utf8-as-utf8mb4”:true,“enable-table-lock”:false,“delay-clean-table-lock”:0,“split-region-max-num”:1000,“stmt-summary”:{“enable”:true,“max-stmt-count”:200,“max-sql-length”:4096,“refresh-interval”:1800,“history-size”:24},“repair-mode”:false,“repair-table-list”:[],“isolation-read”:{“engines”:[“tikv”,“tiflash”,“tidb”]},“max-server-connections”:4096,“new_collations_enabled_on_first_bootstrap”:false,“experimental”:{“allow-auto-random”:false},“enable-dynamic-config”:true}"]
[2021/07/08 16:23:14.243 +08:00] [INFO] [client.go:135] ["[pd] create pd client with endpoints"] [pd-address="[10.10.24.67:2379,10.10.24.68:2379]"]
[2021/07/08 16:23:14.245 +08:00] [INFO] [base_client.go:226] ["[pd] update member urls"] [old-urls="[http://10.10.24.67:2379,http://10.10.24.68:2379]"] [new-urls="[http://10.10.24.66:2379,http://10.10.24.67:2379,http://10.10.24.68:2379]"]
[2021/07/08 16:23:14.245 +08:00] [INFO] [base_client.go:242] ["[pd] switch leader"] [new-leader=http://10.10.24.66:2379] [old-leader=]
[2021/07/08 16:23:14.245 +08:00] [INFO] [base_client.go:92] ["[pd] init cluster id"] [cluster-id=6830592625788170975]
[2021/07/08 16:23:14.248 +08:00] [INFO] [main.go:271] [tidb-server] [“create pumps client success, ignore binlog error”=true]
[2021/07/08 16:23:14.248 +08:00] [INFO] [main.go:280] [“disable Prometheus push client”]
[2021/07/08 16:23:14.248 +08:00] [INFO] [store.go:68] [“new store”] [path=tikv://10.10.24.67:2379,10.10.24.68:2379]
[2021/07/08 16:23:14.248 +08:00] [INFO] [client.go:135] ["[pd] create pd client with endpoints"] [pd-address="[10.10.24.67:2379,10.10.24.68:2379]"]
[2021/07/08 16:23:14.248 +08:00] [INFO] [systime_mon.go:25] [“start system time monitor”]
[2021/07/08 16:23:14.250 +08:00] [INFO] [base_client.go:226] ["[pd] update member urls"] [old-urls="[http://10.10.24.67:2379,http://10.10.24.68:2379]"] [new-urls="[http://10.10.24.66:2379,http://10.10.24.67:2379,http://10.10.24.68:2379]"]
[2021/07/08 16:23:14.250 +08:00] [INFO] [base_client.go:242] ["[pd] switch leader"] [new-leader=http://10.10.24.66:2379] [old-leader=]
[2021/07/08 16:23:14.250 +08:00] [INFO] [base_client.go:92] ["[pd] init cluster id"] [cluster-id=6830592625788170975]
[2021/07/08 16:23:14.251 +08:00] [INFO] [store.go:74] [“new store with retry success”]
[2021/07/08 16:23:14.252 +08:00] [INFO] [config_handler.go:212] [“PDConfHandler updates config successfully”] [new_version=] [accepted_conf_items="[]"] [rejected_conf_items="[Path,Log.SlowQueryFile,Binlog.Enable,Binlog.IgnoreError]"]
[2021/07/08 16:23:14.252 +08:00] [INFO] [config_handler.go:178] [“PDConfHandler register successfully”]
[2021/07/08 16:23:14.253 +08:00] [WARN] [config_handler.go:189] [“write config to disk error”] [error=“rename /tmp/tmp_conf_20210708162314.toml conf/tidb.toml: invalid cross-device link”]
[2021/07/08 16:23:34.315 +08:00] [WARN] [backoff.go:309] [“regionMiss backoffer.maxSleep 20000ms is exceeded, errors:\ message:“region 529285 is missing” region_not_found:<region_id:529285 > at 2021-07-08T16:23:33.313710647+08:00\ message:“region 529285 is missing” region_not_found:<region_id:529285 > at 2021-07-08T16:23:33.814861389+08:00\ message:“region 529285 is missing” region_not_found:<region_id:529285 > at 2021-07-08T16:23:34.315972889+08:00”]
[2021/07/08 16:23:34.316 +08:00] [FATAL] [session.go:1835] [“check bootstrapped failed”] [error="[tikv:9005]Region is unavailable"] [stack=“github.com/pingcap/tidb/session.getStoreBootstrapVersion\ \t/home/jenkins/agent/workspace/tidb_master/go/src/github.com/pingcap/tidb/session/session.go:1835\ github.com/pingcap/tidb/session.BootstrapSession\ \t/home/jenkins/agent/workspace/tidb_master/go/src/github.com/pingcap/tidb/session/session.go:1642\ main.createStoreAndDomain\ \t/home/jenkins/agent/workspace/tidb_master/go/src/github.com/pingcap/tidb/tidb-server/main.go:234\ main.main\ \t/home/jenkins/agent/workspace/tidb_master/go/src/github.com/pingcap/tidb/tidb-server/main.go:171\ runtime.main\ \t/usr/local/go/src/runtime/proc.go:203”]


若提问为性能优化、故障排查类问题,请下载脚本运行。终端输出的打印结果,请务必全选并复制粘贴上传。

日志报错看是 TiDB 加载初始化数据报错了,报错可能得存在原因是因为网络问题,导致数据无法同步。可以分别确认一下日志报错有哪些。另外 ansible 导入到 tiup 以后,不会影响到集群节点之间的连接,需要确认一下端口启动配置文件是否呗修改。需要看一下对应的日志。

1 个赞