TiDB Server拉不起来

【 TiDB 使用环境】

  • 测试

【 TiDB 版本】

  • v5.1.4

【遇到的问题】

  • 3台服务器,
    3tidb + 3pd + 3kv
  • 背景:
    其中一台数据盘被清理,刚好是tiup中控机器(机器完全宕机)
  • 恢复过程:根据先用集群信息手动补充topology.yaml, 在手动deploy,未start, 发现有pd节点未启动,用pd-recovery工具恢复后,对tikv进行了扩容(2个正常节点扩容至3个),之后对tidb进行恢复,发现拉不起来,报错如下:
[2022/10/28 17:45:02.538 +08:00] [FATAL] [terror.go:276] ["unexpected error"] [error="[privilege:8049]mysql.user"] [stack="github.com/pingcap/parser/terror.MustNil\n\t/root/go/pkg/mod/github.com/pingcap/parser@v0.0.0-20210618053735-57843e8185c4/terror/terror.go:276\nmain.createStoreAndDomain\n\t/var/lib/docker/jenkins/workspace/build-common@4/go/src/github.com/pingcap/tidb/tidb-server/main.go:276\nmain.main\n\t/var/lib/docker/jenkins/workspace/build-common@4/go/src/github.com/pingcap/tidb/tidb-server/main.go:182\nruntime.main\n\t/usr/local/go1.16.4/src/runtime/proc.go:225"] [stack="github.com/pingcap/parser/terror.MustNil\n\t/root/go/pkg/mod/github.com/pingcap/parser@v0.0.0-20210618053735-57843e8185c4/terror/terror.go:276\nmain.createStoreAndDomain\n\t/var/lib/docker/jenkins/workspace/build-common@4/go/src/github.com/pingcap/tidb/tidb-server/main.go:276\nmain.main\n\t/var/lib/docker/jenkins/workspace/build-common@4/go/src/github.com/pingcap/tidb/tidb-server/main.go:182\nruntime.main\n\t/usr/local/go1.16.4/src/runtime/proc.go:225"]

各位大佬帮忙看下

好像是元数据丢了,干脆重新部署一套得了

还有其他的办法吗?

很麻烦,你要一个个的节点检查,然后每个节点看是否有损坏,有的话,只能有损恢复

可以尝试下,建议不如重装算了

我检查了region,缺副本都是正常的,有损恢复也可以接受,有文档吗

有,先恢复 PD,在检查 tikv, tidb 节点挂了,可以放弃,重新部署新的就行了

祝你成功~

1 个赞

PD我已经恢复了,TiKV也都恢复了,最后就是tidb拉不起来

在扩一个tidb 的节点试试

我缩容完,新加还是不行

啥错误? 日志看看

tidb.log日志:

[2022/10/29 14:33:41.324 +08:00] [INFO] [trackerRecorder.go:28] ["Mem Profile Tracker started"]
[2022/10/29 14:33:41.325 +08:00] [INFO] [printer.go:47] ["loaded config"] [config="{\"host\":\"0.0.0.0\",\"advertise-address\":\"10.246.177.103\",\"port\":4122,\"cors\":\"\",\"store\":\"tikv\",\"path\":\"10.246.177.103:2400,10.246.250.135:2400,10.246.177.102:2400\",\"socket\":\"\",\"lease\":\"45s\",\"run-ddl\":true,\"split-table\":true,\"token-limit\":1000,\"oom-use-tmp-storage\":true,\"tmp-storage-path\":\"/tmp/1000_tidb/MC4wLjAuMDo0MTIyLzAuMC4wLjA6MTAxMDI=/tmp-storage\",\"oom-action\":\"cancel\",\"mem-quota-query\":1073741824,\"tmp-storage-quota\":-1,\"enable-streaming\":false,\"enable-batch-dml\":false,\"lower-case-table-names\":2,\"server-version\":\"\",\"log\":{\"level\":\"info\",\"format\":\"text\",\"disable-timestamp\":null,\"enable-timestamp\":null,\"disable-error-stack\":null,\"enable-error-stack\":null,\"file\":{\"filename\":\"/home/tidb/tidb_deploy/tidb1/log/tidb.log\",\"max-size\":300,\"max-days\":0,\"max-backups\":0},\"enable-slow-log\":true,\"slow-query-file\":\"/home/tidb/tidb_deploy/tidb1/log/tidb_slow_query.log\",\"slow-threshold\":300,\"expensive-threshold\":10000,\"query-log-max-len\":4096,\"record-plan-in-slow-log\":1},\"security\":{\"skip-grant-table\":false,\"ssl-ca\":\"\",\"ssl-cert\":\"\",\"ssl-key\":\"\",\"require-secure-transport\":false,\"cluster-ssl-ca\":\"\",\"cluster-ssl-cert\":\"\",\"cluster-ssl-key\":\"\",\"cluster-verify-cn\":null,\"spilled-file-encryption-method\":\"plaintext\",\"enable-sem\":false},\"status\":{\"status-host\":\"0.0.0.0\",\"metrics-addr\":\"\",\"status-port\":10102,\"metrics-interval\":15,\"report-status\":true,\"record-db-qps\":false},\"performance\":{\"max-procs\":0,\"max-memory\":0,\"server-memory-quota\":0,\"memory-usage-alarm-ratio\":0.8,\"stats-lease\":\"3s\",\"stmt-count-limit\":5000,\"feedback-probability\":0,\"query-feedback-limit\":512,\"pseudo-estimate-ratio\":0.8,\"force-priority\":\"NO_PRIORITY\",\"bind-info-lease\":\"3s\",\"txn-entry-size-limit\":6291456,\"txn-total-size-limit\":104857600,\"tcp-keep-alive\":true,\"tcp-no-delay\":true,\"cross-join\":true,\"run-auto-analyze\":true,\"distinct-agg-push-down\":false,\"committer-concurrency\":128,\"max-txn-ttl\":3600000,\"mem-profile-interval\":\"1m\",\"index-usage-sync-lease\":\"0s\",\"gogc\":100,\"enforce-mpp\":false},\"prepared-plan-cache\":{\"enabled\":false,\"capacity\":100,\"memory-guard-ratio\":0.1},\"opentracing\":{\"enable\":false,\"rpc-metrics\":false,\"sampler\":{\"type\":\"const\",\"param\":1,\"sampling-server-url\":\"\",\"max-operations\":0,\"sampling-refresh-interval\":0},\"reporter\":{\"queue-size\":0,\"buffer-flush-interval\":0,\"log-spans\":false,\"local-agent-host-port\":\"\"}},\"proxy-protocol\":{\"networks\":\"\",\"header-timeout\":5},\"pd-client\":{\"pd-server-timeout\":3},\"tikv-client\":{\"grpc-connection-count\":4,\"grpc-keepalive-time\":10,\"grpc-keepalive-timeout\":3,\"grpc-compression-type\":\"none\",\"commit-timeout\":\"41s\",\"async-commit\":{\"keys-limit\":256,\"total-key-size-limit\":4096,\"safe-window\":2000000000,\"allowed-clock-drift\":500000000},\"max-batch-size\":128,\"overload-threshold\":200,\"max-batch-wait-time\":0,\"batch-wait-size\":8,\"enable-chunk-rpc\":true,\"region-cache-ttl\":600,\"store-limit\":0,\"store-liveness-timeout\":\"1s\",\"copr-cache\":{\"capacity-mb\":1000},\"ttl-refreshed-txn-size\":33554432},\"binlog\":{\"enable\":false,\"ignore-error\":false,\"write-timeout\":\"15s\",\"binlog-socket\":\"\",\"strategy\":\"range\"},\"compatible-kill-query\":false,\"plugin\":{\"dir\":\"\",\"load\":\"\"},\"pessimistic-txn\":{\"max-retry-count\":256,\"deadlock-history-capacity\":10},\"check-mb4-value-in-utf8\":true,\"max-index-length\":3072,\"index-limit\":64,\"table-column-count-limit\":1017,\"graceful-wait-before-shutdown\":0,\"alter-primary-key\":false,\"treat-old-version-utf8-as-utf8mb4\":true,\"enable-table-lock\":false,\"delay-clean-table-lock\":0,\"split-region-max-num\":1000,\"stmt-summary\":{\"enable\":true,\"enable-internal-query\":false,\"max-stmt-count\":3000,\"max-sql-length\":4096,\"refresh-interval\":1800,\"history-size\":24},\"repair-mode\":false,\"repair-table-list\":[],\"isolation-read\":{\"engines\":[\"tikv\",\"tiflash\",\"tidb\"]},\"max-server-connections\":0,\"new_collations_enabled_on_first_bootstrap\":false,\"experimental\":{\"allow-expression-index\":false},\"enable-collect-execution-info\":true,\"skip-register-to-dashboard\":false,\"enable-telemetry\":true,\"labels\":{},\"enable-global-index\":false,\"deprecate-integer-display-length\":false,\"enable-enum-length-limit\":true,\"stores-refresh-interval\":60,\"enable-tcp4-only\":false,\"enable-forwarding\":false}"]
[2022/10/29 14:33:41.325 +08:00] [INFO] [main.go:322] ["disable Prometheus push client"]
[2022/10/29 14:33:41.325 +08:00] [INFO] [store.go:68] ["new store"] [path=tikv://10.246.177.103:2400,10.246.250.135:2400,10.246.177.102:2400]
[2022/10/29 14:33:41.325 +08:00] [INFO] [client.go:214] ["[pd] create pd client with endpoints"] [pd-address="[10.246.177.103:2400,10.246.250.135:2400,10.246.177.102:2400]"]
[2022/10/29 14:33:41.325 +08:00] [INFO] [systime_mon.go:25] ["start system time monitor"]
[2022/10/29 14:33:41.330 +08:00] [INFO] [base_client.go:334] ["[pd] update member urls"] [old-urls="[http://10.246.177.103:2400,http://10.246.250.135:2400,http://10.246.177.102:2400]"] [new-urls="[http://10.246.177.102:2400,http://10.246.177.103:2400,http://10.246.250.135:2400]"]
[2022/10/29 14:33:41.330 +08:00] [INFO] [base_client.go:346] ["[pd] switch leader"] [new-leader=http://10.246.177.102:2400] [old-leader=]
[2022/10/29 14:33:41.330 +08:00] [INFO] [base_client.go:126] ["[pd] init cluster id"] [cluster-id=7137515427758526124]
[2022/10/29 14:33:41.330 +08:00] [INFO] [client.go:238] ["[pd] create tso dispatcher"] [dc-location=global]
[2022/10/29 14:33:41.333 +08:00] [INFO] [store.go:74] ["new store with retry success"]
[2022/10/29 14:33:41.340 +08:00] [INFO] [tidb.go:70] ["new domain"] [store=tikv-7137515427758526124] ["ddl lease"=45s] ["stats lease"=3s] ["index usage sync lease"=0s]
[2022/10/29 14:33:41.349 +08:00] [INFO] [ddl.go:342] ["[ddl] start DDL"] [ID=7d713e88-0239-40e4-a584-6277590defa0] [runWorker=true]
[2022/10/29 14:33:41.349 +08:00] [INFO] [manager.go:188] ["start campaign owner"] [ownerInfo="[ddl] /tidb/ddl/fg/owner"]
[2022/10/29 14:33:41.353 +08:00] [INFO] [ddl.go:331] ["[ddl] start delRangeManager OK"] ["is a emulator"=false]
[2022/10/29 14:33:41.353 +08:00] [INFO] [ddl_worker.go:134] ["[ddl] start DDL worker"] [worker="worker 1, tp general"]
[2022/10/29 14:33:41.354 +08:00] [INFO] [ddl_worker.go:134] ["[ddl] start DDL worker"] [worker="worker 2, tp add index"]
[2022/10/29 14:33:41.785 +08:00] [INFO] [domain.go:155] ["full load InfoSchema success"] [currentSchemaVersion=0] [neededSchemaVersion=5293] ["start time"=416.187081ms]
[2022/10/29 14:33:41.788 +08:00] [INFO] [domain.go:370] ["full load and reset schema validator"]
[2022/10/29 14:33:41.798 +08:00] [INFO] [manager.go:188] ["start campaign owner"] [ownerInfo="[bindinfo] /tidb/bindinfo/owner"]
[2022/10/29 14:33:41.798 +08:00] [WARN] [sysvar_cache.go:52] ["sysvar cache is empty, triggering rebuild"]
[2022/10/29 14:33:41.803 +08:00] [WARN] [cache.go:309] ["load mysql.user fail"] [error="[planner:1054]Unknown column 'create_role_priv' in 'field list'"]
[2022/10/29 14:33:41.803 +08:00] [FATAL] [terror.go:276] ["unexpected error"] [error="[privilege:8049]mysql.user"] [stack="github.com/pingcap/parser/terror.MustNil\n\t/root/go/pkg/mod/github.com/pingcap/parser@v0.0.0-20210618053735-57843e8185c4/terror/terror.go:276\nmain.createStoreAndDomain\n\t/var/lib/docker/jenkins/workspace/build-common@4/go/src/github.com/pingcap/tidb/tidb-server/main.go:276\nmain.main\n\t/var/lib/docker/jenkins/workspace/build-common@4/go/src/github.com/pingcap/tidb/tidb-server/main.go:182\nruntime.main\n\t/usr/local/go1.16.4/src/runtime/proc.go:225"] [stack="github.com/pingcap/parser/terror.MustNil\n\t/root/go/pkg/mod/github.com/pingcap/parser@v0.0.0-20210618053735-57843e8185c4/terror/terror.go:276\nmain.createStoreAndDomain\n\t/var/lib/docker/jenkins/workspace/build-common@4/go/src/github.com/pingcap/tidb/tidb-server/main.go:276\nmain.main\n\t/var/lib/docker/jenkins/workspace/build-common@4/go/src/github.com/pingcap/tidb/tidb-server/main.go:182\nruntime.main\n\t/usr/local/go1.16.4/src/runtime/proc.go:225"]
[2022/10/29 14:33:56.876 +08:00] [INFO] [printer.go:33] ["Welcome to TiDB."] ["Release Version"=v5.1.4] [Edition=Community] ["Git Commit Hash"=094b3e5e69d0921e2abe6907d217478bb5a7082d] ["Git Branch"=heads/refs/tags/v5.1.4] ["UTC Build Time"="2022-02-10 10:09:15"] [GoVersion=go1.16.4] ["Race Enabled"=false] ["Check Table Before Drop"=false] ["TiKV Min Version"=v3.0.0-60965b006877ca7234adaced7890d7b029ed1306]

[“load mysql.user fail”] [error=“[planner:1054]Unknown column ‘create_role_priv’ in ‘field list’”]

还是元数据丢了~ 重建吧… :joy:

数据还是蛮重要的,不然就重建了 :joy:

这是系统级的元数据缺失了,基本上没办法正常启动服务了 :joy:

:joy: 我在看看吧

你用新的 tikv 节点,估计可以启动,但是新节点上没有之前的数据,你也拿不到这些数据了

测试还是尽量多弄几台VM,或者物理机,好歹可以隔离下,其中某个节点挂了,也还能扩容恢复…

是三台物理机,每台上面都组件各有一个,刚好挂掉又是tiup中控机

三台有点少,不如做成 VM 集群,用 VM 来跑 tidb 会好点,只要 硬件配置够,网络速度够

我用了另外一种思路:

  1. 新建了一套集群
  2. PDTiKV都是正常的,就用BR做了full backuprestore新集群