TiDB一个节点重启后,整个TiDB故障

【 TiDB 使用环境】 麒麟V10 SP1 arm, 三个节点部署:3个pd、3个tikv、3个tidb
【概述】 TiDB6.5.0部署在K8S环境中,一个PD的POD故障引起TIDB无法提供服务
【背景】 一个物理节点重启
【现象】 所有微服务连接数据库失败
【问题】 TiDB无法提供服务
【业务影响】系统崩溃
【TiDB 版本】 6.5.0
【应用软件及版本】
【TiDB operator】1.4.3
【K8S】1.20.7
【附件】
/pd-server --data-dir=/var/lib/pd --name=basic-pd-2 --peer-urls=http://0.0.0.0:2380 --advertise-peer-urls=http://basic-pd-2.basic-pd-peer.my-namespace.svc:2380 --client-urls=http://0.0.0.0:2379 --advertise-client-urls=http://basic-pd-2.basic-pd-peer.my-namespace.svc:2379 --config=/etc/pd/pd.toml

[root@pf-test-2 ~]# kubectl -n my-namespace logs basic-pd-2 pd -f
Server: 10.96.0.10
Address 1: 10.96.0.10 kube-dns.kube-system.svc.cluster.local

Name: basic-pd-2.basic-pd-peer.my-namespace.svc
Address 1: 100.87.244.178 basic-pd-2.basic-pd-peer.my-namespace.svc.cluster.local
nslookup domain basic-pd-2.basic-pd-peer.my-namespace.svc.svc success
starting pd-server …
/pd-server --data-dir=/var/lib/pd --name=basic-pd-2 --peer-urls=http://0.0.0.0:2380 --advertise-peer-urls=http://basic-pd-2.basic-pd-peer.my-namespace.svc:2380 --client-urls=http://0.0.0.0:2379 --advertise-client-urls=http://basic-pd-2.basic-pd-peer.my-namespace.svc:2379 --config=/etc/pd/pd.toml --join=http://basic-pd-1.basic-pd-peer.my-namespace.svc:2380,http://basic-pd-2.basic-pd-peer.my-namespace.svc:2380,http://basic-pd-0.basic-pd-peer.my-namespace.svc:2380
[2023/07/22 07:30:03.600 +00:00] [INFO] [util.go:41] [“Welcome to Placement Driver (PD)”]
[2023/07/22 07:30:03.601 +00:00] [INFO] [util.go:42] [PD] [release-version=v6.5.0]
[2023/07/22 07:30:03.601 +00:00] [INFO] [util.go:43] [PD] [edition=Community]
[2023/07/22 07:30:03.601 +00:00] [INFO] [util.go:44] [PD] [git-hash=d1a4433c3126c77fb2d5bb5720eefa0f2e05c166]
[2023/07/22 07:30:03.601 +00:00] [INFO] [util.go:45] [PD] [git-branch=heads/refs/tags/v6.5.0]
[2023/07/22 07:30:03.601 +00:00] [INFO] [util.go:46] [PD] [utc-build-time=“2022-12-05 01:43:11”]
[2023/07/22 07:30:03.601 +00:00] [INFO] [metricutil.go:83] [“disable Prometheus push client”]
[2023/07/22 07:30:03.601 +00:00] [INFO] [server.go:247] [“PD Config”] [config=“{"client-urls":"http://0.0.0.0:2379","peer-urls":"http://0.0.0.0:2380","advertise-client-urls":"http://basic-pd-2.basic-pd-peer.my-namespace.svc:2379","advertise-peer-urls":"http://basic-pd-2.basic-pd-peer.my-namespace.svc:2380","name":"basic-pd-2","data-dir":"/var/lib/pd","force-new-cluster":false,"enable-grpc-gateway":true,"initial-cluster":"basic-pd-1=http://basic-pd-1.basic-pd-peer.my-namespace.svc:2380,basic-pd-2=http://basic-pd-2.basic-pd-peer.my-namespace.svc:2380,basic-pd-0=http://basic-pd-0.basic-pd-peer.my-namespace.svc:2380","initial-cluster-state":"existing","initial-cluster-token":"pd-cluster","join":"http://basic-pd-1.basic-pd-peer.my-namespace.svc:2380,http://basic-pd-2.basic-pd-peer.my-namespace.svc:2380,http://basic-pd-0.basic-pd-peer.my-namespace.svc:2380","lease":3,"log":{"level":"info","format":"text","disable-timestamp":false,"file":{"filename":"","max-size":0,"max-days":0,"max-backups":0},"development":false,"disable-caller":false,"disable-stacktrace":false,"disable-error-verbose":true,"sampling":null,"error-output-path":""},"tso-save-interval":"3s","tso-update-physical-interval":"50ms","enable-local-tso":false,"metric":{"job":"basic-pd-2","address":"","interval":"15s"},"schedule":{"max-snapshot-count":64,"max-pending-peer-count":64,"max-merge-region-size":20,"max-merge-region-keys":0,"split-merge-interval":"1h0m0s","swtich-witness-interval":"1h0m0s","enable-one-way-merge":"false","enable-cross-table-merge":"true","patrol-region-interval":"10ms","max-store-down-time":"30m0s","max-store-preparing-time":"48h0m0s","leader-schedule-limit":4,"leader-schedule-policy":"count","region-schedule-limit":2048,"replica-schedule-limit":64,"merge-schedule-limit":8,"hot-region-schedule-limit":4,"hot-region-cache-hits-threshold":3,"store-limit":{},"tolerant-size-ratio":0,"low-space-ratio":0.8,"high-space-ratio":0.7,"region-score-formula-version":"v2","scheduler-max-waiting-operator":5,"enable-remove-down-replica":"true","enable-replace-offline-replica":"true","enable-make-up-replica":"true","enable-remove-extra-replica":"true","enable-location-replacement":"true","enable-debug-metrics":"false","enable-joint-consensus":"true","enable-tikv-split-region":"true","schedulers-v2":[{"type":"balance-region","args":null,"disable":false,"args-payload":""},{"type":"balance-leader","args":null,"disable":false,"args-payload":""},{"type":"hot-region","args":null,"disable":false,"args-payload":""},{"type":"split-bucket","args":null,"disable":false,"args-payload":""}],"schedulers-payload":null,"store-limit-mode":"manual","hot-regions-write-interval":"10m0s","hot-regions-reserved-days":7,"enable-diagnostic":"false","enable-witness":"false"},"replication":{"max-replicas":3,"location-labels":"","strictly-match-label":"false","enable-placement-rules":"true","enable-placement-rules-cache":"false","isolation-level":""},"pd-server":{"use-region-storage":"true","max-gap-reset-ts":"24h0m0s","key-type":"table","runtime-services":"","metric-storage":"","dashboard-address":"auto","trace-region-flow":"true","flow-round-by-digit":3,"min-resolved-ts-persistence-interval":"1s"},"cluster-version":"0.0.0","labels":{},"quota-backend-bytes":"8GiB","auto-compaction-mode":"periodic","auto-compaction-retention-v2":"1h","TickInterval":"500ms","ElectionInterval":"3s","PreVote":true,"max-request-bytes":157286400,"security":{"cacert-path":"","cert-path":"","key-path":"","cert-allowed-cn":null,"SSLCABytes":null,"SSLCertBytes":null,"SSLKEYBytes":null,"redact-info-log":false,"encryption":{"data-encryption-method":"plaintext","data-key-rotation-period":"168h0m0s","master-key":{"type":"plaintext","key-id":"","region":"","endpoint":"","path":""}}},"label-property":null,"WarningMsgs":null,"DisableStrictReconfigCheck":false,"HeartbeatStreamBindInterval":"1m0s","LeaderPriorityCheckInterval":"1m0s","dashboard":{"tidb-cacert-path":"","tidb-cert-path":"","tidb-key-path":"","public-path-prefix":"","internal-proxy":false,"enable-telemetry":true,"enable-experimental":false},"replication-mode":{"replication-mode":"majority","dr-auto-sync":{"label-key":"","primary":"","dr":"","primary-replicas":0,"dr-replicas":0,"wait-store-timeout":"1m0s","pause-region-split":"false"}}}”]
[2023/07/22 07:30:03.608 +00:00] [INFO] [server.go:222] [“register REST path”] [path=/pd/api/v1]
[2023/07/22 07:30:03.608 +00:00] [INFO] [server.go:222] [“register REST path”] [path=/pd/api/v2/]
[2023/07/22 07:30:03.608 +00:00] [INFO] [server.go:222] [“register REST path”] [path=/swagger/]
[2023/07/22 07:30:03.608 +00:00] [INFO] [server.go:222] [“register REST path”] [path=/autoscaling]
[2023/07/22 07:30:03.608 +00:00] [INFO] [distro.go:51] [“Using distribution strings”] [strings={}]
[2023/07/22 07:30:03.609 +00:00] [INFO] [server.go:222] [“register REST path”] [path=/dashboard/api/]
[2023/07/22 07:30:03.609 +00:00] [INFO] [server.go:222] [“register REST path”] [path=/dashboard/]
[2023/07/22 07:30:03.610 +00:00] [INFO] [etcd.go:117] [“configuring peer listeners”] [listen-peer-urls=“[http://0.0.0.0:2380]”]
[2023/07/22 07:30:03.610 +00:00] [INFO] [systimemon.go:28] [“start system time monitor”]
[2023/07/22 07:30:03.610 +00:00] [INFO] [etcd.go:127] [“configuring client listeners”] [listen-client-urls=“[http://0.0.0.0:2379]”]
[2023/07/22 07:30:03.610 +00:00] [INFO] [etcd.go:611] [“pprof is enabled”] [path=/debug/pprof]
[2023/07/22 07:30:03.610 +00:00] [INFO] [etcd.go:305] [“starting an etcd server”] [etcd-version=3.4.21] [git-sha=“Not provided (use ./build instead of go build)”] [go-version=go1.19.3] [go-os=linux] [go-arch=arm64] [max-cpu-set=64] [max-cpu-available=64] [member-initialized=true] [name=basic-pd-2] [data-dir=/var/lib/pd] [wal-dir=] [wal-dir-dedicated=] [member-dir=/var/lib/pd/member] [force-new-cluster=false] [heartbeat-interval=500ms] [election-timeout=3s] [initial-election-tick-advance=true] [snapshot-count=100000] [snapshot-catchup-entries=5000] [initial-advertise-peer-urls=“[http://basic-pd-2.basic-pd-peer.my-namespace.svc:2380]”] [listen-peer-urls=“[http://0.0.0.0:2380]”] [advertise-client-urls=“[http://basic-pd-2.basic-pd-peer.my-namespace.svc:2379]”] [listen-client-urls=“[http://0.0.0.0:2379]”] [listen-metrics-urls=“[]”] [cors=“[]“] [host-whitelist=”[]”] [initial-cluster=] [initial-cluster-state=existing] [initial-cluster-token=] [quota-backend-bytes=8589934592] [max-request-bytes=157286400] [max-concurrent-streams=4294967295] [pre-vote=true] [initial-corrupt-check=false] [corrupt-check-time-interval=0s] [auto-compaction-mode=periodic] [auto-compaction-retention=1h0m0s] [auto-compaction-interval=1h0m0s] [discovery-url=] [discovery-proxy=]
[2023/07/22 07:30:03.610 +00:00] [WARN] [server.go:297] [“exceeded recommended request limit”] [max-request-bytes=157286400] [max-request-size=“157 MB”] [recommended-request-bytes=10485760] [recommended-request-size=“10 MB”]
2023-07-22 07:30:03.610906 W | pkg/fileutil: check file permission: directory “/var/lib/pd” exist, but the permission is “drwxr-xr-x”. The recommended permission is “-rwx------” to prevent possible unprivileged access to the data.
[2023/07/22 07:30:03.633 +00:00] [INFO] [backend.go:80] [“opened backend db”] [path=/var/lib/pd/member/snap/db] [took=22.401135ms]
[2023/07/22 07:30:04.558 +00:00] [INFO] [server.go:462] [“recovered v2 store from snapshot”] [snapshot-index=500005] [snapshot-size=“9.8 kB”]
[2023/07/22 07:30:04.559 +00:00] [INFO] [kvstore.go:388] [“restored last compact revision”] [meta-bucket-name=meta] [meta-bucket-name-key=finishedCompactRev] [restored-compact-revision=472722]
[2023/07/22 07:30:04.603 +00:00] [INFO] [server.go:480] [“recovered v3 backend from snapshot”] [backend-size-bytes=9961472] [backend-size=“10 MB”] [backend-size-in-use-bytes=6422528] [backend-size-in-use=“6.4 MB”]
[2023/07/22 07:30:04.706 +00:00] [INFO] [raft.go:586] [“restarting local member”] [cluster-id=95d1ea70524eb4dc] [local-member-id=97f207d1729a6d18] [commit-index=500214]
[2023/07/22 07:30:04.706 +00:00] [INFO] [raft.go:1523] [“97f207d1729a6d18 switched to configuration voters=(6347619308011026086 10948822240243379480 12637012018295520321)”]
[2023/07/22 07:30:04.706 +00:00] [INFO] [raft.go:706] [“97f207d1729a6d18 became follower at term 4”]
[2023/07/22 07:30:04.706 +00:00] [INFO] [raft.go:389] [“newRaft 97f207d1729a6d18 [peers: [58174621276876a6,97f207d1729a6d18,af5faf7a14d8bc41], term: 4, commit: 500214, applied: 500005, lastindex: 500215, lastterm: 4]”]
[2023/07/22 07:30:04.707 +00:00] [INFO] [capability.go:76] [“enabled capabilities for version”] [cluster-version=3.4]
[2023/07/22 07:30:04.707 +00:00] [INFO] [cluster.go:256] [“recovered/added member from store”] [cluster-id=95d1ea70524eb4dc] [local-member-id=97f207d1729a6d18] [recovered-remote-peer-id=58174621276876a6] [recovered-remote-peer-urls=“[http://basic-pd-1.basic-pd-peer.my-namespace.svc:2380]”]
[2023/07/22 07:30:04.707 +00:00] [INFO] [cluster.go:256] [“recovered/added member from store”] [cluster-id=95d1ea70524eb4dc] [local-member-id=97f207d1729a6d18] [recovered-remote-peer-id=97f207d1729a6d18] [recovered-remote-peer-urls=“[http://basic-pd-2.basic-pd-peer.my-namespace.svc:2380]”]
[2023/07/22 07:30:04.707 +00:00] [INFO] [cluster.go:256] [“recovered/added member from store”] [cluster-id=95d1ea70524eb4dc] [local-member-id=97f207d1729a6d18] [recovered-remote-peer-id=af5faf7a14d8bc41] [recovered-remote-peer-urls=“[http://basic-pd-0.basic-pd-peer.my-namespace.svc:2380]”]
[2023/07/22 07:30:04.707 +00:00] [INFO] [cluster.go:269] [“set cluster version from store”] [cluster-version=3.4]
[2023/07/22 07:30:04.707 +00:00] [WARN] [store.go:1379] [“simple token is not cryptographically signed”]
[2023/07/22 07:30:04.708 +00:00] [INFO] [kvstore.go:388] [“restored last compact revision”] [meta-bucket-name=meta] [meta-bucket-name-key=finishedCompactRev] [restored-compact-revision=472722]
[2023/07/22 07:30:04.752 +00:00] [INFO] [quota.go:126] [“enabled backend quota”] [quota-name=v3-applier] [quota-size-bytes=8589934592] [quota-size=“8.6 GB”]
[2023/07/22 07:30:04.752 +00:00] [INFO] [peer.go:128] [“starting remote peer”] [remote-peer-id=58174621276876a6]
[2023/07/22 07:30:04.752 +00:00] [INFO] [pipeline.go:71] [“started HTTP pipelining with remote peer”] [local-member-id=97f207d1729a6d18] [remote-peer-id=58174621276876a6]
[2023/07/22 07:30:04.753 +00:00] [INFO] [stream.go:166] [“started stream writer with remote peer”] [local-member-id=97f207d1729a6d18] [remote-peer-id=58174621276876a6]
[2023/07/22 07:30:04.753 +00:00] [INFO] [stream.go:166] [“started stream writer with remote peer”] [local-member-id=97f207d1729a6d18] [remote-peer-id=58174621276876a6]
[2023/07/22 07:30:04.754 +00:00] [INFO] [peer.go:134] [“started remote peer”] [remote-peer-id=58174621276876a6]
[2023/07/22 07:30:04.754 +00:00] [INFO] [transport.go:327] [“added remote peer”] [local-member-id=97f207d1729a6d18] [remote-peer-id=58174621276876a6] [remote-peer-urls=“[http://basic-pd-1.basic-pd-peer.my-namespace.svc:2380]”]
[2023/07/22 07:30:04.754 +00:00] [INFO] [stream.go:406] [“started stream reader with remote peer”] [stream-reader-type=“stream MsgApp v2”] [local-member-id=97f207d1729a6d18] [remote-peer-id=58174621276876a6]
[2023/07/22 07:30:04.754 +00:00] [INFO] [stream.go:406] [“started stream reader with remote peer”] [stream-reader-type=“stream Message”] [local-member-id=97f207d1729a6d18] [remote-peer-id=58174621276876a6]
[2023/07/22 07:30:04.754 +00:00] [INFO] [peer.go:128] [“starting remote peer”] [remote-peer-id=af5faf7a14d8bc41]
[2023/07/22 07:30:04.754 +00:00] [INFO] [pipeline.go:71] [“started HTTP pipelining with remote peer”] [local-member-id=97f207d1729a6d18] [remote-peer-id=af5faf7a14d8bc41]
[2023/07/22 07:30:04.754 +00:00] [INFO] [stream.go:166] [“started stream writer with remote peer”] [local-member-id=97f207d1729a6d18] [remote-peer-id=af5faf7a14d8bc41]
[2023/07/22 07:30:04.754 +00:00] [INFO] [stream.go:166] [“started stream writer with remote peer”] [local-member-id=97f207d1729a6d18] [remote-peer-id=af5faf7a14d8bc41]
[2023/07/22 07:30:04.755 +00:00] [INFO] [peer.go:134] [“started remote peer”] [remote-peer-id=af5faf7a14d8bc41]
[2023/07/22 07:30:04.755 +00:00] [INFO] [stream.go:406] [“started stream reader with remote peer”] [stream-reader-type=“stream MsgApp v2”] [local-member-id=97f207d1729a6d18] [remote-peer-id=af5faf7a14d8bc41]
[2023/07/22 07:30:04.755 +00:00] [INFO] [transport.go:327] [“added remote peer”] [local-member-id=97f207d1729a6d18] [remote-peer-id=af5faf7a14d8bc41] [remote-peer-urls=“[http://basic-pd-0.basic-pd-peer.my-namespace.svc:2380]”]
[2023/07/22 07:30:04.755 +00:00] [INFO] [server.go:803] [“starting etcd server”] [local-member-id=97f207d1729a6d18] [local-server-version=3.4.21] [cluster-id=95d1ea70524eb4dc] [cluster-version=3.4]
[2023/07/22 07:30:04.755 +00:00] [INFO] [stream.go:406] [“started stream reader with remote peer”] [stream-reader-type=“stream Message”] [local-member-id=97f207d1729a6d18] [remote-peer-id=af5faf7a14d8bc41]
[2023/07/22 07:30:04.755 +00:00] [INFO] [server.go:704] [“starting initial election tick advance”] [election-ticks=6]
[2023/07/22 07:30:04.757 +00:00] [WARN] [server.go:1078] [“server error”] [error=“the member has been permanently removed from the cluster”]
[2023/07/22 07:30:04.757 +00:00] [WARN] [server.go:1079] [“data-dir used by this member must be removed”]
[2023/07/22 07:30:04.757 +00:00] [WARN] [server.go:2084] [“stopped publish because server is stopped”] [local-member-id=97f207d1729a6d18] [local-member-attributes=“{Name:basic-pd-2 ClientURLs:[http://basic-pd-2.basic-pd-peer.my-namespace.svc:2379]}”] [publish-timeout=11s] [error=“etcdserver: server stopped”]
[2023/07/22 07:30:04.758 +00:00] [INFO] [peer.go:333] [“stopping remote peer”] [remote-peer-id=58174621276876a6]
[2023/07/22 07:30:04.758 +00:00] [WARN] [stream.go:301] [“stopped TCP streaming connection with remote peer”] [stream-writer-type=“unknown stream”] [remote-peer-id=58174621276876a6]
[2023/07/22 07:30:04.758 +00:00] [WARN] [stream.go:301] [“stopped TCP streaming connection with remote peer”] [stream-writer-type=“unknown stream”] [remote-peer-id=58174621276876a6]
[2023/07/22 07:30:04.758 +00:00] [INFO] [pipeline.go:86] [“stopped HTTP pipelining with remote peer”] [local-member-id=97f207d1729a6d18] [remote-peer-id=58174621276876a6]
[2023/07/22 07:30:04.758 +00:00] [INFO] [stream.go:459] [“stopped stream reader with remote peer”] [stream-reader-type=“stream MsgApp v2”] [local-member-id=97f207d1729a6d18] [remote-peer-id=58174621276876a6]
[2023/07/22 07:30:04.758 +00:00] [INFO] [stream.go:459] [“stopped stream reader with remote peer”] [stream-reader-type=“stream Message”] [local-member-id=97f207d1729a6d18] [remote-peer-id=58174621276876a6]
[2023/07/22 07:30:04.758 +00:00] [INFO] [peer.go:340] [“stopped remote peer”] [remote-peer-id=58174621276876a6]
[2023/07/22 07:30:04.758 +00:00] [INFO] [peer.go:333] [“stopping remote peer”] [remote-peer-id=af5faf7a14d8bc41]
[2023/07/22 07:30:04.758 +00:00] [WARN] [stream.go:301] [“stopped TCP streaming connection with remote peer”] [stream-writer-type=“unknown stream”] [remote-peer-id=af5faf7a14d8bc41]
[2023/07/22 07:30:04.758 +00:00] [WARN] [stream.go:301] [“stopped TCP streaming connection with remote peer”] [stream-writer-type=“unknown stream”] [remote-peer-id=af5faf7a14d8bc41]
[2023/07/22 07:30:04.758 +00:00] [INFO] [pipeline.go:86] [“stopped HTTP pipelining with remote peer”] [local-member-id=97f207d1729a6d18] [remote-peer-id=af5faf7a14d8bc41]
[2023/07/22 07:30:04.758 +00:00] [INFO] [stream.go:459] [“stopped stream reader with remote peer”] [stream-reader-type=“stream MsgApp v2”] [local-member-id=97f207d1729a6d18] [remote-peer-id=af5faf7a14d8bc41]
[2023/07/22 07:30:04.758 +00:00] [INFO] [etcd.go:585] [“serving peer traffic”] [address=“[::]:2380”]
[2023/07/22 07:30:04.758 +00:00] [INFO] [stream.go:459] [“stopped stream reader with remote peer”] [stream-reader-type=“stream Message”] [local-member-id=97f207d1729a6d18] [remote-peer-id=af5faf7a14d8bc41]
[2023/07/22 07:30:04.758 +00:00] [INFO] [peer.go:340] [“stopped remote peer”] [remote-peer-id=af5faf7a14d8bc41]
[2023/07/22 07:30:04.758 +00:00] [INFO] [etcd.go:247] [“now serving peer/client/metrics”] [local-member-id=97f207d1729a6d18] [initial-advertise-peer-urls=“[http://basic-pd-2.basic-pd-peer.my-namespace.svc:2380]”] [listen-peer-urls=“[http://0.0.0.0:2380]”] [advertise-client-urls=“[http://basic-pd-2.basic-pd-peer.my-namespace.svc:2379]”] [listen-client-urls=“[http://0.0.0.0:2379]”] [listen-metrics-urls=“[]”]
[2023/07/22 07:35:03.610 +00:00] [FATAL] [main.go:117] [“run server failed”] [error=“[PD:server:ErrCancelStartEtcd]etcd start canceled”] [stack=“main.main\n\t/var/lib/docker/jenkins/workspace/build-common/go/src/github.com/pingcap/pd/cmd/pd-server/main.go:117\nruntime.main\n\t/usr/local/go1.19.3/src/runtime/proc.go:250”]

只有一个pd异常,还是所有的都异常?

只有一个PD异常

根据提供的日志,TiDB PD 服务器似乎遇到了问题并且无法正常恢复。 服务器报告某个成员已从集群中永久删除,并且需要删除该成员使用的数据目录。 这表明 PD 集群由于成员丢失而无法保持共识,这很可能导致 TiDB 集群不可用。

可以采取以下一些步骤来调查和解决问题:

检查 PD 集群的状态:验证 PD 集群的状态以查看所有成员是否正在运行且健康。 可以使用以下命令获取PD集群的状态:

kubectl -n my-namespace exec basic-pd-2 – pd-ctl -u http://127.0.0.1:2379 store

检查日志:检查所有 PD pod(basic-pd-0、basic-pd-1 和 basic-pd-2)的日志,以查找有关错误或故障的更详细信息。 可以使用以下命令查看日志:

kubectl -n my-namespace logs basic-pd-0 pd
kubectl -n my-namespace logs basic-pd-1 pd
kubectl -n my-namespace logs basic-pd-2 pd

调查节点重新启动:查看与重新启动的物理节点相关的事件和日志。 检查是否有任何硬件问题的迹象,或者该节点上的 PD Pod 在重新启动期间是否遇到任何错误。

检查 K8S 事件:使用以下命令查看与 PD pod 和 TiDB Operator 相关的事件:

kubectl -n my-namespace get events --sort-by=‘.metadata.creationTimestamp’

处理缺失成员:如果确实存在缺失成员且无法自动恢复,则可能需要手动从 PD 集群中删除不可用成员并恢复集群仲裁。 此过程涉及删除不可用的成员并使用剩余的健康成员启动新集群。 但是,应谨慎执行此操作,并且在继续操作之前备份数据非常重要。

验证配置:确保PD集群的配置正确,并且所有PD实例都指向彼此正确的端点。 请注意 join 配置参数并确保它指向正确的 URL。

监控硬件资源:检查托管 PD Pod 的物理节点上的硬件资源(CPU、内存、磁盘等),并验证是否存在可能导致故障的任何与资源相关的问题。

踢了,重加

这个是另外一个节点的pd的日志。
现在环境已经被破坏,我们通过删除故障pd的数据,并重启恢复。
但作为一个集群,肯定要在任意节点故障时,保持高可靠运行,这个是最基本要求。因此需要继续分析原因。
lgg_pd_log (5.1 MB)