K8S中的集群中的 TiKV无法自动下线。

为提高效率,提问时请提供以下信息,问题描述清晰可优先响应。

  • 【TiDB 版本】:
  • 【问题描述】:

整个集群的配置如下

[root@dcn-tidb-k8s-p-l-11:/home/appdeploy/test-tidb1]#kubectl get tc -n test-namespace1
NAME     READY   PD                                        STORAGE   READY   DESIRE   TIKV                                        STORAGE   READY   DESIRE   TIDB                                        READY   DESIRE   AGE
basic1   False   harbor.fcbox.com/tidb/pingcap/pd:v4.0.6   1Gi       3       3        harbor.fcbox.com/tidb/pingcap/tikv:v4.0.6   1Gi       4       3        harbor.fcbox.com/tidb/pingcap/tidb:v4.0.6   3       3        104d

部署如下,有两个TiKV有故障无法自动下线释放。

[root@dcn-tidb-k8s-p-l-11:/home/appdeploy/test-tidb1]#kubectl get po -n test-namespace1
NAME                                READY   STATUS             RESTARTS   AGE
basic1-discovery-56bd576c8b-v8hk9   1/1     Running            0          96d
basic1-pd-0                         1/1     Running            0          22h
basic1-pd-1                         1/1     Running            0          22h
basic1-pd-2                         1/1     Running            0          22h
basic1-pump-0                       1/1     Running            0          22h
basic1-pump-1                       1/1     Running            0          22h
basic1-pump-2                       1/1     Running            2          22h
basic1-tidb-0                       2/2     Running            0          22h
basic1-tidb-1                       2/2     Running            0          22h
basic1-tidb-2                       2/2     Running            2          22h
basic1-tikv-0                       1/1     Running            0          22h
basic1-tikv-1                       0/1     CrashLoopBackOff   16         22h
basic1-tikv-2                       0/1     CrashLoopBackOff   244        22h
basic1-tikv-3                       1/1     Running            0          19h
basic1-tikv-4                       1/1     Running            0          26m
meta1-monitor-776d4fbf4c-mz42j      3/3     Running            0          95d

报的错误应该是磁盘挂载有问题,共用了磁盘导致。

[root@dcn-tidb-k8s-p-l-11:/home/appdeploy/test-tidb1]#kubectl logs basic1-tikv-1 -n test-namespace1
starting tikv-server ...
/tikv-server --pd=http://basic1-pd:2379 --advertise-addr=basic1-tikv-1.basic1-tikv-peer.test-namespace1.svc:20160 --addr=0.0.0.0:20160 --status-addr=0.0.0.0:20180 --data-dir=/var/lib/tikv --capacity=0 --config=/etc/tikv/tikv.toml

[2020/12/30 16:30:51.027 +08:00] [INFO] [lib.rs:92] ["Welcome to TiKV"]
[2020/12/30 16:30:51.028 +08:00] [INFO] [lib.rs:94] []
[2020/12/30 16:30:51.028 +08:00] [INFO] [lib.rs:94] ["Release Version:   4.0.6"]
[2020/12/30 16:30:51.028 +08:00] [INFO] [lib.rs:94] ["Edition:           Community"]
[2020/12/30 16:30:51.028 +08:00] [INFO] [lib.rs:94] ["Git Commit Hash:   ca2475bfbcb49a7c34cf783596acb3edd05fc88f"]
[2020/12/30 16:30:51.028 +08:00] [INFO] [lib.rs:94] ["Git Commit Branch: release-4.0"]
[2020/12/30 16:30:51.028 +08:00] [INFO] [lib.rs:94] ["UTC Build Time:    2020-09-15 10:51:45"]
[2020/12/30 16:30:51.028 +08:00] [INFO] [lib.rs:94] ["Rust Version:      rustc 1.42.0-nightly (0de96d37f 2019-12-19)"]
[2020/12/30 16:30:51.028 +08:00] [INFO] [lib.rs:94] ["Enable Features:   jemalloc portable sse protobuf-codec"]
[2020/12/30 16:30:51.028 +08:00] [INFO] [lib.rs:94] ["Profile:           dist_release"]
[2020/12/30 16:30:51.040 +08:00] [INFO] [mod.rs:46] ["memory limit in bytes: 269966053376, cpu cores quota: 64"]
[2020/12/30 16:30:51.041 +08:00] [INFO] [config.rs:565] ["kernel parameters"] [value=0] [param=vm.swappiness]
[2020/12/30 16:30:51.041 +08:00] [WARN] [server.rs:852] ["check: kernel"] [err="kernel parameters net.core.somaxconn got 128, expect 32768"]
[2020/12/30 16:30:51.042 +08:00] [WARN] [server.rs:852] ["check: kernel"] [err="kernel parameters net.ipv4.tcp_syncookies got 1, expect 0"]
[2020/12/30 16:30:51.044 +08:00] [INFO] [util.rs:419] ["connecting to PD endpoint"] [endpoints=http://basic1-pd:2379]
[2020/12/30 16:30:51.044 +08:00] [INFO] [<unknown>] ["Disabling AF_INET6 sockets because ::1 is not available."]
[2020/12/30 16:30:51.046 +08:00] [INFO] [<unknown>] ["New connected subchannel at 0x7f235ce182a0 for subchannel 0x7f235ce39000"]
[2020/12/30 16:30:51.047 +08:00] [INFO] [util.rs:419] ["connecting to PD endpoint"] [endpoints=http://basic1-pd-0.basic1-pd-peer.test-namespace1.svc:2379]
[2020/12/30 16:30:51.049 +08:00] [INFO] [<unknown>] ["New connected subchannel at 0x7f235ce18450 for subchannel 0x7f235ce391c0"]
[2020/12/30 16:30:51.050 +08:00] [INFO] [util.rs:419] ["connecting to PD endpoint"] [endpoints=http://basic1-pd-2.basic1-pd-peer.test-namespace1.svc:2379]
[2020/12/30 16:30:51.052 +08:00] [INFO] [<unknown>] ["New connected subchannel at 0x7f235ce18600 for subchannel 0x7f235ce39380"]
[2020/12/30 16:30:51.053 +08:00] [INFO] [util.rs:484] ["connected to PD leader"] [endpoints=http://basic1-pd-2.basic1-pd-peer.test-namespace1.svc:2379]
[2020/12/30 16:30:51.053 +08:00] [INFO] [util.rs:407] ["all PD endpoints are consistent"] [endpoints="[\"http://basic1-pd:2379\"]"]
[2020/12/30 16:30:51.054 +08:00] [INFO] [server.rs:242] ["connect to PD cluster"] [cluster_id=6873013461546630258]
[2020/12/30 16:30:51.054 +08:00] [INFO] [config.rs:1674] ["readpool.storage.use-unified-pool is not set, set to false by default"]
[2020/12/30 16:30:51.054 +08:00] [INFO] [config.rs:1717] ["readpool.coprocessor.use-unified-pool is not set, set to true by default"]
[2020/12/30 16:30:51.055 +08:00] [INFO] [config.rs:211] ["no advertise-status-addr is specified, falling back to status-addr"] [status-addr=0.0.0.0:20180]
[2020/12/30 16:30:51.296 +08:00] [INFO] [server.rs:860] ["beginning system configuration check"]
[2020/12/30 16:30:51.296 +08:00] [INFO] [config.rs:745] ["data dir"] [mount_fs="FsInfo { tp: \"ext4\", opts: \"rw,noatime,nodelalloc,stripe=64,data=ordered\", mnt_dir: \"/var/lib/tikv\", fsname: \"/dev/sdd\" }"] [data_path=/var/lib/tikv]
[2020/12/30 16:30:51.296 +08:00] [WARN] [config.rs:748] ["not on SSD device"] [data_path=/var/lib/tikv]
[2020/12/30 16:30:51.296 +08:00] [INFO] [config.rs:745] ["data dir"] [mount_fs="FsInfo { tp: \"ext4\", opts: \"rw,noatime,nodelalloc,stripe=64,data=ordered\", mnt_dir: \"/var/lib/tikv\", fsname: \"/dev/sdd\" }"] [data_path=/var/lib/tikv/raft]
[2020/12/30 16:30:51.296 +08:00] [WARN] [config.rs:748] ["not on SSD device"] [data_path=/var/lib/tikv/raft]
[2020/12/30 16:30:51.297 +08:00] [INFO] [server.rs:212] ["using config"] [config="{\"log-level\":\"info\",\"log-file\":\"\",\"slow-log-file\":\"\",\"slow-log-threshold\":\"1s\",\"log-rotation-timespan\":\"1d\",\"log-rotation-size\":\"300MiB\",\"panic-when-unexpected-key-or-data\":false,\"readpool\":{\"unified\":{\"min-thread-count\":1,\"max-thread-count\":51,\"stack-size\":\"10MiB\",\"max-tasks-per-worker\":2000},\"storage\":{\"use-unified-pool\":false,\"high-concurrency\":8,\"normal-concurrency\":8,\"low-concurrency\":8,\"max-tasks-per-worker-high\":2000,\"max-tasks-per-worker-normal\":2000,\"max-tasks-per-worker-low\":2000,\"stack-size\":\"10MiB\"},\"coprocessor\":{\"use-unified-pool\":true,\"high-concurrency\":51,\"normal-concurrency\":51,\"low-concurrency\":51,\"max-tasks-per-worker-high\":2000,\"max-tasks-per-worker-normal\":2000,\"max-tasks-per-worker-low\":2000,\"stack-size\":\"10MiB\"}},\"server\":{\"addr\":\"0.0.0.0:20160\",\"advertise-addr\":\"basic1-tikv-1.basic1-tikv-peer.test-namespace1.svc:20160\",\"status-addr\":\"0.0.0.0:20180\",\"advertise-status-addr\":\"0.0.0.0:20180\",\"status-thread-pool-size\":1,\"max-grpc-send-msg-len\":10485760,\"grpc-compression-type\":\"none\",\"grpc-concurrency\":4,\"grpc-concurrent-stream\":1024,\"grpc-raft-conn-num\":1,\"grpc-memory-pool-quota\":9223372036854775807,\"grpc-stream-initial-window-size\":\"2MiB\",\"grpc-keepalive-time\":\"10s\",\"grpc-keepalive-timeout\":\"3s\",\"concurrent-send-snap-limit\":32,\"concurrent-recv-snap-limit\":32,\"end-point-recursion-limit\":1000,\"end-point-stream-channel-size\":8,\"end-point-batch-row-limit\":64,\"end-point-stream-batch-row-limit\":128,\"end-point-enable-batch-if-possible\":true,\"end-point-request-max-handle-duration\":\"1m\",\"end-point-max-concurrency\":64,\"snap-max-write-bytes-per-sec\":\"100MiB\",\"snap-max-total-size\":\"0KiB\",\"stats-concurrency\":1,\"heavy-load-threshold\":300,\"heavy-load-wait-duration\":\"1ms\",\"enable-request-batch\":true,\"request-batch-enable-cross-command\":false,\"request-batch-wait-duration\":\"1ms\",\"labels\":{}},\"storage\":{\"data-dir\":\"/var/lib/tikv\",\"gc-ratio-threshold\":1.1,\"max-key-size\":4096,\"scheduler-concurrency\":524288,\"scheduler-worker-pool-size\":8,\"scheduler-pending-write-threshold\":\"100MiB\",\"reserve-space\":\"0KiB\",\"block-cache\":{\"shared\":true,\"capacity\":\"106054MiB\",\"num-shard-bits\":6,\"strict-capacity-limit\":false,\"high-pri-pool-ratio\":0.8,\"memory-allocator\":\"nodump\"}},\"pd\":{\"endpoints\":[\"http://basic1-pd:2379\"],\"retry-interval\":\"300ms\",\"retry-max-count\":9223372036854775807,\"retry-log-every\":10,\"update-interval\":\"10m\"},\"metric\":{\"interval\":\"15s\",\"address\":\"\",\"job\":\"tikv\"},\"raftstore\":{\"sync-log\":true,\"prevote\":true,\"raftdb-path\":\"/var/lib/tikv/raft\",\"capacity\":\"0KiB\",\"raft-base-tick-interval\":\"1s\",\"raft-heartbeat-ticks\":2,\"raft-election-timeout-ticks\":10,\"raft-min-election-timeout-ticks\":10,\"raft-max-election-timeout-ticks\":20,\"raft-max-size-per-msg\":\"1MiB\",\"raft-max-inflight-msgs\":256,\"raft-entry-max-size\":\"8MiB\",\"raft-log-gc-tick-interval\":\"10s\",\"raft-log-gc-threshold\":50,\"raft-log-gc-count-limit\":73728,\"raft-log-gc-size-limit\":\"72MiB\",\"raft-entry-cache-life-time\":\"30s\",\"raft-reject-transfer-leader-duration\":\"3s\",\"split-region-check-tick-interval\":\"10s\",\"region-split-check-diff\":\"6MiB\",\"region-compact-check-interval\":\"5m\",\"clean-stale-peer-delay\":\"11m\",\"region-compact-check-step\":100,\"region-compact-min-tombstones\":10000,\"region-compact-tombstones-percent\":30,\"pd-heartbeat-tick-interval\":\"1m\",\"pd-store-heartbeat-tick-interval\":\"10s\",\"snap-mgr-gc-tick-interval\":\"1m\",\"snap-gc-timeout\":\"4h\",\"lock-cf-compact-interval\":\"10m\",\"lock-cf-compact-bytes-threshold\":\"256MiB\",\"notify-capacity\":40960,\"messages-per-tick\":4096,\"max-peer-down-duration\":\"5m\",\"max-leader-missing-duration\":\"2h\",\"abnormal-leader-missing-duration\":\"10m\",\"peer-stale-state-check-interval\":\"5m\",\"leader-transfer-max-log-lag\":10,\"snap-apply-batch-size\":\"10MiB\",\"consistency-check-interval\":\"0s\",\"report-region-flow-interval\":\"1m\",\"raft-store-max-leader-lease\":\"9s\",\"right-derive-when-split\":true,\"allow-remove-leader\":false,\"merge-max-log-gap\":10,\"merge-check-tick-interval\":\"10s\",\"use-delete-range\":false,\"cleanup-import-sst-interval\":\"10m\",\"local-read-batch-size\":1024,\"apply-max-batch-size\":256,\"apply-pool-size\":2,\"apply-reschedule-duration\":\"5s\",\"store-max-batch-size\":256,\"store-pool-size\":2,\"store-reschedule-duration\":\"5s\",\"future-poll-size\":1,\"hibernate-regions\":false,\"hibernate-timeout\":\"10m\",\"early-apply\":true,\"dev-assert\":false,\"apply-yield-duration\":\"500ms\",\"perf-level\":1},\"coprocessor\":{\"split-region-on-table\":false,\"batch-split-limit\":10,\"region-max-size\":\"144MiB\",\"region-split-size\":\"96MiB\",\"region-max-keys\":1440000,\"region-split-keys\":960000},\"rocksdb\":{\"wal-recovery-mode\":2,\"wal-dir\":\"\",\"wal-ttl-seconds\":0,\"wal-size-limit\":\"0KiB\",\"max-total-wal-size\":\"4GiB\",\"max-background-jobs\":8,\"max-manifest-file-size\":\"128MiB\",\"create-if-missing\":true,\"max-open-files\":40960,\"enable-statistics\":true,\"stats-dump-period\":\"10m\",\"compaction-readahead-size\":\"0KiB\",\"info-log-max-size\":\"1GiB\",\"info-log-roll-time\":\"0s\",\"info-log-keep-log-file-num\":10,\"info-log-dir\":\"\",\"rate-bytes-per-sec\":\"0KiB\",\"rate-limiter-mode\":2,\"auto-tuned\":false,\"bytes-per-sync\":\"1MiB\",\"wal-bytes-per-sync\":\"512KiB\",\"max-sub-compactions\":3,\"writable-file-max-buffer-size\":\"1MiB\",\"use-direct-io-for-flush-and-compaction\":false,\"enable-pipelined-write\":true,\"enable-multi-batch-write\":true,\"enable-unordered-write\":false,\"defaultcf\":{\"block-size\":\"64KiB\",\"block-cache-size\":\"64364MiB\",\"disable-block-cache\":false,\"cache-index-and-filter-blocks\":true,\"pin-l0-filter-and-index-blocks\":true,\"use-bloom-filter\":true,\"optimize-filters-for-hits\":true,\"whole-key-filtering\":true,\"bloom-filter-bits-per-key\":10,\"block-based-bloom-filter\":false,\"read-amp-bytes-per-bit\":0,\"compression-per-level\":[\"no\",\"no\",\"lz4\",\"lz4\",\"lz4\",\"zstd\",\"zstd\"],\"write-buffer-size\":\"128MiB\",\"max-write-buffer-number\":5,\"min-write-buffer-number-to-merge\":1,\"max-bytes-for-level-base\":\"512MiB\",\"target-file-size-base\":\"8MiB\",\"level0-file-num-compaction-trigger\":4,\"level0-slowdown-writes-trigger\":20,\"level0-stop-writes-trigger\":36,\"max-compaction-bytes\":\"2GiB\",\"compaction-pri\":3,\"dynamic-level-bytes\":true,\"num-levels\":7,\"max-bytes-for-level-multiplier\":10,\"compaction-style\":0,\"disable-auto-compactions\":false,\"soft-pending-compaction-bytes-limit\":\"64GiB\",\"hard-pending-compaction-bytes-limit\":\"256GiB\",\"force-consistency-checks\":true,\"prop-size-index-distance\":4194304,\"prop-keys-index-distance\":40960,\"enable-doubly-skiplist\":true,\"titan\":{\"min-blob-size\":\"1KiB\",\"blob-file-compression\":\"lz4\",\"blob-cache-size\":\"0KiB\",\"min-gc-batch-size\":\"16MiB\",\"max-gc-batch-size\":\"64MiB\",\"discardable-ratio\":0.5,\"sample-ratio\":0.1,\"merge-small-file-threshold\":\"8MiB\",\"blob-run-mode\":\"normal\",\"level-merge\":false,\"range-merge\":true,\"max-sorted-runs\":20,\"gc-merge-rewrite\":false}},\"writecf\":{\"block-size\":\"64KiB\",\"block-cache-size\":\"38618MiB\",\"disable-block-cache\":false,\"cache-index-and-filter-blocks\":true,\"pin-l0-filter-and-index-blocks\":true,\"use-bloom-filter\":true,\"optimize-filters-for-hits\":false,\"whole-key-filtering\":false,\"bloom-filter-bits-per-key\":10,\"block-based-bloom-filter\":false,\"read-amp-bytes-per-bit\":0,\"compression-per-level\":[\"no\",\"no\",\"lz4\",\"lz4\",\"lz4\",\"zstd\",\"zstd\"],\"write-buffer-size\":\"128MiB\",\"max-write-buffer-number\":5,\"min-write-buffer-number-to-merge\":1,\"max-bytes-for-level-base\":\"512MiB\",\"target-file-size-base\":\"8MiB\",\"level0-file-num-compaction-trigger\":4,\"level0-slowdown-writes-trigger\":20,\"level0-stop-writes-trigger\":36,\"max-compaction-bytes\":\"2GiB\",\"compaction-pri\":3,\"dynamic-level-bytes\":true,\"num-levels\":7,\"max-bytes-for-level-multiplier\":10,\"compaction-style\":0,\"disable-auto-compactions\":false,\"soft-pending-compaction-bytes-limit\":\"64GiB\",\"hard-pending-compaction-bytes-limit\":\"256GiB\",\"force-consistency-checks\":true,\"prop-size-index-distance\":4194304,\"prop-keys-index-distance\":40960,\"enable-doubly-skiplist\":true,\"titan\":{\"min-blob-size\":\"1KiB\",\"blob-file-compression\":\"lz4\",\"blob-cache-size\":\"0KiB\",\"min-gc-batch-size\":\"16MiB\",\"max-gc-batch-size\":\"64MiB\",\"discardable-ratio\":0.5,\"sample-ratio\":0.1,\"merge-small-file-threshold\":\"8MiB\",\"blob-run-mode\":\"read-only\",\"level-merge\":false,\"range-merge\":true,\"max-sorted-runs\":20,\"gc-merge-rewrite\":false}},\"lockcf\":{\"block-size\":\"16KiB\",\"block-cache-size\":\"1GiB\",\"disable-block-cache\":false,\"cache-index-and-filter-blocks\":true,\"pin-l0-filter-and-index-blocks\":true,\"use-bloom-filter\":true,\"optimize-filters-for-hits\":false,\"whole-key-filtering\":true,\"bloom-filter-bits-per-key\":10,\"block-based-bloom-filter\":false,\"read-amp-bytes-per-bit\":0,\"compression-per-level\":[\"no\",\"no\",\"no\",\"no\",\"no\",\"no\",\"no\"],\"write-buffer-size\":\"32MiB\",\"max-write-buffer-number\":5,\"min-write-buffer-number-to-merge\":1,\"max-bytes-for-level-base\":\"128MiB\",\"target-file-size-base\":\"8MiB\",\"level0-file-num-compaction-trigger\":1,\"level0-slowdown-writes-trigger\":20,\"level0-stop-writes-trigger\":36,\"max-compaction-bytes\":\"2GiB\",\"compaction-pri\":0,\"dynamic-level-bytes\":true,\"num-levels\":7,\"max-bytes-for-level-multiplier\":10,\"compaction-style\":0,\"disable-auto-compactions\":false,\"soft-pending-compaction-bytes-limit\":\"64GiB\",\"hard-pending-compaction-bytes-limit\":\"256GiB\",\"force-consistency-checks\":true,\"prop-size-index-distance\":4194304,\"prop-keys-index-distance\":40960,\"enable-doubly-skiplist\":true,\"titan\":{\"min-blob-size\":\"1KiB\",\"blob-file-compression\":\"lz4\",\"blob-cache-size\":\"0KiB\",\"min-gc-batch-size\":\"16MiB\",\"max-gc-batch-size\":\"64MiB\",\"discardable-ratio\":0.5,\"sample-ratio\":0.1,\"merge-small-file-threshold\":\"8MiB\",\"blob-run-mode\":\"read-only\",\"level-merge\":false,\"range-merge\":true,\"max-sorted-runs\":20,\"gc-merge-rewrite\":false}},\"raftcf\":{\"block-size\":\"16KiB\",\"block-cache-size\":\"128MiB\",\"disable-block-cache\":false,\"cache-index-and-filter-blocks\":true,\"pin-l0-filter-and-index-blocks\":true,\"use-bloom-filter\":true,\"optimize-filters-for-hits\":true,\"whole-key-filtering\":true,\"bloom-filter-bits-per-key\":10,\"block-based-bloom-filter\":false,\"read-amp-bytes-per-bit\":0,\"compression-per-level\":[\"no\",\"no\",\"no\",\"no\",\"no\",\"no\",\"no\"],\"write-buffer-size\":\"128MiB\",\"max-write-buffer-number\":5,\"min-write-buffer-number-to-merge\":1,\"max-bytes-for-level-base\":\"128MiB\",\"target-file-size-base\":\"8MiB\",\"level0-file-num-compaction-trigger\":1,\"level0-slowdown-writes-trigger\":20,\"level0-stop-writes-trigger\":36,\"max-compaction-bytes\":\"2GiB\",\"compaction-pri\":0,\"dynamic-level-bytes\":true,\"num-levels\":7,\"max-bytes-for-level-multiplier\":10,\"compaction-style\":0,\"disable-auto-compactions\":false,\"soft-pending-compaction-bytes-limit\":\"64GiB\",\"hard-pending-compaction-bytes-limit\":\"256GiB\",\"force-consistency-checks\":true,\"prop-size-index-distance\":4194304,\"prop-keys-index-distance\":40960,\"enable-doubly-skiplist\":true,\"titan\":{\"min-blob-size\":\"1KiB\",\"blob-file-compression\":\"lz4\",\"blob-cache-size\":\"0KiB\",\"min-gc-batch-size\":\"16MiB\",\"max-gc-batch-size\":\"64MiB\",\"discardable-ratio\":0.5,\"sample-ratio\":0.1,\"merge-small-file-threshold\":\"8MiB\",\"blob-run-mode\":\"read-only\",\"level-merge\":false,\"range-merge\":true,\"max-sorted-runs\":20,\"gc-merge-rewrite\":false}},\"titan\":{\"enabled\":false,\"dirname\":\"\",\"disable-gc\":false,\"max-background-gc\":4,\"purge-obsolete-files-period\":\"10s\"}},\"raftdb\":{\"wal-recovery-mode\":2,\"wal-dir\":\"\",\"wal-ttl-seconds\":0,\"wal-size-limit\":\"0KiB\",\"max-total-wal-size\":\"4GiB\",\"max-background-jobs\":4,\"max-manifest-file-size\":\"20MiB\",\"create-if-missing\":true,\"max-open-files\":40960,\"enable-statistics\":true,\"stats-dump-period\":\"10m\",\"compaction-readahead-size\":\"0KiB\",\"info-log-max-size\":\"1GiB\",\"info-log-roll-time\":\"0s\",\"info-log-keep-log-file-num\":10,\"info-log-dir\":\"\",\"max-sub-compactions\":2,\"writable-file-max-buffer-size\":\"1MiB\",\"use-direct-io-for-flush-and-compaction\":false,\"enable-pipelined-write\":true,\"enable-unordered-write\":false,\"allow-concurrent-memtable-write\":true,\"bytes-per-sync\":\"1MiB\",\"wal-bytes-per-sync\":\"512KiB\",\"defaultcf\":{\"block-size\":\"64KiB\",\"block-cache-size\":\"2GiB\",\"disable-block-cache\":false,\"cache-index-and-filter-blocks\":true,\"pin-l0-filter-and-index-blocks\":true,\"use-bloom-filter\":false,\"optimize-filters-for-hits\":true,\"whole-key-filtering\":true,\"bloom-filter-bits-per-key\":10,\"block-based-bloom-filter\":false,\"read-amp-bytes-per-bit\":0,\"compression-per-level\":[\"no\",\"no\",\"lz4\",\"lz4\",\"lz4\",\"zstd\",\"zstd\"],\"write-buffer-size\":\"128MiB\",\"max-write-buffer-number\":5,\"min-write-buffer-number-to-merge\":1,\"max-bytes-for-level-base\":\"512MiB\",\"target-file-size-base\":\"8MiB\",\"level0-file-num-compaction-trigger\":4,\"level0-slowdown-writes-trigger\":20,\"level0-stop-writes-trigger\":36,\"max-compaction-bytes\":\"2GiB\",\"compaction-pri\":0,\"dynamic-level-bytes\":true,\"num-levels\":7,\"max-bytes-for-level-multiplier\":10,\"compaction-style\":0,\"disable-auto-compactions\":false,\"soft-pending-compaction-bytes-limit\":\"64GiB\",\"hard-pending-compaction-bytes-limit\":\"256GiB\",\"force-consistency-checks\":true,\"prop-size-index-distance\":4194304,\"prop-keys-index-distance\":40960,\"enable-doubly-skiplist\":true,\"titan\":{\"min-blob-size\":\"1KiB\",\"blob-file-compression\":\"lz4\",\"blob-cache-size\":\"0KiB\",\"min-gc-batch-size\":\"16MiB\",\"max-gc-batch-size\":\"64MiB\",\"discardable-ratio\":0.5,\"sample-ratio\":0.1,\"merge-small-file-threshold\":\"8MiB\",\"blob-run-mode\":\"normal\",\"level-merge\":false,\"range-merge\":true,\"max-sorted-runs\":20,\"gc-merge-rewrite\":false}},\"titan\":{\"enabled\":false,\"dirname\":\"\",\"disable-gc\":false,\"max-background-gc\":4,\"purge-obsolete-files-period\":\"10s\"}},\"security\":{\"ca-path\":\"\",\"cert-path\":\"\",\"key-path\":\"\",\"cert-allowed-cn\":[],\"encryption\":{\"data-encryption-method\":\"plaintext\",\"data-key-rotation-period\":\"7d\",\"master-key\":{\"type\":\"plaintext\"},\"previous-master-key\":{\"type\":\"plaintext\"}}},\"import\":{\"num-threads\":8,\"stream-channel-window\":128},\"backup\":{\"num-threads\":32},\"pessimistic-txn\":{\"enabled\":true,\"wait-for-lock-timeout\":\"1s\",\"wake-up-delay-duration\":\"20ms\",\"pipelined\":false},\"gc\":{\"ratio-threshold\":1.1,\"batch-keys\":512,\"max-write-bytes-per-sec\":\"0KiB\"},\"split\":{\"qps-threshold\":3000,\"split-balance-score\":0.25,\"split-contained-score\":0.5,\"detect-times\":10,\"sample-num\":20,\"sample-threshold\":100},\"cdc\":{\"min-ts-interval\":\"1s\"}}"]
[2020/12/30 16:30:51.304 +08:00] [INFO] [mod.rs:335] ["starting working thread"] [worker=addr-resolver]
[2020/12/30 16:30:51.304 +08:00] [INFO] [mod.rs:335] ["starting working thread"] [worker=region-collector-worker]
[2020/12/30 16:30:51.304 +08:00] [FATAL] [server.rs:295] ["lock /var/lib/tikv failed, maybe another instance is using this directory."]

看起来权限有问题导致KV启动不了。

[2020/12/30 16:29:28.338 +08:00] [INFO] [mod.rs:335] ["starting working thread"] [worker=addr-resolver]
[2020/12/30 16:29:28.338 +08:00] [INFO] [mod.rs:335] ["starting working thread"] [worker=region-collector-worker]
[2020/12/30 16:29:28.338 +08:00] [INFO] [mod.rs:85] ["encryption: none of key dictionary and file dictionary are found."]
[2020/12/30 16:29:28.338 +08:00] [INFO] [mod.rs:374] ["encryption is disabled."]
[2020/12/30 16:29:28.389 +08:00] [INFO] [future.rs:136] ["starting working thread"] [worker=gc-worker]
[2020/12/30 16:29:28.389 +08:00] [INFO] [mod.rs:335] ["starting working thread"] [worker=lock-collector]
[2020/12/30 16:29:28.432 +08:00] [INFO] [mod.rs:170] ["Storage started."]
[2020/12/30 16:29:28.434 +08:00] [INFO] [mod.rs:335] ["starting working thread"] [worker=split-check]
[2020/12/30 16:29:28.435 +08:00] [INFO] [node.rs:353] ["start raft store thread"] [store_id=19121]
[2020/12/30 16:29:28.435 +08:00] [INFO] [store.rs:941] ["start store"] [takes=28.246µs] [merge_count=0] [applying_count=0] [tombstone_count=0] [region_count=0] [store_id=19121]
[2020/12/30 16:29:28.435 +08:00] [INFO] [store.rs:992] ["cleans up garbage data"] [takes=20.59µs] [garbage_range_count=1] [store_id=19121]
[2020/12/30 16:29:28.435 +08:00] [INFO] [snap.rs:1121] ["Initializing SnapManager, encryption is enabled: false"]
[2020/12/30 16:29:28.436 +08:00] [INFO] [mod.rs:335] ["starting working thread"] [worker=snapshot-worker]
[2020/12/30 16:29:28.436 +08:00] [INFO] [mod.rs:335] ["starting working thread"] [worker=raft-gc-worker]
[2020/12/30 16:29:28.437 +08:00] [INFO] [mod.rs:335] ["starting working thread"] [worker=cleanup-worker]
[2020/12/30 16:29:28.437 +08:00] [INFO] [future.rs:136] ["starting working thread"] [worker=pd-worker]
[2020/12/30 16:29:28.437 +08:00] [INFO] [mod.rs:335] ["starting working thread"] [worker=consistency-check]
[2020/12/30 16:29:28.437 +08:00] [WARN] [store.rs:1275] ["set thread priority for raftstore failed"] [error="Os { code: 13, kind: PermissionDenied, message: \"Permission denied\" }"]
[2020/12/30 16:29:28.437 +08:00] [INFO] [node.rs:174] ["put store to PD"] [store="id: 19121 address: \"basic1-tikv-2.basic1-tikv-peer.test-namespace1.svc:20160\" version: \"4.0.6\" status_address: \"0.0.0.0:20180\" git_hash: \"ca2475bfbcb49a7c34cf783596acb3edd05fc88f\" start_timestamp: 1609316968 deploy_path: \"/\""]
[2020/12/30 16:29:28.439 +08:00] [ERROR] [util.rs:347] ["request failed"] [err_code=KV-PD-gRPC] [err="Grpc(RpcFailure(RpcStatus { status: 2-UNKNOWN, details: Some(\"duplicated store address: id:19121 address:\\\"basic1-tikv-2.basic1-tikv-peer.test-namespace1.svc:20160\\\" version:\\\"4.0.6\\\" status_address:\\\"0.0.0.0:20180\\\" git_hash:\\\"ca2475bfbcb49a7c34cf783596acb3edd05fc88f\\\" start_timestamp:1609316968 deploy_path:\\\"/\\\" , already registered by id:10514 address:\\\"basic1-tikv-2.basic1-tikv-peer.test-namespace1.svc:20160\\\" version:\\\"4.0.6\\\" status_address:\\\"0.0.0.0:20180\\\" git_hash:\\\"ca2475bfbcb49a7c34cf783596acb3edd05fc88f\\\" start_timestamp:1609232735 deploy_path:\\\"/\\\" last_heartbeat:1609241306659466673 \") }))"]
[2020/12/30 16:29:28.441 +08:00] [ERROR] [util.rs:347] ["request failed"] [err_code=KV-PD-gRPC] [err="Grpc(RpcFailure(RpcStatus { status: 2-UNKNOWN, details: Some(\"duplicated store address: id:19121 address:\\\"basic1-tikv-2.basic1-tikv-peer.test-namespace1.svc:20160\\\" version:\\\"4.0.6\\\" status_address:\\\"0.0.0.0:20180\\\" git_hash:\\\"ca2475bfbcb49a7c34cf783596acb3edd05fc88f\\\" start_timestamp:1609316968 deploy_path:\\\"/\\\" , already registered by id:10514 address:\\\"basic1-tikv-2.basic1-tikv-peer.test-namespace1.svc:20160\\\" version:\\\"4.0.6\\\" status_address:\\\"0.0.0.0:20180\\\" git_hash:\\\"ca2475bfbcb49a7c34cf783596acb3edd05fc88f\\\" start_timestamp:1609232735 deploy_path:\\\"/\\\" last_heartbeat:1609241306659466673 \") }))"]
[2020/12/30 16:29:28.442 +08:00] [ERROR] [util.rs:347] ["request failed"] [err_code=KV-PD-gRPC] [err="Grpc(RpcFailure(RpcStatus { status: 2-UNKNOWN, details: Some(\"duplicated store address: id:19121 address:\\\"basic1-tikv-2.basic1-tikv-peer.test-namespace1.svc:20160\\\" version:\\\"4.0.6\\\" status_address:\\\"0.0.0.0:20180\\\" git_hash:\\\"ca2475bfbcb49a7c34cf783596acb3edd05fc88f\\\" start_timestamp:1609316968 deploy_path:\\\"/\\\" , already registered by id:10514 address:\\\"basic1-tikv-2.basic1-tikv-peer.test-namespace1.svc:20160\\\" version:\\\"4.0.6\\\" status_address:\\\"0.0.0.0:20180\\\" git_hash:\\\"ca2475bfbcb49a7c34cf783596acb3edd05fc88f\\\" start_timestamp:1609232735 deploy_path:\\\"/\\\" last_heartbeat:1609241306659466673 \") }))"]
[2020/12/30 16:29:28.443 +08:00] [ERROR] [util.rs:347] ["request failed"] [err_code=KV-PD-gRPC] [err="Grpc(RpcFailure(RpcStatus { status: 2-UNKNOWN, details: Some(\"duplicated store address: id:19121 address:\\\"basic1-tikv-2.basic1-tikv-peer.test-namespace1.svc:20160\\\" version:\\\"4.0.6\\\" status_address:\\\"0.0.0.0:20180\\\" git_hash:\\\"ca2475bfbcb49a7c34cf783596acb3edd05fc88f\\\" start_timestamp:1609316968 deploy_path:\\\"/\\\" , already registered by id:10514 address:\\\"basic1-tikv-2.basic1-tikv-peer.test-namespace1.svc:20160\\\" version:\\\"4.0.6\\\" status_address:\\\"0.0.0.0:20180\\\" git_hash:\\\"ca2475bfbcb49a7c34cf783596acb3edd05fc88f\\\" start_timestamp:1609232735 deploy_path:\\\"/\\\" last_heartbeat:1609241306659466673 \") }))"]
[2020/12/30 16:29:28.445 +08:00] [ERROR] [util.rs:347] ["request failed"] [err_code=KV-PD-gRPC] [err="Grpc(RpcFailure(RpcStatus { status: 2-UNKNOWN, details: Some(\"duplicated store address: id:19121 address:\\\"basic1-tikv-2.basic1-tikv-peer.test-namespace1.svc:20160\\\" version:\\\"4.0.6\\\" status_address:\\\"0.0.0.0:20180\\\" git_hash:\\\"ca2475bfbcb49a7c34cf783596acb3edd05fc88f\\\" start_timestamp:1609316968 deploy_path:\\\"/\\\" , already registered by id:10514 address:\\\"basic1-tikv-2.basic1-tikv-peer.test-namespace1.svc:20160\\\" version:\\\"4.0.6\\\" status_address:\\\"0.0.0.0:20180\\\" git_hash:\\\"ca2475bfbcb49a7c34cf783596acb3edd05fc88f\\\" start_timestamp:1609232735 deploy_path:\\\"/\\\" last_heartbeat:1609241306659466673 \") }))"]
[2020/12/30 16:29:28.447 +08:00] [ERROR] [util.rs:347] ["request failed"] [err_code=KV-PD-gRPC] [err="Grpc(RpcFailure(RpcStatus { status: 2-UNKNOWN, details: Some(\"duplicated store address: id:19121 address:\\\"basic1-tikv-2.basic1-tikv-peer.test-namespace1.svc:20160\\\" version:\\\"4.0.6\\\" status_address:\\\"0.0.0.0:20180\\\" git_hash:\\\"ca2475bfbcb49a7c34cf783596acb3edd05fc88f\\\" start_timestamp:1609316968 deploy_path:\\\"/\\\" , already registered by id:10514 address:\\\"basic1-tikv-2.basic1-tikv-peer.test-namespace1.svc:20160\\\" version:\\\"4.0.6\\\" status_address:\\\"0.0.0.0:20180\\\" git_hash:\\\"ca2475bfbcb49a7c34cf783596acb3edd05fc88f\\\" start_timestamp:1609232735 deploy_path:\\\"/\\\" last_heartbeat:1609241306659466673 \") }))"]
[2020/12/30 16:29:28.448 +08:00] [ERROR] [util.rs:347] ["request failed"] [err_code=KV-PD-gRPC] [err="Grpc(RpcFailure(RpcStatus { status: 2-UNKNOWN, details: Some(\"duplicated store address: id:19121 address:\\\"basic1-tikv-2.basic1-tikv-peer.test-namespace1.svc:20160\\\" version:\\\"4.0.6\\\" status_address:\\\"0.0.0.0:20180\\\" git_hash:\\\"ca2475bfbcb49a7c34cf783596acb3edd05fc88f\\\" start_timestamp:1609316968 deploy_path:\\\"/\\\" , already registered by id:10514 address:\\\"basic1-tikv-2.basic1-tikv-peer.test-namespace1.svc:20160\\\" version:\\\"4.0.6\\\" status_address:\\\"0.0.0.0:20180\\\" git_hash:\\\"ca2475bfbcb49a7c34cf783596acb3edd05fc88f\\\" start_timestamp:1609232735 deploy_path:\\\"/\\\" last_heartbeat:1609241306659466673 \") }))"]
[2020/12/30 16:29:28.449 +08:00] [ERROR] [util.rs:347] ["request failed"] [err_code=KV-PD-gRPC] [err="Grpc(RpcFailure(RpcStatus { status: 2-UNKNOWN, details: Some(\"duplicated store address: id:19121 address:\\\"basic1-tikv-2.basic1-tikv-peer.test-namespace1.svc:20160\\\" version:\\\"4.0.6\\\" status_address:\\\"0.0.0.0:20180\\\" git_hash:\\\"ca2475bfbcb49a7c34cf783596acb3edd05fc88f\\\" start_timestamp:1609316968 deploy_path:\\\"/\\\" , already registered by id:10514 address:\\\"basic1-tikv-2.basic1-tikv-peer.test-namespace1.svc:20160\\\" version:\\\"4.0.6\\\" status_address:\\\"0.0.0.0:20180\\\" git_hash:\\\"ca2475bfbcb49a7c34cf783596acb3edd05fc88f\\\" start_timestamp:1609232735 deploy_path:\\\"/\\\" last_heartbeat:1609241306659466673 \") }))"]
[2020/12/30 16:29:28.450 +08:00] [ERROR] [util.rs:347] ["request failed"] [err_code=KV-PD-gRPC] [err="Grpc(RpcFailure(RpcStatus { status: 2-UNKNOWN, details: Some(\"duplicated store address: id:19121 address:\\\"basic1-tikv-2.basic1-tikv-peer.test-namespace1.svc:20160\\\" version:\\\"4.0.6\\\" status_address:\\\"0.0.0.0:20180\\\" git_hash:\\\"ca2475bfbcb49a7c34cf783596acb3edd05fc88f\\\" start_timestamp:1609316968 deploy_path:\\\"/\\\" , already registered by id:10514 address:\\\"basic1-tikv-2.basic1-tikv-peer.test-namespace1.svc:20160\\\" version:\\\"4.0.6\\\" status_address:\\\"0.0.0.0:20180\\\" git_hash:\\\"ca2475bfbcb49a7c34cf783596acb3edd05fc88f\\\" start_timestamp:1609232735 deploy_path:\\\"/\\\" last_heartbeat:1609241306659466673 \") }))"]
[2020/12/30 16:29:28.451 +08:00] [ERROR] [util.rs:347] ["request failed"] [err_code=KV-PD-gRPC] [err="Grpc(RpcFailure(RpcStatus { status: 2-UNKNOWN, details: Some(\"duplicated store address: id:19121 address:\\\"basic1-tikv-2.basic1-tikv-peer.test-namespace1.svc:20160\\\" version:\\\"4.0.6\\\" status_address:\\\"0.0.0.0:20180\\\" git_hash:\\\"ca2475bfbcb49a7c34cf783596acb3edd05fc88f\\\" start_timestamp:1609316968 deploy_path:\\\"/\\\" , already registered by id:10514 address:\\\"basic1-tikv-2.basic1-tikv-peer.test-namespace1.svc:20160\\\" version:\\\"4.0.6\\\" status_address:\\\"0.0.0.0:20180\\\" git_hash:\\\"ca2475bfbcb49a7c34cf783596acb3edd05fc88f\\\" start_timestamp:1609232735 deploy_path:\\\"/\\\" last_heartbeat:1609241306659466673 \") }))"]
[2020/12/30 16:29:28.452 +08:00] [FATAL] [server.rs:591] ["failed to start node: Grpc(RpcFailure(RpcStatus { status: 2-UNKNOWN, details: Some(\"duplicated store address: id:19121 address:\\\"basic1-tikv-2.basic1-tikv-peer.test-namespace1.svc:20160\\\" version:\\\"4.0.6\\\" status_address:\\\"0.0.0.0:20180\\\" git_hash:\\\"ca2475bfbcb49a7c34cf783596acb3edd05fc88f\\\" start_timestamp:1609316968 deploy_path:\\\"/\\\" , already registered by id:10514 address:\\\"basic1-tikv-2.basic1-tikv-peer.test-namespace1.svc:20160\\\" version:\\\"4.0.6\\\" status_address:\\\"0.0.0.0:20180\\\" git_hash:\\\"ca2475bfbcb49a7c34cf783596acb3edd05fc88f\\\" start_timestamp:1609232735 deploy_path:\\\"/\\\" last_heartbeat:1609241306659466673 \") }))"]

检查PD发现store确实已经down了,但是operator没能把tikv下掉,已经持续超过半个小时了。

[root@dcn-tidb-k8s-p-l-11:/home/appdeploy/tidb-v4.0.6-linux-amd64/bin]#./pd-ctl  store
{
  "count": 5,
  "stores": [
    {
      "store": {
        "id": 1,
        "address": "basic1-tikv-0.basic1-tikv-peer.test-namespace1.svc:20160",
        "version": "4.0.6",
        "status_address": "0.0.0.0:20180",
        "git_hash": "ca2475bfbcb49a7c34cf783596acb3edd05fc88f",
        "start_timestamp": 1609232833,
        "deploy_path": "/",
        "last_heartbeat": 1609320053033919196,
        "state_name": "Up"
      },
      "status": {
        "capacity": "1.719TiB",
        "available": "1.602TiB",
        "used_size": "116.6GiB",
        "leader_count": 1294,
        "leader_weight": 1,
        "leader_score": 1294,
        "leader_size": 115999,
        "region_count": 2581,
        "region_weight": 1,
        "region_score": 232721,
        "region_size": 232721,
        "start_ts": "2020-12-29T17:07:13+08:00",
        "last_heartbeat_ts": "2020-12-30T17:20:53.033919196+08:00",
        "uptime": "24h13m40.033919196s"
      }
    },
    {
      "store": {
        "id": 44,
        "address": "basic1-tikv-1.basic1-tikv-peer.test-namespace1.svc:20160",
        "version": "4.0.6",
        "status_address": "0.0.0.0:20180",
        "git_hash": "ca2475bfbcb49a7c34cf783596acb3edd05fc88f",
        "start_timestamp": 1609232786,
        "deploy_path": "/",
        "last_heartbeat": 1609311195666488737,
        "state_name": "Down"
      },
      "status": {
        "capacity": "1.719TiB",
        "available": "1.604TiB",
        "used_size": "117GiB",
        "leader_count": 1286,
        "leader_weight": 1,
        "leader_score": 1286,
        "leader_size": 116627,
        "region_count": 2580,
        "region_weight": 1,
        "region_score": 232626,
        "region_size": 232626,
        "start_ts": "2020-12-29T17:06:26+08:00",
        "last_heartbeat_ts": "2020-12-30T14:53:15.666488737+08:00",
        "uptime": "21h46m49.666488737s"
      }
    },
    {
      "store": {
        "id": 10514,
        "address": "basic1-tikv-2.basic1-tikv-peer.test-namespace1.svc:20160",
        "version": "4.0.6",
        "status_address": "0.0.0.0:20180",
        "git_hash": "ca2475bfbcb49a7c34cf783596acb3edd05fc88f",
        "start_timestamp": 1609232735,
        "deploy_path": "/",
        "last_heartbeat": 1609241306659466673,
        "state_name": "Down"
      },
      "status": {
        "capacity": "1.719TiB",
        "available": "1.609TiB",
        "used_size": "111.7GiB",
        "leader_count": 0,
        "leader_weight": 1,
        "leader_score": 0,
        "leader_size": 0,
        "region_count": 2580,
        "region_weight": 1,
        "region_score": 232626,
        "region_size": 232626,
        "start_ts": "2020-12-29T17:05:35+08:00",
        "last_heartbeat_ts": "2020-12-29T19:28:26.659466673+08:00",
        "uptime": "2h22m51.659466673s"
      }
    },
    {
      "store": {
        "id": 16029,
        "address": "basic1-tikv-3.basic1-tikv-peer.test-namespace1.svc:20160",
        "version": "4.0.6",
        "status_address": "0.0.0.0:20180",
        "git_hash": "ca2475bfbcb49a7c34cf783596acb3edd05fc88f",
        "start_timestamp": 1609311179,
        "deploy_path": "/",
        "last_heartbeat": 1609320050403388921,
        "state_name": "Up"
      },
      "status": {
        "capacity": "1.719TiB",
        "available": "1.718TiB",
        "used_size": "56.04MiB",
        "leader_count": 1,
        "leader_weight": 1,
        "leader_score": 1,
        "leader_size": 95,
        "region_count": 4,
        "region_weight": 1,
        "region_score": 405,
        "region_size": 405,
        "start_ts": "2020-12-30T14:52:59+08:00",
        "last_heartbeat_ts": "2020-12-30T17:20:50.403388921+08:00",
        "uptime": "2h27m51.403388921s"
      }
    },
    {
      "store": {
        "id": 20283,
        "address": "basic1-tikv-4.basic1-tikv-peer.test-namespace1.svc:20160",
        "version": "4.0.6",
        "status_address": "0.0.0.0:20180",
        "git_hash": "ca2475bfbcb49a7c34cf783596acb3edd05fc88f",
        "start_timestamp": 1609313305,
        "deploy_path": "/",
        "last_heartbeat": 1609320046551549272,
        "state_name": "Up"
      },
      "status": {
        "capacity": "1.719TiB",
        "available": "1.718TiB",
        "used_size": "56.04MiB",
        "leader_count": 0,
        "leader_weight": 1,
        "leader_score": 0,
        "leader_size": 0,
        "region_count": 1,
        "region_weight": 1,
        "region_score": 95,
        "region_size": 95,
        "start_ts": "2020-12-30T15:28:25+08:00",
        "last_heartbeat_ts": "2020-12-30T17:20:46.551549272+08:00",
        "uptime": "1h52m21.551549272s"
      }
    }
  ]
}
  1. 麻烦反馈下
    kubectl get pod -n test-namespace1 -o wide
    kubectl get pv
    kubectl get pvc -n test-namespace1

  2. 有做过什么操作吗? 导致 tikv-1 和 tikv-2 有问题,这两个 tikv 是同时出问题的吗?
    kubectl describe pod basic1-tikv-1 -n test-namespace1

不确定是不是同时出的问题,在KV重启这之前整个集群就无法写入了,报错是KV不能提供服务。其他集群出问题的时候进行了调试,增加本来想跟其他集群增加PV,这个集群就直接扩节点占用了,而占用的这个PV,重新挂过盘。
mount /data1 /mnt/kv1
mount /data2 /mnt/kv1

[root@dcn-tidb-k8s-p-l-11:/home/appdeploy]#kubectl get pod -n test-namespace1 -o wide
NAME                                READY   STATUS             RESTARTS   AGE     IP             NODE           NOMINATED NODE   READINESS GATES
basic1-discovery-56bd576c8b-v8hk9   1/1     Running            0          102d    172.32.92.11   10.204.11.92   <none>           <none>
basic1-pd-0                         1/1     Running            0          6d16h   172.32.90.6    10.204.11.90   <none>           <none>
basic1-pd-1                         1/1     Running            0          6d16h   172.32.91.4    10.204.11.91   <none>           <none>
basic1-pd-2                         1/1     Running            0          6d16h   172.32.92.3    10.204.11.92   <none>           <none>
basic1-pump-0                       1/1     Running            0          6d16h   172.32.90.8    10.204.11.90   <none>           <none>
basic1-pump-1                       1/1     Running            0          6d16h   172.32.92.9    10.204.11.92   <none>           <none>
basic1-pump-2                       1/1     Running            2          6d16h   172.32.91.10   10.204.11.91   <none>           <none>
basic1-tidb-0                       2/2     Running            0          6d16h   172.32.85.3    10.204.11.85   <none>           <none>
basic1-tidb-1                       2/2     Running            0          6d16h   172.32.90.15   10.204.11.90   <none>           <none>
basic1-tidb-2                       2/2     Running            2          6d16h   172.32.86.3    10.204.11.86   <none>           <none>
basic1-tikv-0                       1/1     Running            0          6d16h   172.32.90.4    10.204.11.90   <none>           <none>
basic1-tikv-1                       0/1     CrashLoopBackOff   1592       5d15h   172.32.91.7    10.204.11.91   <none>           <none>
basic1-tikv-2                       0/1     CrashLoopBackOff   1854       6d16h   172.32.92.10   10.204.11.92   <none>           <none>
basic1-tikv-3                       1/1     Running            0          6d13h   172.32.91.13   10.204.11.91   <none>           <none>
basic1-tikv-4                       1/1     Running            0          5d17h   172.32.92.5    10.204.11.92   <none>           <none>
meta1-monitor-776d4fbf4c-mz42j      3/3     Running            0          101d    172.32.91.8    10.204.11.91   <none>           <none>
[root@dcn-tidb-k8s-p-l-11:/home/appdeploy]#kubectl get pv
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS      CLAIM                                           STORAGECLASS         REASON   AGE
local-pv-1014f390                          1759Gi     RWO            Delete           Bound       test-namespace1/tikv-basic1-tikv-0              kv-storage                    110d
local-pv-166b56da                          1759Gi     RWO            Retain           Bound       push-namespace/tikv-push-tidb-tikv-0            kv-storage                    103d
local-pv-2d007021                          1759Gi     RWO            Delete           Bound       test-namespace1/tikv-basic1-tikv-3              kv-storage                    5d18h
local-pv-3564e783                          98Gi       RWO            Retain           Bound       pay-back/pd-pay-bk-pd-2                         pd-storage                    5d18h
local-pv-377fafa9                          738Gi      RWO            Retain           Bound       pay-back/data-pay-bk-pump-4                     pump-storage                  2d1h
local-pv-3783673a                          393Gi      RWO            Retain           Bound       pay-back/data-pay-bk-pump-0                     pump-storage                  5d18h
local-pv-3e8f2ea3                          98Gi       RWO            Retain           Bound       push-namespace/pd-push-tidb-pd-1                pd-storage                    103d
local-pv-4fb09c33                          393Gi      RWO            Delete           Bound       test-namespace1/data-basic1-pump-0              pump-storage                  110d
local-pv-50486d5b                          738Gi      RWO            Retain           Bound       pay-back/data-pay-bk-pump-5                     pump-storage                  2d1h
local-pv-538b7c4f                          98Gi       RWO            Retain           Bound       push-namespace/pd-push-tidb-pd-0                pd-storage                    103d
local-pv-55fab000                          98Gi       RWO            Delete           Bound       test-namespace1/pd-basic1-pd-0                  pd-storage                    110d
local-pv-5683598c                          1759Gi     RWO            Delete           Bound       test-namespace1/tikv-basic1-tikv-4              kv-storage                    5d18h
local-pv-576d1e5d                          98Gi       RWO            Retain           Bound       push-namespace/pd-push-tidb-pd-2                pd-storage                    103d
local-pv-6178076f                          1759Gi     RWO            Retain           Bound       pay-back/tikv-pay-bk-tikv-0                     kv-storage                    5d18h
local-pv-7127a8f1                          98Gi       RWO            Retain           Available                                                   tidb-storage                  110d
local-pv-78c0236a                          98Gi       RWO            Delete           Bound       test-namespace1/pd-basic1-pd-1                  pd-storage                    110d
local-pv-795e63ba                          1759Gi     RWO            Retain           Bound       push-namespace/tikv-push-tidb-tikv-2            kv-storage                    103d
local-pv-811dfc90                          393Gi      RWO            Retain           Bound       push-namespace/data-push-tidb-pump-0            pump-storage                  103d
local-pv-824a39f7                          393Gi      RWO            Retain           Bound       pay-back/data-pay-bk-pump-1                     pump-storage                  5d18h
local-pv-8d7d5e06                          98Gi       RWO            Retain           Bound       pay-back/pay-monitor-meta-monitor               tidb-storage                  110d
local-pv-958dd699                          688Gi      RWO            Retain           Bound       pay-back/data-pay-bk-pump-3                     pump-storage                  2d21h
local-pv-a02e03d0                          98Gi       RWO            Retain           Bound       pay-back/pd-pay-bk-pd-1                         pd-storage                    5d18h
local-pv-a0a40224                          393Gi      RWO            Retain           Bound       push-namespace/data-push-tidb-pump-1            pump-storage                  103d
local-pv-a874c0b4                          1759Gi     RWO            Retain           Bound       push-namespace/tikv-push-tidb-tikv-1            kv-storage                    103d
local-pv-ac9dc8f6                          1759Gi     RWO            Delete           Bound       test-namespace1/tikv-basic1-tikv-1              kv-storage                    110d
local-pv-b0f4074e                          1759Gi     RWO            Retain           Bound       pay-back/tikv-pay-bk-tikv-6                     kv-storage                    4d22h
local-pv-b4568991                          98Gi       RWO            Retain           Available                                                   tidb-storage                  110d
local-pv-c205a0c7                          1759Gi     RWO            Retain           Bound       pay-back/tikv-pay-bk-tikv-5                     kv-storage                    4d22h
local-pv-c415be68                          1759Gi     RWO            Retain           Bound       pay-back/tikv-pay-bk-tikv-2                     kv-storage                    5d18h
local-pv-ca8be8b3                          98Gi       RWO            Delete           Bound       test-namespace1/pd-basic1-pd-2                  pd-storage                    103d
local-pv-cf79cf33                          393Gi      RWO            Delete           Bound       test-namespace1/data-basic1-pump-1              pump-storage                  110d
local-pv-d36fe7ea                          98Gi       RWO            Retain           Bound       pay-back/pd-pay-bk-pd-0                         pd-storage                    5d18h
local-pv-e71e0cdd                          393Gi      RWO            Retain           Bound       push-namespace/data-push-tidb-pump-2            pump-storage                  103d
local-pv-ea4283b4                          1759Gi     RWO            Retain           Bound       pay-back/tikv-pay-bk-tikv-4                     kv-storage                    5d18h
local-pv-ed467f97                          590Gi      RWO            Retain           Bound       pay-back/data-pay-bk-pump-2                     pump-storage                  5d18h
local-pv-ef12c2e2                          1759Gi     RWO            Retain           Bound       pay-back/tikv-pay-bk-tikv-1                     kv-storage                    5d18h
local-pv-f22a494                           1759Gi     RWO            Retain           Bound       pay-back/tikv-pay-bk-tikv-3                     kv-storage                    5d18h
local-pv-fbb9467c                          1759Gi     RWO            Delete           Bound       test-namespace1/tikv-basic1-tikv-2              kv-storage                    110d
local-pv-fe7bd904                          393Gi      RWO            Delete           Bound       test-namespace1/data-basic1-pump-2              pump-storage                  110d
pvc-751fed69-5289-47a2-8795-3f1fbc534395   10Gi       RWO            Delete           Bound       monitoring/prometheus-k8s-db-prometheus-k8s-0   prometheus-data-db            105d
pvc-febff3c7-521e-475b-9bac-d631151475fa   10Gi       RWO            Delete           Bound       monitoring/prometheus-k8s-db-prometheus-k8s-1   prometheus-data-db            105d
[root@dcn-tidb-k8s-p-l-11:/home/appdeploy]#kubectl get pvc -n test-namespace1
NAME                 STATUS   VOLUME              CAPACITY   ACCESS MODES   STORAGECLASS   AGE
data-basic1-pump-0   Bound    local-pv-4fb09c33   393Gi      RWO            pump-storage   105d
data-basic1-pump-1   Bound    local-pv-cf79cf33   393Gi      RWO            pump-storage   105d
data-basic1-pump-2   Bound    local-pv-fe7bd904   393Gi      RWO            pump-storage   103d
pd-basic1-pd-0       Bound    local-pv-55fab000   98Gi       RWO            pd-storage     110d
pd-basic1-pd-1       Bound    local-pv-78c0236a   98Gi       RWO            pd-storage     110d
pd-basic1-pd-2       Bound    local-pv-ca8be8b3   98Gi       RWO            pd-storage     103d
tikv-basic1-tikv-0   Bound    local-pv-1014f390   1759Gi     RWO            kv-storage     110d
tikv-basic1-tikv-1   Bound    local-pv-ac9dc8f6   1759Gi     RWO            kv-storage     110d
tikv-basic1-tikv-2   Bound    local-pv-fbb9467c   1759Gi     RWO            kv-storage     103d
tikv-basic1-tikv-3   Bound    local-pv-2d007021   1759Gi     RWO            kv-storage     6d13h
tikv-basic1-tikv-4   Bound    local-pv-5683598c   1759Gi     RWO            kv-storage     5d17h
[root@dcn-tidb-k8s-p-l-11:/home/appdeploy]#kubectl describe pod basic1-tikv-1 -n test-namespace1
Name:         basic1-tikv-1
Namespace:    test-namespace1
Priority:     0
Node:         10.204.11.91/10.204.11.91
Start Time:   Wed, 30 Dec 2020 17:48:19 +0800
Labels:       app.kubernetes.io/component=tikv
              app.kubernetes.io/instance=basic1
              app.kubernetes.io/managed-by=tidb-operator
              app.kubernetes.io/name=tidb-cluster
              controller-revision-hash=basic1-tikv-55f86c96cd
              statefulset.kubernetes.io/pod-name=basic1-tikv-1
              tidb.pingcap.com/cluster-id=6873013461546630258
              tidb.pingcap.com/store-id=44
Annotations:  prometheus.io/path: /metrics
              prometheus.io/port: 20180
              prometheus.io/scrape: true
              tidb.pingcap.com/restartedAt: 2020-12-28T02:45
Status:       Running
IP:           172.32.91.7
IPs:
  IP:           172.32.91.7
Controlled By:  StatefulSet/basic1-tikv
Containers:
  tikv:
    Container ID:  docker://bb2e809ad3ad1d512008c7011a6dac704b622004631422a7ab8c940e0dc9b791
    Image:         harbor.fcbox.com/tidb/pingcap/tikv:v4.0.6
    Image ID:      docker-pullable://harbor.fcbox.com/tidb/pingcap/tikv@sha256:572b9beb33d94a9d1eea5be1080e4f4920f4ae614c96b5ca2f97e877d7e55456
    Port:          20160/TCP
    Host Port:     0/TCP
    Command:
      /bin/sh
      /usr/local/bin/tikv_start_script.sh
    State:          Waiting
      Reason:       CrashLoopBackOff
    Last State:     Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Tue, 05 Jan 2021 09:18:00 +0800
      Finished:     Tue, 05 Jan 2021 09:18:01 +0800
    Ready:          False
    Restart Count:  1593
    Environment:
      NAMESPACE:              test-namespace1 (v1:metadata.namespace)
      CLUSTER_NAME:           basic1
      HEADLESS_SERVICE_NAME:  basic1-tikv-peer
      CAPACITY:               0
      TZ:                     Asia/Shanghai
    Mounts:
      /etc/podinfo from annotations (ro)
      /etc/tikv from config (ro)
      /usr/local/bin from startup-script (ro)
      /var/lib/tikv from tikv (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-ft9mr (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  tikv:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  tikv-basic1-tikv-1
    ReadOnly:   false
  annotations:
    Type:  DownwardAPI (a volume populated by information about the pod)
    Items:
      metadata.annotations -> annotations
  config:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      basic1-tikv
    Optional:  false
  startup-script:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      basic1-tikv
    Optional:  false
  default-token-ft9mr:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-ft9mr
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 180s
                 node.kubernetes.io/unreachable:NoExecute for 180s
Events:
  Type     Reason   Age                       From                   Message
  ----     ------   ----                      ----                   -------
  Normal   Pulled   3h25m (x1554 over 5d15h)  kubelet, 10.204.11.91  Container image "harbor.fcbox.com/tidb/pingcap/tikv:v4.0.6" already present on machine
  Warning  BackOff  21s (x37387 over 5d15h)   kubelet, 10.204.11.91  Back-off restarting failed container
```斜体示例
  1. 从输出结果看,basic1-tikv-1 在 node 10.204.11.91 使用的 kv-storage 的 local-pv-ac9dc8f6;
    basic1-tikv-2 在 node 10.204.11.92 使用的 kv-storage 的 local-pv-fbb9467c

  2. kv-storage 是如何配置的? 方便反馈下 local-volume-provisioner.yaml 吗?

  3. mount /data1 /mnt/kv1 和 mount /data2 /mnt/kv1 在哪些node执行过?

  4. 麻烦也反馈下 df -h 和 cat /etc/fstab 的结果,多谢。

  5. 从 store 的报错看 { status: 2-UNKNOWN, details: Some(“duplicated store address: id:19121 address:\“basic1-tikv-2.basic1-tikv-peer.test-namespace1.svc:20160\” version:\“4.0.6\” status_address:\“0.0.0.0:20180\” git_hash:\“ca2475bfbcb49a7c34cf783596acb3edd05fc88f\” start_timestamp:1609316968 deploy_path:\”/\" , already registered by id:10514 address:\"basic1-tikv-2.basic1-tikv-peer.test-namespace1.svc:20160 , 在出问题前除了 mount 过磁盘,还做过其他操作吗?

从日志看是TiKV的配置和Stores不匹配导致的, 可以尝试找到POD IP地址相对应的Store, 进行正确的PVC挂载和绑定.

另一种思路是, 目前情况是TiKV仍处于多数状态, 因此可以按照以下文档走TiKV实例的下线流程, 这样做不需要考虑PVC挂载是否正确. 可以按照步骤3~步骤7的操作删除异常的PVC, 跳过节点drain和cordon的操作
https://docs.pingcap.com/zh/tidb-in-kubernetes/stable/maintain-a-kubernetes-node#维护短期内不可恢复的节点