使用 tidb-operator 部署,pd 部署成功,tikv 一直处于 CrashLoopBackOff 状态

  • tidb:v2.1.14
  • tidb-operator:v1.0.0-beta.3
  • k8s:v1.14.1
  • os: centos7.5
  • kernel: 5.0.7-1.el7.elrepo.x86_64

在另一套环境里部署是没问题的,但是在这个环境里部署就出现 tikv 一直处于 CrashLoopBackOff 的问题

tidb-controller-manager 日志

tidb-scheduler/kube-scheduler 日志

tidb-scheduler/tidb-scheduler 日志

tikv 日志

pv & pvc 信息

1赞

values.yaml 文件中 TiKV 配置的资源是多少?

tikv:
  ...
  resources:
    limits: 
    #   cpu: 16000m
    #   memory: 32Gi
    #   storage: 300Gi
    requests:
      # cpu: 12000m
      # memory: 24Gi
      # storage: 10Gi

resource 为空,没加任何限制

spec:
  affinity: {}
  containers:
  - command:
    - /bin/sh
    - /usr/local/bin/tikv_start_script.sh
    env:
    - name: NAMESPACE
      valueFrom:
        fieldRef:
          apiVersion: v1
          fieldPath: metadata.namespace
    - name: CLUSTER_NAME
      value: lcm-db
    - name: HEADLESS_SERVICE_NAME
      value: lcm-db-tikv-peer
    - name: CAPACITY
      value: "0"
    - name: TZ
      value: UTC
    image: registry.umstor.io:5050/vendor/tikv:v2.1.14
    imagePullPolicy: IfNotPresent
    name: tikv
    ports:
    - containerPort: 20160
      name: server
      protocol: TCP
    resources: {}
    securityContext:
      privileged: false
      procMount: Default
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /etc/podinfo
      name: annotations
      readOnly: true
    - mountPath: /var/lib/tikv
      name: tikv
    - mountPath: /etc/tikv
      name: config
      readOnly: true
    - mountPath: /usr/local/bin
      name: startup-script
      readOnly: true
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: default-token-jfrc5
      readOnly: true
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  hostname: lcm-db-tikv-0
  nodeName: umstor004
  priority: 0

系统资源

这个用的是什么类型的 PV,容量只有 1G? 另外可以再看下 dmesg 日志里有没有 OOM。

local pv

没有 OOM

按照这个文档,将其中一个 tikv 进入诊断模式:https://pingcap.com/docs-cn/v3.0/tidb-in-kubernetes/troubleshoot/#诊断模式

然后手工执行:/tik-server --pd=… (具体参数看下面的图片)看看报错信息。

报错信息如下:

2019/08/21 05:40:18.417 ERRO tikv-server.rs:84: Limit(“the maximum number of open file descriptors is too small, got 65536, expect greater or equal to 82920”)


诊断运行:

/ # /tikv-server --pd=lcm-db-pd:2379 --advertise-addr=lcm-db-tikv-0.lcm-db-tikv-peer.storage-system.svc:20160 --addr=0.0.0.0:20160 --data-dir=/var/lib/tikv --capacity=0 --config=/etc/tikv/tikv.toml
2019/08/21 05:40:18.416 INFO mod.rs:26: Welcome to TiKV.
Release Version:   2.1.14
Git Commit Hash:   32ca82bc067f7529dc07bf8ddb594b8c060b7f49
Git Commit Branch: HEAD
UTC Build Time:    2019-07-04 10:32:40
Rust Version:      rustc 1.29.0-nightly (4f3c7a472 2018-07-17)
2019/08/21 05:40:18.416 INFO tikv-server.rs:443: using config: {
  "log-level": "info",
  "log-file": "",
  "log-rotation-timespan": "24h",
  "panic-when-unexpected-key-or-data": false,
  "readpool": {
    "storage": {
      "high-concurrency": 4,
      "normal-concurrency": 4,
      "low-concurrency": 4,
      "max-tasks-per-worker-high": 2000,
      "max-tasks-per-worker-normal": 2000,
      "max-tasks-per-worker-low": 2000,
      "stack-size": "10MB"
    },
    "coprocessor": {
      "high-concurrency": 38,
      "normal-concurrency": 38,
      "low-concurrency": 38,
      "max-tasks-per-worker-high": 2000,
      "max-tasks-per-worker-normal": 2000,
      "max-tasks-per-worker-low": 2000,
      "stack-size": "10MB"
    }
  },
  "server": {
    "addr": "0.0.0.0:20160",
    "advertise-addr": "lcm-db-tikv-0.lcm-db-tikv-peer.storage-system.svc:20160",
    "status-addr": "0.0.0.0:20180",
    "status-thread-pool-size": 1,
    "grpc-compression-type": "none",
    "grpc-concurrency": 4,
    "grpc-concurrent-stream": 1024,
    "grpc-raft-conn-num": 10,
    "grpc-stream-initial-window-size": "2MB",
    "grpc-keepalive-time": "10s",
    "grpc-keepalive-timeout": "3s",
    "concurrent-send-snap-limit": 32,
    "concurrent-recv-snap-limit": 32,
    "end-point-recursion-limit": 1000,
    "end-point-stream-channel-size": 8,
    "end-point-batch-row-limit": 64,
    "end-point-stream-batch-row-limit": 128,
    "end-point-request-max-handle-duration": "1m",
    "snap-max-write-bytes-per-sec": "100MB",
    "snap-max-total-size": "0KB",
    "labels": {}
  },
  "storage": {
    "data-dir": "/var/lib/tikv",
    "gc-ratio-threshold": 1.1,
    "max-key-size": 4096,
    "scheduler-notify-capacity": 10240,
    "scheduler-concurrency": 2048000,
    "scheduler-worker-pool-size": 8,
    "scheduler-pending-write-threshold": "100MB"
  },
  "pd": {
    "endpoints": [
      "lcm-db-pd:2379"
    ]
  },
  "metric": {
    "interval": "15s",
    "address": "",
    "job": "tikv"
  },
  "raftstore": {
    "sync-log": true,
    "prevote": true,
    "raftdb-path": "/var/lib/tikv/raft",
    "capacity": "0KB",
    "raft-base-tick-interval": "1s",
    "raft-heartbeat-ticks": 2,
    "raft-election-timeout-ticks": 10,
    "raft-min-election-timeout-ticks": 10,
    "raft-max-election-timeout-ticks": 20,
    "raft-max-size-per-msg": "1MB",
    "raft-max-inflight-msgs": 256,
    "raft-entry-max-size": "8MB",
    "raft-log-gc-tick-interval": "10s",
    "raft-log-gc-threshold": 50,
    "raft-log-gc-count-limit": 73728,
    "raft-log-gc-size-limit": "72MB",
    "raft-entry-cache-life-time": "30s",
    "raft-reject-transfer-leader-duration": "3s",
    "split-region-check-tick-interval": "10s",
    "region-split-check-diff": "6MB",
    "region-compact-check-interval": "5m",
    "clean-stale-peer-delay": "11m",
    "region-compact-check-step": 100,
    "region-compact-min-tombstones": 10000,
    "region-compact-tombstones-percent": 30,
    "pd-heartbeat-tick-interval": "1m",
    "pd-store-heartbeat-tick-interval": "10s",
    "snap-mgr-gc-tick-interval": "1m",
    "snap-gc-timeout": "4h",
    "lock-cf-compact-interval": "10m",
    "lock-cf-compact-bytes-threshold": "256MB",
    "notify-capacity": 40960,
    "messages-per-tick": 4096,
    "max-peer-down-duration": "5m",
    "max-leader-missing-duration": "2h",
    "abnormal-leader-missing-duration": "10m",
    "peer-stale-state-check-interval": "5m",
    "leader-transfer-max-log-lag": 10,
    "snap-apply-batch-size": "10MB",
    "consistency-check-interval": "0s",
    "report-region-flow-interval": "1m",
    "raft-store-max-leader-lease": "9s",
    "right-derive-when-split": true,
    "allow-remove-leader": false,
    "merge-max-log-gap": 10,
    "merge-check-tick-interval": "10s",
    "use-delete-range": false,
    "cleanup-import-sst-interval": "10m",
    "local-read-batch-size": 1024
  },
  "coprocessor": {
    "split-region-on-table": true,
    "batch-split-limit": 10,
    "region-max-size": "144MB",
    "region-split-size": "96MB",
    "region-max-keys": 1440000,
    "region-split-keys": 960000
  },
  "rocksdb": {
    "wal-recovery-mode": 2,
    "wal-dir": "",
    "wal-ttl-seconds": 0,
    "wal-size-limit": "0KB",
    "max-total-wal-size": "4GB",
    "max-background-jobs": 6,
    "max-manifest-file-size": "128MB",
    "create-if-missing": true,
    "max-open-files": 40960,
    "enable-statistics": true,
    "stats-dump-period": "10m",
    "compaction-readahead-size": "0KB",
    "info-log-max-size": "1GB",
    "info-log-roll-time": "0s",
    "info-log-keep-log-file-num": 10,
    "info-log-dir": "",
    "rate-bytes-per-sec": "0KB",
    "bytes-per-sync": "1MB",
    "wal-bytes-per-sync": "512KB",
    "max-sub-compactions": 1,
    "writable-file-max-buffer-size": "1MB",
    "use-direct-io-for-flush-and-compaction": false,
    "enable-pipelined-write": true,
    "defaultcf": {
      "block-size": "64KB",
      "block-cache-size": "48329MB",
      "disable-block-cache": false,
      "cache-index-and-filter-blocks": true,
      "pin-l0-filter-and-index-blocks": true,
      "use-bloom-filter": true,
      "whole-key-filtering": true,
      "bloom-filter-bits-per-key": 10,
      "block-based-bloom-filter": false,
      "read-amp-bytes-per-bit": 0,
      "compression-per-level": [
        "no",
        "no",
        "lz4",
        "lz4",
        "lz4",
        "zstd",
        "zstd"
      ],
      "write-buffer-size": "128MB",
      "max-write-buffer-number": 5,
      "min-write-buffer-number-to-merge": 1,
      "max-bytes-for-level-base": "512MB",
      "target-file-size-base": "8MB",
      "level0-file-num-compaction-trigger": 4,
      "level0-slowdown-writes-trigger": 20,
      "level0-stop-writes-trigger": 36,
      "max-compaction-bytes": "2GB",
      "compaction-pri": 3,
      "dynamic-level-bytes": true,
      "num-levels": 7,
      "max-bytes-for-level-multiplier": 10,
      "compaction-style": 0,
      "disable-auto-compactions": false,
      "soft-pending-compaction-bytes-limit": "64GB",
      "hard-pending-compaction-bytes-limit": "256GB"
    },
    "writecf": {
      "block-size": "64KB",
      "block-cache-size": "28997MB",
      "disable-block-cache": false,
      "cache-index-and-filter-blocks": true,
      "pin-l0-filter-and-index-blocks": true,
      "use-bloom-filter": true,
      "whole-key-filtering": false,
      "bloom-filter-bits-per-key": 10,
      "block-based-bloom-filter": false,
      "read-amp-bytes-per-bit": 0,
      "compression-per-level": [
        "no",
        "no",
        "lz4",
        "lz4",
        "lz4",
        "zstd",
        "zstd"
      ],
      "write-buffer-size": "128MB",
      "max-write-buffer-number": 5,
      "min-write-buffer-number-to-merge": 1,
      "max-bytes-for-level-base": "512MB",
      "target-file-size-base": "8MB",
      "level0-file-num-compaction-trigger": 4,
      "level0-slowdown-writes-trigger": 20,
      "level0-stop-writes-trigger": 36,
      "max-compaction-bytes": "2GB",
      "compaction-pri": 3,
      "dynamic-level-bytes": true,
      "num-levels": 7,
      "max-bytes-for-level-multiplier": 10,
      "compaction-style": 0,
      "disable-auto-compactions": false,
      "soft-pending-compaction-bytes-limit": "64GB",
      "hard-pending-compaction-bytes-limit": "256GB"
    },
    "lockcf": {
      "block-size": "16KB",
      "block-cache-size": "1GB",
      "disable-block-cache": false,
      "cache-index-and-filter-blocks": true,
      "pin-l0-filter-and-index-blocks": true,
      "use-bloom-filter": true,
      "whole-key-filtering": true,
      "bloom-filter-bits-per-key": 10,
      "block-based-bloom-filter": false,
      "read-amp-bytes-per-bit": 0,
      "compression-per-level": [
        "no",
        "no",
        "no",
        "no",
        "no",
        "no",
        "no"
      ],
      "write-buffer-size": "128MB",
      "max-write-buffer-number": 5,
      "min-write-buffer-number-to-merge": 1,
      "max-bytes-for-level-base": "128MB",
      "target-file-size-base": "8MB",
      "level0-file-num-compaction-trigger": 1,
      "level0-slowdown-writes-trigger": 20,
      "level0-stop-writes-trigger": 36,
      "max-compaction-bytes": "2GB",
      "compaction-pri": 0,
      "dynamic-level-bytes": true,
      "num-levels": 7,
      "max-bytes-for-level-multiplier": 10,
      "compaction-style": 0,
      "disable-auto-compactions": false,
      "soft-pending-compaction-bytes-limit": "64GB",
      "hard-pending-compaction-bytes-limit": "256GB"
    },
    "raftcf": {
      "block-size": "16KB",
      "block-cache-size": "128MB",
      "disable-block-cache": false,
      "cache-index-and-filter-blocks": true,
      "pin-l0-filter-and-index-blocks": true,
      "use-bloom-filter": true,
      "whole-key-filtering": true,
      "bloom-filter-bits-per-key": 10,
      "block-based-bloom-filter": false,
      "read-amp-bytes-per-bit": 0,
      "compression-per-level": [
        "no",
        "no",
        "no",
        "no",
        "no",
        "no",
        "no"
      ],
      "write-buffer-size": "128MB",
      "max-write-buffer-number": 5,
      "min-write-buffer-number-to-merge": 1,
      "max-bytes-for-level-base": "128MB",
      "target-file-size-base": "8MB",
      "level0-file-num-compaction-trigger": 1,
      "level0-slowdown-writes-trigger": 20,
      "level0-stop-writes-trigger": 36,
      "max-compaction-bytes": "2GB",
      "compaction-pri": 0,
      "dynamic-level-bytes": true,
      "num-levels": 7,
      "max-bytes-for-level-multiplier": 10,
      "compaction-style": 0,
      "disable-auto-compactions": false,
      "soft-pending-compaction-bytes-limit": "64GB",
      "hard-pending-compaction-bytes-limit": "256GB"
    }
  },
  "raftdb": {
    "wal-recovery-mode": 2,
    "wal-dir": "",
    "wal-ttl-seconds": 0,
    "wal-size-limit": "0KB",
    "max-total-wal-size": "4GB",
    "max-manifest-file-size": "20MB",
    "create-if-missing": true,
    "max-open-files": 40960,
    "enable-statistics": true,
    "stats-dump-period": "10m",
    "compaction-readahead-size": "0KB",
    "info-log-max-size": "1GB",
    "info-log-roll-time": "0s",
    "info-log-keep-log-file-num": 10,
    "info-log-dir": "",
    "max-sub-compactions": 1,
    "writable-file-max-buffer-size": "1MB",
    "use-direct-io-for-flush-and-compaction": false,
    "enable-pipelined-write": true,
    "allow-concurrent-memtable-write": false,
    "bytes-per-sync": "1MB",
    "wal-bytes-per-sync": "512KB",
    "defaultcf": {
      "block-size": "64KB",
      "block-cache-size": "2GB",
      "disable-block-cache": false,
      "cache-index-and-filter-blocks": true,
      "pin-l0-filter-and-index-blocks": true,
      "use-bloom-filter": false,
      "whole-key-filtering": true,
      "bloom-filter-bits-per-key": 10,
      "block-based-bloom-filter": false,
      "read-amp-bytes-per-bit": 0,
      "compression-per-level": [
        "no",
        "no",
        "lz4",
        "lz4",
        "lz4",
        "zstd",
        "zstd"
      ],
      "write-buffer-size": "128MB",
      "max-write-buffer-number": 5,
      "min-write-buffer-number-to-merge": 1,
      "max-bytes-for-level-base": "512MB",
      "target-file-size-base": "8MB",
      "level0-file-num-compaction-trigger": 4,
      "level0-slowdown-writes-trigger": 20,
      "level0-stop-writes-trigger": 36,
      "max-compaction-bytes": "2GB",
      "compaction-pri": 0,
      "dynamic-level-bytes": true,
      "num-levels": 7,
      "max-bytes-for-level-multiplier": 10,
      "compaction-style": 0,
      "disable-auto-compactions": false,
      "soft-pending-compaction-bytes-limit": "64GB",
      "hard-pending-compaction-bytes-limit": "256GB"
    }
  },
  "security": {
    "ca-path": "",
    "cert-path": "",
    "key-path": ""
  },
  "import": {
    "import-dir": "/tmp/tikv/import",
    "num-threads": 8,
    "num-import-jobs": 8,
    "num-import-sst-jobs": 2,
    "max-prepare-duration": "5m",
    "region-split-size": "512MB",
    "stream-channel-window": 128,
    "max-open-engines": 8,
    "upload-speed-limit": "512MB",
    "min-available-ratio": 0.05
  }
}
2019/08/21 05:40:18.417 ERRO tikv-server.rs:84: Limit("the maximum number of open file descriptors is too small, got 65536, expect greater or equal to 82920")

找到了一个类似的问题

多谢帮助 debug @weekface-PingCAP

我尝试下上面的方法

修改 /usr/lib/systemd/system/docker.service

LimitNOFILE=infinity
LimitNPROC=infinity

LimitNOFILE=1048576
LimitNPROC=1048576

重启 docker 后, tidb 集群部署起来了

@weekface-PingCAP thanks.