Tidb Pod 未正常创建 pd, tikv能正常创建

TiDBer_6Lizki07 · 2022 年10 月 31 日 12:09

部署tidb-cluster， pd, tikv和discouvery都能正常创建出来，但是tidb pod未创建, 没有任何异常日志, events里也没有想过信息，不知道是什么原因

以下是相关的日志

kubectl get pod
NAME READY STATUS RESTARTS AGE
advanced-tidb-discovery-6c65bf49fb-lwgqg 1/1 Running 0 8m38s
advanced-tidb-pd-0 1/1 Running 0 8m38s
advanced-tidb-pd-1 1/1 Running 0 8m38s
advanced-tidb-tikv-0 1/1 Running 0 8m13s
advanced-tidb-tikv-1 1/1 Running 0 8m13s
tidb-controller-manager-75859464b-vb8x8 1/1 Running 0 131m
kubectl get deployment
NAME READY UP-TO-DATE AVAILABLE AGE
advanced-tidb-discovery 1/1 1 1 8m38s
tidb-controller-manager 1/1 1 1 114m
kubectl get tidbclusters -n frs-dev
NAME READY PD STORAGE READY DESIRE TIKV STORAGE READY DESIRE TIDB READY DESIRE AGE
advanced-tidb False pingcap/pd:v6.1.0 10Gi 2 2 50Gi 2 2 2 8m39s
kubectl get statefulsets -n frs-dev
NAME READY AGE
advanced-tidb-pd 2/2 8m39s
advanced-tidb-tikv 2/2 8m14s

以下是tidb-cluster的yaml配置内容


apiVersion: pingcap.com/v1alpha1
kind: TidbCluster
metadata:
  name: advanced-tidb
  namespace: frs-dev

spec:
  #######################
  # Basic Configuration #
  #######################

  ## TiDB cluster version
  version: "v6.1.0"

  ## Time zone of TiDB cluster Pods
  timezone: UTC

  configUpdateStrategy: RollingUpdate

  helper:
    image: alpine:3.16.0
  pvReclaimPolicy: Retain

  nodeSelector:
    project: RECONPLFM

  ## Tolerations are applied to TiDB cluster pods, allowing (but do not require) pods to be scheduled onto nodes with matching taints.
  ## This cluster-level `tolerations` only takes effect when no component-level `tolerations` are set.
  ## e.g. if `pd.tolerations` is not empty, `tolerations` here will be ignored.
  ## Ref: https://kubernetes.io/docs/concepts/configuration/taint-and-toleration/
  tolerations:
    - effect: NoSchedule
      key: RECONPLFM
      operator: Equal
    # value: RECONPLFM

  enableDynamicConfiguration: true

  pd:
    ##########################
    # Basic PD Configuration #
    ##########################

    ## Base image of the component
    baseImage: pingcap/pd

    ## pd-server configuration
    ## Ref: https://docs.pingcap.com/tidb/stable/pd-configuration-file
    config: |
      [dashboard]
        internal-proxy = true

    ## The desired replicas
    replicas: 2

    ## max inprogress failover PD pod counts
    maxFailoverCount: 0

    ## describes the compute resource requirements and limits.
    ## Ref: https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/
    requests:
    #   cpu: 1000m
    #   memory: 1Gi
      storage: 10Gi
    # limits:
    #   cpu: 2000m
    #   memory: 2Gi

    mountClusterClientSecret: true

    ## The storageClassName of the persistent volume for PD data storage.
    storageClassName: "cna-reconplfm-dev-nas"
  tidb:
    ############################
    # Basic TiDB Configuration #
    ############################

    ## Base image of the component
    baseImage: pingcap/tidb

    ## tidb-server Configuration
    ## Ref: https://docs.pingcap.com/tidb/stable/tidb-configuration-file
    config: |
      [performance]
        tcp-keep-alive = true

    ## The desired replicas
    replicas: 2

    ## max inprogress failover TiDB pod counts
    maxFailoverCount: 0

    service:
      type: ClusterIP
    storageClassName: "cna-reconplfm-dev-nas"

  tikv:
    ############################
    # Basic TiKV Configuration #
    ############################

    ## Base image of the component
    baseImage: pingcap/tikv

    ## tikv-server configuration
    ## Ref: https://docs.pingcap.com/tidb/stable/tikv-configuration-file
    config: |
      log-level = "info"

    ## The desired replicas
    replicas: 2

    ## max inprogress failover TiKV pod counts
    maxFailoverCount: 0

    ## describes the compute resource requirements.
    ## Ref: https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/
    requests:
    #   cpu: 1000m
    #   memory: 1Gi
      storage: 50Gi
    mountClusterClientSecret: true
    storageClassName: "cna-reconplfm-dev-nas"

TiDBer_6Lizki07 · 2022 年10 月 31 日 14:20

大概定位到问题点了，通过tidb-controller-manager的log查看提示在等待pd的leader节点选举，然后通过查看集群明细发现有一个tidb-pd节点的health是false,

然后对应查看tidb-pd节点log错误信息为
[ERROR] [etcdutil.go:126] ["load from etcd meet error"] [key=/pd/7160589068699979798/config] [error="[PD:etcd:ErrEtcdKVGet]context deadline exceeded: context deadline exceeded"]

但是不清楚这个错误提示发生的具体原因？

xfworld · 2022 年11 月 1 日 11:00

PD 中的核心组件 ETCD 没正常工作

system · 2022 年12 月 31 日 11:00

此话题已在最后回复的 60 天后被自动关闭。不再允许新回复。