kind下部署TiDB集群缺少TiDB实例的pod

ddhe9527 · 2023 年1 月 11 日 02:01

按照官方文档在kind里面部署TiDB集群，Operator已经部署成功，但是在部署TiDB-Cluster集群时，发现缺少TiDB实例的pod。步骤是参考如下官方文档，一些镜像已经提前加载到K8s集群中了：
https://docs.pingcap.com/zh/tidb-in-kubernetes/stable/get-started#第-3-步部署-tidb-集群和监控

[root@centos7 ~]# kubectl get po -A
NAMESPACE            NAME                                         READY   STATUS    RESTARTS   AGE
kube-system          coredns-565d847f94-cpx9s                     1/1     Running   0          39m
kube-system          coredns-565d847f94-mjfps                     1/1     Running   0          39m
kube-system          etcd-kind-control-plane                      1/1     Running   0          39m
kube-system          kindnet-kjhzx                                1/1     Running   0          39m
kube-system          kube-apiserver-kind-control-plane            1/1     Running   0          39m
kube-system          kube-controller-manager-kind-control-plane   1/1     Running   0          39m
kube-system          kube-proxy-f9rdv                             1/1     Running   0          39m
kube-system          kube-scheduler-kind-control-plane            1/1     Running   0          39m
local-path-storage   local-path-provisioner-684f458cdd-rb8jv      1/1     Running   0          39m
tidb-admin           tidb-controller-manager-86f4899f-xrxwg       1/1     Running   0          37m
tidb-admin           tidb-scheduler-7957d5b4d6-7fj7j              2/2     Running   0          37m
[root@centos7 ~]# docker exec -it adecb14994c2 crictl images
IMAGE                                                    TAG                  IMAGE ID            SIZE
docker.io/grafana/grafana                                7.5.11               6cfe8ab94353f       206MB
docker.io/kindest/kindnetd                               v20221004-44d545d1   d6e3e26021b60       25.8MB
docker.io/kindest/local-path-helper                      v20220607-9a4d8d2a   d2f902e939cc3       2.86MB
docker.io/kindest/local-path-provisioner                 v0.0.22-kind.0       4c1e997385b8f       17.4MB
docker.io/library/alpine                                 3.16.0               e66264b98777e       5.81MB
docker.io/pingcap/advanced-statefulset                   v0.4.0               70c265c22e08e       49.4MB
docker.io/pingcap/pd                                     v6.5.0               69c043a19b5d9       165MB
docker.io/pingcap/tidb-backup-manager                    v1.4.0               fd4dcce8769e5       579MB
docker.io/pingcap/tidb-dashboard                         v6.5.0               e269b8cd23749       267MB
docker.io/pingcap/tidb-monitor-initializer               v6.5.0               dc26054ae594b       6.49MB
docker.io/pingcap/tidb-monitor-reloader                  v1.0.1               912ff2b5e6562       20.7MB
docker.io/pingcap/tidb-operator                          v1.4.0               30563eeb9ca04       298MB
docker.io/pingcap/tidb                                   v6.5.0               500953de794e2       200MB
docker.io/pingcap/tikv                                   v6.5.0               9621b51b12826       551MB
docker.io/prom/prometheus                                v2.27.1              86ea6f86fc575       187MB
k8s.gcr.io/kube-scheduler                                v1.25.3              6d23ec0e8b87e       51.9MB
quay.io/prometheus-operator/prometheus-config-reloader   v0.49.0              ae8e4c9feb781       13.8MB
registry.k8s.io/coredns/coredns                          v1.9.3               5185b96f0becf       14.8MB
registry.k8s.io/etcd                                     3.5.4-0              a8a176a5d5d69       102MB
registry.k8s.io/kube-apiserver                           v1.25.3              4bc1b1e750e34       76.5MB
registry.k8s.io/kube-controller-manager                  v1.25.3              580dca99efc3b       64.5MB
registry.k8s.io/kube-proxy                               v1.25.3              86063cd68dfc9       63.3MB
registry.k8s.io/kube-scheduler                           v1.25.3              5225724a11400       51.9MB
registry.k8s.io/pause                                    3.7                  221177c6082a8       311kB
[root@centos7 ~]# kubectl create namespace tidb-cluster && kubectl -n tidb-cluster apply -f tidb-cluster.yaml 
namespace/tidb-cluster created
tidbcluster.pingcap.com/basic created
[root@centos7 ~]# 
[root@centos7 ~]# kubectl get po -n tidb-cluster
NAME                               READY   STATUS    RESTARTS   AGE
basic-discovery-5db6c75657-wrz6l   1/1     Running   0          36s
basic-pd-0                         1/1     Running   0          36s
basic-tikv-0                       1/1     Running   0          28s
[root@centos7 ~]# 
[root@centos7 ~]# kubectl get po -n tidb-cluster
NAME                               READY   STATUS    RESTARTS   AGE
basic-discovery-5db6c75657-wrz6l   1/1     Running   0          43s
basic-pd-0                         1/1     Running   0          43s
basic-tikv-0                       1/1     Running   0          35s
[root@centos7 ~]# kubectl get TidbCluster -n tidb-cluster basic
NAME    READY   PD                  STORAGE   READY   DESIRE   TIKV   STORAGE   READY   DESIRE   TIDB   READY   DESIRE   AGE
basic   False   pingcap/pd:v6.5.0   1Gi       1       1               1Gi       1       1                       1        62s
[root@centos7 ~]#

tidb-cluster.yaml的内容如下：

# IT IS NOT SUITABLE FOR PRODUCTION USE.
# This YAML describes a basic TiDB cluster with minimum resource requirements,
# which should be able to run in any Kubernetes cluster with storage support.
apiVersion: pingcap.com/v1alpha1
kind: TidbCluster
metadata:
  name: basic
spec:
  version: v6.5.0
  timezone: UTC
  pvReclaimPolicy: Retain
  enableDynamicConfiguration: true
  configUpdateStrategy: RollingUpdate
  discovery: {}
  helper:
    image: alpine:3.16.0
  pd:
    baseImage: pingcap/pd
    maxFailoverCount: 0
    replicas: 1
    # if storageClassName is not set, the default Storage Class of the Kubernetes cluster will be used
    # storageClassName: local-storage
    requests:
      storage: "1Gi"
    config: {}
  tikv:
    baseImage: pingcap/tikv
    maxFailoverCount: 0
    # If only 1 TiKV is deployed, the TiKV region leader 
    # cannot be transferred during upgrade, so we have
    # to configure a short timeout
    evictLeaderTimeout: 1m
    replicas: 1
    # if storageClassName is not set, the default Storage Class of the Kubernetes cluster will be used
    # storageClassName: local-storage
    requests:
      storage: "1Gi"
    config:
      storage:
        # In basic examples, we set this to avoid using too much storage.
        reserve-space: "0MB"
      rocksdb:
        # In basic examples, we set this to avoid the following error in some Kubernetes clusters:
        # "the maximum number of open file descriptors is too small, got 1024, expect greater or equal to 82920"
        max-open-files: 256
      raftdb:
        max-open-files: 256
  tidb:
    baseImage: pingcap/tidb
    maxFailoverCount: 0
    replicas: 1
    service:
      type: ClusterIP
    config: {}

部署完集群后，只有PD和TiKV，没有TiDB，这个怎么排查

[root@centos7 ~]# kubectl get po -n tidb-cluster
NAME                               READY   STATUS    RESTARTS   AGE
basic-discovery-5db6c75657-wrz6l   1/1     Running   0          43s
basic-pd-0                         1/1     Running   0          43s
basic-tikv-0                       1/1     Running   0          35s
[root@centos7 ~]# kubectl get TidbCluster -n tidb-cluster basic
NAME    READY   PD                  STORAGE   READY   DESIRE   TIKV   STORAGE   READY   DESIRE   TIDB   READY   DESIRE   AGE
basic   False   pingcap/pd:v6.5.0   1Gi       1       1               1Gi       1       1                       1        62s

tidb菜鸟一只 · 2023 年1 月 11 日 02:33

kubectl describe TidbCluster -n tidb-cluster basic看一下

ddhe9527 · 2023 年1 月 11 日 02:43

好的，describe输出的内容如下：

[root@centos7 ~]# kubectl describe TidbCluster -n tidb-cluster basic
Name:         basic
Namespace:    tidb-cluster
Labels:       <none>
Annotations:  <none>
API Version:  pingcap.com/v1alpha1
Kind:         TidbCluster
Metadata:
  Creation Timestamp:  2023-01-11T01:55:29Z
  Generation:          8
  Managed Fields:
    API Version:  pingcap.com/v1alpha1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .:
          f:kubectl.kubernetes.io/last-applied-configuration:
      f:spec:
        .:
        f:configUpdateStrategy:
        f:discovery:
        f:enableDynamicConfiguration:
        f:helper:
          .:
          f:image:
        f:imagePullPolicy:
        f:pd:
          .:
          f:baseImage:
          f:maxFailoverCount:
          f:replicas:
          f:requests:
            .:
            f:storage:
        f:pvReclaimPolicy:
        f:tidb:
          .:
          f:baseImage:
          f:maxFailoverCount:
          f:replicas:
          f:service:
            .:
            f:type:
        f:tikv:
          .:
          f:baseImage:
          f:evictLeaderTimeout:
          f:maxFailoverCount:
          f:replicas:
          f:requests:
            .:
            f:storage:
        f:timezone:
        f:version:
    Manager:      kubectl-client-side-apply
    Operation:    Update
    Time:         2023-01-11T01:55:29Z
    API Version:  pingcap.com/v1alpha1
    Fields Type:  FieldsV1
    fieldsV1:
      f:spec:
        f:enablePVReclaim:
        f:pd:
          f:config:
        f:tidb:
          f:config:
        f:tikv:
          f:config:
          f:scalePolicy:
            .:
            f:scaleInParallelism:
            f:scaleOutParallelism:
        f:tlsCluster:
      f:status:
        .:
        f:clusterID:
        f:conditions:
        f:pd:
          .:
          f:image:
          f:leader:
            .:
            f:clientURL:
            f:health:
            f:id:
            f:lastTransitionTime:
            f:name:
          f:members:
            .:
            f:basic-pd-0:
              .:
              f:clientURL:
              f:health:
              f:id:
              f:lastTransitionTime:
              f:name:
          f:phase:
          f:statefulSet:
            .:
            f:collisionCount:
            f:currentReplicas:
            f:currentRevision:
            f:observedGeneration:
            f:readyReplicas:
            f:replicas:
            f:updateRevision:
            f:updatedReplicas:
          f:synced:
          f:volumes:
            .:
            f:pd:
              .:
              f:boundCount:
              f:currentCapacity:
              f:currentCount:
              f:currentStorageClass:
              f:modifiedCapacity:
              f:modifiedCount:
              f:modifiedStorageClass:
              f:name:
              f:resizedCapacity:
              f:resizedCount:
        f:pump:
        f:ticdc:
        f:tidb:
        f:tiflash:
        f:tikv:
          .:
          f:phase:
          f:statefulSet:
            .:
            f:collisionCount:
            f:currentReplicas:
            f:currentRevision:
            f:observedGeneration:
            f:readyReplicas:
            f:replicas:
            f:updateRevision:
            f:updatedReplicas:
          f:synced:
        f:tiproxy:
          .:
          f:proxy:
    Manager:         tidb-controller-manager
    Operation:       Update
    Time:            2023-01-11T01:55:45Z
  Resource Version:  6854
  UID:               6b85a5be-142e-40d5-bde4-415bc5dd02ef
Spec:
  Config Update Strategy:  RollingUpdate
  Discovery:
  Enable Dynamic Configuration:  true
  Enable PV Reclaim:             false
  Helper:
    Image:            alpine:3.16.0
  Image Pull Policy:  IfNotPresent
  Pd:
    Base Image:          pingcap/pd
    Config:              
    Max Failover Count:  0
    Replicas:            1
    Requests:
      Storage:        1Gi
  Pv Reclaim Policy:  Retain
  Tidb:
    Base Image:  pingcap/tidb
    Config:      [log]
  [log.file]
    max-backups = 3

    Max Failover Count:  0
    Replicas:            1
    Service:
      Type:  ClusterIP
  Tikv:
    Base Image:  pingcap/tikv
    Config:      [raftdb]
  max-open-files = 256

[rocksdb]
  max-open-files = 256

[storage]
  reserve-space = "0MB"

    Evict Leader Timeout:  1m
    Max Failover Count:    0
    Replicas:              1
    Requests:
      Storage:  1Gi
    Scale Policy:
      Scale In Parallelism:   1
      Scale Out Parallelism:  1
  Timezone:                   UTC
  Tls Cluster:
  Version:  v6.5.0
Status:
  Cluster ID:  7187207458780878196
  Conditions:
    Last Transition Time:  2023-01-11T01:55:30Z
    Last Update Time:      2023-01-11T01:55:38Z
    Message:               TiKV store(s) are not up
    Reason:                TiKVStoreNotUp
    Status:                False
    Type:                  Ready
  Pd:
    Image:  pingcap/pd:v6.5.0
    Leader:
      Client URL:            http://basic-pd-0.basic-pd-peer.tidb-cluster.svc:2379
      Health:                true
      Id:                    7441053368211532809
      Last Transition Time:  2023-01-11T01:55:38Z
      Name:                  basic-pd-0
    Members:
      basic-pd-0:
        Client URL:            http://basic-pd-0.basic-pd-peer.tidb-cluster.svc:2379
        Health:                true
        Id:                    7441053368211532809
        Last Transition Time:  2023-01-11T01:55:38Z
        Name:                  basic-pd-0
    Phase:                     Normal
    Stateful Set:
      Collision Count:      0
      Current Replicas:     1
      Current Revision:     basic-pd-67fbbc98cf
      Observed Generation:  1
      Ready Replicas:       1
      Replicas:             1
      Update Revision:      basic-pd-67fbbc98cf
      Updated Replicas:     1
    Synced:                 true
    Volumes:
      Pd:
        Bound Count:             1
        Current Capacity:        1Gi
        Current Count:           1
        Current Storage Class:   standard
        Modified Capacity:       1Gi
        Modified Count:          0
        Modified Storage Class:  
        Name:                    pd
        Resized Capacity:        1Gi
        Resized Count:           0
  Pump:
  Ticdc:
  Tidb:
  Tiflash:
  Tikv:
    Phase:  Normal
    Stateful Set:
      Collision Count:      0
      Current Replicas:     1
      Current Revision:     basic-tikv-888479b96
      Observed Generation:  1
      Ready Replicas:       1
      Replicas:             1
      Update Revision:      basic-tikv-888479b96
      Updated Replicas:     1
    Synced:                 true
  Tiproxy:
    Proxy:
Events:
  Type    Reason               Age   From                     Message
  ----    ------               ----  ----                     -------
  Normal  Successfully Create  46m   tidb-controller-manager  create Role/basic-discovery for controller TidbCluster/basic successfully
  Normal  Successfully Create  46m   tidb-controller-manager  create ServiceAccount/basic-discovery for controller TidbCluster/basic successfully
  Normal  Successfully Create  46m   tidb-controller-manager  create RoleBinding/basic-discovery for controller TidbCluster/basic successfully
  Normal  Successfully Create  46m   tidb-controller-manager  create Deployment/basic-discovery for controller TidbCluster/basic successfully
  Normal  Successfully Create  46m   tidb-controller-manager  create Service/basic-discovery for controller TidbCluster/basic successfully
  Normal  SuccessfulCreate     46m   tidb-controller-manager  create Service basic-pd in  basic successful
  Normal  SuccessfulCreate     46m   tidb-controller-manager  create Service basic-pd-peer in  basic successful
  Normal  Successfully Create  46m   tidb-controller-manager  create ConfigMap/basic-pd-6130373 for controller TidbCluster/basic successfully
  Normal  SuccessfulCreate     46m   tidb-controller-manager  create StatefulSet basic-pd in  basic successful
  Normal  SuccessfulPatch      46m   tidb-controller-manager  patch PV pvc-191424dc-d7e0-4d78-bbee-cebd661dfd01 in TidbCluster basic successful
  Normal  SuccessfulCreate     46m   tidb-controller-manager  create Service basic-tikv-peer in  basic successful
  Normal  Successfully Create  46m   tidb-controller-manager  create ConfigMap/basic-tikv-6565313 for controller TidbCluster/basic successfully
  Normal  SuccessfulCreate     46m   tidb-controller-manager  create StatefulSet basic-tikv in  basic successful
  Normal  SuccessfulPatch      46m   tidb-controller-manager  patch PV pvc-8c38b174-c3a9-4e9f-ae31-316a3739b7fd in TidbCluster basic successful

yiduoyunQ · 2023 年1 月 11 日 04:50

kubectl -n tidb-admin logs -f tidb-controller-manager-xxxx 确认下 operator 的日志报错

ddhe9527 · 2023 年1 月 11 日 05:27

operator的日志如下，看起来是TiKV不正常导致的TiDB Pod没有被创建出来

I0111 05:21:09.747911       1 tikv_member_manager.go:938] TiKV of Cluster tidb-cluster/basic is not bootstrapped yet, no need to set store labels
I0111 05:21:09.748348       1 tidb_cluster_controller.go:131] TidbCluster: tidb-cluster/basic, still need sync: TidbCluster: [tidb-cluster/basic], waiting for TiKV cluster running, requeuing
W0111 05:21:39.746429       1 phase.go:69] volume tidb-cluster/pd-basic-pd-0 modification is not allowed: can't change storage class to the default one
I0111 05:21:39.756434       1 tikv_member_manager.go:834] TiKV of Cluster tidb-cluster/basic not bootstrapped yet
I0111 05:21:39.762929       1 tikv_member_manager.go:938] TiKV of Cluster tidb-cluster/basic is not bootstrapped yet, no need to set store labels
I0111 05:21:39.764531       1 tidb_cluster_controller.go:131] TidbCluster: tidb-cluster/basic, still need sync: TidbCluster: [tidb-cluster/basic], waiting for TiKV cluster running, requeuing
W0111 05:22:09.756889       1 phase.go:69] volume tidb-cluster/pd-basic-pd-0 modification is not allowed: can't change storage class to the default one
I0111 05:22:09.766395       1 tikv_member_manager.go:834] TiKV of Cluster tidb-cluster/basic not bootstrapped yet
I0111 05:22:09.773004       1 tikv_member_manager.go:938] TiKV of Cluster tidb-cluster/basic is not bootstrapped yet, no need to set store labels
I0111 05:22:09.773385       1 tidb_cluster_controller.go:131] TidbCluster: tidb-cluster/basic, still need sync: TidbCluster: [tidb-cluster/basic], waiting for TiKV cluster running, requeuing
W0111 05:22:33.624267       1 phase.go:69] volume tidb-cluster/pd-basic-pd-0 modification is not allowed: can't change storage class to the default one
I0111 05:22:33.633341       1 tikv_member_manager.go:834] TiKV of Cluster tidb-cluster/basic not bootstrapped yet
I0111 05:22:33.639904       1 tikv_member_manager.go:938] TiKV of Cluster tidb-cluster/basic is not bootstrapped yet, no need to set store labels
I0111 05:22:33.640331       1 tidb_cluster_controller.go:131] TidbCluster: tidb-cluster/basic, still need sync: TidbCluster: [tidb-cluster/basic], waiting for TiKV cluster running, requeuing
W0111 05:22:39.754556       1 phase.go:69] volume tidb-cluster/pd-basic-pd-0 modification is not allowed: can't change storage class to the default one
I0111 05:22:39.762877       1 tikv_member_manager.go:834] TiKV of Cluster tidb-cluster/basic not bootstrapped yet
I0111 05:22:39.768924       1 tikv_member_manager.go:938] TiKV of Cluster tidb-cluster/basic is not bootstrapped yet, no need to set store labels
I0111 05:22:39.769322       1 tidb_cluster_controller.go:131] TidbCluster: tidb-cluster/basic, still need sync: TidbCluster: [tidb-cluster/basic], waiting for TiKV cluster running, requeuing
W0111 05:23:09.752607       1 phase.go:69] volume tidb-cluster/pd-basic-pd-0 modification is not allowed: can't change storage class to the default one
I0111 05:23:09.763979       1 tikv_member_manager.go:834] TiKV of Cluster tidb-cluster/basic not bootstrapped yet
I0111 05:23:09.773034       1 tikv_member_manager.go:938] TiKV of Cluster tidb-cluster/basic is not bootstrapped yet, no need to set store labels
I0111 05:23:09.773447       1 tidb_cluster_controller.go:131] TidbCluster: tidb-cluster/basic, still need sync: TidbCluster: [tidb-cluster/basic], waiting for TiKV cluster running, requeuing
W0111 05:23:39.749425       1 phase.go:69] volume tidb-cluster/pd-basic-pd-0 modification is not allowed: can't change storage class to the default one
I0111 05:23:39.757056       1 tikv_member_manager.go:834] TiKV of Cluster tidb-cluster/basic not bootstrapped yet
I0111 05:23:39.763204       1 tikv_member_manager.go:938] TiKV of Cluster tidb-cluster/basic is not bootstrapped yet, no need to set store labels
I0111 05:23:39.764064       1 tidb_cluster_controller.go:131] TidbCluster: tidb-cluster/basic, still need sync: TidbCluster: [tidb-cluster/basic], waiting for TiKV cluster running, requeuing

另外，kubectl describe TidbCluster -n tidb-cluster basic命令显示的是TiKVStoreNotUp，看起来是TiKV的问题

Status:
  Cluster ID:  7187207458780878196
  Conditions:
    Last Transition Time:  2023-01-11T01:55:30Z
    Last Update Time:      2023-01-11T01:55:38Z
    Message:               TiKV store(s) are not up
    Reason:                TiKVStoreNotUp
    Status:                False
    Type:                  Ready

TiKV log一直在输出如下内容：

[2023/01/11 05:26:56.864 +00:00] [WARN] [client.rs:163] ["validate PD endpoints failed"] [err="Other(\"[components/pd_client/src/util.rs:593]: PD cluster failed to respond\")"]
[2023/01/11 05:26:57.165 +00:00] [INFO] [util.rs:598] ["connecting to PD endpoint"] [endpoints=http://basic-pd:2379]
[2023/01/11 05:26:59.166 +00:00] [INFO] [util.rs:560] ["PD failed to respond"] [err="Grpc(RpcFailure(RpcStatus { code: 4-DEADLINE_EXCEEDED, message: \"Deadline Exceeded\", details: [] }))"] [endpoints=http://basic-pd:2379]
[2023/01/11 05:26:59.468 +00:00] [INFO] [util.rs:598] ["connecting to PD endpoint"] [endpoints=http://basic-pd:2379]
[2023/01/11 05:27:01.469 +00:00] [INFO] [util.rs:560] ["PD failed to respond"] [err="Grpc(RpcFailure(RpcStatus { code: 4-DEADLINE_EXCEEDED, message: \"Deadline Exceeded\", details: [] }))"] [endpoints=http://basic-pd:2379]
[2023/01/11 05:27:01.771 +00:00] [INFO] [util.rs:598] ["connecting to PD endpoint"] [endpoints=http://basic-pd:2379]
[2023/01/11 05:27:03.772 +00:00] [INFO] [util.rs:560] ["PD failed to respond"] [err="Grpc(RpcFailure(RpcStatus { code: 4-DEADLINE_EXCEEDED, message: \"Deadline Exceeded\", details: [] }))"] [endpoints=http://basic-pd:2379]
[2023/01/11 05:27:04.081 +00:00] [INFO] [util.rs:598] ["connecting to PD endpoint"] [endpoints=http://basic-pd:2379]
[2023/01/11 05:27:06.081 +00:00] [INFO] [util.rs:560] ["PD failed to respond"] [err="Grpc(RpcFailure(RpcStatus { code: 4-DEADLINE_EXCEEDED, message: \"Deadline Exceeded\", details: [] }))"] [endpoints=http://basic-pd:2379]
[2023/01/11 05:27:06.383 +00:00] [INFO] [util.rs:598] ["connecting to PD endpoint"] [endpoints=http://basic-pd:2379]
[2023/01/11 05:27:08.388 +00:00] [INFO] [util.rs:560] ["PD failed to respond"] [err="Grpc(RpcFailure(RpcStatus { code: 4-DEADLINE_EXCEEDED, message: \"Deadline Exceeded\", details: [] }))"] [endpoints=http://basic-pd:2379]
[2023/01/11 05:27:08.691 +00:00] [INFO] [util.rs:598] ["connecting to PD endpoint"] [endpoints=http://basic-pd:2379]
[2023/01/11 05:27:10.693 +00:00] [INFO] [util.rs:560] ["PD failed to respond"] [err="Grpc(RpcFailure(RpcStatus { code: 4-DEADLINE_EXCEEDED, message: \"Deadline Exceeded\", details: [] }))"] [endpoints=http://basic-pd:2379]
[2023/01/11 05:27:10.993 +00:00] [INFO] [util.rs:598] ["connecting to PD endpoint"] [endpoints=http://basic-pd:2379]
[2023/01/11 05:27:12.995 +00:00] [INFO] [util.rs:560] ["PD failed to respond"] [err="Grpc(RpcFailure(RpcStatus { code: 4-DEADLINE_EXCEEDED, message: \"Deadline Exceeded\", details: [] }))"] [endpoints=http://basic-pd:2379]
[2023/01/11 05:27:13.296 +00:00] [INFO] [util.rs:598] ["connecting to PD endpoint"] [endpoints=http://basic-pd:2379]
[2023/01/11 05:27:15.297 +00:00] [INFO] [util.rs:560] ["PD failed to respond"] [err="Grpc(RpcFailure(RpcStatus { code: 4-DEADLINE_EXCEEDED, message: \"Deadline Exceeded\", details: [] }))"] [endpoints=http://basic-pd:2379]
[2023/01/11 05:27:15.599 +00:00] [INFO] [util.rs:598] ["connecting to PD endpoint"] [endpoints=http://basic-pd:2379]
[2023/01/11 05:27:17.600 +00:00] [INFO] [util.rs:560] ["PD failed to respond"] [err="Grpc(RpcFailure(RpcStatus { code: 4-DEADLINE_EXCEEDED, message: \"Deadline Exceeded\", details: [] }))"] [endpoints=http://basic-pd:2379]
[2023/01/11 05:27:17.904 +00:00] [INFO] [util.rs:598] ["connecting to PD endpoint"] [endpoints=http://basic-pd:2379]

但是从TiKV容器里面测试到http://basic-pd:2379又是通的

[root@centos7 ~]# kubectl exec -it basic-tikv-0 -n tidb-cluster sh
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
/ # wget http://basic-pd:2379
Connecting to basic-pd:2379 (10.96.252.132:2379)
wget: server returned error: HTTP/1.1 404 Not Found
/ #

ddhe9527 · 2023 年1 月 11 日 05:41

pd服务的状态如下：

[root@centos7 ~]# kubectl get svc -A
NAMESPACE      NAME              TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)                  AGE
default        kubernetes        ClusterIP   10.96.0.1       <none>        443/TCP                  4h23m
kube-system    kube-dns          ClusterIP   10.96.0.10      <none>        53/UDP,53/TCP,9153/TCP   4h23m
tidb-cluster   basic-discovery   ClusterIP   10.96.23.151    <none>        10261/TCP,10262/TCP      3h43m
tidb-cluster   basic-pd          ClusterIP   10.96.252.132   <none>        2379/TCP                 3h43m
tidb-cluster   basic-pd-peer     ClusterIP   None            <none>        2380/TCP,2379/TCP        3h43m
tidb-cluster   basic-tikv-peer   ClusterIP   None            <none>        20160/TCP                3h43m
[root@centos7 ~]# 
[root@centos7 ~]# 
[root@centos7 ~]# 
[root@centos7 ~]# kubectl describe svc -n tidb-cluster basic-pd
Name:              basic-pd
Namespace:         tidb-cluster
Labels:            app.kubernetes.io/component=pd
                   app.kubernetes.io/instance=basic
                   app.kubernetes.io/managed-by=tidb-operator
                   app.kubernetes.io/name=tidb-cluster
                   app.kubernetes.io/used-by=end-user
Annotations:       pingcap.com/last-applied-configuration:
                     {"ports":[{"name":"client","protocol":"TCP","port":2379,"targetPort":2379}],"selector":{"app.kubernetes.io/component":"pd","app.kubernetes...
Selector:          app.kubernetes.io/component=pd,app.kubernetes.io/instance=basic,app.kubernetes.io/managed-by=tidb-operator,app.kubernetes.io/name=tidb-cluster
Type:              ClusterIP
IP Family Policy:  SingleStack
IP Families:       IPv4
IP:                10.96.252.132
IPs:               10.96.252.132
Port:              client  2379/TCP
TargetPort:        2379/TCP
Endpoints:         10.244.0.23:2379
Session Affinity:  None
Events:            <none>
[root@centos7 ~]# 
[root@centos7 ~]# 
[root@centos7 ~]# 
[root@centos7 ~]# kubectl describe svc -n tidb-cluster basic-pd-peer
Name:              basic-pd-peer
Namespace:         tidb-cluster
Labels:            app.kubernetes.io/component=pd
                   app.kubernetes.io/instance=basic
                   app.kubernetes.io/managed-by=tidb-operator
                   app.kubernetes.io/name=tidb-cluster
                   app.kubernetes.io/used-by=peer
Annotations:       pingcap.com/last-applied-configuration:
                     {"ports":[{"name":"tcp-peer-2380","protocol":"TCP","port":2380,"targetPort":2380},{"name":"tcp-peer-2379","protocol":"TCP","port":2379,"ta...
Selector:          app.kubernetes.io/component=pd,app.kubernetes.io/instance=basic,app.kubernetes.io/managed-by=tidb-operator,app.kubernetes.io/name=tidb-cluster
Type:              ClusterIP
IP Family Policy:  SingleStack
IP Families:       IPv4
IP:                None
IPs:               None
Port:              tcp-peer-2380  2380/TCP
TargetPort:        2380/TCP
Endpoints:         10.244.0.23:2380
Port:              tcp-peer-2379  2379/TCP
TargetPort:        2379/TCP
Endpoints:         10.244.0.23:2379
Session Affinity:  None
Events:            <none>

ffeenn · 2023 年1 月 11 日 08:39

这不是问题原因？

ddhe9527 · 2023 年1 月 11 日 08:53

是TiKV和PD的通信问题，但是从TiKV Pod测试到PD:2379是通的

ffeenn · 2023 年1 月 11 日 09:21

可以看看 tikv的日志。kubectl -n ${namespace} logs -f ${pod_name}
Kubernetes 上的 TiDB 常见部署错误 | PingCAP Docs

ffeenn · 2023 年1 月 11 日 09:23

你存储这么做得？

yiduoyunQ · 2023 年1 月 12 日 01:18

先试下把 tc 改正确吧，若是 1 个 tikv 配置一下 https://docs.pingcap.com/zh/tidb/dev/pd-configuration-file#max-replicas

ddhe9527 · 2023 年1 月 12 日 01:37

已解决，应该是环境的问题。我重启了操作系统，在kind里重新创建了K8s集群，然后按照文档操作又正常了。具体原因未知。感谢

system · 2023 年3 月 13 日 01:38

此话题已在最后回复的 60 天后被自动关闭。不再允许新回复。