使用tidb-operation部署tidb时pd一直处于pending状态

为提高效率,提问时请提供以下信息,问题描述清晰可优先响应。

  • 【TiDB 版本】:v3.0.5
  • 【问题描述】:k8s是v1.17.3版本,操作系统centos7.5,有两个node节点,所以部署tidb是将pd和tikv的pod都设置成了1,现在的问题是pd一直处于pending状态。 [root@master tidb-cluster]# kubectl describe pod tidb-cluster-pd-0 -n tidb-cluster Name: tidb-cluster-pd-0 Namespace: tidb-cluster Priority: 0 Node: Labels: app.kubernetes.io/component=pd app.kubernetes.io/instance=tidb-cluster app.kubernetes.io/managed-by=tidb-operator app.kubernetes.io/name=tidb-cluster controller-revision-hash=tidb-cluster-pd-5d57f87b9d statefulset.kubernetes.io/pod-name=tidb-cluster-pd-0 Annotations: pingcap.com/last-applied-configuration: {“volumes”:[{“name”:“annotations”,“downwardAPI”:{“items”:[{“path”:“annotations”,“fieldRef”:{“fieldPath”:“metadata.annotations”}}]}},{"name… prometheus.io/path: /metrics prometheus.io/port: 2379 prometheus.io/scrape: true runmode: debug Status: Pending IP:
    IPs: Controlled By: StatefulSet/tidb-cluster-pd Containers: pd: Image: pingcap/pd:v3.0.5 Ports: 2380/TCP, 2379/TCP Host Ports: 0/TCP, 0/TCP Command: /bin/sh /usr/local/bin/pd_start_script.sh Environment: NAMESPACE: tidb-cluster (v1:metadata.namespace) PEER_SERVICE_NAME: tidb-cluster-pd-peer SERVICE_NAME: tidb-cluster-pd SET_NAME: tidb-cluster-pd TZ: UTC Mounts: /etc/pd from config (ro) /etc/podinfo from annotations (ro) /usr/local/bin from startup-script (ro) /var/lib/pd from pd (rw) /var/run/secrets/kubernetes.io/serviceaccount from default-token-cv85k (ro) Volumes: pd: Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace) ClaimName: pd-tidb-cluster-pd-0 ReadOnly: false annotations: Type: DownwardAPI (a volume populated by information about the pod) Items: metadata.annotations -> annotations config: Type: ConfigMap (a volume populated by a ConfigMap) Name: tidb-cluster-pd-aa6df71f Optional: false startup-script: Type: ConfigMap (a volume populated by a ConfigMap) Name: tidb-cluster-pd-aa6df71f Optional: false default-token-cv85k: Type: Secret (a volume populated by a Secret) SecretName: default-token-cv85k Optional: false QoS Class: BestEffort Node-Selectors: Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s node.kubernetes.io/unreachable:NoExecute for 300s Events:

若提问为性能优化、故障排查类问题,请下载脚本运行。终端输出的打印结果,请务必全选并复制粘贴上传。

可以用 kubectl describe nodes <node -name> 查看两个 node 的状态,检查是否有报错

[root@master tidb-cluster]# kubectl describe nodes node1 Name: node1 Roles: Labels: beta.kubernetes.io/arch=amd64 beta.kubernetes.io/os=linux kubernetes.io/arch=amd64 kubernetes.io/hostname=node1 kubernetes.io/os=linux Annotations: flannel.alpha.coreos.com/backend-data: {“VtepMAC”:“be:3b:19:d9:ac:ba”} flannel.alpha.coreos.com/backend-type: vxlan flannel.alpha.coreos.com/kube-subnet-manager: true flannel.alpha.coreos.com/public-ip: 172.30.0.154 kubeadm.alpha.kubernetes.io/cri-socket: /var/run/dockershim.sock node.alpha.kubernetes.io/ttl: 0 volumes.kubernetes.io/controller-managed-attach-detach: true CreationTimestamp: Sun, 01 Mar 2020 12:00:41 +0800 Taints: Unschedulable: false Lease: HolderIdentity: node1 AcquireTime: RenewTime: Tue, 17 Mar 2020 11:30:38 +0800 Conditions: Type Status LastHeartbeatTime LastTransitionTime Reason Message


MemoryPressure False Tue, 17 Mar 2020 11:26:46 +0800 Sun, 01 Mar 2020 12:00:41 +0800 KubeletHasSufficientMemory kubelet has sufficient memory available DiskPressure False Tue, 17 Mar 2020 11:26:46 +0800 Sun, 01 Mar 2020 12:00:41 +0800 KubeletHasNoDiskPressure kubelet has no disk pressure PIDPressure False Tue, 17 Mar 2020 11:26:46 +0800 Sun, 01 Mar 2020 12:00:41 +0800 KubeletHasSufficientPID kubelet has sufficient PID available Ready True Tue, 17 Mar 2020 11:26:46 +0800 Sun, 01 Mar 2020 12:01:42 +0800 KubeletReady kubelet is posting ready status Addresses: InternalIP: 10.7.10.154 Hostname: node1 Capacity: cpu: 8 ephemeral-storage: 51175Mi hugepages-2Mi: 0 memory: 20607492Ki pods: 110 Allocatable: cpu: 8 ephemeral-storage: 48294789041 hugepages-2Mi: 0 memory: 20505092Ki pods: 110 System Info: Machine ID: b2d71c4f4af44ca09bbd58f3a38cb0ae System UUID: 564DB120-B5BF-C553-92B9-9CDE4A16197C Boot ID: 1442ebad-b76b-44c5-b973-1df392753eb7 Kernel Version: 3.10.0-862.el7.x86_64 OS Image: CentOS Linux 7 (Core) Operating System: linux Architecture: amd64 Container Runtime Version: docker://19.3.6 Kubelet Version: v1.17.3 Kube-Proxy Version: v1.17.3 PodCIDR: 10.244.1.0/24 PodCIDRs: 10.244.1.0/24 Non-terminated Pods: (7 in total) Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits AGE


default itswk-deployment-f96f5f7b4-k7lbt 0 (0%) 0 (0%) 0 (0%) 0 (0%) 3d13h kube-system kube-flannel-ds-amd64-nxwfd 100m (1%) 100m (1%) 50Mi (0%) 50Mi (0%) 15d kube-system kube-proxy-vwtsh 0 (0%) 0 (0%) 0 (0%) 0 (0%) 15d kube-system local-volume-provisioner-rnjz8 100m (1%) 100m (1%) 100Mi (0%) 100Mi (0%) 115m kube-system tiller-deploy-6d8dfbb696-89bvl 0 (0%) 0 (0%) 0 (0%) 0 (0%) 5d17h kubernetes-dashboard kubernetes-dashboard-866f987876-l9s7d 0 (0%) 0 (0%) 0 (0%) 0 (0%) 12d tidb-cluster tidb-cluster-discovery-77d9b8d8b9-8gszb 80m (1%) 250m (3%) 50Mi (0%) 150Mi (0%) 70m Allocated resources: (Total limits may be over 100 percent, i.e., overcommitted.) Resource Requests Limits


cpu 280m (3%) 450m (5%) memory 200Mi (0%) 300Mi (1%) ephemeral-storage 0 (0%) 0 (0%) Events: [root@master tidb-cluster]# clear [root@master tidb-cluster]# kubectl describe nodes node1 Name: node1 Roles: Labels: beta.kubernetes.io/arch=amd64 beta.kubernetes.io/os=linux kubernetes.io/arch=amd64 kubernetes.io/hostname=node1 kubernetes.io/os=linux Annotations: flannel.alpha.coreos.com/backend-data: {“VtepMAC”:“be:3b:19:d9:ac:ba”} flannel.alpha.coreos.com/backend-type: vxlan flannel.alpha.coreos.com/kube-subnet-manager: true flannel.alpha.coreos.com/public-ip: 172.30.0.154 kubeadm.alpha.kubernetes.io/cri-socket: /var/run/dockershim.sock node.alpha.kubernetes.io/ttl: 0 volumes.kubernetes.io/controller-managed-attach-detach: true CreationTimestamp: Sun, 01 Mar 2020 12:00:41 +0800 Taints: Unschedulable: false Lease: HolderIdentity: node1 AcquireTime: RenewTime: Tue, 17 Mar 2020 11:31:38 +0800 Conditions: Type Status LastHeartbeatTime LastTransitionTime Reason Message


MemoryPressure False Tue, 17 Mar 2020 11:26:46 +0800 Sun, 01 Mar 2020 12:00:41 +0800 KubeletHasSufficientMemory kubelet has sufficient memory available DiskPressure False Tue, 17 Mar 2020 11:26:46 +0800 Sun, 01 Mar 2020 12:00:41 +0800 KubeletHasNoDiskPressure kubelet has no disk pressure PIDPressure False Tue, 17 Mar 2020 11:26:46 +0800 Sun, 01 Mar 2020 12:00:41 +0800 KubeletHasSufficientPID kubelet has sufficient PID available Ready True Tue, 17 Mar 2020 11:26:46 +0800 Sun, 01 Mar 2020 12:01:42 +0800 KubeletReady kubelet is posting ready status Addresses: InternalIP: 10.7.10.154 Hostname: node1 Capacity: cpu: 8 ephemeral-storage: 51175Mi hugepages-2Mi: 0 memory: 20607492Ki pods: 110 Allocatable: cpu: 8 ephemeral-storage: 48294789041 hugepages-2Mi: 0 memory: 20505092Ki pods: 110 System Info: Machine ID: b2d71c4f4af44ca09bbd58f3a38cb0ae System UUID: 564DB120-B5BF-C553-92B9-9CDE4A16197C Boot ID: 1442ebad-b76b-44c5-b973-1df392753eb7 Kernel Version: 3.10.0-862.el7.x86_64 OS Image: CentOS Linux 7 (Core) Operating System: linux Architecture: amd64 Container Runtime Version: docker://19.3.6 Kubelet Version: v1.17.3 Kube-Proxy Version: v1.17.3 PodCIDR: 10.244.1.0/24 PodCIDRs: 10.244.1.0/24 Non-terminated Pods: (7 in total) Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits AGE


default itswk-deployment-f96f5f7b4-k7lbt 0 (0%) 0 (0%) 0 (0%) 0 (0%) 3d14h kube-system kube-flannel-ds-amd64-nxwfd 100m (1%) 100m (1%) 50Mi (0%) 50Mi (0%) 15d kube-system kube-proxy-vwtsh 0 (0%) 0 (0%) 0 (0%) 0 (0%) 15d kube-system local-volume-provisioner-rnjz8 100m (1%) 100m (1%) 100Mi (0%) 100Mi (0%) 116m kube-system tiller-deploy-6d8dfbb696-89bvl 0 (0%) 0 (0%) 0 (0%) 0 (0%) 5d17h kubernetes-dashboard kubernetes-dashboard-866f987876-l9s7d 0 (0%) 0 (0%) 0 (0%) 0 (0%) 12d tidb-cluster tidb-cluster-discovery-77d9b8d8b9-8gszb 80m (1%) 250m (3%) 50Mi (0%) 150Mi (0%) 71m Allocated resources: (Total limits may be over 100 percent, i.e., overcommitted.) Resource Requests Limits


cpu 280m (3%) 450m (5%) memory 200Mi (0%) 300Mi (1%) ephemeral-storage 0 (0%) 0 (0%) Events:

[root@master tidb-cluster]# kubectl describe nodes node1 Name: node1 Roles: Labels: beta.kubernetes.io/arch=amd64 beta.kubernetes.io/os=linux kubernetes.io/arch=amd64 kubernetes.io/hostname=node1 kubernetes.io/os=linux Annotations: flannel.alpha.coreos.com/backend-data: {“VtepMAC”:“be:3b:19:d9:ac:ba”} flannel.alpha.coreos.com/backend-type: vxlan flannel.alpha.coreos.com/kube-subnet-manager: true flannel.alpha.coreos.com/public-ip: 172.30.0.154 kubeadm.alpha.kubernetes.io/cri-socket: /var/run/dockershim.sock node.alpha.kubernetes.io/ttl: 0 volumes.kubernetes.io/controller-managed-attach-detach: true CreationTimestamp: Sun, 01 Mar 2020 12:00:41 +0800 Taints: Unschedulable: false Lease: HolderIdentity: node1 AcquireTime: RenewTime: Tue, 17 Mar 2020 11:31:38 +0800 Conditions: Type Status LastHeartbeatTime LastTransitionTime Reason Message


MemoryPressure False Tue, 17 Mar 2020 11:26:46 +0800 Sun, 01 Mar 2020 12:00:41 +0800 KubeletHasSufficientMemory kubelet has sufficient memory available DiskPressure False Tue, 17 Mar 2020 11:26:46 +0800 Sun, 01 Mar 2020 12:00:41 +0800 KubeletHasNoDiskPressure kubelet has no disk pressure PIDPressure False Tue, 17 Mar 2020 11:26:46 +0800 Sun, 01 Mar 2020 12:00:41 +0800 KubeletHasSufficientPID kubelet has sufficient PID available Ready True Tue, 17 Mar 2020 11:26:46 +0800 Sun, 01 Mar 2020 12:01:42 +0800 KubeletReady kubelet is posting ready status Addresses: InternalIP: 10.7.10.154 Hostname: node1 Capacity: cpu: 8 ephemeral-storage: 51175Mi hugepages-2Mi: 0 memory: 20607492Ki pods: 110 Allocatable: cpu: 8 ephemeral-storage: 48294789041 hugepages-2Mi: 0 memory: 20505092Ki pods: 110 System Info: Machine ID: b2d71c4f4af44ca09bbd58f3a38cb0ae System UUID: 564DB120-B5BF-C553-92B9-9CDE4A16197C Boot ID: 1442ebad-b76b-44c5-b973-1df392753eb7 Kernel Version: 3.10.0-862.el7.x86_64 OS Image: CentOS Linux 7 (Core) Operating System: linux Architecture: amd64 Container Runtime Version: docker://19.3.6 Kubelet Version: v1.17.3 Kube-Proxy Version: v1.17.3 PodCIDR: 10.244.1.0/24 PodCIDRs: 10.244.1.0/24 Non-terminated Pods: (7 in total) Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits AGE


default itswk-deployment-f96f5f7b4-k7lbt 0 (0%) 0 (0%) 0 (0%) 0 (0%) 3d14h kube-system kube-flannel-ds-amd64-nxwfd 100m (1%) 100m (1%) 50Mi (0%) 50Mi (0%) 15d kube-system kube-proxy-vwtsh 0 (0%) 0 (0%) 0 (0%) 0 (0%) 15d kube-system local-volume-provisioner-rnjz8 100m (1%) 100m (1%) 100Mi (0%) 100Mi (0%) 116m kube-system tiller-deploy-6d8dfbb696-89bvl 0 (0%) 0 (0%) 0 (0%) 0 (0%) 5d17h kubernetes-dashboard kubernetes-dashboard-866f987876-l9s7d 0 (0%) 0 (0%) 0 (0%) 0 (0%) 12d tidb-cluster tidb-cluster-discovery-77d9b8d8b9-8gszb 80m (1%) 250m (3%) 50Mi (0%) 150Mi (0%) 71m Allocated resources: (Total limits may be over 100 percent, i.e., overcommitted.) Resource Requests Limits


cpu 280m (3%) 450m (5%) memory 200Mi (0%) 300Mi (1%) ephemeral-storage 0 (0%) 0 (0%) Events: [root@master tidb-cluster]# kubectl describe nodes node2 Name: node2 Roles: Labels: beta.kubernetes.io/arch=amd64 beta.kubernetes.io/os=linux kubernetes.io/arch=amd64 kubernetes.io/hostname=node2 kubernetes.io/os=linux Annotations: flannel.alpha.coreos.com/backend-data: {“VtepMAC”:“82:2b:bd:c0:de:12”} flannel.alpha.coreos.com/backend-type: vxlan flannel.alpha.coreos.com/kube-subnet-manager: true flannel.alpha.coreos.com/public-ip: 172.30.0.153 kubeadm.alpha.kubernetes.io/cri-socket: /var/run/dockershim.sock node.alpha.kubernetes.io/ttl: 0 volumes.kubernetes.io/controller-managed-attach-detach: true CreationTimestamp: Sun, 01 Mar 2020 12:05:11 +0800 Taints: Unschedulable: false Lease: HolderIdentity: node2 AcquireTime: RenewTime: Tue, 17 Mar 2020 11:32:18 +0800 Conditions: Type Status LastHeartbeatTime LastTransitionTime Reason Message


MemoryPressure False Tue, 17 Mar 2020 11:29:43 +0800 Sun, 01 Mar 2020 12:05:11 +0800 KubeletHasSufficientMemory kubelet has sufficient memory available DiskPressure False Tue, 17 Mar 2020 11:29:43 +0800 Sun, 01 Mar 2020 12:05:11 +0800 KubeletHasNoDiskPressure kubelet has no disk pressure PIDPressure False Tue, 17 Mar 2020 11:29:43 +0800 Sun, 01 Mar 2020 12:05:11 +0800 KubeletHasSufficientPID kubelet has sufficient PID available Ready True Tue, 17 Mar 2020 11:29:43 +0800 Sun, 01 Mar 2020 12:06:21 +0800 KubeletReady kubelet is posting ready status Addresses: InternalIP: 10.7.10.153 Hostname: node2 Capacity: cpu: 8 ephemeral-storage: 51175Mi hugepages-2Mi: 0 memory: 21023224Ki pods: 110 Allocatable: cpu: 8 ephemeral-storage: 48294789041 hugepages-2Mi: 0 memory: 20920824Ki pods: 110 System Info: Machine ID: dfca0fed1f434c8f84b13a9f7bc1d192 System UUID: 564D0CDA-810D-8049-1A0C-9F1A2579A638 Boot ID: 4d2a0d98-8c3b-48ab-bf69-9d85906844f2 Kernel Version: 3.10.0-862.el7.x86_64 OS Image: CentOS Linux 7 (Core) Operating System: linux Architecture: amd64 Container Runtime Version: docker://19.3.6 Kubelet Version: v1.17.3 Kube-Proxy Version: v1.17.3 PodCIDR: 10.244.2.0/24 PodCIDRs: 10.244.2.0/24 Non-terminated Pods: (7 in total) Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits AGE


kube-system kube-flannel-ds-amd64-t5285 100m (1%) 100m (1%) 50Mi (0%) 50Mi (0%) 15d kube-system kube-proxy-82ncw 0 (0%) 0 (0%) 0 (0%) 0 (0%) 15d kube-system local-volume-provisioner-wlkb2 100m (1%) 100m (1%) 100Mi (0%) 100Mi (0%) 117m kubernetes-dashboard dashboard-metrics-scraper-7b8b58dc8b-g58tj 0 (0%) 0 (0%) 0 (0%) 0 (0%) 12d tidb-admin tidb-controller-manager-5574fbbfb9-zlcmc 80m (1%) 250m (3%) 50Mi (0%) 150Mi (0%) 88m tidb-admin tidb-scheduler-86d9dbf948-9v94g 160m (2%) 500m (6%) 100Mi (0%) 300Mi (1%) 88m tidb-cluster tidb-cluster-monitor-5d5fc8d8c6-fnwbv 0 (0%) 0 (0%) 0 (0%) 0 (0%) 72m Allocated resources: (Total limits may be over 100 percent, i.e., overcommitted.) Resource Requests Limits


cpu 440m (5%) 950m (11%) memory 300Mi (1%) 600Mi (2%) ephemeral-storage 0 (0%) 0 (0%) Events:

@kimi 你是用的 TiDB Operator 版本是什么?

麻烦再提供下这些信息:

  • 创建集群的 values.yaml
  • kubectl get pv
  • kubectl get pvc
  • tidb-scheduler 这个 Pod 里面两个容器的日志

也可以按照这个文档诊断下:https://pingcap.com/docs-cn/stable/tidb-in-kubernetes/troubleshoot/#pod-处于-pending-状态

TiDB Operator 版本是v1.0.3

kube-schedule.rar (77.8 KB)

您好: 您的问题我们正在分析,会尽快答复,多谢。

@kimi

K8s v1.16 以后额外增加了一些 RBAC rules,详细需要增加的 rules 可以参考这里:https://github.com/pingcap/tidb-operator/issues/1281#issuecomment-561520818。

我们已经在 v1.0.6 上修复了这个问题:https://github.com/pingcap/tidb-operator/pull/1282 升级到 v1.0.6 就可以了。问题答案更新了下,@kimi

好的,我已经重新在安装v1.0.6了,正在进行中。。。。。。

换成v1.0.6 已经好了

是的,v1.0.6 上修复了这个问题。tidb-scheduler 直接绑定了 system:volume-scheduler 这个 CLusterRole 解决了。

多谢:rose::rose::rose:

:+1: