使用tidb-operation部署tidb时pd一直处于pending状态

为提高效率,提问时请提供以下信息,问题描述清晰可优先响应。

  • 【TiDB 版本】:v3.0.5
  • 【问题描述】:k8s是v1.17.3版本,操作系统centos7.5,有两个node节点,所以部署tidb是将pd和tikv的pod都设置成了1,现在的问题是pd一直处于pending状态。
    image
    [root@master tidb-cluster]# kubectl describe pod tidb-cluster-pd-0 -n tidb-cluster
    Name: tidb-cluster-pd-0
    Namespace: tidb-cluster
    Priority: 0
    Node:
    Labels: app.kubernetes.io/component=pd
    app.kubernetes.io/instance=tidb-cluster
    app.kubernetes.io/managed-by=tidb-operator
    app.kubernetes.io/name=tidb-cluster
    controller-revision-hash=tidb-cluster-pd-5d57f87b9d
    statefulset.kubernetes.io/pod-name=tidb-cluster-pd-0
    Annotations: pingcap.com/last-applied-configuration:
    {“volumes”:[{“name”:“annotations”,“downwardAPI”:{“items”:[{“path”:“annotations”,“fieldRef”:{“fieldPath”:“metadata.annotations”}}]}},{"name…
    prometheus.io/path: /metrics
    prometheus.io/port: 2379
    prometheus.io/scrape: true
    runmode: debug
    Status: Pending
    IP:
    IPs:
    Controlled By: StatefulSet/tidb-cluster-pd
    Containers:
    pd:
    Image: pingcap/pd:v3.0.5
    Ports: 2380/TCP, 2379/TCP
    Host Ports: 0/TCP, 0/TCP
    Command:
    /bin/sh
    /usr/local/bin/pd_start_script.sh
    Environment:
    NAMESPACE: tidb-cluster (v1:metadata.namespace)
    PEER_SERVICE_NAME: tidb-cluster-pd-peer
    SERVICE_NAME: tidb-cluster-pd
    SET_NAME: tidb-cluster-pd
    TZ: UTC
    Mounts:
    /etc/pd from config (ro)
    /etc/podinfo from annotations (ro)
    /usr/local/bin from startup-script (ro)
    /var/lib/pd from pd (rw)
    /var/run/secrets/kubernetes.io/serviceaccount from default-token-cv85k (ro)
    Volumes:
    pd:
    Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName: pd-tidb-cluster-pd-0
    ReadOnly: false
    annotations:
    Type: DownwardAPI (a volume populated by information about the pod)
    Items:
    metadata.annotations → annotations
    config:
    Type: ConfigMap (a volume populated by a ConfigMap)
    Name: tidb-cluster-pd-aa6df71f
    Optional: false
    startup-script:
    Type: ConfigMap (a volume populated by a ConfigMap)
    Name: tidb-cluster-pd-aa6df71f
    Optional: false
    default-token-cv85k:
    Type: Secret (a volume populated by a Secret)
    SecretName: default-token-cv85k
    Optional: false
    QoS Class: BestEffort
    Node-Selectors:
    Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
    node.kubernetes.io/unreachable:NoExecute for 300s
    Events:

若提问为性能优化、故障排查类问题,请下载脚本运行。终端输出的打印结果,请务必全选并复制粘贴上传。

可以用 kubectl describe nodes <node -name> 查看两个 node 的状态,检查是否有报错

[root@master tidb-cluster]# kubectl describe nodes node1
Name: node1
Roles:
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/os=linux
kubernetes.io/arch=amd64
kubernetes.io/hostname=node1
kubernetes.io/os=linux
Annotations: flannel.alpha.coreos.com/backend-data: {“VtepMAC”:“be:3b:19:d9:ac:ba”}
flannel.alpha.coreos.com/backend-type: vxlan
flannel.alpha.coreos.com/kube-subnet-manager: true
flannel.alpha.coreos.com/public-ip: 172.30.0.154
kubeadm.alpha.kubernetes.io/cri-socket: /var/run/dockershim.sock
node.alpha.kubernetes.io/ttl: 0
volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp: Sun, 01 Mar 2020 12:00:41 +0800
Taints:
Unschedulable: false
Lease:
HolderIdentity: node1
AcquireTime:
RenewTime: Tue, 17 Mar 2020 11:30:38 +0800
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message


MemoryPressure False Tue, 17 Mar 2020 11:26:46 +0800 Sun, 01 Mar 2020 12:00:41 +0800 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Tue, 17 Mar 2020 11:26:46 +0800 Sun, 01 Mar 2020 12:00:41 +0800 KubeletHasNoDiskPressure kubelet has no disk pressure
PIDPressure False Tue, 17 Mar 2020 11:26:46 +0800 Sun, 01 Mar 2020 12:00:41 +0800 KubeletHasSufficientPID kubelet has sufficient PID available
Ready True Tue, 17 Mar 2020 11:26:46 +0800 Sun, 01 Mar 2020 12:01:42 +0800 KubeletReady kubelet is posting ready status
Addresses:
InternalIP: 10.7.10.154
Hostname: node1
Capacity:
cpu: 8
ephemeral-storage: 51175Mi
hugepages-2Mi: 0
memory: 20607492Ki
pods: 110
Allocatable:
cpu: 8
ephemeral-storage: 48294789041
hugepages-2Mi: 0
memory: 20505092Ki
pods: 110
System Info:
Machine ID: b2d71c4f4af44ca09bbd58f3a38cb0ae
System UUID: 564DB120-B5BF-C553-92B9-9CDE4A16197C
Boot ID: 1442ebad-b76b-44c5-b973-1df392753eb7
Kernel Version: 3.10.0-862.el7.x86_64
OS Image: CentOS Linux 7 (Core)
Operating System: linux
Architecture: amd64
Container Runtime Version: docker://19.3.6
Kubelet Version: v1.17.3
Kube-Proxy Version: v1.17.3
PodCIDR: 10.244.1.0/24
PodCIDRs: 10.244.1.0/24
Non-terminated Pods: (7 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits AGE


default itswk-deployment-f96f5f7b4-k7lbt 0 (0%) 0 (0%) 0 (0%) 0 (0%) 3d13h
kube-system kube-flannel-ds-amd64-nxwfd 100m (1%) 100m (1%) 50Mi (0%) 50Mi (0%) 15d
kube-system kube-proxy-vwtsh 0 (0%) 0 (0%) 0 (0%) 0 (0%) 15d
kube-system local-volume-provisioner-rnjz8 100m (1%) 100m (1%) 100Mi (0%) 100Mi (0%) 115m
kube-system tiller-deploy-6d8dfbb696-89bvl 0 (0%) 0 (0%) 0 (0%) 0 (0%) 5d17h
kubernetes-dashboard kubernetes-dashboard-866f987876-l9s7d 0 (0%) 0 (0%) 0 (0%) 0 (0%) 12d
tidb-cluster tidb-cluster-discovery-77d9b8d8b9-8gszb 80m (1%) 250m (3%) 50Mi (0%) 150Mi (0%) 70m
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits


cpu 280m (3%) 450m (5%)
memory 200Mi (0%) 300Mi (1%)
ephemeral-storage 0 (0%) 0 (0%)
Events:
[root@master tidb-cluster]# clear
[root@master tidb-cluster]# kubectl describe nodes node1
Name: node1
Roles:
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/os=linux
kubernetes.io/arch=amd64
kubernetes.io/hostname=node1
kubernetes.io/os=linux
Annotations: flannel.alpha.coreos.com/backend-data: {“VtepMAC”:“be:3b:19:d9:ac:ba”}
flannel.alpha.coreos.com/backend-type: vxlan
flannel.alpha.coreos.com/kube-subnet-manager: true
flannel.alpha.coreos.com/public-ip: 172.30.0.154
kubeadm.alpha.kubernetes.io/cri-socket: /var/run/dockershim.sock
node.alpha.kubernetes.io/ttl: 0
volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp: Sun, 01 Mar 2020 12:00:41 +0800
Taints:
Unschedulable: false
Lease:
HolderIdentity: node1
AcquireTime:
RenewTime: Tue, 17 Mar 2020 11:31:38 +0800
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message


MemoryPressure False Tue, 17 Mar 2020 11:26:46 +0800 Sun, 01 Mar 2020 12:00:41 +0800 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Tue, 17 Mar 2020 11:26:46 +0800 Sun, 01 Mar 2020 12:00:41 +0800 KubeletHasNoDiskPressure kubelet has no disk pressure
PIDPressure False Tue, 17 Mar 2020 11:26:46 +0800 Sun, 01 Mar 2020 12:00:41 +0800 KubeletHasSufficientPID kubelet has sufficient PID available
Ready True Tue, 17 Mar 2020 11:26:46 +0800 Sun, 01 Mar 2020 12:01:42 +0800 KubeletReady kubelet is posting ready status
Addresses:
InternalIP: 10.7.10.154
Hostname: node1
Capacity:
cpu: 8
ephemeral-storage: 51175Mi
hugepages-2Mi: 0
memory: 20607492Ki
pods: 110
Allocatable:
cpu: 8
ephemeral-storage: 48294789041
hugepages-2Mi: 0
memory: 20505092Ki
pods: 110
System Info:
Machine ID: b2d71c4f4af44ca09bbd58f3a38cb0ae
System UUID: 564DB120-B5BF-C553-92B9-9CDE4A16197C
Boot ID: 1442ebad-b76b-44c5-b973-1df392753eb7
Kernel Version: 3.10.0-862.el7.x86_64
OS Image: CentOS Linux 7 (Core)
Operating System: linux
Architecture: amd64
Container Runtime Version: docker://19.3.6
Kubelet Version: v1.17.3
Kube-Proxy Version: v1.17.3
PodCIDR: 10.244.1.0/24
PodCIDRs: 10.244.1.0/24
Non-terminated Pods: (7 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits AGE


default itswk-deployment-f96f5f7b4-k7lbt 0 (0%) 0 (0%) 0 (0%) 0 (0%) 3d14h
kube-system kube-flannel-ds-amd64-nxwfd 100m (1%) 100m (1%) 50Mi (0%) 50Mi (0%) 15d
kube-system kube-proxy-vwtsh 0 (0%) 0 (0%) 0 (0%) 0 (0%) 15d
kube-system local-volume-provisioner-rnjz8 100m (1%) 100m (1%) 100Mi (0%) 100Mi (0%) 116m
kube-system tiller-deploy-6d8dfbb696-89bvl 0 (0%) 0 (0%) 0 (0%) 0 (0%) 5d17h
kubernetes-dashboard kubernetes-dashboard-866f987876-l9s7d 0 (0%) 0 (0%) 0 (0%) 0 (0%) 12d
tidb-cluster tidb-cluster-discovery-77d9b8d8b9-8gszb 80m (1%) 250m (3%) 50Mi (0%) 150Mi (0%) 71m
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits


cpu 280m (3%) 450m (5%)
memory 200Mi (0%) 300Mi (1%)
ephemeral-storage 0 (0%) 0 (0%)
Events:

[root@master tidb-cluster]# kubectl describe nodes node1
Name: node1
Roles:
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/os=linux
kubernetes.io/arch=amd64
kubernetes.io/hostname=node1
kubernetes.io/os=linux
Annotations: flannel.alpha.coreos.com/backend-data: {“VtepMAC”:“be:3b:19:d9:ac:ba”}
flannel.alpha.coreos.com/backend-type: vxlan
flannel.alpha.coreos.com/kube-subnet-manager: true
flannel.alpha.coreos.com/public-ip: 172.30.0.154
kubeadm.alpha.kubernetes.io/cri-socket: /var/run/dockershim.sock
node.alpha.kubernetes.io/ttl: 0
volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp: Sun, 01 Mar 2020 12:00:41 +0800
Taints:
Unschedulable: false
Lease:
HolderIdentity: node1
AcquireTime:
RenewTime: Tue, 17 Mar 2020 11:31:38 +0800
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message


MemoryPressure False Tue, 17 Mar 2020 11:26:46 +0800 Sun, 01 Mar 2020 12:00:41 +0800 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Tue, 17 Mar 2020 11:26:46 +0800 Sun, 01 Mar 2020 12:00:41 +0800 KubeletHasNoDiskPressure kubelet has no disk pressure
PIDPressure False Tue, 17 Mar 2020 11:26:46 +0800 Sun, 01 Mar 2020 12:00:41 +0800 KubeletHasSufficientPID kubelet has sufficient PID available
Ready True Tue, 17 Mar 2020 11:26:46 +0800 Sun, 01 Mar 2020 12:01:42 +0800 KubeletReady kubelet is posting ready status
Addresses:
InternalIP: 10.7.10.154
Hostname: node1
Capacity:
cpu: 8
ephemeral-storage: 51175Mi
hugepages-2Mi: 0
memory: 20607492Ki
pods: 110
Allocatable:
cpu: 8
ephemeral-storage: 48294789041
hugepages-2Mi: 0
memory: 20505092Ki
pods: 110
System Info:
Machine ID: b2d71c4f4af44ca09bbd58f3a38cb0ae
System UUID: 564DB120-B5BF-C553-92B9-9CDE4A16197C
Boot ID: 1442ebad-b76b-44c5-b973-1df392753eb7
Kernel Version: 3.10.0-862.el7.x86_64
OS Image: CentOS Linux 7 (Core)
Operating System: linux
Architecture: amd64
Container Runtime Version: docker://19.3.6
Kubelet Version: v1.17.3
Kube-Proxy Version: v1.17.3
PodCIDR: 10.244.1.0/24
PodCIDRs: 10.244.1.0/24
Non-terminated Pods: (7 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits AGE


default itswk-deployment-f96f5f7b4-k7lbt 0 (0%) 0 (0%) 0 (0%) 0 (0%) 3d14h
kube-system kube-flannel-ds-amd64-nxwfd 100m (1%) 100m (1%) 50Mi (0%) 50Mi (0%) 15d
kube-system kube-proxy-vwtsh 0 (0%) 0 (0%) 0 (0%) 0 (0%) 15d
kube-system local-volume-provisioner-rnjz8 100m (1%) 100m (1%) 100Mi (0%) 100Mi (0%) 116m
kube-system tiller-deploy-6d8dfbb696-89bvl 0 (0%) 0 (0%) 0 (0%) 0 (0%) 5d17h
kubernetes-dashboard kubernetes-dashboard-866f987876-l9s7d 0 (0%) 0 (0%) 0 (0%) 0 (0%) 12d
tidb-cluster tidb-cluster-discovery-77d9b8d8b9-8gszb 80m (1%) 250m (3%) 50Mi (0%) 150Mi (0%) 71m
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits


cpu 280m (3%) 450m (5%)
memory 200Mi (0%) 300Mi (1%)
ephemeral-storage 0 (0%) 0 (0%)
Events:
[root@master tidb-cluster]# kubectl describe nodes node2
Name: node2
Roles:
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/os=linux
kubernetes.io/arch=amd64
kubernetes.io/hostname=node2
kubernetes.io/os=linux
Annotations: flannel.alpha.coreos.com/backend-data: {“VtepMAC”:“82:2b:bd:c0:de:12”}
flannel.alpha.coreos.com/backend-type: vxlan
flannel.alpha.coreos.com/kube-subnet-manager: true
flannel.alpha.coreos.com/public-ip: 172.30.0.153
kubeadm.alpha.kubernetes.io/cri-socket: /var/run/dockershim.sock
node.alpha.kubernetes.io/ttl: 0
volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp: Sun, 01 Mar 2020 12:05:11 +0800
Taints:
Unschedulable: false
Lease:
HolderIdentity: node2
AcquireTime:
RenewTime: Tue, 17 Mar 2020 11:32:18 +0800
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message


MemoryPressure False Tue, 17 Mar 2020 11:29:43 +0800 Sun, 01 Mar 2020 12:05:11 +0800 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Tue, 17 Mar 2020 11:29:43 +0800 Sun, 01 Mar 2020 12:05:11 +0800 KubeletHasNoDiskPressure kubelet has no disk pressure
PIDPressure False Tue, 17 Mar 2020 11:29:43 +0800 Sun, 01 Mar 2020 12:05:11 +0800 KubeletHasSufficientPID kubelet has sufficient PID available
Ready True Tue, 17 Mar 2020 11:29:43 +0800 Sun, 01 Mar 2020 12:06:21 +0800 KubeletReady kubelet is posting ready status
Addresses:
InternalIP: 10.7.10.153
Hostname: node2
Capacity:
cpu: 8
ephemeral-storage: 51175Mi
hugepages-2Mi: 0
memory: 21023224Ki
pods: 110
Allocatable:
cpu: 8
ephemeral-storage: 48294789041
hugepages-2Mi: 0
memory: 20920824Ki
pods: 110
System Info:
Machine ID: dfca0fed1f434c8f84b13a9f7bc1d192
System UUID: 564D0CDA-810D-8049-1A0C-9F1A2579A638
Boot ID: 4d2a0d98-8c3b-48ab-bf69-9d85906844f2
Kernel Version: 3.10.0-862.el7.x86_64
OS Image: CentOS Linux 7 (Core)
Operating System: linux
Architecture: amd64
Container Runtime Version: docker://19.3.6
Kubelet Version: v1.17.3
Kube-Proxy Version: v1.17.3
PodCIDR: 10.244.2.0/24
PodCIDRs: 10.244.2.0/24
Non-terminated Pods: (7 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits AGE


kube-system kube-flannel-ds-amd64-t5285 100m (1%) 100m (1%) 50Mi (0%) 50Mi (0%) 15d
kube-system kube-proxy-82ncw 0 (0%) 0 (0%) 0 (0%) 0 (0%) 15d
kube-system local-volume-provisioner-wlkb2 100m (1%) 100m (1%) 100Mi (0%) 100Mi (0%) 117m
kubernetes-dashboard dashboard-metrics-scraper-7b8b58dc8b-g58tj 0 (0%) 0 (0%) 0 (0%) 0 (0%) 12d
tidb-admin tidb-controller-manager-5574fbbfb9-zlcmc 80m (1%) 250m (3%) 50Mi (0%) 150Mi (0%) 88m
tidb-admin tidb-scheduler-86d9dbf948-9v94g 160m (2%) 500m (6%) 100Mi (0%) 300Mi (1%) 88m
tidb-cluster tidb-cluster-monitor-5d5fc8d8c6-fnwbv 0 (0%) 0 (0%) 0 (0%) 0 (0%) 72m
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits


cpu 440m (5%) 950m (11%)
memory 300Mi (1%) 600Mi (2%)
ephemeral-storage 0 (0%) 0 (0%)
Events:

@kimi 你是用的 TiDB Operator 版本是什么?

麻烦再提供下这些信息:

  • 创建集群的 values.yaml
  • kubectl get pv
  • kubectl get pvc
  • tidb-scheduler 这个 Pod 里面两个容器的日志

也可以按照这个文档诊断下:https://pingcap.com/docs-cn/stable/tidb-in-kubernetes/troubleshoot/#pod-处于-pending-状态





TiDB Operator 版本是v1.0.3

kube-schedule.rar (77.8 KB)

您好: 您的问题我们正在分析,会尽快答复,多谢。

@kimi

K8s v1.16 以后额外增加了一些 RBAC rules,详细需要增加的 rules 可以参考这里:https://github.com/pingcap/tidb-operator/issues/1281#issuecomment-561520818。

我们已经在 v1.0.6 上修复了这个问题:https://github.com/pingcap/tidb-operator/pull/1282 升级到 v1.0.6 就可以了。问题答案更新了下,@kimi

好的,我已经重新在安装v1.0.6了,正在进行中。。。。。。

换成v1.0.6 已经好了

是的,v1.0.6 上修复了这个问题。tidb-scheduler 直接绑定了 system:volume-scheduler 这个 CLusterRole 解决了。

多谢:rose::rose::rose:

:+1:

此话题已在最后回复的 1 分钟后被自动关闭。不再允许新回复。