在k8s部署tidb遇到一个很奇怪的问题 创建tidb-cluster.yaml 只启动了pd


查看运行的pod 只有 pd 而tikv和tidb 都没运行

which should be able to run in any Kubernetes cluster with storage support.

apiVersion: pingcap.com/v1alpha1
kind: TidbCluster
metadata:
name: basic
spec:
version: v4.0.9
timezone: UTC
pvReclaimPolicy: Retain
enableDynamicConfiguration: true
configUpdateStrategy: RollingUpdate
discovery: {}
pd:
baseImage: registry.cn-beijing.aliyuncs.com/tidb/pd
replicas: 3
# if storageClassName is not set, the default Storage Class of the Kubernetes cluster will be used
storageClassName: local-storage
requests:
storage: “10Gi”
config: {}
tikv:
baseImage: registry.cn-beijing.aliyuncs.com/tidb/tikv
replicas: 3
# if storageClassName is not set, the default Storage Class of the Kubernetes cluster will be used
storageClassName: local-storage
requests:
storage: “10Gi”
config:
storage:
# In basic examples, we set this to avoid using too much storage.
reserve-space: “0MB”
rocksdb:
# In basic examples, we set this to avoid the following error in some Kubernetes clusters:
# “the maximum number of open file descriptors is too small, got 1024, expect greater or equal to 82920”
max-open-files: 256
raftdb:
max-open-files: 256
tidb:
baseImage: registry.cn-beijing.aliyuncs.com/tidb/tidb
replicas: 2
service:
type: ClusterIP
config: {}

麻烦通过命令 kubectl describe po -n {namespace} 查看下是否有相关报错信息。

[root@k8s-master ~]# kubectl describe po -n tidb-cluster
Name: basic-discovery-58c68bc54f-gvgsg
Namespace: tidb-cluster
Priority: 0
Node: k8s-node3/192.168.1.204
Start Time: Fri, 15 Jan 2021 14:15:56 +0800
Labels: app.kubernetes.io/component=discovery
app.kubernetes.io/instance=basic
app.kubernetes.io/managed-by=tidb-operator
app.kubernetes.io/name=tidb-cluster
pod-template-hash=58c68bc54f
Annotations:
Status: Running
IP: 10.244.3.128
IPs:
IP: 10.244.3.128
Controlled By: ReplicaSet/basic-discovery-58c68bc54f
Containers:
discovery:
Container ID: docker://63df83633637bac984e3444a73b9258caeb6e0d98e5205ad09a3be6e1495b59b
Image: registry.cn-beijing.aliyuncs.com/tidb/tidb-operator:v1.1.9
Image ID: docker-pullable://pingcap/tidb-operator@sha256:8d3b536ec067ae250ef3095cf8cd229db4d9a033a924b2a9fe6bc33a1727d71d
Port:
Host Port:
Command:
/usr/local/bin/tidb-discovery
State: Running
Started: Fri, 15 Jan 2021 14:15:58 +0800
Ready: True
Restart Count: 0
Environment:
MY_POD_NAMESPACE: tidb-cluster (v1:metadata.namespace)
TZ: UTC
TC_NAME: basic
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from basic-discovery-token-4c8ms (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
basic-discovery-token-4c8ms:
Type: Secret (a volume populated by a Secret)
SecretName: basic-discovery-token-4c8ms
Optional: false
QoS Class: BestEffort
Node-Selectors:
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message


Normal Scheduled 15m default-scheduler Successfully assigned tidb-cluster/basic-discovery-58c68bc54f-gvgsg to k8s-node3
Normal Pulled 15m kubelet, k8s-node3 Container image “registry.cn-beijing.aliyuncs.com/tidb/tidb-operator:v1.1.9” already present on machine
Normal Created 15m kubelet, k8s-node3 Created container discovery
Normal Started 15m kubelet, k8s-node3 Started container discovery

Name: basic-monitor-566f9bf47f-klscg
Namespace: tidb-cluster
Priority: 0
Node: k8s-node1/192.168.1.203
Start Time: Fri, 15 Jan 2021 11:33:39 +0800
Labels: app.kubernetes.io/component=monitor
app.kubernetes.io/instance=basic
app.kubernetes.io/managed-by=tidb-operator
app.kubernetes.io/name=tidb-cluster
pod-template-hash=566f9bf47f
Annotations:
Status: Running
IP: 10.244.2.101
IPs:
IP: 10.244.2.101
Controlled By: ReplicaSet/basic-monitor-566f9bf47f
Init Containers:
monitor-initializer:
Container ID: docker://395626e03fc75d9a2b3aa8fca5a8eb09d7b13e4789f5b025981a8c8f2524f5cc
Image: registry.cn-beijing.aliyuncs.com/tidb/tidb-monitor-initializer:v4.0.9
Image ID: docker-pullable://registry.cn-beijing.aliyuncs.com/tidb/tidb-monitor-initializer@sha256:73c8d3bfab30f7d9906a9da7647ea0146b966b05d216edec23dc2e241ec39673
Port:
Host Port:
Command:
/bin/sh
-c
mkdir -p /data/prometheus /data/grafana
chmod 777 /data/prometheus /data/grafana
/usr/bin/init.sh
State: Terminated
Reason: Completed
Exit Code: 0
Started: Fri, 15 Jan 2021 11:33:40 +0800
Finished: Fri, 15 Jan 2021 11:33:41 +0800
Ready: True
Restart Count: 0
Environment:
TIDB_CLUSTER_NAME: basic
TIDB_ENABLE_BINLOG: false
PROM_CONFIG_PATH: /prometheus-rules
PROM_PERSISTENT_DIR: /data
TIDB_VERSION: registry.cn-beijing.aliyuncs.com/tidb/tidb:v4.0.9
GF_TIDB_PROMETHEUS_URL: http://127.0.0.1:9090
TIDB_CLUSTER_NAMESPACE: tidb-cluster
TZ: UTC
GF_PROVISIONING_PATH: /grafana-dashboard-definitions/tidb
GF_DATASOURCE_PATH: /etc/grafana/provisioning/datasources
Mounts:
/data from monitor-data (rw)
/etc/grafana/provisioning/datasources from datasource (rw)
/grafana-dashboard-definitions/tidb from grafana-dashboard (rw)
/prometheus-rules from prometheus-rules (rw)
/var/run/secrets/kubernetes.io/serviceaccount from basic-monitor-token-rnd5c (ro)
Containers:
prometheus:
Container ID: docker://090827a5ca1aed8a533e6d43e3982beed69c80bf7276d5665613a73f692aea00
Image: prom/prometheus:v2.18.1
Image ID: docker-pullable://prom/prometheus@sha256:5880ec936055fad18ccee798d2a63f64ed85bd28e8e0af17c6923a090b686c3d
Port: 9090/TCP
Host Port: 0/TCP
Command:
/bin/prometheus
–web.enable-admin-api
–web.enable-lifecycle
–config.file=/etc/prometheus/prometheus.yml
–storage.tsdb.path=/data/prometheus
–storage.tsdb.retention=0d
State: Running
Started: Fri, 15 Jan 2021 11:33:42 +0800
Ready: True
Restart Count: 0
Environment:
TZ: UTC
Mounts:
/data from monitor-data (rw)
/etc/prometheus from prometheus-config (ro)
/prometheus-rules from prometheus-rules (rw)
/var/run/secrets/kubernetes.io/serviceaccount from basic-monitor-token-rnd5c (ro)
reloader:
Container ID: docker://4df95ebe51548b0c3f458f6dc6b881f54c0a34c1cc4d3794b3ffee2272161c4e
Image: registry.cn-beijing.aliyuncs.com/tidb/tidb-monitor-reloader:v1.0.1
Image ID: docker-pullable://registry.cn-beijing.aliyuncs.com/tidb/tidb-monitor-reloader@sha256:b562ddeaa1be9e64af226fd866e90ee63479a301d559000686827780d5c4c520
Port: 9089/TCP
Host Port: 0/TCP
Command:
/bin/reload
–root-store-path=/data
–sub-store-path=registry.cn-beijing.aliyuncs.com/tidb/tidb:v4.0.9
–watch-path=/prometheus-rules/rules
–prometheus-url=http://127.0.0.1:9090
State: Running
Started: Fri, 15 Jan 2021 11:33:42 +0800
Ready: True
Restart Count: 0
Environment:
TZ: UTC
Mounts:
/data from monitor-data (rw)
/prometheus-rules from prometheus-rules (rw)
/var/run/secrets/kubernetes.io/serviceaccount from basic-monitor-token-rnd5c (ro)
grafana:
Container ID: docker://b229f93dd099ebab0af28a168d899aaf65df6d7afe53a0411dcd59bb3ba548ee
Image: grafana/grafana:6.1.6
Image ID: docker-pullable://grafana/grafana@sha256:d66b41cf7e0586274ca3e15e03299e4cfde48019fd756bb97cc9db57da9b0c86
Port: 3000/TCP
Host Port: 0/TCP
State: Running
Started: Fri, 15 Jan 2021 11:33:42 +0800
Ready: True
Restart Count: 0
Environment:
GF_PATHS_DATA: /data/grafana
GF_SECURITY_ADMIN_PASSWORD: <set to the key ‘password’ in secret ‘basic-monitor’> Optional: false
GF_SECURITY_ADMIN_USER: <set to the key ‘username’ in secret ‘basic-monitor’> Optional: false
TZ: UTC
Mounts:
/data from monitor-data (rw)
/etc/grafana/provisioning/dashboards from dashboards-provisioning (rw)
/etc/grafana/provisioning/datasources from datasource (rw)
/grafana-dashboard-definitions/tidb from grafana-dashboard (rw)
/var/run/secrets/kubernetes.io/serviceaccount from basic-monitor-token-rnd5c (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
monitor-data:
Type: EmptyDir (a temporary directory that shares a pod’s lifetime)
Medium:
SizeLimit:
prometheus-config:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: basic-monitor
Optional: false
datasource:
Type: EmptyDir (a temporary directory that shares a pod’s lifetime)
Medium:
SizeLimit:
dashboards-provisioning:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: basic-monitor
Optional: false
grafana-dashboard:
Type: EmptyDir (a temporary directory that shares a pod’s lifetime)
Medium:
SizeLimit:
prometheus-rules:
Type: EmptyDir (a temporary directory that shares a pod’s lifetime)
Medium:
SizeLimit:
basic-monitor-token-rnd5c:
Type: Secret (a volume populated by a Secret)
SecretName: basic-monitor-token-rnd5c
Optional: false
QoS Class: BestEffort
Node-Selectors:
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:

Name: basic-pd-0
Namespace: tidb-cluster
Priority: 0
Node: k8s-node2/192.168.1.202
Start Time: Fri, 15 Jan 2021 14:15:57 +0800
Labels: app.kubernetes.io/component=pd
app.kubernetes.io/instance=basic
app.kubernetes.io/managed-by=tidb-operator
app.kubernetes.io/name=tidb-cluster
controller-revision-hash=basic-pd-864fd94cb5
statefulset.kubernetes.io/pod-name=basic-pd-0
Annotations: prometheus.io/path: /metrics
prometheus.io/port: 2379
prometheus.io/scrape: true
Status: Running
IP: 10.244.1.78
IPs:
IP: 10.244.1.78
Controlled By: StatefulSet/basic-pd
Containers:
pd:
Container ID: docker://ac501777db50174e9c406d98d696fe595ecebc1ef9f536ddd769e6231e20d634
Image: registry.cn-beijing.aliyuncs.com/tidb/pd:v4.0.9
Image ID: docker-pullable://registry.cn-beijing.aliyuncs.com/tidb/pd@sha256:e13b2a438f7bf0cc7b287463d07e60ff58801eb4d5dbb1e862e3a0c302bf8437
Ports: 2380/TCP, 2379/TCP
Host Ports: 0/TCP, 0/TCP
Command:
/bin/sh
/usr/local/bin/pd_start_script.sh
State: Running
Started: Fri, 15 Jan 2021 14:15:58 +0800
Ready: True
Restart Count: 0
Environment:
NAMESPACE: tidb-cluster (v1:metadata.namespace)
PEER_SERVICE_NAME: basic-pd-peer
SERVICE_NAME: basic-pd
SET_NAME: basic-pd
TZ: UTC
Mounts:
/etc/pd from config (ro)
/etc/podinfo from annotations (ro)
/usr/local/bin from startup-script (ro)
/var/lib/pd from pd (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-j2mmv (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
pd:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: pd-basic-pd-0
ReadOnly: false
annotations:
Type: DownwardAPI (a volume populated by information about the pod)
Items:
metadata.annotations -> annotations
config:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: basic-pd-6130373
Optional: false
startup-script:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: basic-pd-6130373
Optional: false
default-token-j2mmv:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-j2mmv
Optional: false
QoS Class: BestEffort
Node-Selectors:
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message


Normal Scheduled 15m default-scheduler Successfully assigned tidb-cluster/basic-pd-0 to k8s-node2
Normal Pulled 15m kubelet, k8s-node2 Container image “registry.cn-beijing.aliyuncs.com/tidb/pd:v4.0.9” already present on machine
Normal Created 15m kubelet, k8s-node2 Created container pd
Normal Started 15m kubelet, k8s-node2 Started container pd

Name: basic-pd-1
Namespace: tidb-cluster
Priority: 0
Node: k8s-node3/192.168.1.204
Start Time: Fri, 15 Jan 2021 14:15:57 +0800
Labels: app.kubernetes.io/component=pd
app.kubernetes.io/instance=basic
app.kubernetes.io/managed-by=tidb-operator
app.kubernetes.io/name=tidb-cluster
controller-revision-hash=basic-pd-864fd94cb5
statefulset.kubernetes.io/pod-name=basic-pd-1
Annotations: prometheus.io/path: /metrics
prometheus.io/port: 2379
prometheus.io/scrape: true
Status: Running
IP: 10.244.3.129
IPs:
IP: 10.244.3.129
Controlled By: StatefulSet/basic-pd
Containers:
pd:
Container ID: docker://43453bba53e82405b37f2c3e9369c9136cd97c587ba37a9abe02ae3f51b555c7
Image: registry.cn-beijing.aliyuncs.com/tidb/pd:v4.0.9
Image ID: docker-pullable://registry.cn-beijing.aliyuncs.com/tidb/pd@sha256:e13b2a438f7bf0cc7b287463d07e60ff58801eb4d5dbb1e862e3a0c302bf8437
Ports: 2380/TCP, 2379/TCP
Host Ports: 0/TCP, 0/TCP
Command:
/bin/sh
/usr/local/bin/pd_start_script.sh
State: Running
Started: Fri, 15 Jan 2021 14:29:10 +0800
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Fri, 15 Jan 2021 14:25:33 +0800
Finished: Fri, 15 Jan 2021 14:28:28 +0800
Ready: True
Restart Count: 4
Environment:
NAMESPACE: tidb-cluster (v1:metadata.namespace)
PEER_SERVICE_NAME: basic-pd-peer
SERVICE_NAME: basic-pd
SET_NAME: basic-pd
TZ: UTC
Mounts:
/etc/pd from config (ro)
/etc/podinfo from annotations (ro)
/usr/local/bin from startup-script (ro)
/var/lib/pd from pd (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-j2mmv (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
pd:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: pd-basic-pd-1
ReadOnly: false
annotations:
Type: DownwardAPI (a volume populated by information about the pod)
Items:
metadata.annotations -> annotations
config:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: basic-pd-6130373
Optional: false
startup-script:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: basic-pd-6130373
Optional: false
default-token-j2mmv:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-j2mmv
Optional: false
QoS Class: BestEffort
Node-Selectors:
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message


Normal Scheduled 15m default-scheduler Successfully assigned tidb-cluster/basic-pd-1 to k8s-node3
Warning BackOff 2m45s (x6 over 9m46s) kubelet, k8s-node3 Back-off restarting failed container
Normal Pulled 2m30s (x5 over 15m) kubelet, k8s-node3 Container image “registry.cn-beijing.aliyuncs.com/tidb/pd:v4.0.9” already present on machine
Normal Created 2m30s (x5 over 15m) kubelet, k8s-node3 Created container pd
Normal Started 2m29s (x5 over 15m) kubelet, k8s-node3 Started container pd

Name: basic-pd-2
Namespace: tidb-cluster
Priority: 0
Node: k8s-node1/192.168.1.203
Start Time: Fri, 15 Jan 2021 14:15:57 +0800
Labels: app.kubernetes.io/component=pd
app.kubernetes.io/instance=basic
app.kubernetes.io/managed-by=tidb-operator
app.kubernetes.io/name=tidb-cluster
controller-revision-hash=basic-pd-864fd94cb5
statefulset.kubernetes.io/pod-name=basic-pd-2
Annotations: prometheus.io/path: /metrics
prometheus.io/port: 2379
prometheus.io/scrape: true
Status: Running
IP: 10.244.2.104
IPs:
IP: 10.244.2.104
Controlled By: StatefulSet/basic-pd
Containers:
pd:
Container ID: docker://1f4c908c162fe93cc50cd3aab361fc507305d3bf4f139d55990d7e45b200240c
Image: registry.cn-beijing.aliyuncs.com/tidb/pd:v4.0.9
Image ID: docker-pullable://registry.cn-beijing.aliyuncs.com/tidb/pd@sha256:e13b2a438f7bf0cc7b287463d07e60ff58801eb4d5dbb1e862e3a0c302bf8437
Ports: 2380/TCP, 2379/TCP
Host Ports: 0/TCP, 0/TCP
Command:
/bin/sh
/usr/local/bin/pd_start_script.sh
State: Running
Started: Fri, 15 Jan 2021 14:29:10 +0800
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Fri, 15 Jan 2021 14:25:20 +0800
Finished: Fri, 15 Jan 2021 14:28:15 +0800
Ready: True
Restart Count: 4
Environment:
NAMESPACE: tidb-cluster (v1:metadata.namespace)
PEER_SERVICE_NAME: basic-pd-peer
SERVICE_NAME: basic-pd
SET_NAME: basic-pd
TZ: UTC
Mounts:
/etc/pd from config (ro)
/etc/podinfo from annotations (ro)
/usr/local/bin from startup-script (ro)
/var/lib/pd from pd (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-j2mmv (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
pd:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: pd-basic-pd-2
ReadOnly: false
annotations:
Type: DownwardAPI (a volume populated by information about the pod)
Items:
metadata.annotations -> annotations
config:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: basic-pd-6130373
Optional: false
startup-script:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: basic-pd-6130373
Optional: false
default-token-j2mmv:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-j2mmv
Optional: false
QoS Class: BestEffort
Node-Selectors:
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message


Normal Scheduled 15m default-scheduler Successfully assigned tidb-cluster/basic-pd-2 to k8s-node1
Warning BackOff 2m44s (x7 over 9m49s) kubelet, k8s-node1 Back-off restarting failed container
Normal Pulled 2m29s (x5 over 15m) kubelet, k8s-node1 Container image “registry.cn-beijing.aliyuncs.com/tidb/pd:v4.0.9” already present on machine
Normal Created 2m29s (x5 over 15m) kubelet, k8s-node1 Created container pd
Normal Started 2m29s (x5 over 15m) kubelet, k8s-node1 Started container pd

日志里没有看出明显的问题,麻烦参考下面这个文档再做进一步的排查:
https://docs.pingcap.com/zh/tidb-in-kubernetes/stable/deploy-failures

pod 到 pod 的网络是有问题的。所以建议首先排查下 k8s 的网络,看看网络组件是否工作正常。

:+1:

此话题已在最后回复的 1 分钟后被自动关闭。不再允许新回复。