k8s 部署时 tidb pod未创建

接着 上一个帖子 k8s部署tidb pvc不能绑定 - #4,来自 h5n1 的问题:
在apply 集群配置文件后 pd 和tikv running 了,但是tidb pod 没有创建,该如何分析?

apiVersion: pingcap.com/v1alpha1
kind: TidbCluster
metadata:
  name: tidb-test-cluster
  namespace: default
spec:
  timezone: UTC
  configUpdateStrategy: RollingUpdate
  imagePullPolicy: Always
  helper:
    image: alpine:3.16.0
  pvReclaimPolicy: Retain
  discovery: {}
  enableDynamicConfiguration: true
  pd:
    baseImage: 10.172.49.246/zongbu-sre/pd-arm64:v6.1.2
    config: |
      [dashboard]
        internal-proxy = true
    replicas: 3
    maxFailoverCount: 0
    requests:
      cpu: 2000m
      memory: 12Gi
      storage: 100Gi
    limits:
      cpu: 2000m
      memory: 12Gi
      storage: 100Gi
    storageClassName: "pd-ssd-storage"
  tidb:
    baseImage: 10.172.49.246/zongbu-sre/tidb-arm64:v6.1.2
    config: |
      [performance]
        tcp-keep-alive = true
    replicas: 3
    maxFailoverCount: 0
    requests:
      cpu: 2000m
      memory: 16Gi
      storage: 100Gi
    limits:
      cpu: 2000m
      memory: 16Gi
      storage: 100Gi
    service:
      type: "ClusterIP"
    storageClassName: "tidb-storage"
  tikv:
    baseImage: 10.172.49.246/zongbu-sre/tikv-arm64:v6.1.2
    config: |
      log-level = "info"
    replicas: 6
    maxFailoverCount: 0
    requests:
      cpu: 2000m
      memory: 32Gi
      storage: 500Gi
    limits:
      cpu: 2000m
      memory: 32Gi
      storage: 500Gi
    storageClassName: "tikv-ssd-storage" 

I0110 13:49:42.667328       1 tidb_cluster_controller.go:131] TidbCluster: default/tidb-test-cluster, still need sync: TidbCluster: [default/tidb-test-cluster], waiting for TiKV cluster running, requeuing
I0110 13:49:49.794007       1 tikv_member_manager.go:834] TiKV of Cluster default/tidb-test-cluster not bootstrapped yet
I0110 13:49:49.799414       1 tikv_member_manager.go:938] TiKV of Cluster default/tidb-test-cluster is not bootstrapped yet, no need to set store labels
I0110 13:49:49.800033       1 tidb_cluster_controller.go:131] TidbCluster: default/tidb-test-cluster, still need sync: TidbCluster: [default/tidb-test-cluster], waiting for TiKV cluster running, requeuing
I0110 13:50:19.811670       1 tikv_member_manager.go:834] TiKV of Cluster default/tidb-test-cluster not bootstrapped yet
I0110 13:50:19.818649       1 tikv_member_manager.go:938] TiKV of Cluster default/tidb-test-cluster is not bootstrapped yet, no need to set store labels
I0110 13:50:19.819283       1 tidb_cluster_controller.go:131] TidbCluster: default/tidb-test-cluster, still need sync: TidbCluster: [default/tidb-test-cluster], waiting for TiKV cluster running, requeuing

检查 opeator 的日志可以查到错误的,官网也提供了分析方案

参考如下:
https://docs.pingcap.com/zh/tidb-in-kubernetes/stable/deploy-failures

kubectl describe tidbclusters -n ${namespace} ${cluster_name}

看上去像是tikv找不到pd
— tikv pod log:

[2023/01/11 01:16:20.611 +00:00] [INFO] [util.rs:587] ["connecting to PD endpoint"] [endpoints=http://tidb-test-cluster-pd:2379]
[2023/01/11 01:16:22.611 +00:00] [INFO] [util.rs:549] ["PD failed to respond"] [err="Grpc(RpcFailure(RpcStatus { code: 4-DEADLINE_EXCEEDED, message: \"Deadline Exceeded\", details: [] }))"] [endpoints=http://tidb-test-cluster-pd:2379]
[2023/01/11 01:16:22.611 +00:00] [WARN] [client.rs:163] ["validate PD endpoints failed"] [err="Other(\"[components/pd_client/src/util.rs:582]: PD cluster failed to respond\")"]
[2023/01/11 01:16:22.912 +00:00] [INFO] [util.rs:587] ["connecting to PD endpoint"] [endpoints=http://tidb-test-cluster-pd:2379]

./pd-ctl store

Failed to get store: [500] "[PD:cluster:ErrNotBootstrapped]TiKV cluster not bootstrapped, please start TiKV first"

ps -ef|grep tikv

root           1       0  0 01:13 ?        00:00:00 /tikv-server --pd=http://tidb-test-cluster-pd:2379 --advertise-addr=tidb-test-cluster-tikv-5.tidb-test-cluster-tikv-peer.default.svc:20160 --addr=0.0.0.0:20160 --status-addr=0.0.0.0:20180 --advertise-status-addr=tidb-test-cluster-tikv-5.tidb-test-cluster-tikv-peer.default.svc:20180 --data-dir=/var/lib/tikv --capacity=500GB --config=/etc/tikv/tikv.toml

kubectl get svc -A

NAMESPACE     NAME                          TYPE        CLUSTER-IP        EXTERNAL-IP   PORT(S)                  AGE
default       kubernetes                    ClusterIP   192.168.0.1       <none>        443/TCP                  2d17h
default       tidb-test-cluster-discovery   ClusterIP   192.168.104.90    <none>        10261/TCP,10262/TCP      11h
default       tidb-test-cluster-pd          ClusterIP   192.168.153.231   <none>        2379/TCP                 11h
default       tidb-test-cluster-pd-peer     ClusterIP   None              <none>        2380/TCP,2379/TCP        11h
default       tidb-test-cluster-tikv-peer   ClusterIP   None              <none>        20160/TCP                11h
kube-system   kube-dns                      ClusterIP   192.168.0.222     <none>        53/UDP,53/TCP,9153/TCP   5d13h

kubectl describe svc tidb-test-cluster-pd

Name:              tidb-test-cluster-pd
Namespace:         default
Labels:            app.kubernetes.io/component=pd
                   app.kubernetes.io/instance=tidb-test-cluster
                   app.kubernetes.io/managed-by=tidb-operator
                   app.kubernetes.io/name=tidb-cluster
                   app.kubernetes.io/used-by=end-user
Annotations:       pingcap.com/last-applied-configuration:
                     {"ports":[{"name":"client","protocol":"TCP","port":2379,"targetPort":2379}],"selector":{"app.kubernetes.io/component":"pd","app.kubernetes...
Selector:          app.kubernetes.io/component=pd,app.kubernetes.io/instance=tidb-test-cluster,app.kubernetes.io/managed-by=tidb-operator,app.kubernetes.io/name=tidb-cluster
Type:              ClusterIP
IP Family Policy:  SingleStack
IP Families:       IPv4
IP:                192.168.153.231
IPs:               192.168.153.231
Port:              client  2379/TCP
TargetPort:        2379/TCP
Endpoints:         172.16.112.142:2379,172.16.228.152:2379,172.16.252.53:2379
Session Affinity:  None
Events:            <none>

参考K8S 的网络配置和故障排查的部分看看,这个本身就十分复杂了… :rofl:

已解决,动作很大 ,清理了iptable 重启主机 之前的pv/cluster删了重新弄一遍好,应该还是网络问题

此话题已在最后回复的 60 天后被自动关闭。不再允许新回复。