用TiDB Operator部署在K8S上的TiDB集群,使用NodePort的方式外部访问PD,报UnknownHost错误

【 TiDB 使用环境】测试环境
【 TiDB 版本】5.4.0
【遇到的问题】用TiDB Operator部署在K8S上的TiDB集群,使用NodePort的方式暴露端口外部访问PD(主要是用于调试flink tidb cdc connector),报UnknownHost错误:java.net.UnknownHostException: tidb-cluster-pd-0.tidb-cluster-pd-peer.tidb-cluster.svc。 不知道针对需要外部访问pd的场景(比如使用flink tidb cdc调试的场景)下,需要怎么暴露给外部才行?
【复现路径】
【问题现象及影响】
【TiDB Operator 版本】:1.3.3
【K8s 版本】:1.20.11

既然你用的nodeport的方式暴露pd给外部,理应可以用 Ip:prot的方式去访问pd,看你这报的错,是 tidb-cluster-pd-0.tidb-cluster-pd-peer.tidb-cluster.svc 解析失败,这个host属于 k8s 内部的,只能是部署在k8s上的服务才能用这个host访问对应的服务

代码程序中是使用的IP:Port的方式去访问的,但程序报错就就会解析到这个内部DNS记录上去,感觉是否因为flink tidb cdc connector的逻辑就是这样的?
代码相关报错:

19:54:23,967 INFO  org.tikv.common.PDClient                                     [] - init host mapping: start
19:54:23,968 INFO  org.tikv.common.PDClient                                     [] - init host mapping: end
19:54:23,968 INFO  org.tikv.common.PDClient                                     [] - get members with pd http://IP:PORT: start
19:54:24,578 INFO  org.tikv.common.PDClient                                     [] - get members with pd http://IP:PORT: end
19:54:24,583 INFO  org.tikv.common.PDClient                                     [] - init cluster with address: [http://tidb-cluster-pd-2.tidb-cluster-pd-peer.tidb-cluster.svc:2379, http://tidb-cluster-pd-1.tidb-cluster-pd-peer.tidb-cluster.svc:2379, http://tidb-cluster-pd-0.tidb-cluster-pd-peer.tidb-cluster.svc:2379]
19:54:24,584 INFO  org.tikv.common.PDClient                                     [] - createLeaderClientWrapper with leader tidb-cluster-pd-1.tidb-cluster-pd-peer.tidb-cluster.svc:2379: start
19:54:24,586 INFO  org.tikv.common.PDClient                                     [] - Switched to new leader: [leaderInfo: tidb-cluster-pd-1.tidb-cluster-pd-peer.tidb-cluster.svc:2379, storeAddress: tidb-cluster-pd-1.tidb-cluster-pd-peer.tidb-cluster.svc:2379]
19:54:24,586 INFO  org.tikv.common.PDClient                                     [] - createLeaderClientWrapper with leader tidb-cluster-pd-1.tidb-cluster-pd-peer.tidb-cluster.svc:2379: end
19:54:24,587 INFO  org.tikv.common.PDClient                                     [] - init cluster: finish
19:54:24,587 INFO  org.tikv.common.TiSession                                    [] - enable grpc forward for high available
19:54:24,590 INFO  org.tikv.common.TiSession                                    [] - TiSession initialized in TXN mode

暴露NodePort 的Service 具体如下:

apiVersion: v1
kind: Service
metadata:
  annotations:
    field.cattle.io/creatorId: user-xn7fw
    field.cattle.io/ipAddresses: "null"
    field.cattle.io/publicEndpoints: '[{"addresses":["IP"],"port":30537,"protocol":"TCP","serviceName":"tidb-cluster:tidb-cluster-pd-svc","allNodes":true}]'
    field.cattle.io/targetDnsRecordIds: "null"
    field.cattle.io/targetWorkloadIds: '["statefulset:tidb-cluster:tidb-cluster-pd"]'
  creationTimestamp: "2022-05-31T11:44:14Z"
  labels:
    app.kubernetes.io/component: pd
    app.kubernetes.io/instance: tidb-cluster
    app.kubernetes.io/managed-by: tidb-operator
    app.kubernetes.io/name: tidb-cluster
    cattle.io/creator: norman
  managedFields:
  - apiVersion: v1
    fieldsType: FieldsV1
    fieldsV1:
      f:spec:
        f:selector:
          .: {}
          f:workloadID_tidb-cluster-pd-svc: {}
    manager: agent
    operation: Update
    time: "2022-05-31T11:44:20Z"
  - apiVersion: v1
    fieldsType: FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .: {}
          f:field.cattle.io/creatorId: {}
          f:field.cattle.io/ipAddresses: {}
          f:field.cattle.io/publicEndpoints: {}
          f:field.cattle.io/targetDnsRecordIds: {}
          f:field.cattle.io/targetWorkloadIds: {}
        f:labels:
          .: {}
          f:app.kubernetes.io/component: {}
          f:app.kubernetes.io/instance: {}
          f:app.kubernetes.io/managed-by: {}
          f:app.kubernetes.io/name: {}
          f:cattle.io/creator: {}
      f:spec:
        f:externalTrafficPolicy: {}
        f:ports:
          .: {}
          k:{"port":2379,"protocol":"TCP"}:
            .: {}
            f:name: {}
            f:port: {}
            f:protocol: {}
            f:targetPort: {}
        f:sessionAffinity: {}
        f:type: {}
    manager: rancher
    operation: Update
    time: "2022-05-31T11:54:03Z"
  name: tidb-cluster-pd-svc
  namespace: tidb-cluster
  resourceVersion: "7567662"
  uid: a4adcf8b-c5a5-4935-adea-3a8900e3da81
spec:
  clusterIP: 10.43.200.64
  clusterIPs:
  - 10.43.200.64
  externalTrafficPolicy: Cluster
  ports:
  - name: myclient
    nodePort: 30537
    port: 2379
    protocol: TCP
    targetPort: 2379
  selector:
    workloadID_tidb-cluster-pd-svc: "true"
  sessionAffinity: None
  type: NodePort
status:
  loadBalancer: {}

用的是Rancher?
你这个nodeport暴露是pd这个pod,但是这一个pod里面跑着三个pd容器对吧,所以,程序内部用ip:port获取了三个pd的基础信息,然后重定向到leader上了,所以就用了k8s内部的host,这样能理解吧

是用的rancher部署的,service是解析到tidb-cluster-pd workload的。 请教下正确的暴露pd nodeport用于外部访问的姿势是哪种?:rofl:

看之前官方文档有说明 TiDB Operator部署的TiDB集群,使用TiSpark访问要求Spark必须与TiDB处于同一K8S集群内才行,感觉使用flink tidb cdc connetor是不是也要求Flink集群必须与TiDB在同一K8S集群内才行(根据log显示的每次都会走到K8S内部的DNS记录上去)?如果是这样,估计怎么暴露nodeport都不行

不是同一个集群?那难怪会报错,我对k8s上部署的tidb种种配置不是很熟,目前看来,避免这个问题的方式,只能是把这两个部署在一个集群了

可能是了,就是这样本地环境调试开发比较麻烦。多谢了

此话题已在最后回复的 1 分钟后被自动关闭。不再允许新回复。