[FATAL] [server.rs:698] ["failed to start node: Other(\"[components/pd_client/src/util.rs:756]: version should compatible with version 7.5.0, got 5.0.1\")"]

【 TiDB 使用环境】生产环境
【 TiDB 版本】使用k8s+kubesphere+operator的方式部署tidb
【复现路径】增加 kv pod 节点
【遇到的问题:问题现象及影响】[FATAL] [server.rs:698] [“failed to start node: Other("[components/pd_client/src/util.rs:756]: version should compatible with version 7.5.0, got 5.0.1")”]
【资源配置】
【附件:截图/日志/监控】

是不是组件版本没统一

你好 请问哪里可以配置kv的版本呢 我用的是operator(v1.6.0-alpha.8)的方式部署的 都是默认配置


你全部换7.5的版本的镜像试下

TiDB 版本 适用的 TiDB Operator 版本
dev dev
TiDB >= 7.1 1.5(推荐),1.4
6.5 <= TiDB < 7.1 1.5, 1.4(推荐),1.3
5.4 <= TiDB < 6.5 1.4, 1.3(推荐)
5.1 <= TiDB < 5.4 1.4,1.3(推荐),1.2
3.0 <= TiDB < 5.1 1.4,1.3(推荐),1.2,1.1
2.1 <= TiDB < v3.0 1.0(停止维护)

好的👌 感谢回复🙏 我试一下

把5的版本换成7的

你好 感谢回复🙏 线上环境可以随意更改镜像版本嘛 数据会不会丢失之类的呢

yaml配置文件显示的镜像 kitv tidb tipd 的版本都是 v6.5.0 但是pod启动之后不知道为什么kv就变成5.0.1了
image

kubectl describe pod 看看tikv的镜像是不是真的5.0.1

Name: tidb-cluster-tikv-4
Namespace: tidb-cluster
Priority: 0
Node: node3/192.168.0.7
Start Time: Tue, 23 Jan 2024 18:04:29 +0800
Labels: app.kubernetes.io/component=tikv
app.kubernetes.io/instance=tidb-cluster
app.kubernetes.io/managed-by=tidb-operator
app.kubernetes.io/name=tidb-cluster
controller-revision-hash=tidb-cluster-tikv-5cc88d8bbf
statefulset.kubernetes.io/pod-name=tidb-cluster-tikv-4
tidb.pingcap.com/cluster-id=7310155728820563327
Annotations: cni.projectcalico.org/containerID: 908b4d126bc04b62d689a52c3ede5b03fb4e46b076c851a86365b3c9ea505fd3
cni.projectcalico.org/podIP: 10.233.92.49/32
cni.projectcalico.org/podIPs: 10.233.92.49/32
prometheus.io/path: /metrics
prometheus.io/port: 20180
prometheus.io/scrape: true
Status: Running
IP: 10.233.92.49
IPs:
IP: 10.233.92.49
Controlled By: StatefulSet/tidb-cluster-tikv
Containers:
tikv:
Container ID: containerd://5533c31def1bedbeee0ff9862cdbc5e66a3ef1f59804a98ca21d9e309e9fd345
Image: pingcap/tikv
Image ID: docker.io/pingcap/tikv@sha256:2b0992519eb2cabdf22291a7066c0ab5cb93373825366c5b6cf97b273eb2cb53
Port: 20160/TCP
Host Port: 0/TCP
Command:
/bin/sh
/usr/local/bin/tikv_start_script.sh
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Wed, 24 Jan 2024 10:59:13 +0800
Finished: Wed, 24 Jan 2024 10:59:13 +0800
Ready: False
Restart Count: 202
Environment:
NAMESPACE: tidb-cluster (v1:metadata.namespace)
CLUSTER_NAME: tidb-cluster
HEADLESS_SERVICE_NAME: tidb-cluster-tikv-peer
CAPACITY: 0
TZ: UTC
Mounts:
/etc/podinfo from annotations (ro)
/etc/tikv from config (ro)
/usr/local/bin from startup-script (ro)
/var/lib/tikv from tikv (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-2hhv5 (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
tikv:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: tikv-tidb-cluster-tikv-4
ReadOnly: false
annotations:
Type: DownwardAPI (a volume populated by information about the pod)
Items:
metadata.annotations → annotations
config:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: tidb-cluster-tikv-1c8d5543
Optional: false
startup-script:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: tidb-cluster-tikv-1c8d5543
Optional: false
kube-api-access-2hhv5:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional:
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors:
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message


Warning BackOff 2m39s (x4655 over 16h) kubelet Back-off restarting failed container

好像没有显示具体的镜像版本信息 有md5值

确实是5.0.1 sha256对的上

tidbcluster里面的镜像路径别写版本号,然后在spec:version 里面单独指定一个版本。类似下面的例子,我去掉了其他的内容,只保留了镜像和版本。这样的话所有组件用的版本就都是同一个了。例子来自于: https://github.com/pingcap/tidb-operator/blob/master/examples/basic-random-password/tidb-cluster.yaml

apiVersion: pingcap.com/v1alpha1
kind: TidbCluster
metadata:
  name: basic
spec:
  version: v7.1.1
  helper:
    image: alpine:3.16.0
  pd:
    baseImage: pingcap/pd
  tikv:
    baseImage: pingcap/tikv
  tidb:
    baseImage: pingcap/tidb

好滴 感谢回复! 我试一下

你好大佬 昨天我按照这个配置把version提到外面 还是没有生效 还是5.0.1版本 :joy:

这个有点奇怪。

  1. 检查sts: kubectl get sts xxx-tikv -n xxx -oyaml 看看tikv的image是不是正确的版本,如果是正确的,继续检查pod的。
  2. 检查pod: kubectl get pod xxx-tikv-x -n xxx -oyaml 看看image,如果还对,去node上看看
  3. 登录这个节点所在的node,执行 docker image list ,看看是不是tikv的tag打错了?直接删掉这个image:
    docker rmi xxx
  4. 然后重建 pod

我按照您说的看了一下 kubectl get pod 里面的镜像内容是:
image: docker.io/pingcap/tikv:latest
imageID: docker.io/pingcap/tikv@sha256:2b0992519eb2cabdf22291a7066c0ab5cb93373825366c5b6cf97b273eb2cb53

这个sha256的值我去dockerhub上对比了一下是5.0.1版本的
但是其他的几个kv pod 是下面的这样的
image: docker.io/pingcap/tikv:latest
imageID: docker.io/pingcap/tikv@sha256:d2adb67c75e9d25dda8c8c367c1db269e079dfd2f8427c8aff0ff44ec1c1be09

其他的几个节点上的 kv pod可以正常运行没有这个问题 就是某两个节点有这个问题

然后我删除重建立有问题的 kv pod节点 还是一样的sha256 和 版本(5.0.1) :joy:

  1. latest 肯定是不对的,dockerhub上的latest也不是5.0.1,是不是你的这俩节点的docker的仓库地址被镜像到了第三方地址。
  2. 看看sts里面的版本对不对。如果这个都不对,那得看看operator了。运行的镜像不应该有latest这样的版本

我刚刚看了一下sts的状态 如下图所示:


image
sts 显示的镜像 确实也是不对的

我的operator 的版本是 v1.6.0-alpha.9 是在 应用模版里部署的


感谢回复!

这个是我的 tidb-cluster CRD 的全部内容
apiVersion: pingcap.com/v1alpha1
kind: TidbCluster
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: >
      {"apiVersion":"pingcap.com/v1alpha1","kind":"TidbCluster","metadata":{"annotations":{"kubesphere.io/creator":"admin","meta.helm.sh/release-name":"tidb-cluster","meta.helm.sh/release-namespace":"tidb-cluster","pingcap.com/ha-topology-key":"kubernetes.io/hostname","pingcap.com/pd.tidb-cluster-pd.sha":"cfa0d77a","pingcap.com/tidb.tidb-cluster-tidb.sha":"866b9771","pingcap.com/tikv.tidb-cluster-tikv.sha":"1c8d5543"},"labels":{"app.kubernetes.io/component":"tidb-cluster","app.kubernetes.io/instance":"tidb-cluster","app.kubernetes.io/managed-by":"Helm","app.kubernetes.io/name":"tidb-cluster","app.kubesphere.io/instance":"tidb-cluster","helm.sh/chart":"tidb-cluster-v1.6.0-alpha.8"},"name":"tidb-cluster","namespace":"tidb-cluster"},"spec":{"discovery":{},"enablePVReclaim":false,"helper":{"image":"busybox:1.34.1"},"imagePullPolicy":"IfNotPresent","pd":{"affinity":{},"baseImage":"pingcap/pd","enableDashboardInternalProxy":true,"hostNetwork":false,"image":"pingcap/pd:v6.5.0","imagePullPolicy":"IfNotPresent","maxFailoverCount":3,"replicas":3,"requests":{"storage":"1Gi"},"startTimeout":30,"storageClassName":"nfs-client"},"pvReclaimPolicy":"Retain","schedulerName":"tidb-scheduler","services":[{"name":"pd","type":"ClusterIP"}],"tidb":{"affinity":{},"baseImage":"pingcap/tidb","binlogEnabled":false,"hostNetwork":false,"image":"pingcap/tidb:v6.5.0","imagePullPolicy":"IfNotPresent","maxFailoverCount":3,"replicas":2,"separateSlowLog":true,"slowLogTailer":{"image":"busybox:1.33.0","imagePullPolicy":"IfNotPresent","limits":{"cpu":"100m","memory":"50Mi"},"requests":{"cpu":"20m","memory":"5Mi"}},"tlsClient":{}},"tikv":{"affinity":{},"baseImage":"pingcap/tikv","hostNetwork":false,"image":"pingcap/tikv:v6.5.0","imagePullPolicy":"IfNotPresent","maxFailoverCount":3,"replicas":3,"requests":{"storage":"10Gi"},"scalePolicy":{"scaleInParallelism":1,"scaleOutParallelism":1},"storageClassName":"nfs-client"},"timezone":"UTC","tiproxy":{"baseImage":"pingcap/tiproxy","imagePullPolicy":"IfNotPresent","replicas":0,"requests":{"storage":"1Gi"},"storageClassName":"nfs-client","version":"v6.5.0"},"tlsCluster":{},"version":""}}
    kubesphere.io/creator: admin
    meta.helm.sh/release-name: tidb-cluster
    meta.helm.sh/release-namespace: tidb-cluster
    pingcap.com/ha-topology-key: kubernetes.io/hostname
    pingcap.com/pd.tidb-cluster-pd.sha: cfa0d77a
    pingcap.com/tidb.tidb-cluster-tidb.sha: 866b9771
    pingcap.com/tikv.tidb-cluster-tikv.sha: 1c8d5543
  labels:
    app.kubernetes.io/component: tidb-cluster
    app.kubernetes.io/instance: tidb-cluster
    app.kubernetes.io/managed-by: Helm
    app.kubernetes.io/name: tidb-cluster
    app.kubesphere.io/instance: tidb-cluster
    helm.sh/chart: tidb-cluster-v1.6.0-alpha.8
  name: tidb-cluster
  namespace: tidb-cluster
spec:
  discovery: {}
  enablePVReclaim: false
  version: v7.5.0
  helper:
    image: 'busybox:1.34.1'
  imagePullPolicy: IfNotPresent
  pd:
    affinity: {}
    baseImage: pingcap/pd
    enableDashboardInternalProxy: true
    hostNetwork: false
    imagePullPolicy: IfNotPresent
    maxFailoverCount: 3
    replicas: 3
    requests:
      storage: 1Gi
    startTimeout: 30
    storageClassName: nfs-client
  pvReclaimPolicy: Retain
  schedulerName: tidb-scheduler
  services:
    - name: pd
      type: ClusterIP
  tidb:
    affinity: {}
    baseImage: pingcap/tidb
    binlogEnabled: false
    hostNetwork: false
    imagePullPolicy: IfNotPresent
    maxFailoverCount: 3
    replicas: 2
    separateSlowLog: true
    slowLogTailer:
      image: 'busybox:1.33.0'
      imagePullPolicy: IfNotPresent
      limits:
        cpu: 100m
        memory: 50Mi
      requests:
        cpu: 20m
        memory: 5Mi
    tlsClient: {}
  tikv:
    affinity: {}
    baseImage: pingcap/tikv
    image: 'pingcap/tikv'
    hostNetwork: false
    imagePullPolicy: IfNotPresent
    maxFailoverCount: 3
    replicas: 3
    requests:
      storage: 20Gi
    scalePolicy:
      scaleInParallelism: 1
      scaleOutParallelism: 1
    storageClassName: nfs-client
  timezone: Asia/Shanghai
  tiproxy:
    baseImage: pingcap/tiproxy
    imagePullPolicy: IfNotPresent
    replicas: 0
    requests:
      storage: 1Gi
    storageClassName: nfs-client
    version: v6.5.0
  tlsCluster: {}