开启tidb-定时备份时错误

luyuandeng · 2019 年10 月 28 日 03:28

为提高效率，提问时请尽量提供详细背景信息，问题描述清晰可优先响应。以下信息点请尽量提供：

【系统版本 & kernel 版本】centos 7.5
【TiDB 版本】 3.0.1
【磁盘型号】
【集群节点分布】
【数据量 & region 数量 & 副本数】

kubectl create secret generic backup-secret --from-literal=user=root --from-literal=password=dexx1234 -n tidb-admin

##values.yaml文件scheduledBackup修改如下：

scheduledBackup:
  create: true
  # https://github.com/pingcap/tidb-cloud-backup
  mydumperImage: pingcap/tidb-cloud-backup:20190610
  mydumperImagePullPolicy: IfNotPresent
  # storageClassName is a StorageClass provides a way for administrators to describe the "classes" of storage they offer.
  # different classes might map to quality-of-service levels, or to backup policies,
  # or to arbitrary policies determined by the cluster administrators.
  # refer to https://kubernetes.io/docs/concepts/storage/storage-classes
  storageClassName: local-storage
  storage: 100Gi
  # https://kubernetes.io/docs/tasks/job/automated-tasks-with-cron-jobs/#schedule
  schedule: "0 0 * * *"
  # https://kubernetes.io/docs/tasks/job/automated-tasks-with-cron-jobs/#suspend
  suspend: false
  # https://kubernetes.io/docs/tasks/job/automated-tasks-with-cron-jobs/#jobs-history-limits
  successfulJobsHistoryLimit: 3
  failedJobsHistoryLimit: 1
  # https://kubernetes.io/docs/tasks/job/automated-tasks-with-cron-jobs/#starting-deadline
  startingDeadlineSeconds: 3600
  # https://github.com/maxbube/mydumper/blob/master/docs/mydumper_usage.rst#options
  options: "--verbose=3"
  # secretName is the name of the secret which stores user and password used for backup
  # Note: you must give the user enough privilege to do the backup
  # you can create the secret by:
  # kubectl create secret generic backup-secret --from-literal=user=root --from-literal=password=<password>
  secretName: backup-secret
  # backup to gcp
  gcp: {}
  # bucket: ""
  # secretName is the name of the secret which stores the gcp service account credentials json file
  # The service account must have read/write permission to the above bucket.
  # Read the following document to create the service account and download the credentials file as credentials.json:
  # https://cloud.google.com/docs/authentication/production#obtaining_and_providing_service_account_credentials_manually
  # And then create the secret by: kubectl create secret generic gcp-backup-secret --from-file=./credentials.json
  # secretName: gcp-backup-secret

  # backup to ceph object storage
  ceph: {}
  # endpoint: ""
  # bucket: ""
  # secretName is the name of the secret which stores ceph object store access key and secret key
  # You can create the secret by:
  # kubectl create secret generic ceph-backup-secret --from-literal=access_key=<access-key> --from-literal=secret_key=<secret-key>
  # secretName: ceph-backup-secret

  # backup to s3
  s3: {}
  # region: ""
  # bucket: ""
  # secretName is the name of the secret which stores s3 object store access key and secret key
  # You can create the secret by:
  # kubectl create secret generic s3-backup-secret --from-literal=access_key=<access-key> --from-literal=secret_key=<secret-key>
  # secretName: s3-backup-secret

metaInstance: "{{ $labels.instance }}"
metaType: "{{ $labels.type }}"
metaValue: "{{ $value }}"

##按照官方文档的说明操作开启定时备份命令：

helm upgrade tidb-cluster /home/grg/k8s-kubeadm-v1.14.1/denali-tidb/tidb-cluster/tidb-cluster-v1.0.0.tgz -f /home/grg/k8s-kubeadm-v1.14.1/denali-tidb/tidb-cluster/values-denali-tidb.yaml --version=1.0.0 --namespace=tidb-admin

错误：

onlymellb-PingCAP · 2019 年10 月 28 日 03:46

你这个 secret backup-secret 创建了么？看上去像是没有创建导致

luyuandeng · 2019 年10 月 28 日 03:57

当然是已经创建了

kubectl create secret generic backup-secret --from-literal=user=root --from-literal=password=dexx1234 -n tidb-admin 这个命令已经执行了

onlymellb-PingCAP · 2019 年10 月 28 日 04:13

我看错了，你这个更新有问题，你是在原来创建集群的 tidb-cluster chart 里面进行升级的么？你之前部署 tidb-cluster 的时候修改过一些 values.yaml 里面的值，你如果是在原来的 values.yaml 中打开 backup schedule 的话，只会创建定时调度任务，而这个里面报错的是 tidb-cluster-tidb-initializer 这个初始化 tidb 集群密码的 job. 说明这个 job 的配置也被更新了，你检查一下

luyuandeng · 2019 年10 月 28 日 05:32

是的。我是在原来的集群上升级启用定时玩更新

[root@master1 tidb-cluster]# kubectl get job -A
NAMESPACE    NAME                            COMPLETIONS   DURATION   AGE
tidb-admin   tidb-cluster-tidb-initializer   1/1           4m         3h29m

确实有你说的job那我升级难道要先删除这个job还是说怎么的你说的检查是我具体检查那些东西呀

luyuandeng · 2019 年10 月 28 日 05:37

[root@master1 tidb-cluster]# helm upgrade tidb-cluster /home/grg/k8s-kubeadm-v1.14.1/denali-tidb/tidb-cluster/tidb-cluster-v1.0.0.tgz   -f  /home/grg/k8s-kubeadm-v1.14.1/denali-tidb/tidb-cluster/values-denali-tidb.yaml  --version=1.0.0 --namespace=tidb-admin
UPGRADE FAILED
Error: failed to create resource: Job.batch "tidb-cluster-tidb-initializer" is invalid: [spec.template.spec.volumes[0].secret.secretName: Required value, spec.template.spec.containers[0].volumeMounts[0].name: Not found: "password"]
Error: UPGRADE FAILED: failed to create resource: Job.batch "tidb-cluster-tidb-initializer" is invalid: [spec.template.spec.volumes[0].secret.secretName: Required value, spec.template.spec.containers[0].volumeMounts[0].name: Not found: "password"]
[root@master1 tidb-cluster]# 
[root@master1 tidb-cluster]#

如上面所示我删除那个job后再更新然后出现上面所示的错误原因又是什么呢？该如何解决！

luyuandeng · 2019 年10 月 28 日 05:39

上一个是回复你的！

onlymellb-PingCAP · 2019 年10 月 28 日 05:47

我是这个意思，就是假设你之前更新了 tidb-cluster 的 values.yaml，然后部署了 tidb 集群 A，现在你想要将这个集群开启定时备份功能，此时正常来说，你只需要将之前的 values.yaml 中 scheduledBackup 打开然后更新就可以了，但是现在我看到的情况是你更新了 scheduledBackup，但是用 helm 去升级的时候为什么会去更新 tidb-cluster-tidb-initializer 这个 job？除非你现在修改的 values.yaml 不是你之前部署 tidb 集群 A 时用到的那个，现在 tidb-cluster-tidb-initializer 这个 job 的报错原因是找不到他需要的 secret，这个 secret 是在 values.yaml 中 tidb 那段配置里面的 passwordSecretName 字段，也就是 .Values.tidb.passwordSecretName，如果你找不到以前的 values.yaml 文件了，那你就重新给 tidb-cluster-tidb-initializer 这个job 创建一个 secret 去修复吧

luyuandeng · 2019 年10 月 28 日 05:51

OK 。谢谢这次明白你说的啦！！

luyuandeng · 2019 年10 月 28 日 06:06

[root@master1 tidb-cluster]# kubectl logs tidb-cluster-tidb-initializer-qnrd8 -n tidb-admin
Traceback (most recent call last):
  File "<string>", line 5, in <module>
  File "/usr/local/lib/python3.6/site-packages/MySQLdb/__init__.py", line 84, in Connect
    return Connection(*args, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/MySQLdb/connections.py", line 164, in __init__
    super(Connection, self).__init__(*args, **kwargs2)
MySQLdb._exceptions.OperationalError: (1045, "Access denied for user 'root'@'10.244.5.94' (using password: NO)")

现在执行helm upgrade命令没有出错了但是执行完后 tidb-cluster-tidb-initializer启动报错了上面是报错信息。这个问题原因又可能是什么？

luyuandeng · 2019 年10 月 28 日 06:21

我是添加–set tidb.passwordSecretName=tidb-secret 运行之前的升级命令然后就出现上面这个异常

onlymellb-PingCAP · 2019 年10 月 28 日 06:22

这报错很明显了。。。密码不对，访问不了 tidb

luyuandeng · 2019 年10 月 28 日 06:23

但是这个密码是指那里配置的密码呢？

onlymellb-PingCAP · 2019 年10 月 28 日 06:29

这个初始化job 不能执行第二遍，只有在集群初始创建成功没有密码的时候可以执行，你把 values.yaml 里面 .Values.tidb.passwordSecretName .Values.tidb.permitHost .Values.tidb.initSql .Values.tidb.initSqlConfigM apName 这个几个字段都注释掉吧，没有的话就不用管了

luyuandeng · 2019 年10 月 28 日 06:50

修改了一下执行更新
并没有任何反应。。
可能又是怎么回事呢

onlymellb-PingCAP · 2019 年10 月 28 日 07:00

scheduledBackup 这个 cronjob 没有创建么？你看 kubectl get cronjob -n tidb-admin 看下

luyuandeng · 2019 年10 月 28 日 07:08

denali       mysql-pvc                              Bound     mysql-pv                               8Gi        RWX            nfs                       77d
denali       zk-data-denali-zookeeper-0             Bound     zookeeper-local-pv-0                   10Gi       RWO            local-storage-zookeeper   10d
denali       zk-data-denali-zookeeper-1             Bound     zookeeper-local-pv-1                   10Gi       RWO            local-storage-zookeeper   10d
denali       zk-data-denali-zookeeper-2             Pending                                                                    local-storage-zookeeper   10d
tidb-admin   pd-tidb-cluster-pd-0                   Bound     local-pv-bebb50a                       7381Gi     RWO            tipd-storage              4d3h
tidb-admin   pd-tidb-cluster-pd-1                   Bound     local-pv-ca10fc34                      7381Gi     RWO            tipd-storage              4d3h
tidb-admin   pd-tidb-cluster-pd-2                   Bound     local-pv-3b1bb339                      7381Gi     RWO            tipd-storage              4d3h
**tidb-admin   tidb-cluster-scheduled-backup          Pending                                                                    local-storage             6m52s**
tidb-admin   tikv-tidb-cluster-tikv-0               Bound     local-pv-31358b0b                      7381Gi     RWO            tikv-storage              4d3h
tidb-admin   tikv-tidb-cluster-tikv-1               Bound     local-pv-dd70accd                      7381Gi     RWO            tikv-storage              4d3h
tidb-admin   tikv-tidb-cluster-tikv-2               Bound     local-pv-90474fc4                      7381Gi     RWO            tikv-storage              4d3h
[root@master1 tidb-cluster]# 
[root@master1 tidb-cluster]# 
[root@master1 tidb-cluster]# 
[root@master1 tidb-cluster]#  kubectl get cronjob -n tidb-admin
NAME                            SCHEDULE    SUSPEND   ACTIVE   LAST SCHEDULE   AGE
tidb-cluster-scheduled-backup   0 0 * * *   False     0        <none>          17m
[root@master1 tidb-cluster]#

有这个cronjob的但是我的pvc像是没有绑定上。而且我的镜像也是没去拉取的那些东西都是要等任务时间到了才会去绑定和拉取镜像吗、

onlymellb-PingCAP · 2019 年10 月 28 日 07:09

是的，得等时间到了才会调度，那你的 pvc 没有绑定上，等到调度时间启动备份job 也会一直是pending 的

system · 2022 年10 月 31 日 19:11

此话题已在最后回复的 1 分钟后被自动关闭。不再允许新回复。