Restore没有改状态

  • 【TiDB 版本】:
    tidb-backup-manager:v1.1.3
    tidb-operator:v1.1.3
    tidb:v4.0.3

  • 【问题描述】:
    执行restore,restore的job complete后,restore没有改状态。

kubectl get restore -n tidb-oshkk20jx3 -o yaml
apiVersion: v1
items:
- apiVersion: pingcap.com/v1alpha1
  kind: Restore
  metadata:
...
  spec:
...
  status:
    commitTs: ""
    conditions:
    - lastTransitionTime: 2020-10-12T07:27:12Z
      message: ""
      reason: ""
      status: "True"
      type: Scheduled
    timeCompleted: null
    timeStarted: null
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""

kubectl get job restore-restore-1602487630099848503 -n tidb-oshkk20jx3 -o yaml

apiVersion: batch/v1
kind: Job
metadata:
...
spec:
...
status:
  completionTime: 2020-10-12T07:27:13Z
  conditions:
  - lastProbeTime: 2020-10-12T07:27:13Z
    lastTransitionTime: 2020-10-12T07:27:13Z
    status: "True"
    type: Complete
  startTime: 2020-10-12T07:27:10Z
  succeeded: 1

kubectl logs restore-restore-1602487630099848503-gmmxs -n tidb-oshkk20jx3

Create rclone.conf file.
/tidb-backup-manager restore --namespace=tidb-oshkk20jx3 --restoreName=restore-1602487630099848503 --tikvVersion=v4.0.3
I1012 07:27:11.979027       1 restore.go:73] start to process restore tidb-oshkk20jx3/restore-1602487630099848503
I1012 07:27:11.998696       1 restore_status_updater.go:66] Restore: [tidb-oshkk20jx3/restore-1602487630099848503] updated successfully
I1012 07:27:12.134786       1 manager.go:167] cluster tidb-oshkk20jx3/restore-1602487630099848503 tikv_gc_life_time is 10m0s
I1012 07:27:12.143938       1 manager.go:231] set cluster tidb-oshkk20jx3/restore-1602487630099848503 tikv_gc_life_time to 72h success
I1012 07:27:12.143994       1 restore.go:63] Running br command with args: [restore full --storage=s3://s3addr/tidb-81t4ndbc0e/manaul-backup/1602486850165/ --s3.provider=ceph --s3.endpoint=s3addr --pd=tidb-oshkk20jx3-pd.tidb-oshkk20jx3:2379]
I1012 07:27:12.170848       1 restore.go:86] [2020/10/12 07:27:12.170 +00:00] [INFO] [version.go:33] ["Welcome to Backup & Restore (BR)"]
I1012 07:27:12.170876       1 restore.go:86] [2020/10/12 07:27:12.170 +00:00] [INFO] [version.go:34] [BR] [release-version=v4.0.3]
I1012 07:27:12.170887       1 restore.go:86] [2020/10/12 07:27:12.170 +00:00] [INFO] [version.go:35] [BR] [git-hash=c91c79a8431805dcbfda2c7b2612dde6985e564c]
I1012 07:27:12.170896       1 restore.go:86] [2020/10/12 07:27:12.170 +00:00] [INFO] [version.go:36] [BR] [git-branch=heads/refs/tags/v4.0.3]
I1012 07:27:12.170908       1 restore.go:86] [2020/10/12 07:27:12.170 +00:00] [INFO] [version.go:37] [BR] [go-version=go1.13]
I1012 07:27:12.170917       1 restore.go:86] [2020/10/12 07:27:12.170 +00:00] [INFO] [version.go:38] [BR] [utc-build-time="2020-07-24 01:13:58"]
I1012 07:27:12.170927       1 restore.go:86] [2020/10/12 07:27:12.170 +00:00] [INFO] [version.go:39] [BR] [race-enabled=false]
I1012 07:27:12.170939       1 restore.go:86] [2020/10/12 07:27:12.170 +00:00] [INFO] [common.go:381] [arguments] [pd="[tidb-oshkk20jx3-pd.tidb-oshkk20jx3:2379]"] [s3.endpoint=s3addr] [s3.provider=ceph] [storage=s3://s3addr/tidb-81t4ndbc0e/manaul-backup/1602486850165/]
I1012 07:27:12.171055       1 restore.go:86] [2020/10/12 07:27:12.170 +00:00] [INFO] [client.go:149] ["[pd] create pd client with endpoints"] [pd-address="[tidb-oshkk20jx3-pd.tidb-oshkk20jx3:2379]"]
I1012 07:27:12.176176       1 restore.go:86] [2020/10/12 07:27:12.176 +00:00] [INFO] [base_client.go:226] ["[pd] update member urls"] [old-urls="[http://tidb-oshkk20jx3-pd.tidb-oshkk20jx3:2379]"] [new-urls="[http://tidb-oshkk20jx3-pd-0.tidb-oshkk20jx3-pd-peer.tidb-oshkk20jx3.svc:2379]"]
I1012 07:27:12.176194       1 restore.go:86] [2020/10/12 07:27:12.176 +00:00] [INFO] [base_client.go:242] ["[pd] switch leader"] [new-leader=http://tidb-oshkk20jx3-pd-0.tidb-oshkk20jx3-pd-peer.tidb-oshkk20jx3.svc:2379] [old-leader=]
I1012 07:27:12.176245       1 restore.go:86] [2020/10/12 07:27:12.176 +00:00] [INFO] [base_client.go:92] ["[pd] init cluster id"] [cluster-id=6882631627490935648]
I1012 07:27:12.183399       1 restore.go:86] [2020/10/12 07:27:12.183 +00:00] [INFO] [client.go:149] ["[pd] create pd client with endpoints"] [pd-address="[tidb-oshkk20jx3-pd.tidb-oshkk20jx3:2379]"]
I1012 07:27:12.186796       1 restore.go:86] [2020/10/12 07:27:12.186 +00:00] [INFO] [base_client.go:226] ["[pd] update member urls"] [old-urls="[http://tidb-oshkk20jx3-pd.tidb-oshkk20jx3:2379]"] [new-urls="[http://tidb-oshkk20jx3-pd-0.tidb-oshkk20jx3-pd-peer.tidb-oshkk20jx3.svc:2379]"]
I1012 07:27:12.186826       1 restore.go:86] [2020/10/12 07:27:12.186 +00:00] [INFO] [base_client.go:242] ["[pd] switch leader"] [new-leader=http://tidb-oshkk20jx3-pd-0.tidb-oshkk20jx3-pd-peer.tidb-oshkk20jx3.svc:2379] [old-leader=]
I1012 07:27:12.186851       1 restore.go:86] [2020/10/12 07:27:12.186 +00:00] [INFO] [base_client.go:92] ["[pd] init cluster id"] [cluster-id=6882631627490935648]
I1012 07:27:12.190011       1 restore.go:86] [2020/10/12 07:27:12.189 +00:00] [INFO] [conn.go:206] ["new mgr"] [pdAddrs=tidb-oshkk20jx3-pd.tidb-oshkk20jx3:2379]
I1012 07:27:12.190453       1 restore.go:86] [2020/10/12 07:27:12.190 +00:00] [INFO] [tidb.go:71] ["new domain"] [store=tikv-6882631627490935648] ["ddl lease"=1s] ["stats lease"=-1ns]
I1012 07:27:12.193516       1 restore.go:86] [2020/10/12 07:27:12.193 +00:00] [INFO] [ddl.go:316] ["[ddl] start DDL"] [ID=76926add-408b-4c61-844f-f40cc3ac27cb] [runWorker=true]
I1012 07:27:12.193531       1 restore.go:86] [2020/10/12 07:27:12.193 +00:00] [INFO] [manager.go:185] ["start campaign owner"] [ownerInfo="[ddl] /tidb/ddl/fg/owner"]
I1012 07:27:12.196425       1 restore.go:86] [2020/10/12 07:27:12.196 +00:00] [INFO] [ddl.go:305] ["[ddl] start delRangeManager OK"] ["is a emulator"=false]
I1012 07:27:12.196457       1 restore.go:86] [2020/10/12 07:27:12.196 +00:00] [INFO] [ddl_worker.go:130] ["[ddl] start DDL worker"] [worker="worker 1, tp general"]
I1012 07:27:12.196582       1 restore.go:86] [2020/10/12 07:27:12.196 +00:00] [INFO] [ddl_worker.go:130] ["[ddl] start DDL worker"] [worker="worker 2, tp add index"]
I1012 07:27:12.216213       1 restore.go:86] [2020/10/12 07:27:12.216 +00:00] [INFO] [domain.go:145] ["full load InfoSchema success"] [usedSchemaVersion=0] [neededSchemaVersion=22] ["start time"=8.782378ms]
I1012 07:27:12.216996       1 restore.go:86] [2020/10/12 07:27:12.216 +00:00] [INFO] [domain.go:369] ["full load and reset schema validator"]
I1012 07:27:12.221501       1 restore.go:86] [2020/10/12 07:27:12.221 +00:00] [INFO] [set.go:207] ["set session var"] [conn=0] [name=sql_mode] [val=]
I1012 07:27:12.295204       1 restore.go:86] [2020/10/12 07:27:12.295 +00:00] [INFO] [client.go:179] ["load backupmeta"] [databases=0] [jobs=0]
I1012 07:27:12.295226       1 restore.go:86] [2020/10/12 07:27:12.295 +00:00] [WARN] [restore.go:548] ["set max-index-length to max(3072*4) to skip check index length in DDL"]
I1012 07:27:12.295233       1 restore.go:86] [2020/10/12 07:27:12.295 +00:00] [INFO] [restore.go:178] ["nothing to restore, all databases and tables are filtered out"]
I1012 07:27:12.295245       1 restore.go:86] [2020/10/12 07:27:12.295 +00:00] [INFO] [client.go:159] ["Restore client closed"]
I1012 07:27:12.296172       1 restore.go:86] [2020/10/12 07:27:12.296 +00:00] [INFO] [manager.go:271] ["failed to campaign"] ["owner info"="[ddl] /tidb/ddl/fg/owner ownerManager 76926add-408b-4c61-844f-f40cc3ac27cb"] [error="context canceled"]
I1012 07:27:12.296193       1 restore.go:86] [2020/10/12 07:27:12.296 +00:00] [INFO] [manager.go:252] ["break campaign loop, context is done"] ["owner info"="[ddl] /tidb/ddl/fg/owner ownerManager 76926add-408b-4c61-844f-f40cc3ac27cb"]
I1012 07:27:12.296496       1 restore.go:86] [2020/10/12 07:27:12.296 +00:00] [INFO] [ddl_worker.go:124] ["[ddl] DDL worker closed"] [worker="worker 1, tp general"] ["take time"=27.386µs]
I1012 07:27:12.296511       1 restore.go:86] [2020/10/12 07:27:12.296 +00:00] [INFO] [ddl_worker.go:124] ["[ddl] DDL worker closed"] [worker="worker 2, tp add index"] ["take time"=3.768µs]
I1012 07:27:12.296521       1 restore.go:86] [2020/10/12 07:27:12.296 +00:00] [INFO] [delete_range.go:130] ["[ddl] closing delRange"]
I1012 07:27:12.296529       1 restore.go:86] [2020/10/12 07:27:12.296 +00:00] [INFO] [session_pool.go:85] ["[ddl] closing sessionPool"]
I1012 07:27:12.296535       1 restore.go:86] [2020/10/12 07:27:12.296 +00:00] [INFO] [ddl.go:384] ["[ddl] DDL closed"] [ID=76926add-408b-4c61-844f-f40cc3ac27cb] ["take time"=1.337621ms]
I1012 07:27:12.296541       1 restore.go:86] [2020/10/12 07:27:12.296 +00:00] [INFO] [ddl.go:297] ["[ddl] stop DDL"] [ID=76926add-408b-4c61-844f-f40cc3ac27cb]
I1012 07:27:12.296868       1 restore.go:86] [2020/10/12 07:27:12.296 +00:00] [INFO] [manager.go:296] ["revoke session"] ["owner info"="[ddl] /tidb/ddl/fg/owner ownerManager 76926add-408b-4c61-844f-f40cc3ac27cb"] []
I1012 07:27:12.297790       1 restore.go:86] [2020/10/12 07:27:12.297 +00:00] [INFO] [domain.go:442] ["infoSyncerKeeper exited."]
I1012 07:27:12.298903       1 restore.go:86] [2020/10/12 07:27:12.298 +00:00] [WARN] [domain.go:511] ["reload schema in loop, schema syncer need rewatch"]
I1012 07:27:12.298923       1 restore.go:86] [2020/10/12 07:27:12.298 +00:00] [INFO] [domain.go:517] ["reload schema in loop, schema syncer need restart"]
I1012 07:27:12.298930       1 restore.go:86] [2020/10/12 07:27:12.298 +00:00] [INFO] [schema_validator.go:96] ["the schema validator stops"]
I1012 07:27:12.298939       1 restore.go:86] [2020/10/12 07:27:12.298 +00:00] [INFO] [domain.go:586] ["domain is closed"]
I1012 07:27:12.299044       1 restore.go:86] [2020/10/12 07:27:12.298 +00:00] [INFO] [syncer.go:255] ["[ddl] syncer watch global schema finished"]
I1012 07:27:12.299054       1 restore.go:86] [2020/10/12 07:27:12.298 +00:00] [ERROR] [domain.go:527] ["reload schema in loop, schema syncer restart failed"] [error="context canceled"] [errorVerbose="context canceled\
github.com/pingcap/errors.AddStack\
\t/go/pkg/mod/github.com/pingcap/errors@v0.11.5-0.20190809092503-95897b64e011/errors.go:174\
github.com/pingcap/errors.Trace\
\t/go/pkg/mod/github.com/pingcap/errors@v0.11.5-0.20190809092503-95897b64e011/juju_adaptor.go:15\
github.com/pingcap/tidb/owner.contextDone\
\t/go/pkg/mod/github.com/pingcap/tidb@v1.1.0-beta.0.20200716023258-b10faca6ff89/owner/manager.go:375\
github.com/pingcap/tidb/owner.NewSession\
\t/go/pkg/mod/github.com/pingcap/tidb@v1.1.0-beta.0.20200716023258-b10faca6ff89/owner/manager.go:145\
github.com/pingcap/tidb/ddl/util.(*schemaVersionSyncer).Restart\
\t/go/pkg/mod/github.com/pingcap/tidb@v1.1.0-beta.0.20200716023258-b10faca6ff89/ddl/util/syncer.go:217\
github.com/pingcap/tidb/domain.(*Domain).mustRestartSyncer\
\t/go/pkg/mod/github.com/pingcap/tidb@v1.1.0-beta.0.20200716023258-b10faca6ff89/domain/domain.go:551\
github.com/pingcap/tidb/domain.(*Domain).loadSchemaInLoop\
\t/go/pkg/mod/github.com/pingcap/tidb@v1.1.0-beta.0.20200716023258-b10faca6ff89/domain/domain.go:525\
runtime.goexit\
\t/usr/local/go/src/runtime/asm_amd64.s:1357"] [stack="github.com/pingcap/tidb/domain.(*Domain).loadSchemaInLoop\
\t/go/pkg/mod/github.com/pingcap/tidb@v1.1.0-beta.0.20200716023258-b10faca6ff89/domain/domain.go:527"]
I1012 07:27:12.299065       1 restore.go:86] [2020/10/12 07:27:12.299 +00:00] [INFO] [domain.go:517] ["reload schema in loop, schema syncer need restart"]
I1012 07:27:12.299074       1 restore.go:86] [2020/10/12 07:27:12.299 +00:00] [INFO] [schema_validator.go:96] ["the schema validator stops"]
I1012 07:27:12.299083       1 restore.go:86] [2020/10/12 07:27:12.299 +00:00] [INFO] [domain.go:586] ["domain is closed"]
I1012 07:27:12.299099       1 restore.go:86] [2020/10/12 07:27:12.299 +00:00] [ERROR] [domain.go:527] ["reload schema in loop, schema syncer restart failed"] [error="context canceled"] [errorVerbose="context canceled\
github.com/pingcap/errors.AddStack\
\t/go/pkg/mod/github.com/pingcap/errors@v0.11.5-0.20190809092503-95897b64e011/errors.go:174\
github.com/pingcap/errors.Trace\
\t/go/pkg/mod/github.com/pingcap/errors@v0.11.5-0.20190809092503-95897b64e011/juju_adaptor.go:15\
github.com/pingcap/tidb/owner.contextDone\
\t/go/pkg/mod/github.com/pingcap/tidb@v1.1.0-beta.0.20200716023258-b10faca6ff89/owner/manager.go:375\
github.com/pingcap/tidb/owner.NewSession\
\t/go/pkg/mod/github.com/pingcap/tidb@v1.1.0-beta.0.20200716023258-b10faca6ff89/owner/manager.go:145\
github.com/pingcap/tidb/ddl/util.(*schemaVersionSyncer).Restart\
\t/go/pkg/mod/github.com/pingcap/tidb@v1.1.0-beta.0.20200716023258-b10faca6ff89/ddl/util/syncer.go:217\
github.com/pingcap/tidb/domain.(*Domain).mustRestartSyncer\
\t/go/pkg/mod/github.com/pingcap/tidb@v1.1.0-beta.0.20200716023258-b10faca6ff89/domain/domain.go:551\
github.com/pingcap/tidb/domain.(*Domain).loadSchemaInLoop\
\t/go/pkg/mod/github.com/pingcap/tidb@v1.1.0-beta.0.20200716023258-b10faca6ff89/domain/domain.go:525\
runtime.goexit\
\t/usr/local/go/src/runtime/asm_amd64.s:1357"] [stack="github.com/pingcap/tidb/domain.(*Domain).loadSchemaInLoop\
\t/go/pkg/mod/github.com/pingcap/tidb@v1.1.0-beta.0.20200716023258-b10faca6ff89/domain/domain.go:527"]
I1012 07:27:12.300262       1 restore.go:86] [2020/10/12 07:27:12.300 +00:00] [INFO] [schema_validator.go:127] ["the schema validator stopped before updating"]
I1012 07:27:12.300293       1 restore.go:86] [2020/10/12 07:27:12.300 +00:00] [WARN] [domain.go:511] ["reload schema in loop, schema syncer need rewatch"]
I1012 07:27:12.300299       1 restore.go:86] [2020/10/12 07:27:12.300 +00:00] [INFO] [domain.go:612] ["domain closed"] ["take time"=5.027467ms]
I1012 07:27:12.300305       1 restore.go:86] [2020/10/12 07:27:12.300 +00:00] [INFO] [syncer.go:255] ["[ddl] syncer watch global schema finished"]
I1012 07:27:12.300514       1 restore.go:86] [2020/10/12 07:27:12.300 +00:00] [INFO] [collector.go:208] ["Full restore Success summary: total restore files: 0, total success: 0, total failed: 0, total take(s): 0.00"] [Size=16]
I1012 07:27:12.302631       1 restore.go:86] 
I1012 07:27:12.302717       1 restore.go:101] Restore data for cluster tidb-oshkk20jx3/restore-1602487630099848503 successfully
I1012 07:27:12.311623       1 manager.go:252] reset cluster tidb-oshkk20jx3/restore-1602487630099848503 tikv_gc_life_time to 10m0s success
I1012 07:27:12.311647       1 manager.go:267] restore cluster tidb-oshkk20jx3/restore-1602487630099848503 from  succeed
I1012 07:27:12.338841       1 restore_status_updater.go:66] Restore: [tidb-oshkk20jx3/restore-1602487630099848503] updated successfully

kubectl get pods -n tidb-oshkk20jx3 看下 restore 的 pod 状态呢

restore-restore-1602487630099848503-gmmxs    0/1     Completed   0          25h
tidb-oshkk20jx3-discovery-759b85cf57-nn7kv   1/1     Running     0          25h
tidb-oshkk20jx3-monitor-7b58584f46-cjjhn     3/3     Running     0          25h
tidb-oshkk20jx3-pd-0                         1/1     Running     0          25h
tidb-oshkk20jx3-tidb-0                       2/2     Running     0          25h
tidb-oshkk20jx3-tidb-initializer-sm57j       0/1     Completed   0          25h
tidb-oshkk20jx3-tikv-0                       1/1     Running     0          25h
tidb-oshkk20jx3-tikv-1                       1/1     Running     0          25h
tidb-oshkk20jx3-tikv-2                       1/1     Running     0          25h

看这一行日志,对应源代码,是把restore的状态改了的,但是为什么kubectl看到的没有改成功呢?

请问能麻烦帮忙拿一下 kube-apiserver 的日志看一下吗?感谢

看了下日志,大部分是

{"log":"I1012 04:55:11.136370       1 controller.go:107] OpenAPI AggregationController: Processing item v1beta1.metrics.k8s.io\
","stream":"stderr","time":"2020-10-12T04:55:11.13648199Z"}
{"log":"I1012 04:55:15.566805       1 log.go:172] http: TLS handshake error from  ip:25699: EOF\
","stream":"stderr","time":"2020-10-12T04:55:15.566903111Z"}

貌似没什么用。

这次恢复的文件里有数据吗?看这条日志 BR 没有恢复什么数据

I1012 07:27:12.300514       1 restore.go:86] [2020/10/12 07:27:12.300 +00:00] [INFO] [collector.go:208] ["Full restore Success summary: total restore files: 0, total success: 0, total failed: 0, total take(s): 0.00"] [Size=16]

没有数据,但是之前用这个备份也恢复成功过

这个是稳定出现吗?我本地无法复现,我建了个 issue trace 这个问题
https://github.com/pingcap/tidb-operator/issues/3385

不是稳定出现的。

好的,后续再出现时可以把前后操作步骤和 kube-apiserver 等日志更新到 issue 或者这里

Fixed in https://github.com/pingcap/tidb-operator/pull/3648, and will be released in v1.1.10.

此话题已在最后回复的 1 分钟后被自动关闭。不再允许新回复。