br恢复数据连续几次报错

【 TiDB 使用环境】生产环境
【 TiDB 版本】v6.5.5
【执行命令】:

export AWS_ACCESS_KEY_ID=xxxxx
export AWS_SECRET_ACCESS_KEY=xxxxx
tiup br:v6.5.5 restore full \
--pd "10.xx.xx.xx:2379" \
--filter 'npd_xxx.ads_tab1' \
--filter 'npd_xxx.ads_tab2' \
--filter 'npd_xxx.ads_tab3' \
--s3.region cn-northwest-1 \
--storage "s3://db-buket/xxx/prod/filter/20231107143530/" \
--log-file "./rs1108.log"

【报错如下】:

[2023/11/08 13:08:17.764 +00:00] [INFO] [base_client.go:143] ["[pd] exit member loop due to context canceled"]
[2023/11/08 13:08:17.764 +00:00] [INFO] [client.go:719] ["[pd] exit tso dispatcher"] [dc-location=global]
[2023/11/08 13:08:17.764 +00:00] [INFO] [pd.go:209] ["closed pd http client"]
[2023/11/08 13:08:17.765 +00:00] [INFO] [base_client.go:143] ["[pd] exit member loop due to context canceled"]
[2023/11/08 13:08:17.766 +00:00] [INFO] [collector.go:220] ["units canceled"] [cancel-unit=0]
[2023/11/08 13:08:17.766 +00:00] [INFO] [collector.go:74] ["Full Restore failed summary"] [total-ranges=91964] [ranges-succeed=91964] [ranges-failed=0] [split-region=3m11.786085203s] [restore-ranges=54357]
[2023/11/08 13:08:17.766 +00:00] [INFO] [client.go:783] ["[pd] stop fetching the pending tso requests due to context canceled"] [dc-location=global]
[2023/11/08 13:08:17.766 +00:00] [INFO] [client.go:719] ["[pd] exit tso dispatcher"] [dc-location=global]
[2023/11/08 13:08:17.766 +00:00] [ERROR] [restore.go:59] ["failed to restore"] [error="other error: Coprocessor task terminated due to exceeding the deadline"] [errorVerbose="other error: Coprocessor task terminated due to exceeding the deadline\ngithub.com/pingcap/tidb/store/copr.(*copIteratorWorker).handleCopResponse\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/store/copr/coprocessor.go:1200\ngithub.com/pingcap/tidb/store/copr.(*copIteratorWorker).handleTaskOnce\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/store/copr/coprocessor.go:1076\ngithub.com/pingcap/tidb/store/copr.(*copIteratorWorker).handleTask\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/store/copr/coprocessor.go:945\ngithub.com/pingcap/tidb/store/copr.(*copIteratorWorker).run\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/store/copr/coprocessor.go:655\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1594"] [stack="main.runRestoreCommand\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/cmd/br/restore.go:59\nmain.newFullRestoreCommand.func1\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/cmd/br/restore.go:143\ngithub.com/spf13/cobra.(*Command).execute\n\t/go/pkg/mod/github.com/spf13/cobra@v1.6.1/command.go:916\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/go/pkg/mod/github.com/spf13/cobra@v1.6.1/command.go:1044\ngithub.com/spf13/cobra.(*Command).Execute\n\t/go/pkg/mod/github.com/spf13/cobra@v1.6.1/command.go:968\nmain.main\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/cmd/br/main.go:58\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:250"]
[2023/11/08 13:08:17.766 +00:00] [ERROR] [main.go:60] ["br failed"] [error="other error: Coprocessor task terminated due to exceeding the deadline"] [errorVerbose="other error: Coprocessor task terminated due to exceeding the deadline\ngithub.com/pingcap/tidb/store/copr.(*copIteratorWorker).handleCopResponse\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/store/copr/coprocessor.go:1200\ngithub.com/pingcap/tidb/store/copr.(*copIteratorWorker).handleTaskOnce\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/store/copr/coprocessor.go:1076\ngithub.com/pingcap/tidb/store/copr.(*copIteratorWorker).handleTask\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/store/copr/coprocessor.go:945\ngithub.com/pingcap/tidb/store/copr.(*copIteratorWorker).run\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/store/copr/coprocessor.go:655\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1594"] [stack="main.main\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/cmd/br/main.go:60\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:250"]

大家有没有遇到这个问题,是如何解决的

集群是正常的么?

 Coprocessor task terminated due to exceeding the deadline

这个错误有点离谱…

other error: Coprocessor task terminated due to exceeding 这错误还真没有遇到过。期待有人看一下。继续关注

没有限速吧。把带宽占满了,然后某些服务的网络服务就无法访问了。加上限速参数再试一次。

没有限速,你指的参数是–ratelimit?
一般限制多少合适,50M?

S3的存储能正常写入嘛

这看你的带宽啊,只要不占满就行。

参考这里看看

2 个赞

不行啊,现在没有这个参数了:
[ERROR] [main.go:60] [“br failed”] [error=“unknown flag: --timeout”] [stack=“main.main\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/cmd/br/main.go:60\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:250”]

:pleading_face:那看看别的方案吧

1 个赞

看起来像是有kv异常了,kv日志重点排查下

看看tidb集群状态,还有各个组件的日志,不一定DR自己的问题

是不是 storage没写endpoint参数?

此话题已在最后回复的 60 天后被自动关闭。不再允许新回复。