【 TiDB 使用环境】生产
【 TiDB 版本】v5.4.0
【遇到的问题】使用tiup br备份报错:Error: error happen in store 5 at 10.3.8.199:20160: Io(Os { code: 22, kind: InvalidInput, message: “Invalid argument” }): [BR:KV:ErrKVStorage]tikv storage occur I/O error
【复现路径】查看3个tikv的日志发现有大量的如下报错:[2022/05/05 09:59:26.438 +08:00] [INFO] [<unknown>] ["New connected subchannel at 0x7fbffcef4590 for subchannel 0x7fc073210c80"] [2022/05/05 09:59:26.538 +08:00] [WARN] [kv.rs:1092] ["call CheckLeader failed"] [err=Grpc(RemoteStopped)] [2022/05/05 09:59:27.849 +08:00] [WARN] [kv.rs:1092] ["call CheckLeader failed"] [err=Grpc(RemoteStopped)] [2022/05/05 09:59:27.891 +08:00] [INFO] [<unknown>] ["New connected subchannel at 0x7fc041562a30 for subchannel 0x7fc072eee300"] [2022/05/05 09:59:27.892 +08:00] [INFO] [<unknown>] ["New connected subchannel at 0x7fc03ad381c0 for subchannel 0x7fc072eef9c0"] [2022/05/05 09:59:29.442 +08:00] [INFO] [<unknown>] ["New connected subchannel at 0x7fbffcdddd10 for subchannel 0x7fc073224540"] [2022/05/05 09:59:29.850 +08:00] [WARN] [kv.rs:1092] ["call CheckLeader failed"] [err=Grpc(RemoteStopped)] [2022/05/05 09:59:29.893 +08:00] [INFO] [<unknown>] ["New connected subchannel at 0x7fbfda50b300 for subchannel 0x7fc072eeef40"] [2022/05/05 09:59:31.895 +08:00] [INFO] [<unknown>] ["New connected subchannel at 0x7fc03adb9060 for subchannel 0x7fc072eef9c0"] [2022/05/05 09:59:32.545 +08:00] [WARN] [kv.rs:1092] ["call CheckLeader failed"] [err=Grpc(RemoteStopped)] [2022/05/05 09:59:33.896 +08:00] [INFO] [<unknown>] ["New connected subchannel at 0x7fbf81e36b90 for subchannel 0x7fc072eefb80"] [2022/05/05 09:59:34.447 +08:00] [INFO] [<unknown>] ["New connected subchannel at 0x7fbfdb85f460 for subchannel 0x7fc073225340"] [2022/05/05 09:59:34.547 +08:00] [WARN] [kv.rs:1092] ["call CheckLeader failed"] [err=Grpc(RemoteStopped)] [2022/05/05 09:59:35.195 +08:00] [WARN] [kv.rs:1092] ["call CheckLeader failed"] [err=Grpc(RemoteStopped)]
【问题现象及影响】在论坛上搜索到一个类似的帖子,好象说是tidb的bug。
v5.4.0版本的cluster集群,tikv的3个节点日志中,出来这个帖子说的报错问题。是咱们的tidb的bug还是?
目前来看,我使用br命令备份到本地是没有问题的,备份到samba共享出来的共享存储盘上就会报错:
1.尝试使用本地路径备份:(/tmp/tikv_car_news_2022-05-04_bk,此目录权限的用户及用户组已授权给tidb)
tiup br backup db --pd “10.3.8.196:2379” --db car_news --storage “local:///tmp/tikv_car_news_2022-05-04_bk” --ratelimit 128 --log-file backuptable.log
tiup is checking updates for component br …
Starting component br
: /root/.tiup/components/br/v5.4.0/br /root/.tiup/components/br/v5.4.0/br backup db --pd 10.3.8.196:2379 --db car_news --storage local:///tmp/tikv_car_news_2022-05-04_bk --ratelimit 128 --log-file backuptable.log
Detail BR log in backuptable.log
Database backup <-------------------------------------------------------------------------------------------> 100.00%
Checksum <--------------------------------------------------------------------------------------------------> 100.00%
[2022/05/05 10:06:43.533 +08:00] [INFO] [collector.go:67] [“Database backup success summary”] [total-ranges=31] [ranges-succeed=31] [ranges-failed=0] [backup-checksum=299.112179ms] [backup-fast-checksum=3.672789ms] [backup-total-regions=23] [backup-total-ranges=22] [total-take=5.056491896s] [BackupTS=432987543561568258] [total-kv=1467304] [total-kv-size=449.8MB] [average-speed=88.96MB/s] [backup-data-size(after-compressed)=35.89MB] [Size=35890697]
2.使用共享盘备份失败:()
tiup br backup db --pd “10.3.8.196:2379” --db car_news --storage “local:///tidb_backup_data/nfs/backup/tikv_car_news_2022-05-04_bk” --ratelimit 128 --log-file backuptable.log
[error=“rpc error: code = Canceled desc = context canceled”] [errorVerbose=“rpc error: code = Canceled desc = context canceled
github.com/tikv/pd/client.(*client).GetAllStores
\t/nfs/cache/mod/github.com/tikv/pd@v1.1.0-beta.0.20211118054146-02848d2660ee/client/client.go:1523
github.com/pingcap/tidb/br/pkg/conn.GetAllTiKVStores
\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/conn/conn.go:142
github.com/pingcap/tidb/br/pkg/conn.GetAllTiKVStoresWithRetry.func1
\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/conn/conn.go:179
github.com/pingcap/tidb/br/pkg/utils.WithRetry
\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/utils/retry.go:58
github.com/pingcap/tidb/br/pkg/conn.GetAllTiKVStoresWithRetry
\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/conn/conn.go:176
github.com/pingcap/tidb/br/pkg/backup.(*Client).BackupRange
\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/backup/client.go:511
github.com/pingcap/tidb/br/pkg/backup.(*Client).BackupRanges.func1
\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/backup/client.go:471
github.com/pingcap/tidb/br/pkg/utils.(*WorkerPool).ApplyOnErrorGroup.func1
\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/utils/worker.go:73
golang.org/x/sync/errgroup.(*Group).Go.func1
\t/nfs/cache/mod/golang.org/x/sync@v0.0.0-20210220032951-036812b2e83c/errgroup/errgroup.go:57
runtime.goexit
\t/usr/local/go/src/runtime/asm_amd64.s:1371”] [unit-name=“range start:74800000000000094e5f69800000000000000100 end:74800000000000094e5f698000000000000001fb”] [error=“rpc error: code = Canceled desc = context canceled”] [errorVerbose=“rpc error: code = Canceled desc = context canceled
github.com/tikv/pd/client.(*client).GetAllStores
\t/nfs/cache/mod/github.com/tikv/pd@v1.1.0-beta.0.20211118054146-02848d2660ee/client/client.go:1523
github.com/pingcap/tidb/br/pkg/conn.GetAllTiKVStores
\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/conn/conn.go:142
github.com/pingcap/tidb/br/pkg/conn.GetAllTiKVStoresWithRetry.func1
\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/conn/conn.go:179
github.com/pingcap/tidb/br/pkg/utils.WithRetry
\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/utils/retry.go:58
github.com/pingcap/tidb/br/pkg/conn.GetAllTiKVStoresWithRetry
\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/conn/conn.go:176
github.com/pingcap/tidb/br/pkg/backup.(*Client).BackupRange
\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/backup/client.go:511
github.com/pingcap/tidb/br/pkg/backup.(*Client).BackupRanges.func1
\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/backup/client.go:471
github.com/pingcap/tidb/br/pkg/utils.(*WorkerPool).ApplyOnErrorGroup.func1
\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/utils/worker.go:73
golang.org/x/sync/errgroup.(*Group).Go.func1
\t/nfs/cache/mod/golang.org/x/sync@v0.0.0-20210220032951-036812b2e83c/errgroup/errgroup.go:57
runtime.goexit
\t/usr/local/go/src/runtime/asm_amd64.s:1371”]
Error: error happen in store 5 at 10.3.8.199:20160: Io(Os { code: 13, kind: PermissionDenied, message: “Permission denied” }): [BR:KV:ErrKVStorage]tikv storage occur I/O error
【附件】
- 相关日志、配置文件、Grafana 监控(https://metricstool.pingcap.com/)
- TiUP Cluster Display 信息
- TiUP CLuster Edit config 信息
- TiDB-Overview 监控
- 对应模块的 Grafana 监控(如有 BR、TiDB-binlog、TiCDC 等)
- 对应模块日志(包含问题前后 1 小时日志)
若提问为性能优化、故障排查类问题,请下载脚本运行。终端输出的打印结果,请务必全选并复制粘贴上传。