为提高效率,请提供以下信息,问题描述清晰能够更快得到解决:
【 TiDB 使用环境】
生产环境
【概述】 场景 + 问题概述
集群由V4.0.10升级到V5.3.0后同时升级br组件,备份时间由原来的不足2小时到现在5个半小时, 原br备份由原来的4T下降到679G,而且升级br后每天的备份虽然日志最后都报备份成功,但备份成功之前都有报“PD:client:ErrClientGetTSO]context canceled: context canceled”,具体日志如下:
升级集群前(V4.0.10)前备份日志
[2021/12/12 23:22:01.612 +08:00] [INFO] [version.go:33] ["Welcome to Backup & Restore (BR)"]
[2021/12/12 23:22:01.612 +08:00] [INFO] [version.go:34] [BR] [release-version=v4.0.4]
[2021/12/12 23:22:01.612 +08:00] [INFO] [version.go:35] [BR] [git-hash=c91c79a8431805dcbfda2c7b2612dde6985e564c]
[2021/12/12 23:22:01.612 +08:00] [INFO] [version.go:36] [BR] [git-branch=heads/refs/tags/v4.0.4]
[2021/12/12 23:22:01.612 +08:00] [INFO] [version.go:37] [BR] [go-version=go1.13]
[2021/12/12 23:22:01.612 +08:00] [INFO] [version.go:38] [BR] [utc-build-time="2020-07-31 07:33:17"]
[2021/12/12 23:22:01.612 +08:00] [INFO] [version.go:39] [BR] [race-enabled=false]
......
[2021/12/13 01:12:12.107 +08:00] [INFO] [client.go:196] ["save backup meta"] [path=local:///dbbak/tidbFullBak/mg_tidb_full_20211212232201] [size=24088950]
[2021/12/13 01:12:13.021 +08:00] [INFO] [ddl.go:384] ["[ddl] DDL closed"] [ID=d57b89e9-b1c0-45a1-9659-becd4249eed3] ["take time"=198.806392ms]
[2021/12/13 01:12:13.021 +08:00] [INFO] [ddl.go:297] ["[ddl] stop DDL"] [ID=d57b89e9-b1c0-45a1-9659-becd4249eed3]
[2021/12/13 01:12:13.026 +08:00] [INFO] [domain.go:442] ["infoSyncerKeeper exited."]
[2021/12/13 01:12:13.027 +08:00] [INFO] [domain.go:612] ["domain closed"] ["take time"=203.963708ms]
[2021/12/13 01:12:13.027 +08:00] [INFO] [collector.go:61] ["Full backup Success summary: total backup ranges: 16901, total success: 16901, total failed: 0, total take(s): 6157.47, total size(MB): 4200646.35, avg speed(MB/s): 682.20,
total kv: 47292061388"] ["backup checksum"=7m4.429579554s] ["backup fast checksum"=6.692480306s] ["backup total regions"=55387] [BackupTS=429738563201400858] [Size=637484128247]
升级集群及br后(V5.3.0)的备份日志
[2021/12/30 22:02:02.197 +08:00] [INFO] [info.go:49] ["Welcome to Backup & Restore (BR)"] [release-version=v5.3.0] [git-hash=4a1b2e9fe5b5afb1068c56de47adb07098d768d6] [git-branch=heads/refs/tags/v5.3.0] [go-version=go1.16.4] [utc-bui
ld-time="2021-11-24 13:31:09"] [race-enabled=false]
[2021/12/30 22:02:02.197 +08:00] [INFO] [common.go:630] [arguments] [__command="br backup full"] [log-file=backup_full.log] [pd="[172.16.5.225:2379,172.16.5.226:2379,172.16.5.227:2379]"] [ratelimit=200] [storage=local:///dbbak/tidbFu
llBak/mg_tidb_full_20211230220202]
[2021/12/30 22:02:02.197 +08:00] [INFO] [conn.go:244] ["new mgr"] [pdAddrs=172.16.5.225:2379,172.16.5.226:2379,172.16.5.227:2379]
[2021/12/30 22:02:02.199 +08:00] [INFO] [client.go:352] ["[pd] create pd client with endpoints"] [pd-address="[172.16.5.225:2379,172.16.5.226:2379,172.16.5.227:2379]"]
[2021/12/30 22:02:02.201 +08:00] [INFO] [base_client.go:349] ["[pd] switch leader"] [new-leader=http://172.16.5.226:2379] [old-leader=]
[2021/12/30 22:02:02.201 +08:00] [INFO] [base_client.go:104] ["[pd] init cluster id"] [cluster-id=6857095228536967924]
[2021/12/30 22:02:02.201 +08:00] [INFO] [client.go:648] ["[pd] tso dispatcher created"] [dc-location=global]
[2021/12/30 22:02:02.203 +08:00] [INFO] [conn.go:219] ["checked alive KV stores"] [aliveStores=9] [totalStores=9]
[2021/12/30 22:02:02.203 +08:00] [INFO] [client.go:352] ["[pd] create pd client with endpoints"] [pd-address="[172.16.5.225:2379,172.16.5.226:2379,172.16.5.227:2379]"]
[2021/12/30 22:02:02.204 +08:00] [INFO] [base_client.go:349] ["[pd] switch leader"] [new-leader=http://172.16.5.226:2379] [old-leader=]
[2021/12/30 22:02:02.205 +08:00] [INFO] [base_client.go:104] ["[pd] init cluster id"] [cluster-id=6857095228536967924]
[2021/12/30 22:02:02.205 +08:00] [INFO] [client.go:648] ["[pd] tso dispatcher created"] [dc-location=global]
[2021/12/30 22:02:02.206 +08:00] [INFO] [client.go:93] ["new backup client"]
.......
[2021/12/31 03:31:47.083 +08:00] [ERROR] [client.go:752] ["[pd] fetch pending tso requests error"] [dc-location=global] [error="[PD:client:ErrClientGetTSO]context canceled: context canceled"] [stack="github.com/tikv/pd/client.(*clien
t).handleDispatcher\
\t/nfs/cache/mod/github.com/tikv/pd@v1.1.0-beta.0.20211104095303-69c86d05d379/client/client.go:752"]
[2021/12/31 03:31:47.083 +08:00] [INFO] [client.go:666] ["[pd] exit tso dispatcher"] [dc-location=global]
[2021/12/31 03:31:47.084 +08:00] [ERROR] [client.go:752] ["[pd] fetch pending tso requests error"] [dc-location=global] [error="[PD:client:ErrClientGetTSO]context canceled: context canceled"] [stack="github.com/tikv/pd/client.(*clien
t).handleDispatcher\
\t/nfs/cache/mod/github.com/tikv/pd@v1.1.0-beta.0.20211104095303-69c86d05d379/client/client.go:752"]
[2021/12/31 03:31:47.084 +08:00] [INFO] [client.go:666] ["[pd] exit tso dispatcher"] [dc-location=global]
[2021/12/31 03:31:47.084 +08:00] [INFO] [collector.go:66] ["Full backup success summary"] [total-ranges=59720] [ranges-succeed=59720] [ranges-failed=0] [backup-checksum=11m11.760918579s] [backup-fast-checksum=348.085778ms] [backup-to
tal-ranges=17057] [total-take=5h29m44.887146372s] [BackupTS=430144991341838386] [total-kv=24978943921] [total-kv-size=2.321TB] [average-speed=117.3MB/s] [backup-data-size(after-compressed)=679.6GB] [Size=679630218545]
注意以上日志中有[ERROR]
【备份和数据迁移策略逻辑】
每天做全备份,备份命令为
br backup full --pd "${pd_string}" -s "local://${bakDir}" --ratelimit 200 --log-file backup_full.log
【背景】 做过哪些操作
集群由V4.0.10升级到V5.3.0,同时br也升级到5.3.0
【现象】 业务和数据库现象
【问题】 当前遇到的问题
1、备份日志中有报错误
2、备份时间过长
3、备份文件相比以前小很多,怀疑备份文件是否有效
【业务影响】
无
【TiDB 版本】
V5.3.0
【附件】
- 相关日志、配置文件、Grafana 监控(https://metricstool.pingcap.com/)
- TiUP Cluster Display 信息
- TiUP CLuster Edit config 信息
- TiDB-Overview 监控
- 对应模块的 Grafana 监控(如有 BR、TiDB-binlog、TiCDC 等)
- 对应模块日志(包含问题前后 1 小时日志)
若提问为性能优化、故障排查类问题,请下载脚本运行。终端输出的打印结果,请务必全选并复制粘贴上传。