BR 备份报错 txnLockFast

【 TiDB 版本】
6.1.1
【遇到的问题:问题现象及影响】
单库全备失败,使用备份语句为:

br backup db --db mydb --pd "192.168.x.x:2379"  -s "local:///xxx/xxx" --log-file /xxx/xxx/backup.log

备份日志报错:

[2022/11/14 20:45:34.939 +08:00] [WARN] [backoff.go:158] ["txnLockFast backoffer.maxSleep 80000ms is exceeded, errors:\n[BR:Common:ErrUnknown]internal error at 2022-11-14T20:44:28.904862754+08:00\n[BR:Common:ErrUnknown]internal error at 2022-11-14T20:44:49.916982047+08:00\n[BR:Common:ErrUnknown]internal error at 2022-11-14T20:45:10.928640175+08:00\nlongest sleep type: txnLockFast, time: 80514ms"]
[2022/11/14 20:45:34.940 +08:00] [INFO] [client.go:588] ["backup range finished"] [range-sn=11335] [take=25m21.509103139s]
[2022/11/14 20:45:34.940 +08:00] [INFO] [client.go:594] ["backup started"] [range-sn=11498] [startKey=7480000000000078F55F720000000000000000] [endKey=7480000000000078F55F72FFFFFFFFFFFFFFFF00] [rateLimit=0] [concurrency=4]
[2022/11/14 20:45:34.940 +08:00] [ERROR] [client.go:1055] ["fail to backup"] [range-sn=11497] [store-id=1] [StoreID=1] [retry=0] [stack="github.com/pingcap/tidb/br/pkg/backup.SendBackup\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/backup/client.go:1055\ngithub.com/pingcap/tidb/br/pkg/backup.(*pushDown).pushBackup.func2\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/backup/push.go:92"]
[2022/11/14 20:45:34.940 +08:00] [WARN] [push.go:212] ["skipping disconnected stores"] [range-sn=11497] [error="[BR:Common:ErrFailedToConnect]failed to create backup stream to store 1: rpc error: code = Canceled desc = context canceled"]
[2022/11/14 20:45:34.940 +08:00] [INFO] [client.go:610] ["finish backup push down"] [range-sn=11497] [small-range-count=0]
[2022/11/14 20:45:34.940 +08:00] [INFO] [client.go:718] ["start fine grained backup"] [range-sn=11497] [incomplete=1]
[2022/11/14 20:45:34.940 +08:00] [ERROR] [client.go:1055] ["fail to backup"] [range-sn=11497] [store-id=6] [StoreID=6] [retry=0] [stack="github.com/pingcap/tidb/br/pkg/backup.SendBackup\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/backup/client.go:1055\ngithub.com/pingcap/tidb/br/pkg/backup.(*pushDown).pushBackup.func2\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/backup/push.go:92"]
[2022/11/14 20:45:34.940 +08:00] [WARN] [backoff.go:179] ["unexcepted error, stop to retry"] [error="rpc error: code = Canceled desc = context canceled"] [errorVerbose="rpc error: code = Canceled desc = context canceled\ngithub.com/tikv/pd/client.(*client).GetAllStores\n\t/go/pkg/mod/github.com/tikv/pd/client@v0.0.0-20220307081149-841fa61e9710/client.go:1596\ngithub.com/pingcap/tidb/br/pkg/conn.GetAllTiKVStores\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/conn/conn.go:155\ngithub.com/pingcap/tidb/br/pkg/conn.GetAllTiKVStoresWithRetry.func1\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/conn/conn.go:192\ngithub.com/pingcap/tidb/br/pkg/utils.WithRetry\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/utils/retry.go:51\ngithub.com/pingcap/tidb/br/pkg/conn.GetAllTiKVStoresWithRetry\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/conn/conn.go:189\ngithub.com/pingcap/tidb/br/pkg/backup.(*Client).BackupRange\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/backup/client.go:600\ngithub.com/pingcap/tidb/br/pkg/backup.(*Client).BackupRanges.func1\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/backup/client.go:562\ngithub.com/pingcap/tidb/br/pkg/utils.(*WorkerPool).ApplyOnErrorGroup.func1\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/utils/worker.go:73\ngolang.org/x/sync/errgroup.(*Group).Go.func1\n\t/go/pkg/mod/golang.org/x/sync@v0.0.0-20220513210516-0976fa681c29/errgroup/errgroup.go:74\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1571"]
......
......
[2022/11/14 20:45:35.705 +08:00] [WARN] [backoff.go:179] ["unexcepted error, stop to retry"] [error="rpc error: code = Canceled desc = context canceled"] [errorVerbose="rpc error: code = Canceled desc = context canceled\ngithub.com/tikv/pd/client.(*client).GetAllStores\n\t/go/pkg/mod/github.com/tikv/pd/client@v0.0.0-20220307081149-841fa61e9710/client.go:1596\ngithub.com/pingcap/tidb/br/pkg/conn.GetAllTiKVStores\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/conn/conn.go:155\ngithub.com/pingcap/tidb/br/pkg/conn.GetAllTiKVStoresWithRetry.func1\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/conn/conn.go:192\ngithub.com/pingcap/tidb/br/pkg/utils.WithRetry\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/utils/retry.go:51\ngithub.com/pingcap/tidb/br/pkg/conn.GetAllTiKVStoresWithRetry\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/conn/conn.go:189\ngithub.com/pingcap/tidb/br/pkg/backup.(*Client).BackupRange\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/backup/client.go:600\ngithub.com/pingcap/tidb/br/pkg/backup.(*Client).BackupRanges.func1\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/backup/client.go:562\ngithub.com/pingcap/tidb/br/pkg/utils.(*WorkerPool).ApplyOnErrorGroup.func1\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/utils/worker.go:73\ngolang.org/x/sync/errgroup.(*Group).Go.func1\n\t/go/pkg/mod/golang.org/x/sync@v0.0.0-20220513210516-0976fa681c29/errgroup/errgroup.go:74\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1571"]
[2022/11/14 20:45:35.705 +08:00] [INFO] [client.go:588] ["backup range finished"] [range-sn=19248] [take=45.618µs]
[2022/11/14 20:45:35.942 +08:00] [ERROR] [client.go:680] ["can not find leader"] [key=7480000000000078FFF45F698000000000FF0000030000000000FB] [stack="github.com/pingcap/tidb/br/pkg/backup.(*Client).findRegionLeader\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/backup/client.go:680\ngithub.com/pingcap/tidb/br/pkg/backup.(*Client).handleFineGrained\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/backup/client.go:880\ngithub.com/pingcap/tidb/br/pkg/backup.(*Client).fineGrainedBackup.func2\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/backup/client.go:737"]
[2022/11/14 20:45:35.942 +08:00] [INFO] [client.go:588] ["backup range finished"] [range-sn=11497] [take=1.544899708s]
[2022/11/14 20:45:37.939 +08:00] [WARN] [push.go:103] ["reset the connection in push"] [range-sn=11495] [store-id=50288811]
[2022/11/14 20:45:37.939 +08:00] [WARN] [push.go:103] ["reset the connection in push"] [range-sn=11478] [store-id=50288811]
[2022/11/14 20:45:37.939 +08:00] [INFO] [client.go:588] ["backup range finished"] [range-sn=11495] [take=4.907649714s]
[2022/11/14 20:45:37.939 +08:00] [INFO] [client.go:588] ["backup range finished"] [range-sn=11478] [take=22.184525987s]
[2022/11/14 20:45:37.939 +08:00] [INFO] [client.go:574] ["Backup Ranges"] [take=149ns]
[2022/11/14 20:45:37.940 +08:00] [INFO] [client.go:768] ["[pd] stop fetching the pending tso requests due to context canceled"] [dc-location=global]
[2022/11/14 20:45:37.940 +08:00] [INFO] [client.go:706] ["[pd] exit tso dispatcher"] [dc-location=global]
[2022/11/14 20:45:37.940 +08:00] [INFO] [client.go:768] ["[pd] stop fetching the pending tso requests due to context canceled"] [dc-location=global]
[2022/11/14 20:45:37.940 +08:00] [INFO] [client.go:706] ["[pd] exit tso dispatcher"] [dc-location=global]
[2022/11/14 20:45:37.941 +08:00] [INFO] [metafile.go:592] ["exit write metas by context done"]
[2022/11/14 20:45:37.947 +08:00] [INFO] [collector.go:204] ["units canceled"] [cancel-unit=2]
[2022/11/14 20:45:38.143 +08:00] [INFO] [collector.go:70] ["Database Backup failed summary"] [total-ranges=7755] [ranges-succeed=0] [ranges-failed=7755] [backup-total-ranges=19249] [unit-name="range start:74800000000000794a5f69800000000000000500 end:74800000000000794a5f698000000000000005fb"] [error="rpc error: code = Canceled desc = context canceled"] [errorVerbose="rpc error: code = Canceled desc = context canceled\ngithub.com/tikv/pd/client.(*client).GetAllStores\n\t/go/pkg/mod/github.com/tikv/pd/client@v0.0.0-20220307081149-841fa61e9710/client.go:1596\ngithub.com/pingcap/tidb/br/pkg/conn.GetAllTiKVStores\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/conn/conn.go:155\ngithub.com/pingcap/tidb/br/pkg/conn.GetAllTiKVStoresWithRetry.func1\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/conn/conn.go:192\ngithub.com/pingcap/tidb/br/pkg/utils.WithRetry\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/utils/retry.go:51\ngithub.com/pingcap/tidb/br/pkg/conn.GetAllTiKVStoresWithRetry\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/conn/conn.go:189\ngithub.com/pingcap/tidb/br/pkg/backup.(*Client).BackupRange\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/backup/client.go:600\ngithub.com/pingcap/tidb/br/pkg/backup.(*Client).BackupRanges.func1\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/backup/client.go:562\ngithub.com/pingcap/tidb/br/pkg/utils.(*WorkerPool).ApplyOnErrorGroup.func1\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/utils/worker.go:73\ngolang.org/x/sync/errgroup.(*Group).Go.func1\n\t/go/pkg/mod/golang.org/x/sync@v0.0.0-20220513210516-0976fa681c29/errgroup/errgroup.go:74\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1571"] [unit-name="range start:7480000000000080a55f69800000000000000200 end:7480000000000080a55f698000000000000002fb"] [error="rpc error: code = Canceled desc = context canceled"] [errorVerbose="rpc error: code = Canceled desc = context canceled\ngithub.com/tikv/pd/client.(*client).GetAllStores\n\t/go/pkg/mod/github.com/tikv/pd/client@v0.0.0-20220307081149-841fa61e9710/client.go:1596\ngithub.com/pingcap/tidb/br/pkg/conn.GetAllTiKVStores\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/conn/conn.go:155\ngithub.com/pingcap/tidb/br/pkg/conn.GetAllTiKVStoresWithRetry.func1\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/conn/conn.go:192\ngithub.com/pingcap/tidb/br/pkg/utils.WithRetry\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/utils/retry.go:51\ngithub.com/pingcap/tidb/br/pkg/conn.GetAllTiKVStoresWithRetry\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/conn/conn.go:189\ngithub.com/pingcap/tidb/br/pkg/backup.(*Client).BackupRange\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/backup/client.go:600\ngithub.com/pingcap/tidb/br/pkg/backup.(*Client).BackupRanges.func1\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/backup/client.go:562\ngithub.com/pingcap/tidb/br/pkg/utils.(*WorkerPool).ApplyOnErrorGroup.func1\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/utils/worker.go:73\ngolang.org/x/sync/errgroup.(*Group).Go.func1\n\t/go/pkg/mod/golang.org/x/sync@v0.0.0-20220513210516-0976fa681c29/errgroup/errgroup.go:74\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1571"] [unit-name="range start:748000000000010eeb5f69800000000000000500 end:748000000000010eeb5f698000000000000005fb"] [error="rpc error: code = Canceled desc = context canceled"] [errorVerbose="rpc error: code = Canceled desc = context canceled\ngithub.com/tikv/pd/client.(*client).GetAllStores\n\t/go/pkg/mod/github.com/tikv/pd/client@v0.0.0-20220307081149-841fa61e9710/client.go:1596\ngithub.com/pingcap/tidb/br/pkg/conn.GetAllTiKVStores\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/conn/conn.go:155\ngithub.com/pingcap/tidb/br/pkg/conn.GetAllTiKVStoresWithRetry.func1\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/conn/conn.go:192\ngithub.com/pingcap/tidb/br/pkg/utils.WithRetry\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/utils/retry.go:51\ngithub.com/pingcap/tidb/br/pkg/conn.GetAllTiKVStoresWithRetry\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/conn/conn.go:189\ngithub.com/pingcap/tidb/br/pkg/backup.(*Client).BackupRange\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/backup/client.go:600\ngithub.com/pingcap/tidb/br/pkg/backup.(*Client).BackupRanges.func1\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/backup/client.go:562\ngithub.com/pingcap/tidb/br/pkg/utils.(*WorkerPool).ApplyOnErrorGroup.func1\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/utils/worker.go:73\ngolang.org/x/sync/errgroup.(*Group).Go.func1\n\t/go/pkg/mod/golang.org/x/sync@v0.0.0-20220513210516-0976fa681c29/errgroup/errgroup.go:74\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1571"] [unit-name="range start:748000000000010c935f69800000000000000300 end:748000000000010c935f698000000000000003fb"] [error="rpc error: code = Canceled desc = context canceled"] [errorVerbose="rpc error: code = Canceled desc = context canceled\ngithub.com/tikv/pd/client.(*client).GetAllStores\n\t/go/pkg/mod/github.com/tikv/pd/client@v0.0.0-20220307081149-841fa61e9710/client.go:1596\ngithub.com/pingcap/tidb/br/pkg/conn.GetAllTiKVStores\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/conn/conn.go:155\ngithub.com/pingcap/tidb/br/pkg/conn.GetAllTiKVStoresWithRetry.func1\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/conn/conn.go:192\ngithub.com/pingcap/tidb/br/pkg/utils.WithRetry\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/utils/retry.go:51\ngithub.com/pingcap/tidb/br/pkg/conn.GetAllTiKVStoresWithRetry\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/conn/conn.go:189\ngithub.com/pingcap/tidb/br/pkg/backup.(*Client).BackupRange\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/backup/client.go:600\ngithub.com/pingcap/tidb/br/pkg/backup.(*Client)
[2022/11/14 20:45:38.143 +08:00] [INFO] [collector.go:70] ["Database Backup failed summary"] [total-ranges=7755] [ranges-succeed=0] [ranges-failed=7755] [backup-total-ranges=19249] [unit-name="range start:74800000000000794a5f69800000000000000500 end:74800000000000794a5f698000000000000005fb"] [error="rpc error: code = Canceled desc = context canceled"] [errorVerbose="rpc error: code = Canceled desc = context canceled\ngithub.com/tikv/pd/client.(*client).GetAllStores\n\t/go/pkg/mod/github.com/tikv/pd/client@v0.0.0-20220307081149-841fa61e9710/client.go:1596\ngithub.com/pingcap/tidb/br/pkg/conn.GetAllTiKVStores\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/conn/conn.go:155\ngithub.com/pingcap/tidb/br/pkg/conn.GetAllTiKVStoresWithRetry.func1\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/conn/conn.go:192\ngithub.com/pingcap/tidb/br/pkg/utils.WithRetry\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/utils/retry.go:51\ngithub.com/pingcap/tidb/br/pkg/conn.GetAllTiKVStoresWithRetry\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/conn/conn.go:189\ngithub.com/pingcap/tidb/br/pkg/backup.(*Client).BackupRange\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/backup/client.go:600\ngithub.com/pingcap/tidb/br/pkg/backup.(*Client).BackupRanges.func1\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/backup/client.go:562\ngithub.com/pingcap/tidb/br/pkg/utils.(*WorkerPool).ApplyOnErrorGroup.func1\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/utils/worker.go:73\ngolang.org/x/sync/errgroup.(*Group).Go.func1\n\t/go/pkg/mod/golang.org/x/sync@v0.0.0-20220513210516-0976fa681c29/errgroup/errgroup.go:74\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1571"] [unit-name="range start:7480000000000080a55f69800000000000000200 end:7480000000000080a55f698000000000000002fb"] [error="rpc error: code = Canceled desc = context canceled"] [errorVerbose="rpc error: code = Canceled desc = context canceled\ngithub.com/tikv/pd/client.(*client).GetAllStores\n\t/go/pkg/mod/github.com/tikv/pd/client@v0.0.0-20220307081149-841fa61e9710/client.go:1596\ngithub.com/pingcap/tidb/br/pkg/conn.GetAllTiKVStores\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/conn/conn.go:155\ngithub.com/pingcap/tidb/br/pkg/conn.GetAllTiKVStoresWithRetry.func1\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/conn/conn.go:192\ngithub.com/pingcap/tidb/br/pkg/utils.WithRetry\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/utils/retry.go:51\ngithub.com/pingcap/tidb/br/pkg/conn.GetAllTiKVStoresWithRetry\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/conn/conn.go:189\ngithub.com/pingcap/tidb/br/pkg/backup.(*Client).BackupRange\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/backup/client.go:600\ngithub.com/pingcap/tidb/br/pkg/backup.(*Client).BackupRanges.func1\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/backup/client.go:562\ngithub.com/pingcap/tidb/br/pkg/utils.(*WorkerPool).ApplyOnErrorGroup.func1\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/utils/worker.go:73\ngolang.org/x/sync/errgroup.(*Group).Go.func1\n\t/go/pkg/mod/golang.org/x/sync@v0.0.0-20220513210516-0976fa681c29/errgroup/errgroup.go:74\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1571"] [unit-name="range start:748000000000010eeb5f69800000000000000500 end:748000000000010eeb5f698000000000000005fb"] [error="rpc error: code = Canceled desc = context canceled"] [errorVerbose="rpc error: code = Canceled desc = context canceled\ngithub.com/tikv/pd/client.(*client).GetAllStores\n\t/go/pkg/mod/github.com/tikv/pd/client@v0.0.0-20220307081149-841fa61e9710/client.go:1596\ngithub.com/pingcap/tidb/br/pkg/conn.GetAllTiKVStores\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/conn/conn.go:155\ngithub.com/pingcap/tidb/br/pkg/conn.GetAllTiKVStoresWithRetry.func1\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/conn/conn.go:192\ngithub.com/pingcap/tidb/br/pkg/utils.WithRetry\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/utils/retry.go:51\ngithub.com/pingcap/tidb/br/pkg/conn.GetAllTiKVStoresWithRetry\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/conn/conn.go:189\ngithub.com/pingcap/tidb/br/pkg/backup.(*Client).BackupRange\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/backup/client.go:600\ngithub.com/pingcap/tidb/br/pkg/backup.(*Client).BackupRanges.func1\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/backup/client.go:562\ngithub.com/pingcap/tidb/br/pkg/utils.(*WorkerPool).ApplyOnErrorGroup.func1\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/utils/worker.go:73\ngolang.org/x/sync/errgroup.(*Group).Go.func1\n\t/go/pkg/mod/golang.org/x/sync@v0.0.0-20220513210516-0976fa681c29/errgroup/errgroup.go:74\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1571"] [unit-name="range start:748000000000010c935f69800000000000000300 end:748000000000010c935f698000000000000003fb"] [error="rpc error: code = Canceled desc = context canceled"] [errorVerbose="rpc error: code = Canceled desc = context canceled\ngithub.com/tikv/pd/client.(*client).GetAllStores\n\t/go/pkg/mod/github.com/tikv/pd/client@v0.0.0-20220307081149-841fa61e9710/client.go:1596\ngithub.com/pingcap/tidb/br/pkg/conn.GetAllTiKVStores\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/conn/conn.go:155\ngithub.com/pingcap/tidb/br/pkg/conn.GetAllTiKVStoresWithRetry.func1\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/conn/conn.go:192\ngithub.com/pingcap/tidb/br/pkg/utils.WithRetry\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/utils/retry.go:51\ngithub.com/pingcap/tidb/br/pkg/conn.GetAllTiKVStoresWithRetry\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/conn/conn.go:189\ngithub.com/pingcap/tidb/br/pkg/backup.(*Client).BackupRange\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/backup/client.go:600\ngithub.com/pingcap/tidb/br/pkg/backup.(*Client)
[2022/11/14 20:45:38.143 +08:00] [INFO] [collector.go:70] ["Database Backup failed summary"] [total-ranges=7755] [ranges-succeed=0] [ranges-failed=7755] [backup-total-ranges=19249] [unit-name="range start:74800000000000794a5f69800000000000000500 end:74800000000000794a5f698000000000000005fb"] [error="rpc error: code = Canceled desc = context canceled"] [errorVerbose="rpc error: code = Canceled desc = context canceled\ngithub.com/tikv/pd/client.(*client).GetAllStores\n\t/go/pkg/mod/github.com/tikv/pd/client@v0.0.0-20220307081149-841fa61e9710/client.go:1596\ngithub.com/pingcap/tidb/br/pkg/conn.GetAllTiKVStores\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/conn/conn.go:155\ngithub.com/pingcap/tidb/br/pkg/conn.GetAllTiKVStoresWithRetry.func1\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/conn/conn.go:192\ngithub.com/pingcap/tidb/br/pkg/utils.WithRetry\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/utils/retry.go:51\ngithub.com/pingcap/tidb/br/pkg/conn.GetAllTiKVStoresWithRetry\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/conn/conn.go:189\ngithub.com/pingcap/tidb/br/pkg/backup.(*Client).BackupRange\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/backup/client.go:600\ngithub.com/pingcap/tidb/br/pkg/backup.(*Client).BackupRanges.func1\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/backup/client.go:562\ngithub.com/pingcap/tidb/br/pkg/utils.(*WorkerPool).ApplyOnErrorGroup.func1\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/utils/worker.go:73\ngolang.org/x/sync/errgroup.(*Group).Go.func1\n\t/go/pkg/mod/golang.org/x/sync@v0.0.0-20220513210516-0976fa681c29/errgroup/errgroup.go:74\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1571"] [unit-name="range start:7480000000000080a55f69800000000000000200 end:7480000000000080a55f698000000000000002fb"] [error="rpc error: code = Canceled desc = context canceled"] [errorVerbose="rpc error: code = Canceled desc = context canceled\ngithub.com/tikv/pd/client.(*client).GetAllStores\n\t/go/pkg/mod/github.com/tikv/pd/client@v0.0.0-20220307081149-841fa61e9710/client.go:1596\ngithub.com/pingcap/tidb/br/pkg/conn.GetAllTiKVStores\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/conn/conn.go:155\ngithub.com/pingcap/tidb/br/pkg/conn.GetAllTiKVStoresWithRetry.func1\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/conn/conn.go:192\ngithub.com/pingcap/tidb/br/pkg/utils.WithRetry\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/utils/retry.go:51\ngithub.com/pingcap/tidb/br/pkg/conn.GetAllTiKVStoresWithRetry\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/conn/conn.go:189\ngithub.com/pingcap/tidb/br/pkg/backup.(*Client).BackupRange\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/backup/client.go:600\ngithub.com/pingcap/tidb/br/pkg/backup.(*Client).BackupRanges.func1\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/backup/client.go:562\ngithub.com/pingcap/tidb/br/pkg/utils.(*WorkerPool).ApplyOnErrorGroup.func1\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/utils/worker.go:73\ngolang.org/x/sync/errgroup.(*Group).Go.func1\n\t/go/pkg/mod/golang.org/x/sync@v0.0.0-20220513210516-0976fa681c29/errgroup/errgroup.go:74\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1571"] [unit-name="range start:748000000000010eeb5f69800000000000000500 end:748000000000010eeb5f698000000000000005fb"] [error="rpc error: code = Canceled desc = context canceled"] [errorVerbose="rpc error: code = Canceled desc = context canceled\ngithub.com/tikv/pd/client.(*client).GetAllStores\n\t/go/pkg/mod/github.com/tikv/pd/client@v0.0.0-20220307081149-841fa61e9710/client.go:1596\ngithub.com/pingcap/tidb/br/pkg/conn.GetAllTiKVStores\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/conn/conn.go:155\ngithub.com/pingcap/tidb/br/pkg/conn.GetAllTiKVStoresWithRetry.func1\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/conn/conn.go:192\ngithub.com/pingcap/tidb/br/pkg/utils.WithRetry\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/utils/retry.go:51\ngithub.com/pingcap/tidb/br/pkg/conn.GetAllTiKVStoresWithRetry\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/conn/conn.go:189\ngithub.com/pingcap/tidb/br/pkg/backup.(*Client).BackupRange\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/backup/client.go:600\ngithub.com/pingcap/tidb/br/pkg/backup.(*Client).BackupRanges.func1\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/backup/client.go:562\ngithub.com/pingcap/tidb/br/pkg/utils.(*WorkerPool).ApplyOnErrorGroup.func1\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/utils/worker.go:73\ngolang.org/x/sync/errgroup.(*Group).Go.func1\n\t/go/pkg/mod/golang.org/x/sync@v0.0.0-20220513210516-0976fa681c29/errgroup/errgroup.go:74\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1571"] [unit-name="range start:748000000000010c935f69800000000000000300 end:748000000000010c935f698000000000000003fb"] [error="rpc error: code = Canceled desc = context canceled"] [errorVerbose="rpc error: code = Canceled desc = context canceled\ngithub.com/tikv/pd/client.(*client).GetAllStores\n\t/go/pkg/mod/github.com/tikv/pd/client@v0.0.0-20220307081149-841fa61e9710/client.go:1596\ngithub.com/pingcap/tidb/br/pkg/conn.GetAllTiKVStores\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/conn/conn.go:155\ngithub.com/pingcap/tidb/br/pkg/conn.GetAllTiKVStoresWithRetry.func1\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/conn/conn.go:192\ngithub.com/pingcap/tidb/br/pkg/utils.WithRetry\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/utils/retry.go:51\ngithub.com/pingcap/tidb/br/pkg/conn.GetAllTiKVStoresWithRetry\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/conn/conn.go:189\ngithub.com/pingcap/tidb/br/pkg/backup.(*Client).BackupRange\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/backup/client.go:600\ngithub.com/pingcap/tidb/br/pkg/backup.(*Client)

tikv 日志没有 Lockfast 相关内容:

tikv 日志 lock 如下:

lock 的 key 查询结果如下(未备份系统表):

tidb_decode_key('7480000000000000175F72800000000113A339')|
---------------------------------------------------------+
{"_tidb_rowid":18064185,"table_id":"23"}                 |


tidb_decode_key('7480000000000000155F698000000000000002038000000000004795')|
---------------------------------------------------------------------------+
{"index_id":2,"index_vals":{"table_id":"18325"},"table_id":21}             |

lock 的 key 的 table_id 对应如下:

table_schema|table_name      |TIDB_TABLE_ID|
------------+----------------+-------------+
mysql       |stats_meta      |           21|
mysql       |stats_histograms|           23|

tikv 日志的 ERROR 如下:

  1. 首先确认是不是这个表引发的问题
    a. 这个是统计信息表,br 备份命令是啥?版本是啥?目前没法证明就是这个表导致的备份失败。不过也不排除

  2. 其实从崩溃 range start_key 就能反解出来 table 信息,从而锁定是拿个表出的问题。因为,br 的备份逻辑是取 table_prefix 从正无穷到负无穷作为 range 下发给 tikv 的。然后 range 按照一定的并发执行,所以基本解完这个 key 就能知道问题表。

br 的版本是 6.1.1,查了 start_key,那个时间这个表确实有写操作,但是那个时段的 lock 却没有这个表,这个是今天备份失败的日志:

[2022/11/16 13:35:02.547 +08:00] [WARN] [backoff.go:158] ["txnLockFast backoffer.maxSleep 80000ms is exceeded, errors:\n[BR:Common:ErrUnknown]internal error at 2022-11-16T13:34:05.527343742+08:00\n[BR:Common:ErrUnknown]internal error at 2022-11-16T13:34:26.528966798+08:00\n[BR:Common:ErrUnknown]internal error at 2022-11-16T13:34:44.563046299+08:00\nlongest sleep type: txnLockFast, time: 80305ms"]
[2022/11/16 13:35:02.547 +08:00] [INFO] [client.go:588] ["backup range finished"] [range-sn=5759] [take=22m18.438076389s]
[2022/11/16 13:35:02.547 +08:00] [INFO] [client.go:594] ["backup started"] [range-sn=5769] [startKey=748000000000003BA15F69800000000000000500] [endKey=748000000000003BA15F698000000000000005FB] [rateLimit=0] [concurrency=4]
[2022/11/16 13:35:02.547 +08:00] [WARN] [backoff.go:179] ["unexcepted error, stop to retry"] [error="rpc error: code = Canceled desc = context canceled"] [errorVerbose="rpc error: code = Canceled desc = context canceled\ngithub.com/tikv/pd/client.(*client).GetAllStores\n\t/go/pkg/mod/github.com/tikv/pd/client@v0.0.0-20220307081149-841fa61e9710/client.go:1596\ngithub.com/pingcap/tidb/br/pkg/conn.GetAllTiKVStores\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/conn/conn.go:155\ngithub.com/pingcap/tidb/br/pkg/conn.GetAllTiKVStoresWithRetry.func1\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/conn/conn.go:192\ngithub.com/pingcap/tidb/br/pkg/utils.WithRetry\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/utils/retry.go:51\ngithub.com/pingcap/tidb/br/pkg/conn.GetAllTiKVStoresWithRetry\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/conn/conn.go:189\ngithub.com/pingcap/tidb/br/pkg/backup.(*Client).BackupRange\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/backup/client.go:600\ngithub.com/pingcap/tidb/br/pkg/backup.(*Client).BackupRanges.func1\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/backup/client.go:562\ngithub.com/pingcap/tidb/br/pkg/utils.(*WorkerPool).ApplyOnErrorGroup.func1\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/utils/worker.go:73\ngolang.org/x/sync/errgroup.(*Group).Go.func1\n\t/go/pkg/mod/golang.org/x/sync@v0.0.0-20220513210516-0976fa681c29/errgroup/errgroup.go:74\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1571"]
[2022/11/16 13:35:02.547 +08:00] [INFO] [client.go:588] ["backup range finished"] [range-sn=5769] [take=372.686µs]
[2022/11/16 13:35:02.547 +08:00] [INFO] [client.go:594] ["backup started"] [range-sn=5770] [startKey=748000000000003BA25F720000000000000000] [endKey=748000000000003BA25F72FFFFFFFFFFFFFFFF00] [rateLimit=0] [concurrency=4]
[2022/11/16 13:35:02.547 +08:00] [WARN] [backoff.go:179] ["unexcepted error, stop to retry"] [error="rpc error: code = Canceled desc = context canceled"] [errorVerbose="rpc error: code = Canceled desc = context canceled\ngithub.com/tikv/pd/client.(*client).GetAllStores\n\t/go/pkg/mod/github.com/tikv/pd/client@v0.0.0-20220307081149-841fa61e9710/client.go:1596\ngithub.com/pingcap/tidb/br/pkg/conn.GetAllTiKVStores\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/conn/conn.go:155\ngithub.com/pingcap/tidb/br/pkg/conn.GetAllTiKVStoresWithRetry.func1\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/conn/conn.go:192\ngithub.com/pingcap/tidb/br/pkg/utils.WithRetry\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/utils/retry.go:51\ngithub.com/pingcap/tidb/br/pkg/conn.GetAllTiKVStoresWithRetry\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/conn/conn.go:189\ngithub.com/pingcap/tidb/br/pkg/backup.(*Client).BackupRange\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/backup/client.go:600\ngithub.com/pingcap/tidb/br/pkg/backup.(*Client).BackupRanges.func1\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/backup/client.go:562\ngithub.com/pingcap/tidb/br/pkg/utils.(*WorkerPool).ApplyOnErrorGroup.func1\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/utils/worker.go:73\ngolang.org/x/sync/errgroup.(*Group).Go.func1\n\t/go/pkg/mod/golang.org/x/sync@v0.0.0-20220513210516-0976fa681c29/errgroup/errgroup.go:74\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1571"]
[2022/11/16 13:35:02.548 +08:00] [INFO] [client.go:588] ["backup range finished"] [range-sn=5770] [take=182.424µs]
[2022/11/16 13:35:02.548 +08:00] [INFO] [client.go:594] ["backup started"] [range-sn=5771] [startKey=748000000000003BA25F69800000000000000100] [endKey=748000000000003BA25F698000000000000001FB] [rateLimit=0] [concurrency=4]
......
......
[2022/11/16 13:35:05.541 +08:00] [WARN] [client.go:922] ["reset the connection in handleFineGrained"] [range-sn=5768] [storeID=50288811]
[2022/11/16 13:35:05.541 +08:00] [ERROR] [client.go:932] ["failed to send fine-grained backup"] [range-sn=5768] [storeID=50288811] [error="failed to reset backup connection on store:50288811 please check the tikv status: context canceled"] [stack="github.com/pingcap/tidb/br/pkg/backup.(*Client).handleFineGrained\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/backup/client.go:932\ngithub.com/pingcap/tidb/br/pkg/backup.(*Client).fineGrainedBackup.func2\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/backup/client.go:737"]
[2022/11/16 13:35:05.543 +08:00] [WARN] [client.go:922] ["reset the connection in handleFineGrained"] [range-sn=5768] [storeID=50288811]
[2022/11/16 13:35:05.543 +08:00] [ERROR] [client.go:932] ["failed to send fine-grained backup"] [range-sn=5768] [storeID=50288811] [error="failed to reset backup connection on store:50288811 please check the tikv status: context canceled"] [stack="github.com/pingcap/tidb/br/pkg/backup.(*Client).handleFineGrained\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/backup/client.go:932\ngithub.com/pingcap/tidb/br/pkg/backup.(*Client).fineGrainedBackup.func2\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/backup/client.go:737"]
[2022/11/16 13:35:05.544 +08:00] [WARN] [client.go:922] ["reset the connection in handleFineGrained"] [range-sn=5704] [storeID=50288811]
[2022/11/16 13:35:05.544 +08:00] [ERROR] [client.go:932] ["failed to send fine-grained backup"] [range-sn=5704] [storeID=50288811] [error="failed to reset backup connection on store:50288811 please check the tikv status: context canceled"] [stack="github.com/pingcap/tidb/br/pkg/backup.(*Client).handleFineGrained\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/backup/client.go:932\ngithub.com/pingcap/tidb/br/pkg/backup.(*Client).fineGrainedBackup.func2\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/backup/client.go:737"]
[2022/11/16 13:35:05.545 +08:00] [WARN] [client.go:922] ["reset the connection in handleFineGrained"] [range-sn=5764] [storeID=50288811]
[2022/11/16 13:35:05.545 +08:00] [ERROR] [client.go:932] ["failed to send fine-grained backup"] [range-sn=5764] [storeID=50288811] [error="failed to reset backup connection on store:50288811 please check the tikv status: context canceled"] [stack="github.com/pingcap/tidb/br/pkg/backup.(*Client).handleFineGrained\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/backup/client.go:932\ngithub.com/pingcap/tidb/br/pkg/backup.(*Client).fineGrainedBackup.func2\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/backup/client.go:737"]
[2022/11/16 13:35:05.545 +08:00] [INFO] [client.go:588] ["backup range finished"] [range-sn=5764] [take=6m34.561498857s]
[2022/11/16 13:35:05.545 +08:00] [INFO] [client.go:574] ["Backup Ranges"] [take=164ns]
[2022/11/16 13:35:05.546 +08:00] [WARN] [client.go:922] ["reset the connection in handleFineGrained"] [range-sn=5704] [storeID=50288811]
[2022/11/16 13:35:05.546 +08:00] [ERROR] [client.go:932] ["failed to send fine-grained backup"] [range-sn=5704] [storeID=50288811] [error="failed to reset backup connection on store:50288811 please check the tikv status: context canceled"] [stack="github.com/pingcap/tidb/br/pkg/backup.(*Client).handleFineGrained\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/backup/client.go:932\ngithub.com/pingcap/tidb/br/pkg/backup.(*Client).fineGrainedBackup.func2\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/backup/client.go:737"]
[2022/11/16 13:35:05.546 +08:00] [INFO] [client.go:768] ["[pd] stop fetching the pending tso requests due to context canceled"] [dc-location=global]
[2022/11/16 13:35:05.546 +08:00] [INFO] [client.go:706] ["[pd] exit tso dispatcher"] [dc-location=global]
[2022/11/16 13:35:05.547 +08:00] [INFO] [client.go:768] ["[pd] stop fetching the pending tso requests due to context canceled"] [dc-location=global]
[2022/11/16 13:35:05.547 +08:00] [INFO] [client.go:706] ["[pd] exit tso dispatcher"] [dc-location=global]
[2022/11/16 13:35:05.547 +08:00] [INFO] [metafile.go:592] ["exit write metas by context done"]
[2022/11/16 13:35:05.567 +08:00] [INFO] [collector.go:204] ["units canceled"] [cancel-unit=3]
[2022/11/16 13:35:05.914 +08:00] [INFO] [collector.go:70] ["Database Backup failed summary"] [total-ranges=13484] [ranges-succeed=0] [ranges-failed=13484] [backup-total-ranges=19249] [unit-name="range start:748000000000010f135f69800000000000000a00 end:748000000000010f135f69800000000000000afb"] [error="rpc error: code = Canceled desc = context canceled"] [errorVerbose="rpc error: code = Canceled desc = context canceled\ngithub.com/tikv/pd/client.(*client).GetAllStores\n\t/go/pkg/mod/github.com/tikv/pd/client@v0.0.0-20220307081149-841fa61e9710/client.go:1596\ngithub.com/pingcap/tidb/br/pkg/conn.GetAllTiKVStores\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/conn/conn.go:155\ngithub.com/pingcap/tidb/br/pkg/conn.GetAllTiKVStoresWithRetry.func1\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/conn/conn.go:192\ngithub.com/pingcap/tidb/br/pkg/utils.WithRetry\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/utils/retry.go:51\ngithub.com/pingcap/tidb/br/pkg/conn.GetAllTiKVStoresWithRetry\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/conn/conn.go:189\ngithub.com/pingcap/tidb/br/pkg/backup.(*Client).BackupRange\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/backup/client.go:600\ngithub.com/pingcap/tidb/br/pkg/backup.(*Client).BackupRanges.func1\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/backup/client.go:562\ngithub.com/pingcap/tidb/br/pkg/utils.(*WorkerPool).ApplyOnErrorGroup.func1\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/utils/worker.go:73\ngolang.org/x/sync/errgroup.(*Group).Go.func1\n\t/go/pkg/mod/golang.org/x/sync@v0.0.0-20220513210516-0976fa681c29/errgroup/errgroup.go:74\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1571"] [unit-name="range start:7480000000000044df5f69800000000000000700 end:7480000000000044df5f698000000000000007fb"] [error="rpc error: code = Canceled desc = context canceled"] [errorVerbose="rpc error: code = Canceled desc = context canceled\ngithub.com/tikv/pd/client.(*client).GetAllStores\n\t/go/pkg/mod/github.com/tikv/pd/client@v0.0.0-20220307081149-841fa61e9710/client.go:1596\ngithub.com/pingcap/tidb/br/pkg/conn.GetAllTiKVStores\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/conn/conn.go:155\ngithub.com/pingcap/tidb/br/pkg/conn.GetAllTiKVStoresWithRetry.func1\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/conn/conn.go:192\ngithub.com/pingcap/tidb/br/pkg/utils.WithRetry\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/utils/retry.go:51\ngithub.com/pingcap/tidb/br/pkg/conn.GetAllTiKVStoresWithRetry\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/conn/conn.go:189\ngithub.com/pingcap/tidb/br/pkg/backup.(*Client).BackupRange\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/backup/client.go:600\ngithub.com/pingcap/tidb/br/pkg/backup.(*Client).BackupRanges.func1\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/backup/client.go:562\ngithub.com/pingcap/tidb/br/pkg/utils.(*WorkerPool).ApplyOnErrorGroup.func1\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/utils/worker.go:73\ngolang.org/x/sync/errgroup.(*Group).Go.func1\n\t/go/pkg/mod/golang.org/x/sync@v0.0.0-20220513210516-0976fa681c29/errgroup/errgroup.go:74\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1571"] [unit-name="range start:7480000000000047445f720000000000000000 end:7480000000000047445f72ffffffffffffffff00"] [error="rpc error: code = Canceled desc = context canceled"] [errorVerbose="rpc error: code = Canceled desc = context canceled\ngithub.com/tikv/pd/client.(*client).GetAllStores\n\t/go/pkg/mod/github.com/tikv/pd/client@v0.0.0-20220307081149-841fa61e9710/client.go:1596\ngithub.com/pingcap/tidb/br/pkg/conn.GetAllTiKVStores\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/conn/conn.go:155\ngithub.com/pingcap/tidb/br/pkg/conn.GetAllTiKVStoresWithRetry.func1\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/conn/conn.go:192\ngithub.com/pingcap/tidb/br/pkg/utils.WithRetry\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/utils/retry.go:51\ngithub.com/pingcap/tidb/br/pkg/conn.GetAllTiKVStoresWithRetry\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/conn/conn.go:189\ngithub.com/pingcap/tidb/br/pkg/backup.(*Client).BackupRange\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/backup/client.go:600\ngithub.com/pingcap/tidb/br/pkg/backup.(*Client).BackupRanges.func1\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/backup/client.go:562\ngithub.com/pingcap/tidb/br/pkg/utils.(*WorkerPool).ApplyOnErrorGroup.func1\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/utils/worker.go:73\ngolang.org/x/sync/errgroup.(*Group).Go.func1\n\t/go/pkg/mod/golang.org/x/sync@v0.0.0-20220513210516-0976fa681c29/errgroup/errgroup.go:74\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1571"] [unit-name="range start:7480000000000047a55f69800000000000000500 end:7480000000000047a55f698000000000000005fb"] [error="rpc error: code = Canceled desc = context canceled"] [errorVerbose="rpc error: code = Canceled desc = context canceled\ngithub.com/tikv/pd/client.(*client).GetAllStores\n\t/go/pkg/mod/github.com/tikv/pd/client@v0.0.0-20220307081149-841fa61e9710/client.go:1596\ngithub.com/pingcap/tidb/br/pkg/conn.GetAllTiKVStores\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/conn/conn.go:155\ngithub.com/pingcap/tidb/br/pkg/conn.GetAllTiKVStoresWithRetry.func1\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/conn/conn.go:192\ngithub.com/pingcap/tidb/br/pkg/utils.WithRetry\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/utils/retry.go:51\ngithub.com/pingcap/tidb/br/pkg/conn.GetAllTiKVStoresWithRetry\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/conn/conn.go:189\ngithub.com/pingcap/tidb/br/pkg/backup.(*Client).BackupRange\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/backup/client.go:600\ngithub.com/pingcap/tidb/br/pkg/backup.(*Client).
[2022/11/16 13:35:06.185 +08:00] [ERROR] [backup.go:40] ["failed to backup"] [error="resolve lock timeout"] [errorVerbose="resolve lock timeout\ngithub.com/tikv/client-go/v2/error.init\n\t/go/pkg/mod/github.com/tikv/client-go/v2@v2.0.1-0.20220614073425-1693f8c71524/error/error.go:74\nruntime.doInit\n\t/usr/local/go/src/runtime/proc.go:6222\nruntime.doInit\n\t/usr/local/go/src/runtime/proc.go:6199\nruntime.doInit\n\t/usr/local/go/src/runtime/proc.go:6199\nruntime.doInit\n\t/usr/local/go/src/runtime/proc.go:6199\nruntime.doInit\n\t/usr/local/go/src/runtime/proc.go:6199\nruntime.doInit\n\t/usr/local/go/src/runtime/proc.go:6199\nruntime.doInit\n\t/usr/local/go/src/runtime/proc.go:6199\nruntime.doInit\n\t/usr/local/go/src/runtime/proc.go:6199\nruntime.doInit\n\t/usr/local/go/src/runtime/proc.go:6199\nruntime.doInit\n\t/usr/local/go/src/runtime/proc.go:6199\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:233\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1571\ngithub.com/tikv/client-go/v2/internal/retry.(*Backoffer).BackoffWithCfgAndMaxSleep\n\t/go/pkg/mod/github.com/tikv/client-go/v2@v2.0.1-0.20220614073425-1693f8c71524/internal/retry/backoff.go:160\ngithub.com/tikv/client-go/v2/internal/retry.(*Backoffer).BackoffWithMaxSleepTxnLockFast\n\t/go/pkg/mod/github.com/tikv/client-go/v2@v2.0.1-0.20220614073425-1693f8c71524/internal/retry/backoff.go:127\ngithub.com/pingcap/tidb/br/pkg/backup.(*Client).fineGrainedBackup\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/backup/client.go:798\ngithub.com/pingcap/tidb/br/pkg/backup.(*Client).BackupRange\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/backup/client.go:614\ngithub.com/pingcap/tidb/br/pkg/backup.(*Client).BackupRanges.func1\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/backup/client.go:562\ngithub.com/pingcap/tidb/br/pkg/utils.(*WorkerPool).ApplyOnErrorGroup.func1\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/utils/worker.go:73\ngolang.org/x/sync/errgroup.(*Group).Go.func1\n\t/go/pkg/mod/golang.org/x/sync@v0.0.0-20220513210516-0976fa681c29/errgroup/errgroup.go:74\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1571\ngithub.com/pingcap/errors.AddStack\n\t/go/pkg/mod/github.com/pingcap/errors@v0.11.5-0.20211224045212-9687c2b0f87c/errors.go:174\ngithub.com/pingcap/errors.Trace\n\t/go/pkg/mod/github.com/pingcap/errors@v0.11.5-0.20211224045212-9687c2b0f87c/juju_adaptor.go:15\ngithub.com/pingcap/tidb/br/pkg/backup.(*Client).fineGrainedBackup\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/backup/client.go:800\ngithub.com/pingcap/tidb/br/pkg/backup.(*Client).BackupRange\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/backup/client.go:614\ngithub.com/pingcap/tidb/br/pkg/backup.(*Client).BackupRanges.func1\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/backup/client.go:562\ngithub.com/pingcap/tidb/br/pkg/utils.(*WorkerPool).ApplyOnErrorGroup.func1\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/utils/worker.go:73\ngolang.org/x/sync/errgroup.(*Group).Go.func1\n\t/go/pkg/mod/golang.org/x/sync@v0.0.0-20220513210516-0976fa681c29/errgroup/errgroup.go:74\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1571"] [stack="main.runBackupCommand\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/cmd/br/backup.go:40\nmain.newDBBackupCommand.func1\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/cmd/br/backup.go:122\ngithub.com/spf13/cobra.(*Command).execute\n\t/go/pkg/mod/github.com/spf13/cobra@v1.4.0/command.go:856\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/go/pkg/mod/github.com/spf13/cobra@v1.4.0/command.go:974\ngithub.com/spf13/cobra.(*Command).Execute\n\t/go/pkg/mod/github.com/spf13/cobra@v1.4.0/command.go:902\nmain.main\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/cmd/br/main.go:57\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:250"]
[2022/11/16 13:35:06.185 +08:00] [ERROR] [main.go:59] ["br failed"] [error="resolve lock timeout"] [errorVerbose="resolve lock timeout\ngithub.com/tikv/client-go/v2/error.init\n\t/go/pkg/mod/github.com/tikv/client-go/v2@v2.0.1-0.20220614073425-1693f8c71524/error/error.go:74\nruntime.doInit\n\t/usr/local/go/src/runtime/proc.go:6222\nruntime.doInit\n\t/usr/local/go/src/runtime/proc.go:6199\nruntime.doInit\n\t/usr/local/go/src/runtime/proc.go:6199\nruntime.doInit\n\t/usr/local/go/src/runtime/proc.go:6199\nruntime.doInit\n\t/usr/local/go/src/runtime/proc.go:6199\nruntime.doInit\n\t/usr/local/go/src/runtime/proc.go:6199\nruntime.doInit\n\t/usr/local/go/src/runtime/proc.go:6199\nruntime.doInit\n\t/usr/local/go/src/runtime/proc.go:6199\nruntime.doInit\n\t/usr/local/go/src/runtime/proc.go:6199\nruntime.doInit\n\t/usr/local/go/src/runtime/proc.go:6199\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:233\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1571\ngithub.com/tikv/client-go/v2/internal/retry.(*Backoffer).BackoffWithCfgAndMaxSleep\n\t/go/pkg/mod/github.com/tikv/client-go/v2@v2.0.1-0.20220614073425-1693f8c71524/internal/retry/backoff.go:160\ngithub.com/tikv/client-go/v2/internal/retry.(*Backoffer).BackoffWithMaxSleepTxnLockFast\n\t/go/pkg/mod/github.com/tikv/client-go/v2@v2.0.1-0.20220614073425-1693f8c71524/internal/retry/backoff.go:127\ngithub.com/pingcap/tidb/br/pkg/backup.(*Client).fineGrainedBackup\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/backup/client.go:798\ngithub.com/pingcap/tidb/br/pkg/backup.(*Client).BackupRange\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/backup/client.go:614\ngithub.com/pingcap/tidb/br/pkg/backup.(*Client).BackupRanges.func1\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/backup/client.go:562\ngithub.com/pingcap/tidb/br/pkg/utils.(*WorkerPool).ApplyOnErrorGroup.func1\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/utils/worker.go:73\ngolang.org/x/sync/errgroup.(*Group).Go.func1\n\t/go/pkg/mod/golang.org/x/sync@v0.0.0-20220513210516-0976fa681c29/errgroup/errgroup.go:74\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1571\ngithub.com/pingcap/errors.AddStack\n\t/go/pkg/mod/github.com/pingcap/errors@v0.11.5-0.20211224045212-9687c2b0f87c/errors.go:174\ngithub.com/pingcap/errors.Trace\n\t/go/pkg/mod/github.com/pingcap/errors@v0.11.5-0.20211224045212-9687c2b0f87c/juju_adaptor.go:15\ngithub.com/pingcap/tidb/br/pkg/backup.(*Client).fineGrainedBackup\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/backup/client.go:800\ngithub.com/pingcap/tidb/br/pkg/backup.(*Client).BackupRange\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/backup/client.go:614\ngithub.com/pingcap/tidb/br/pkg/backup.(*Client).BackupRanges.func1\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/backup/client.go:562\ngithub.com/pingcap/tidb/br/pkg/utils.(*WorkerPool).ApplyOnErrorGroup.func1\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/pkg/utils/worker.go:73\ngolang.org/x/sync/errgroup.(*Group).Go.func1\n\t/go/pkg/mod/golang.org/x/sync@v0.0.0-20220513210516-0976fa681c29/errgroup/errgroup.go:74\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1571"] [stack="main.main\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/br/br/cmd/br/main.go:59\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:250"]

clinic 的链接:Clinic Service

  1. 截图时间和 clinic 采集时间是对不上的
  2. 如果 “假设” 当成同一个原因导致的问题看,IP-24 的 backup worker 是跑的最高的,就看这个
    image
    image
    image
    3.说会 backup failed,采集的时间里,能找到的第一条就是 tikv service 给 BR 发 response 发不过去。说明 2022 @ 13:13:22.781 就有 backup request 接不到消息了。
    image
  3. 至于是什么动作导致的 BR 端断连,没有对应时间点的 BR 日志,无法判定。
  4. 说回 txnLockFast backoffer.maxSleep 80000ms is exceeded,入口在这 → https://github.com/pingcap/tidb/blob/5263a0abda61f102122735049fd0dfadc7b7f8b2/br/pkg/backup/client.go#L798
    backup range 被重试超过时间阈值就报错,代码里默认这种重拾的原因就是 txnLockFast :thinking:,可以看看日志里是否有 fine-grained-range-start,后面会锁定到 range 的 startkey and endkey。
    image

现在有几点无法解释:

  1. 为什么 tikv backup response 回传 BR 发送失败,没有更多日志解释了.
  2. 这个 backoff 重试的到底是哪个表的 range,也不一定真的是因为解锁失败,只是代码里默认 backup request 下去会没有回传就是 txnlockfast. 得看下 br 在 txnLockFast backoffer.maxSleep 80000ms is exceeded 之前的动作日志。

1.截图是14号备份的,clinic 上传的是16号的,下面这个是16号 br 的 备份日志:
full_20221116.zip (1.4 MB)
2.从多次备份失败时间段的监控来看,确实每一次失败前都有较高的 balance-leader,但是15号的备份中同样也有较高的 balance-leader 的情况下,备份却是成功的




3.查看日志确实有 fine-grained-range-start,但是 key 对应的表与 tikv 日志内 backup 失败的 key 的表一致,没有 lock 的情况

又仔细看了下,应该还是有 txnLockFast

  1. 这 4 条 failed to send fine-grained backup 下面的 4 个 goroutine 报出来的,是备份崩溃的一个中间流程。
    cat full_20221116.log|grep “range-sn=5704”|less 得到下图。5704 就代表第 5704 个表(包含 index …)


  2. 为什么会崩溃,fine-grained 是语句 tikv 的反馈 rangeTree 重拾请求,这第 5704 个 table 中剩余的 range。可以看到 range 一直在变。

  3. 看了下这个时间点附近没有网络问题,ping and tcp_retrans …,所以应该还是解锁超时。
    然后这个重拾时间是写死的,想硬绕应该是没有办法。

    backupFineGrainedMaxBackoff = 80000
    backupRetryTimes            = 5
    
  4. 为什么在日志中搜不到,读写冲突的报错在 tikv 侧会不会暴出来,我不太确定🤔。在 tidb 侧会暴,但 BR 不请求 TiDB。

  5. 不过一个解决思路是,2022/11/16 13:35:05.546 前推 80s ,然后去 slow_log 中看下,然后直到 2022/11/16 13:35:05.546 还没有 commit 的 txn。并且那个 txn 中涉及的表又是 5704 这 range-sn 解出来的 table-name。 如果幸运,能找到,然下那是什么业务,然后错峰备份。

  1. 从官方文档看,tikv 日志是可以看出读写冲突的 https://docs.pingcap.com/zh/tidb/stable/troubleshoot-lock-conflicts#读写冲突 ,但是按照这个文档在 tikv 日志中没有找到记录,按照关键字 Lockfast 搜索 tikv 日志也没有找到任何的记录
  2. 很不幸运,从慢日志中查找,只有一条记录在13:35以后,但事务开始时间是在13:35:25.125


    image

哦哦,其实 log 里有 key,没搜出来

  1. 日志只暴露出来 primary_lock 关键字,之前我只搜了 ip-14.24 这一个 tikv。
  2. 把下面的 primary-key 解下吧(就 ip-14.24 上的冲突可能是 txn 中的 secondary-key🤔),找到对应的 txn 应该就能找到对应事务了,然后慢日志返找下 SQL 中可能存在的 range-sn 的那个表。
Auto
Documents
Time	cluster_id	component	message
	Nov 16, 2022 @ 13:57:56.027	6801054821581547599	tikv	[endpoint.rs:621] [error-response] [err="Key is locked (will clean up) primary_lock: 7480000000000000155F6980000000000000020380000000000044AD lock_version: 437407783577190475 key: 7480000000000000155F698000000000000001040611FC0FCA1C004B03800000000028BEA1 lock_ttl: 3003 txn_size: 2 lock_for_update_ts: 437407783577190475 min_commit_ts: 437407783577190476"]
	Nov 16, 2022 @ 13:55:02.477	6801054821581547599	tikv	[endpoint.rs:621] [error-response] [err="Key is locked (will clean up) primary_lock: 7480000000000079675F698000000000000005038000000000000002037FFFFFFFFFFFFFFF03800000000000000003800000000000001E0419AE60C97400000003800000000328A9C3 lock_version: 437407737937920025 key: 7480000000000079675F698000000000000006038000000000000002037FFFFFFFFFFFFFFF0003800000000000001E0419AE60C97400000003800000000328A9C3 lock_ttl: 3001 txn_size: 1 min_commit_ts: 437407737937920026"]
	Nov 16, 2022 @ 13:53:56.028	6801054821581547599	tikv	[endpoint.rs:621] [error-response] [err="Key is locked (will clean up) primary_lock: 7480000000000000155F698000000000000002038000000000003843 lock_version: 437407720662630479 key: 7480000000000000155F698000000000000001040611FC01241C004F03800000000024A218 lock_ttl: 3004 txn_size: 4 lock_for_update_ts: 437407720662630479 min_commit_ts: 437407720662630480"]
	Nov 16, 2022 @ 13:47:56.018	6801054821581547599	tikv	[endpoint.rs:621] [error-response] [err="Key is locked (will clean up) primary_lock: 7480000000000000175F7280000000010C0BDE lock_version: 437407626303897605 key: 7480000000000000175F7280000000010C0BDE lock_ttl: 3003 txn_size: 4 min_commit_ts: 437407626303897606"]
	Nov 16, 2022 @ 13:46:56.035	6801054821581547599	tikv	[endpoint.rs:621] [error-response] [err="Key is locked (will clean up) primary_lock: 7480000000000000175F728000000001156EB6 lock_version: 437407610575257606 key: 7480000000000000175F728000000001156EB6 lock_ttl: 5661 txn_size: 27 min_commit_ts: 437407610575257607"]
	Nov 16, 2022 @ 13:46:56.035	6801054821581547599	tikv	[endpoint.rs:621] [error-response] [err="Key is locked (will clean up) primary_lock: 7480000000000000175F728000000001156EB6 lock_version: 437407610575257606 key: 7480000000000000175F728000000001156ED7 lock_ttl: 5661 txn_size: 12 min_commit_ts: 437407610575257607"]
	Nov 16, 2022 @ 13:46:56.028	6801054821581547599	tikv	[endpoint.rs:621] [error-response] [err="Key is locked (will clean up) primary_lock: 7480000000000000155F698000000000000002038000000000004376 lock_version: 437407610562150456 key: 7480000000000000155F698000000000000001040611FBE7819C0038038000000000258C4F lock_ttl: 3005 txn_size: 4 lock_for_update_ts: 437407610562150456 min_commit_ts: 437407610562150457"]
	Nov 16, 2022 @ 13:45:56.029	6801054821581547599	tikv	[endpoint.rs:621] [error-response] [err="Key is locked (will clean up) primary_lock: 7480000000000000155F698000000000000002038000000000003C2F lock_version: 437407594833510478 key: 7480000000000000155F72800000000028C06C lock_ttl: 20001 txn_size: 2 lock_for_update_ts: 437407594833510478 min_commit_ts: 437407594833510479"]
	Nov 16, 2022 @ 13:40:03.213	6801054821581547599	tikv	[endpoint.rs:621] [error-response] [err="Key is locked (will clean up) primary_lock: 7480000000000079675F69800000000000000503800000000000000203800000000000000003800000000000000003800000000000001E0419AE60C98700000003800000000328B394 lock_version: 437407502204928044 key: 7480000000000079675F69800000000000000503800000000000000203800000000000000003800000000000000003800000000000001E0419AE60C98700000003800000000328B394 lock_ttl: 3001 txn_size: 1 lock_type: Del min_commit_ts: 437407502204928045"]
	Nov 16, 2022 @ 13:39:56.020	6801054821581547599	tikv	[endpoint.rs:621] [error-response] [err="Key is locked (will clean up) primary_lock: 7480000000000000175F72800000000115ED65 lock_version: 437407500474777607 key: 7480000000000000175F72800000000115ED6C lock_ttl: 3360 txn_size: 5 min_commit_ts: 437407500474777608"]
	Nov 16, 2022 @ 13:24:56.034	6801054821581547599	tikv	[endpoint.rs:621] [error-response] [err="Key is locked (will clean up) primary_lock: 7480000000000000175F72800000000112A837 lock_version: 437407264545177607 key: 7480000000000000175F72800000000112A83F lock_ttl: 3514 txn_size: 2 min_commit_ts: 437407264545177608"]
	Nov 16, 2022 @ 13:11:21.789	6801054821581547599	tikv	[endpoint.rs:621] [error-response] [err="Key is locked (will clean up) primary_lock: 7480000000000000155F698000000000000002038000000000004371 lock_version: 437407048158412856 key: 7480000000000000155F72800000000028731E lock_ttl: 25945 txn_size: 66 lock_type: Del lock_for_update_ts: 437407051055104097 min_commit_ts: 437407051055104098"]
	Nov 16, 2022 @ 13:11:21.765	6801054821581547599	tikv	[endpoint.rs:621] [error-response] [err="Key is locked (will clean up) primary_lock: 7480000000000000155F698000000000000002038000000000004371 lock_version: 437407048158412856 key: 7480000000000000155F698000000000000001040611FB63BE5C009B03800000000028731E lock_ttl: 25945 txn_size: 2 lock_type: Del lock_for_update_ts: 437407051055104097 min_commit_ts: 437407051055104098"]
	Nov 16, 2022 @ 13:11:21.122	6801054821581547599	tikv	[endpoint.rs:621] [error-response] [err="Key is locked (will clean up) primary_lock: 7480000000000000155F6980000000000000020380000000000033D5 lock_version: 437407050897817694 key: 7480000000000000155F698000000000000001040611FB65323C0015038000000000260170 lock_ttl: 3006 txn_size: 6 lock_type: Del lock_for_update_ts: 437407050897817694 min_commit_ts: 437407050897817695"]
	Nov 16, 2022 @ 13:09:36.770	6801054821581547599	tikv	[endpoint.rs:621] [error-response] [err="Key is locked (will clean up) primary_lock: 7480000000000000155F69800000000000000203800000000000354C lock_version: 437407023477555226 key: 7480000000000000175F7280000000010EF298 lock_ttl: 20039 txn_size: 1 lock_type: Del lock_for_update_ts: 437407023490662403 min_commit_ts: 437407023490662404"]
	Nov 16, 2022 @ 13:09:36.121	6801054821581547599	tikv	[endpoint.rs:621] [error-response] [err="Key is locked (will clean up) primary_lock: 7480000000000000155F69800000000000000203800000000000354C lock_version: 437407023307161751 key: 7480000000000000155F7280000000002A6AB6 lock_ttl: 20002 txn_size: 3 lock_type: Del lock_for_update_ts: 437407023320268952 min_commit_ts: 437407023320268953"]
	Nov 16, 2022 @ 13:09:36.113	6801054821581547599	tikv	[endpoint.rs:621] [error-response] [err="Key is locked (will clean up) primary_lock: 7480000000000000155F69800000000000000203800000000000354C lock_version: 437407023307161751 key: 7480000000000000155F698000000000000001040611FB5EC104002A0380000000002A6AB6 lock_ttl: 3164 txn_size: 3 lock_type: Del lock_for_update_ts: 437407023320268952 min_commit_ts: 437407023320268953"]
	Nov 16, 2022 @ 13:02:56.027	6801054821581547599	tikv	[endpoint.rs:621] [error-response] [err="Key is locked (will clean up) primary_lock: 7480000000000000155F698000000000000002038000000000004371 lock_version: 437406918501990522 key: 7480000000000000155F698000000000000001040611FB461564005D038000000000258C4F lock_ttl: 3004 txn_size: 4 lock_type: Del lock_for_update_ts: 437406918501990522 min_commit_ts: 437406918501990523"]
​

我把这些 key 解了一遍,解出来的除了系统表 mysql.stats_meta 和 mysql.stats_histograms 外只有一个业务的表,但是这个表和 range-sn 的表不是一个业务 :joy:不会放一个事务里

  1. 这几条是业务表的
Nov 16, 2022 @ 13:55:02.477	6801054821581547599	tikv	[endpoint.rs:621] [error-response] [err="Key is locked (will clean up) primary_lock: 7480000000000079675F698000000000000005038000000000000002037FFFFFFFFFFFFFFF03800000000000000003800000000000001E0419AE60C97400000003800000000328A9C3 lock_version: 437407737937920025 key: 7480000000000079675F698000000000000006038000000000000002037FFFFFFFFFFFFFFF0003800000000000001E0419AE60C97400000003800000000328A9C3 lock_ttl: 3001 txn_size: 1 min_commit_ts: 437407737937920026"]
Nov 16, 2022 @ 13:40:03.213	6801054821581547599	tikv	[endpoint.rs:621] [error-response] [err="Key is locked (will clean up) primary_lock: 7480000000000079675F69800000000000000503800000000000000203800000000000000003800000000000000003800000000000001E0419AE60C98700000003800000000328B394 lock_version: 437407502204928044 key: 7480000000000079675F69800000000000000503800000000000000203800000000000000003800000000000000003800000000000001E0419AE60C98700000003800000000328B394 lock_ttl: 3001 txn_size: 1 lock_type: Del min_commit_ts: 437407502204928045"]
  1. key 对应的表
{"index_id":6,"index_vals":{"audio_status":null,"createtime":"2022-11-16 12:37:52","download_status":"-1","priority":"30","type":"2"},"table_id":31079}
{"index_id":5,"index_vals":{"createtime":"2022-11-16 12:38:07","download_status":"0","frame_status":"0","priority":"30","type":"2"},"table_id":31079}
  1. range-sn 5704 的 key 对应的表
{"index_id":1,"index_vals":{"fid":"59918113182","uid":"39236972323528"},"table_id":14792} 
  1. 慢日志搜索了 12:30:00-14:00:00 时间段,没有搜索到涉及 range-sn 5704 的表的 SQL

奇怪啊,提了个 issue 看看有没有人帮着解, 可以 follow → https://github.com/pingcap/tidb/issues/39256

近期 tikv 扩容后,使用 BR 备份没有再发现有锁冲突导致失败的现象了

此话题已在最后回复的 60 天后被自动关闭。不再允许新回复。