br 备份出错

[2024/09/03 23:03:20.763 +08:00] [WARN] [endpoint.rs:823] [error-response] [err=“Key is locked (will clean up) primary_lock: 748000000000000B595F728000000001214EEE lock_version: 452296913832378543 key: 748000000000000C435F7203800000000009AE5503800000000083C07E lock_ttl: 20002 txn_size: 1 lock_for_update_ts: 452296913832378543 use_async_commit: true min_commit_ts: 452296913911021666”]
[2024/09/03 23:03:27.668 +08:00] [INFO] [endpoint.rs:519] [“cdc deregister”] [deregister=“Deregister { deregister: "delegate", region_id: 11360907, observe_id: ObserveId(2408), err: Request(message: "peer is not leader for region 11360907, leader may Some(id: 11908454 store_id: 4)" not_leader { region_id: 11360907 leader { id: 11908454 store_id: 4 } }) }”]
[2024/09/03 23:03:27.668 +08:00] [INFO] [delegate.rs:342] [“cdc met region error”] [error=“Request(message: "peer is not leader for region 11360907, leader may Some(id: 11908454 store_id: 4)" not_leader { region_id: 11360907 leader { id: 11908454 store_id: 4 } })”] [region_id=11360907]
[2024/09/03 23:03:32.907 +08:00] [INFO] [endpoint.rs:519] [“cdc deregister”] [deregister=“Deregister { deregister: "delegate", region_id: 11352802, observe_id: ObserveId(2429), err: Request(message: "peer is not leader for region 11352802, leader may Some(id: 11905094 store_id: 4)" not_leader { region_id: 11352802 leader { id: 11905094 store_id: 4 } }) }”]
[2024/09/03 23:03:32.907 +08:00] [INFO] [delegate.rs:342] [“cdc met region error”] [error=“Request(message: "peer is not leader for region 11352802, leader may Some(id: 11905094 store_id: 4)" not_leader { region_id: 11352802 leader { id: 11905094 store_id: 4 } })”] [region_id=11352802]
[2024/09/03 23:03:33.002 +08:00] [WARN] [endpoint.rs:823] [error-response] [err=“Region error (will back off and retry) message: "region 9059757 is missing" region_not_found { region_id: 9059757 }”]
[2024/09/03 23:03:38.543 +08:00] [INFO] [endpoint.rs:519] [“cdc deregister”] [deregister=“Deregister { deregister: "delegate", region_id: 11898449, observe_id: ObserveId(2406), err: Request(message: "peer is not leader for region 11898449, leader may Some(id: 11908520 store_id: 3 role: IncomingVoter)" not_leader { region_id: 11898449 leader { id: 11908520 store_id: 3 role: IncomingVoter } }) }”]
[2024/09/03 23:03:38.543 +08:00] [INFO] [delegate.rs:342] [“cdc met region error”] [error=“Request(message: "peer is not leader for region 11898449, leader may Some(id: 11908520 store_id: 3 role: IncomingVoter)" not_leader { region_id: 11898449 leader { id: 11908520 store_id: 3 role: IncomingVoter } })”] [region_id=11898449]
[2024/09/03 23:03:39.254 +08:00] [WARN] [endpoint.rs:823] [error-response] [err=“Region error (will back off and retry) message: "region 11898449 is missing" region_not_found { region_id: 11898449 }”]
[2024/09/03 23:03:39.797 +08:00] [WARN] [endpoint.rs:823] [error-response] [err=“Region error (will back off and retry) message: "region 11898449 is missing" region_not_found { region_id: 11898449 }”]
[2024/09/03 23:03:51.213 +08:00] [WARN] [endpoint.rs:823] [error-response] [err=“Key is locked (will clean up) primary_lock: 748000000000000B595F728000000001223923 lock_version: 452296921801556015 key: 748000000000000C435F720380000000000B139C038000000000512AC4 lock_ttl: 20002 txn_size: 1 lock_for_update_ts: 452296921801556015 use_async_commit: true min_commit_ts: 452296921893306401”]
[2024/09/03 23:03:52.168 +08:00] [WARN] [endpoint.rs:823] [error-response] [err=“Region error (will back off and retry) message: "region 11898449 is missing" region_not_found { region_id: 11898449 }”]
[2024/09/03 23:03:52.660 +08:00] [INFO] [scheduler.rs:727] [“get snapshot failed”] [err=“Error(Request(message: "region 11898449 is missing" region_not_found { region_id: 11898449 }))”] [cid=224744]
[2024/09/03 23:03:57.273 +08:00] [WARN] [endpoint.rs:823] [error-response] [err=“Region error (will back off and retry) message: "region 11898449 is missing" region_not_found { region_id: 11898449 }”]
[2024/09/03 23:03:57.729 +08:00] [INFO] [scheduler.rs:727] [“get snapshot failed”] [err=“Error(Request(message: "region 11898449 is missing" region_not_found { region_id: 11898449 }))”] [cid=225187]
[2024/09/03 23:03:59.231 +08:00] [WARN] [endpoint.rs:823] [error-response] [err=“Key is locked (will clean up) primary_lock: 748000000000000D725F72038077A82B26413C1A038000000000928AF9 lock_version: 452296923990458389 key: 748000000000000D415F72038077A82B26413C1A040000000000000002 lock_ttl: 20020 txn_size: 1 lock_for_update_ts: 452296923990458389 use_async_commit: true min_commit_ts: 452296923990458478”]

[2024/09/03 22:51:12.235 +08:00] [ERROR] [endpoint.rs:241] [“backup save file failed”] [err_code=KV:Unknown] [err=“Io(Custom { kind: Other, error: "failed to put object rusoto error Error during dispatch: error trying to connect: tcp connect error: Connection timed out (os error 110)" })”]
[2024/09/03 22:51:12.235 +08:00] [ERROR] [endpoint.rs:251] [“backup region failed”] [err_code=KV:Unknown] [err=“Io(Custom { kind: Other, error: "failed to put object rusoto error Error during dispatch: error trying to connect: tcp connect error: Connection timed out (os error 110)" })”] [end_key=748000000000000B355F72038000000000700E5503800000000000000303800000000000019B] [start_key=748000000000000B355F720380000000006D558B03800000000000000103800000000000015D] [region=“id: 10311979 start_key: 748000000000000BFF355F720380000000FF006D558B03800000FF0000000001038000FF00000000015D0000FD end_key: 748000000000000BFF355F720380000000FF00700E5503800000FF0000000003038000FF00000000019B0000FD region_epoch { conf_ver: 44189 version: 1250 } peers { id: 11856914 store_id: 5 } peers { id: 11872624 store_id: 2 } peers { id: 11897824 store_id: 6 }”]
[2024/09/03 22:51:12.235 +08:00] [ERROR] [endpoint.rs:266] [“backup failed to send response”] [err_code=KV:Unknown] [err=“TrySendError { kind: Disconnected }”] [end_key=748000000000000B355F72038000000000700E5503800000000000000303800000000000019B] [start_key=748000000000000B355F720380000000006D558B03800000000000000103800000000000015D] [region=“id: 10311979 start_key: 748000000000000BFF355F720380000000FF006D558B03800000FF0000000001038000FF00000000015D0000FD end_key: 748000000000000BFF355F720380000000FF00700E5503800000FF0000000003038000FF00000000019B0000FD region_epoch { conf_ver: 44189 version: 1250 } peers { id: 11856914 store_id: 5 } peers { id: 11872624 store_id: 2 } peers { id: 11897824 store_id: 6 }”]
[2024/09/03 22:51:14.275 +08:00] [WARN] [util.rs:90] [“aws request meet error.”] [uuid=371df431-7f48-4077-b913-7a658e0d205f] [context=begin_upload] [retry?=true] [err=“Error during dispatch: error trying to connect: tcp connect error: Connection timed out (os error 110)”]
[2024/09/03 22:51:14.277 +08:00] [ERROR] [endpoint.rs:241] [“backup save file failed”] [err_code=KV:Unknown] [err=“Io(Custom { kind: Other, error: "failed to put object rusoto error Error during dispatch: error trying to connect: tcp connect error: Connection timed out (os error 110)" })”]
[2024/09/03 22:51:14.277 +08:00] [ERROR] [endpoint.rs:251] [“backup region failed”] [err_code=KV:Unknown] [err=“Io(Custom { kind: Other, error: "failed to put object rusoto error Error during dispatch: error trying to connect: tcp connect error: Connection timed out (os error 110)" })”] [end_key=748000000000000B355F7203800000000080E8B7038000000000000083038000000000000181] [start_key=748000000000000B355F720380000000007F929A038000000000000001038000000000000184] [region=“id: 5971885 start_key: 748000000000000BFF355F720380000000FF007F929A03800000FF0000000001038000FF0000000001840000FD end_key: 748000000000000BFF355F720380000000FF0080E8B703800000FF0000000083038000FF0000000001810000FD region_epoch { conf_ver: 44009 version: 1249 } peers { id: 10945944 store_id: 5 } peers { id: 11490296 store_id: 6 } peers { id: 11901434 store_id: 3 }”]
[2024/09/03 22:51:14.277 +08:00] [ERROR] [endpoint.rs:266] [“backup failed to send response”] [err_code=KV:Unknown] [err=“TrySendError { kind: Disconnected }”] [end_key=748000000000000B355F7203800000000080E8B7038000000000000083038000000000000181] [start_key=748000000000000B355F720380000000007F929A038000000000000001038000000000000184] [region=“id: 5971885 start_key: 748000000000000BFF355F720380000000FF007F929A03800000FF0000000001038000FF0000000001840000FD end_key: 748000000000000BFF355F720380000000FF0080E8B703800000FF0000000083038000FF0000000001810000FD region_epoch { conf_ver: 44009 version: 1249 } peers { id: 10945944 store_id: 5 } peers { id: 11490296 store_id: 6 } peers { id: 11901434 store_id: 3 }”]
[2024/09/03 22:51:19.414 +08:00] [ERROR] [service.rs:133] [“backup canceled”] [error=RemoteStopped]

是这个region没有leader的原因导致的吧,如何修复这个 region 没有leader的问题

@大龙虾爱小龙虾

这个region合并一下试试?

你到pdctl里面检查下有没leader的region吗?

[2024/09/04 18:02:26.304 +08:00] [INFO] [endpoint.rs:519] [“cdc deregister”] [deregister=“Deregister { deregister: "delegate", region_id: 11938184, observe_id: ObserveId(46281), err: Request(message: "peer is not leader for region 11938184, leader may Some(id: 11952283 store_id: 6 role: IncomingVoter)" not_leader { region_id: 11938184 leader { id: 11952283 store_id: 6 role: IncomingVoter } }) }”]
[2024/09/04 18:02:26.304 +08:00] [INFO] [delegate.rs:342] [“cdc met region error”] [error=“Request(message: "peer is not leader for region 11938184, leader may Some(id: 11952283 store_id: 6 role: IncomingVoter)" not_leader { region_id: 11938184 leader { id: 11952283 store_id: 6 role: IncomingVoter } })”] [region_id=11938184]
[2024/09/04 18:02:26.400 +08:00] [WARN] [endpoint.rs:823] [error-response] [err=“Region error (will back off and retry) message: "region 11938184 is missing" region_not_found { region_id: 11938184 }”]

这个要怎么查


所有的tidb实例都在告警这个 regionmiss,我记得下线过一个 tidb节点,可是tidb不应该会影响 tikv的

看下整个集群机器负载情况
看下 TiKV 监控面板,TiKV 有没有重启情况
PD 面板看下 Region 健康情况,有没有 leader drop 情况出现

tikv pd tidb-server 昨天我全部一个一个reload了一遍 但是没效果

目前所有的tikv tidb-server pd 都是up状态

image

是怀疑你 TiKV 压力大,导致掉 leader 的情况,看下 TiKV=>Error 这几个面板

感觉问题比较复杂了 有些 很早之前 scale-in 过 tidb-server一个节点 Tombstone状态之后 就把那个tidb-server节点给 prune掉 不知道和这个操作有关系没

只要没有 --force 就没啥问题

使用 tiup cluster check <cluster_name> --cluster 检查下 Region 情况,参考:https://docs.pingcap.com/zh/tidb/v8.3/tiup-component-cluster-check

完蛋 半年前了应该 哪怕是 加了 --force
[正在上传:image.png…](https://cloud.tencent.com/developer/article/1971003)

All regions are healthy

那 Region 就没啥问题,你再试下 BR 呢