【 Tikv 使用环境】生产环境(纯 Tikv 服务)
【 Tikv 版本】v7.1.5
【 Tikv-client-go 版本】v2.0.7
生产环境使用 tiup,对部分 tikv-server 进行绑核操作,并重启。
重启完毕后, PD Leader 服务的 GetRegion QPS 异常上涨 70 倍。并且 tikv-client-go 写事务失败率上涨。持续 24 小时。
当对其中一台 tikv-client-go 服务进行重启后,该 client 的 get_region qps 下降到正常水平,怀疑重启线上所有 tikv-client-go 可绕过恢复。
tikv-client-go 部分日志如下:
[2025/01/17 20:26:14.814 +08:00] [WARN] [backoff.go:158] ["regionMiss backoffer.maxSleep 40000ms is exceeded, errors:\nepoch_not_match:<> at 2025-01-17T20:26:13.31283912+08:00\nepoch_not_match:<> at 2025-01-17T20:26:13.81350834+08:00\nepoch_not_match:<> at 2025-01-17T20:26:14.314214462+08:00\nlongest sleep type: regionMiss, time: 40010ms"] [2025/01/17 20:26:15.121 +08:00] [WARN] [backoff.go:158] ["regionMiss backoffer.maxSleep 40000ms is exceeded, errors:\nepoch_not_match:<> at 2025-01-17T20:26:13.61906455+08:00\nepoch_not_match:<> at 2025-01-17T20:26:14.119663957+08:00\nepoch_not_match:<> at 2025-01-17T20:26:14.620620328+08:00\nlongest sleep type: regionMiss, time: 40010ms"] [2025/01/17 20:26:15.262 +08:00] [WARN] [backoff.go:158] ["regionMiss backoffer.maxSleep 40000ms is exceeded, errors:\nno leader, ctx: region ID: 3388860, meta: id:3388860 start_key:\"sys_clog_00_647_25790834::org_\\006Mq\\216\\223\\024\\002\\t_\\000\\001\" end_key:\"sys_clog_00_647_38588195::org_\\006K\\211\\027\\247\\\\\\003H_\\000\\001\" region_epoch:<conf_ver:8906522 version:158 > peers:<id:111351882 store_id:110768518 > peers:<id:115817841 store_id:16 > , peer: id:111351882 store_id:110768518 , addr: 10.251.55.136:10005, idx: 0, reqStoreType: TiKvOnly, runStoreType: tikv at 2025-01-17T20:26:14.003735777+08:00\nepoch_not_match:<> at 2025-01-17T20:26:14.260171212+08:00\nepoch_not_match:<> at 2025-01-17T20:26:14.761690215+08:00\nlongest sleep type: regionMiss, time: 39510ms"]