【 TiDB 使用环境】
【概述】:TiDB集群运行过程中突然宕机然后又自己启动回来了。
【背景】:做过哪些操作
【现象】:dashboard里有显示10几20+条大于5秒的慢查询。
【问题】:TiDB集群突然挂了,过了几分钟后自己又恢复正常了。
【业务影响】:
【TiDB 版本】:
【附件】:
- 相关日志
tidb error log
{“level”:“warn”,“ts”:“2022-01-12T09:59:37.793+0800”,“caller”:“clientv3/retry_interceptor.go:61”,“msg”:“retrying of unary invoker failed”,“target”:“endpoint://client-39c4e526-d6bc-445e-bb67-400b056592e8/168.18.163.100:2379”,“attempt”:0,“error”:“rpc error: code = Unavailable desc = transport is closing”}
[2022/01/12 09:59:38.226 +08:00] [ERROR] [domain.go:870] [“load privilege loop watch channel closed”]
[2022/01/12 10:00:17.807 +08:00] [WARN] [base_client.go:194] [“[pd] cannot update leader”] [address=http://192.18.163.106:2379] [error=“[PD:client:ErrClientGetMember]error:rpc error: code = DeadlineExceeded desc = context deadline exceeded target:192.18.163.106:2379 status:READY”]
[2022/01/12 10:00:56.236 +08:00] [INFO] [region_cache.go:839] [“switch region leader to specific leader due to kv return NotLeader”] [regionID=135201] [currIdx=1] [leaderStoreID=7]
tikv.log
[2022/01/12 09:57:49.973 +08:00] [WARN] [endpoint.rs:530] [error-response] [err=“Region error (will back off and retry) message: "peer is not leader for region 125117, leader may Some(id: 125119 store_id: 2)" not_leader { region_id: 125117 leader { id: 125119 store_id: 2 } }”]
[2022/01/12 09:59:49.050 +08:00] [ERROR] [] [“ipv4:192.18.163.106:45616: Keepalive watchdog fired. Closing transport.”]
pd.log
[2022/01/12 09:59:42.262 +08:00] [WARN] [retry_interceptor.go:61] [“retrying of unary invoker failed”] [target=endpoint://client-2fe2dba4-3c53-4d0d-b259-5c489ff81db2/192.16.163.100:2379] [attempt=0] [error=“rpc error: code = Unavailable desc = transport is closing”]
- 配置文件
- Grafana 监控(https://metricstool.pingcap.com/)