tikv突然无法正常读写

tikv突然无法读写,pd错误日志如下,大神们帮忙看看,要怎么解决。

[2023/11/16 17:17:32.172 +08:00] [WARN] [util.go:163] [“apply request took too long”] [took=313.580658ms] [expected-duration=100ms] [prefix=] [request=“header:<ID:1224288430602737517 username:"pd.server.cn" auth_revision:1 > txn:<compare:<> success:<request_put:<key:"/pd/7283055360825016244/raft/s/00000000000000000005" value_size:167 >> failure:<>>”] [response=size:20] []
[2023/11/16 17:17:32.172 +08:00] [INFO] [trace.go:152] [“trace[783287637] linearizableReadLoop”] [detail=“{readStateIndex:26719986; appliedIndex:26719985; }”] [duration=283.806938ms] [start=2023/11/16 17:17:31.888 +08:00] [end=2023/11/16 17:17:32.172 +08:00] [steps=“["trace[783287637] ‘read index received’ (duration: 463.039µs)","trace[783287637] ‘applied index is now lower than readState.Index’ (duration: 283.341135ms)"]”]
[2023/11/16 17:17:32.172 +08:00] [WARN] [util.go:163] [“apply request took too long”] [took=283.950121ms] [expected-duration=100ms] [prefix=“read-only range “] [request=“key:"/pd/7283055360825016244/config" “] [response=“range_response_count:1 size:3843”] []
[2023/11/16 17:17:32.172 +08:00] [INFO] [trace.go:152] [“trace[102465159] range”] [detail=”{range_begin:/pd/7283055360825016244/config; range_end:; response_count:1; response_revision:26718721; }”] [duration=284.030995ms] [start=2023/11/16 17:17:31.888 +08:00] [end=2023/11/16 17:17:32.172 +08:00] [steps=”["trace[102465159] ‘agreement among raft nodes before linearized reading’ (duration: 283.923158ms)"]”]
[2023/11/16 17:17:32.173 +08:00] [WARN] [util.go:163] [“apply request took too long”] [took=203.547403ms] [expected-duration=100ms] [prefix=“read-only range “] [request=“key:"/tidb/store/gcworker/saved_safe_point" “] [response=“range_response_count:0 size:7”] []
[2023/11/16 17:17:32.173 +08:00] [INFO] [trace.go:152] [“trace[1270457873] range”] [detail=”{range_begin:/tidb/store/gcworker/saved_safe_point; range_end:; response_count:0; response_revision:26718721; }”] [duration=203.659537ms] [start=2023/11/16 17:17:31.969 +08:00] [end=2023/11/16 17:17:32.173 +08:00] [steps=”["trace[1270457873] ‘agreement among raft nodes before linearized reading’ (duration: 203.538012ms)"]”]
[2023/11/16 17:17:37.832 +08:00] [WARN] [util.go:163] [“apply request took too long”] [took=326.211425ms] [expected-duration=100ms] [prefix=] [request=“header:<ID:1224288430602737588 username:"pd.server.cn" auth_revision:1 > txn:<compare:<> success:<request_put:<key:"/pd/7283055360825016244/raft/s/00000000000000000001" value_size:169 >> failure:<>>”] [response=size:20] []
[2023/11/16 17:18:10.124 +08:00] [INFO] [trace.go:152] [“trace[826103753] linearizableReadLoop”] [detail=“{readStateIndex:26720194; appliedIndex:26720194; }”] [duration=235.440112ms] [start=2023/11/16 17:18:09.889 +08:00] [end=2023/11/16 17:18:10.124 +08:00] [steps=“["trace[826103753] ‘read index received’ (duration: 235.436211ms)","trace[826103753] ‘applied index is now lower than readState.Index’ (duration: 3.217µs)"]”]
[2023/11/16 17:18:10.125 +08:00] [WARN] [util.go:163] [“apply request took too long”] [took=235.622061ms] [expected-duration=100ms] [prefix=“read-only range “] [request=“key:"/pd/7283055360825016244/config" “] [response=“range_response_count:1 size:3843”] []
[2023/11/16 17:18:10.125 +08:00] [INFO] [trace.go:152] [“trace[2097090256] range”] [detail=”{range_begin:/pd/7283055360825016244/config; range_end:; response_count:1; response_revision:26718928; }”] [duration=235.715696ms] [start=2023/11/16 17:18:09.889 +08:00] [end=2023/11/16 17:18:10.125 +08:00] [steps=”["trace[2097090256] ‘agreement among raft nodes before linearized reading’ (duration: 235.584921ms)"]”]
[2023/11/16 17:18:10.552 +08:00] [WARN] [util.go:163] [“apply request took too long”] [took=115.088823ms] [expected-duration=100ms] [prefix=] [request=“header:<ID:1224288430602737898 username:"pd.server.cn" auth_revision:1 > txn:<compare:<> success:<request_put:<key:"/pd/7283055360825016244/raft/s/00000000000000000001" value_size:169 >> failure:<>>”] [response=size:20] []

先看下网络有没有问题

网络没问题

感觉是写入的位置出现了只读,不允许写入

感觉 +1

试试重新启动下kv节点呢

日志显示了一个应用请求花费的时间超过了预期的持续时间,在处理请求时出现了延迟。

  1. 检查网络连接;
  2. 检查负载情况。

tidb日志 ,慢查询日志和错误日志,也都看下

restart kv也是无法恢复

clean --data 清理掉数据后,就能恢复,不确认是什么原因???

有问题的时候,tikv和pd进程用tiup看,都是up状态,就是无法读写数据。

没有error日志么? 感觉是磁盘出现问题了

[prefix=“read-only range “这个错误貌似是磁盘硬件出了问题,你去对应的目录下建个文件或者目录试下

这句话啥意思?Leader和Follower之间一致性有问题?

这明显是高延时
可以排查下以下几点:
1.网络 使用常规的ping/traceroute/mtr、ethtool、ifconfig/ip、netstat、tcpdump网络分析工具等命令,测试网络的连通性、延时,查看网卡的速率是否存在丢包等错误,尤其丢包情况
2.磁盘I/O 去监控看下WAL相关的参数disk_wal_fsync的延时数据,P99延时
3.expensive request,检查是否有比如大包请求、涉及到大量key遍历的情况
4.容量瓶颈,太多写请求导致线性读请求性能下降等
5.节点配置,CPU繁忙导致请求处理延时、内存不够等。

检查网络延迟,和磁盘读写速率

检查下磁盘使用率达到了多少,达到一定程度会出现write stall现象的