tikv ERROR

为提高效率,请提供以下信息,问题描述清晰能够更快得到解决:
【 TiDB 使用环境】
无tidb

【概述】:场景 + 问题概述
[2021/07/31 03:44:54.262 +00:00] [INFO] [peer.rs:606] [“deleting applied snap file”] [snap_file=2_6_7] [peer_id=6] [region_id=2]
[2021/07/31 04:12:55.062 +00:00] [INFO] [raft.rs:1739] ["[term 6] received MsgTimeoutNow from 3 and starts an election to get leadership."] [from=3] [term=6] [raft_id=6] [region_id=2]
[2021/07/31 04:12:55.062 +00:00] [INFO] [raft.rs:1177] [“starting a new election”] [term=6] [raft_id=6] [region_id=2]
[2021/07/31 04:12:55.062 +00:00] [INFO] [raft.rs:807] [“became candidate at term 7”] [term=7] [raft_id=6] [region_id=2]
[2021/07/31 04:12:55.062 +00:00] [INFO] [raft.rs:902] [“6 received message from 6”] [term=7] [msg=MsgRequestVote] [from=6] [id=6] [raft_id=6] [region_id=2]
[2021/07/31 04:12:55.062 +00:00] [INFO] [raft.rs:923] ["[logterm: 6, index: 76840] sent request to 7"] [msg=MsgRequestVote] [term=7] [id=7] [log_index=76840] [log_term=6] [raft_id=6] [region_id=2]
[2021/07/31 04:12:55.063 +00:00] [INFO] [raft.rs:923] ["[logterm: 6, index: 76840] sent request to 3"] [msg=MsgRequestVote] [term=7] [id=3] [log_index=76840] [log_term=6] [raft_id=6] [region_id=2]
[2021/07/31 04:12:55.064 +00:00] [INFO] [transport.rs:144] [“resolve store address ok”] [addr=basic-tikv-1.basic-tikv-peer.tidb-cluster.svc:20160] [store_id=5]
[2021/07/31 04:12:55.064 +00:00] [INFO] [raft_client.rs:48] [“server: new connection with tikv endpoint”] [addr=basic-tikv-1.basic-tikv-peer.tidb-cluster.svc:20160]
[2021/07/31 04:12:55.064 +00:00] [INFO] [raft.rs:1673] [“received from 3”] [term=7] [“msg type”=MsgRequestVoteResponse] [from=3] [raft_id=6] [region_id=2]
[2021/07/31 04:12:55.065 +00:00] [INFO] [raft.rs:874] [“became leader at term 7”] [term=7] [raft_id=6] [region_id=2]
[2021/07/31 04:12:55.065 +00:00] [INFO] [deadlock.rs:629] [“became the leader of deadlock detector!”] [self_id=4]
[2021/07/31 04:12:55.065 +00:00] [ERROR] [client.rs:350] [“failed to send heartbeat”] [err=“Grpc(RpcFinished(Some(RpcStatus { status: 0-OK, details: None })))”]
[2021/07/31 04:12:55.065 +00:00] [ERROR] [util.rs:286] [“request failed, retry”] [err=“Grpc(RpcFinished(Some(RpcStatus { status: 0-OK, details: None })))”]
[2021/07/31 04:12:55.066 +00:00] [ERROR] [util.rs:286] [“request failed, retry”] [err=“Other(SendError(”…"))"]
[2021/07/31 04:12:55.066 +00:00] [ERROR] [util.rs:286] [“request failed, retry”] [err=“Other(SendError(”…"))"]
[2021/07/31 04:12:55.066 +00:00] [INFO] [util.rs:398] [“connecting to PD endpoint”] [endpoints=http://basic-pd-0.basic-pd-peer.tidb-cluster.svc:2379]
[2021/07/31 04:12:55.067 +00:00] [INFO] [] [“New connected subchannel at 0x7fb722e80330 for subchannel 0x7fb722ede1c0”]
[2021/07/31 04:12:55.069 +00:00] [INFO] [util.rs:398] [“connecting to PD endpoint”] [endpoints=http://basic-pd-0.basic-pd-peer.tidb-cluster.svc:2379]
[2021/07/31 04:12:55.072 +00:00] [INFO] [util.rs:457] [“connected to PD leader”] [endpoints=http://basic-pd-0.basic-pd-peer.tidb-cluster.svc:2379]
[2021/07/31 04:12:55.072 +00:00] [INFO] [util.rs:175] [“heartbeat sender and receiver are stale, refreshing …”]
[2021/07/31 04:12:55.072 +00:00] [WARN] [util.rs:194] [“updating PD client done”] [spend=6.554492ms]
[2021/07/31 04:12:55.073 +00:00] [INFO] [kv.rs:587] [“batch_raft RPC is called, new gRPC stream established”]
请问tikv出现这种错误有影响吗,这些错误是因为什么
同时导数据的时候出现


【背景】:做过哪些操作

【现象】:业务和数据库现象

【问题】:当前遇到的问题

【业务影响】:

【TiDB 版本】:

【TiDB Operator 版本】:

【K8s 版本】:

【附件】:


若提问为性能优化、故障排查类问题,请下载脚本运行。终端输出的打印结果,请务必全选并复制粘贴上传。

4赞

看日志描述是心跳包发不出去,找不到 leader了,就是和 PD 失联了

[2021/07/31 04:12:55.065 +00:00] [ERROR] [client.rs:350] [“failed to send heartbeat”] [err=“Grpc(RpcFinished(Some(RpcStatus { status: 0-OK, details: None })))”]

后面如果没有相似的 ERROR,观察下集群的状态,状态如果正常,就不用太关注了

3赞

现在的状况是这样,这个error就只出一次,然后导数据的日志还是那样,有问题吗

3赞

心跳恢复没? 每次这样重试以后

2赞

稍等我上传一下完整日志和pd心跳监控

2赞

logpd0 (657.5 KB) logkv1 (275.2 KB) logkv2 (304.8 KB)

2赞

pd 和 tidb 各有几个啊? 日志上显示没有 tidb 的节点存在呢

如果是测试的话,还是按照测试环境的要求把节点配足吧

2赞

没有tidb,用的raw接口,所以只有kv,现在这样有问题吗

2赞

一个pd,3个kv

2赞

当成 key / value 使用了,也没问题(不过,还是觉得配齐会比较好)

你现在读写数据有问题没?

2赞

image 还在写数据,除了上面说的,其他好像没问题
之前也有tidb,但是有错误


后期导数据还有ddl错误,就是下面那个

2赞

我问了下CTC 的小伙伴,看看有没有最佳实践提供给你… 要稍等下

好的,非常感谢,辛苦了

Hello~ 请问 TiKV 集群使用场景、前端 client 是怎么样的 ?目前是现要跑通对吧 ?

集群状态


kv版本4.0.1 、operator版本1.1.12
问题:

导数据的job日志


监控里出现21:40数据量下降的有点快的情况不知道是否有问题

还有下面pd变成了follow不知道有没有问题
下面是完整pd和tikv日志
logpd0 (4.1 MB) logtikv0 (968.2 KB) logtikv1 (804.0 KB) logtikv2 (928.3 KB)

tikv 的参考教程:

TiKV 官方文档

TiKV java-client
开发实践 https://tikv.org/docs/5.1/develop/clients/java/

提供的Sample
https://github.com/marsishandsome/tikv-client-examples

提供的Client 各项参数的说明:
https://github.com/tikv/client-java