tidb server节点出现大量的报错

为提高效率,请提供以下信息,问题描述清晰能够更快得到解决:
【 TiDB 使用环境】
生产环境5.3.0

【概述】场景+问题概述

目前新业务上线,上线后,tidb-server节点日志中出现大量如下两类的错误日志:

第一类:
[2022/02/15 14:44:46.187 +08:00] [WARN] [conn.go:713] [“failed to check the user authplugin”] [conn=402677] [error=“Failed to get user record”]
[2022/02/15 14:44:46.215 +08:00] [WARN] [conn.go:713] [“failed to check the user authplugin”] [conn=402679] [error=“Failed to get user record”]
[2022/02/15 14:44:46.234 +08:00] [WARN] [conn.go:713] [“failed to check the user authplugin”] [conn=402681] [error=“Failed to get user record”]
[2022/02/15 14:44:46.327 +08:00] [WARN] [conn.go:713] [“failed to check the user authplugin”] [conn=402683] [error=“Failed to get user record”]
[2022/02/15 14:44:46.419 +08:00] [WARN] [conn.go:713] [“failed to check the user authplugin”] [conn=402685] [error=“Failed to get user record”]
[2022/02/15 14:44:46.434 +08:00] [WARN] [conn.go:713] [“failed to check the user authplugin”] [conn=402687] [error=“Failed to get user record”]
[2022/02/15 14:44:46.450 +08:00] [WARN] [conn.go:713] [“failed to check the user authplugin”] [conn=402689] [error=“Failed to get user record”]
[2022/02/15 14:44:46.466 +08:00] [WARN] [conn.go:713] [“failed to check the user authplugin”] [conn=402691] [error=“Failed to get user record”]
[2022/02/15 14:44:46.481 +08:00] [WARN] [conn.go:713] [“failed to check the user authplugin”] [conn=402693] [error=“Failed to get user record”]

通过搜索asktug,发现也有人遇到过,

https://github.com/pingcap/tidb/issues/29725

想请问一下,目前5.3.0这个版本,这个问题还没有解决吗?或者是否有其他的原因导致该问题的产生?

第二类错误:
[2022/02/15 14:44:51.820 +08:00] [ERROR] [terror.go:307] [“encountered error”] [error=EOF] [stack=“github.com/pingcap/tidb/parser/terror.Log
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/parser/terror/terror.go:307
github.com/pingcap/tidb/server.(*Server).onConn
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/server/server.go:519”]
[2022/02/15 14:45:06.836 +08:00] [ERROR] [terror.go:307] [“encountered error”] [error=EOF] [stack=“github.com/pingcap/tidb/parser/terror.Log
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/parser/terror/terror.go:307
github.com/pingcap/tidb/server.(*Server).onConn
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/server/server.go:519”]
[2022/02/15 14:45:21.852 +08:00] [ERROR] [terror.go:307] [“encountered error”] [error=EOF] [stack=“github.com/pingcap/tidb/parser/terror.Log
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/parser/terror/terror.go:307
github.com/pingcap/tidb/server.(*Server).onConn
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/server/server.go:519”]
[2022/02/15 14:45:36.871 +08:00] [ERROR] [terror.go:307] [“encountered error”] [error=EOF] [stack=“github.com/pingcap/tidb/parser/terror.Log
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/parser/terror/terror.go:307
github.com/pingcap/tidb/server.(*Server).onConn
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/server/server.go:519”]

从日志中看不出是啥原因,请各位大佬帮忙协助分析一下;

【背景】做过哪些操作
【现象】业务和数据库现象
【业务影响】
【TiDB 版本】
5.3.0
【附件】

  1. TiUP Cluster Display 信息

  2. TiUP Cluster Edit Config 信息

  3. TiDB- Overview 监控

  • 对应模块日志(包含问题前后1小时日志)
1 个赞

第一个问题,你自己搜的帖子有答案了,就是低版本的client 在连接,换成高版本的就ok了

第二个问题,有完整的日志信息么?或者 说,能描述下这个错误出现的场景,以及步骤,不然没办法的判断

第二个报错正式不知道怎么产生的,所以才没定位到原因

该日志每隔15s在tidb.log中记录一条

找到一个在 v5.3.0 版本中类似的一个问题,可以先看下:

1 个赞

试试新版本

线上好像和这个不太一样,目前我这边的详细是这样的:

目前我已经把业务的写入已经停止了,集群的QPS为0了,但是依旧保持数据库长连接,依旧会出现如下的报错:
[2022/02/15 19:45:26.615 +08:00] [ERROR] [terror.go:307] [“encountered error”] [error=“read tcp 10.30.128.105:4000->10.30.128.105:32633: read: connection reset by peer”] [stack=“github.com/pingcap/tidb/parser/terror.Log\ \t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/parser/terror/terror.go:307\ github.com/pingcap/tidb/server.(*Server).onConn\ \t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/server/server.go:519”]
[2022/02/15 19:45:41.631 +08:00] [ERROR] [terror.go:307] [“encountered error”] [error=EOF] [stack=“github.com/pingcap/tidb/parser/terror.Log\ \t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/parser/terror/terror.go:307\ github.com/pingcap/tidb/server.(*Server).onConn\ \t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/server/server.go:519”]
[2022/02/15 19:45:56.646 +08:00] [ERROR] [terror.go:307] [“encountered error”] [error=EOF] [stack=“github.com/pingcap/tidb/parser/terror.Log\ \t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/parser/terror/terror.go:307\ github.com/pingcap/tidb/server.(*Server).onConn\ \t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/server/server.go:519”]
[2022/02/15 19:46:11.662 +08:00] [ERROR] [terror.go:307] [“encountered error”] [error=EOF] [stack=“github.com/pingcap/tidb/parser/terror.Log\ \t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/parser/terror/terror.go:307\ github.com/pingcap/tidb/server.(*Server).onConn\ \t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/server/server.go:519”]
[2022/02/15 19:46:26.681 +08:00] [ERROR] [terror.go:307] [“encountered error”] [error=EOF] [stack=“github.com/pingcap/tidb/parser/terror.Log\ \t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/parser/terror/terror.go:307\ github.com/pingcap/tidb/server.(*Server).onConn\ \t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/server/server.go:519”]
[2022/02/15 19:46:41.696 +08:00] [ERROR] [terror.go:307] [“encountered error”] [error=EOF] [stack=“github.com/pingcap/tidb/parser/terror.Log\ \t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/parser/terror/terror.go:307\ github.com/pingcap/tidb/server.(*Server).onConn\ \t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/server/server.go:519”]
[2022/02/15 19:46:56.710 +08:00] [ERROR] [terror.go:307] [“encountered error”] [error=EOF] [stack=“github.com/pingcap/tidb/parser/terror.Log\ \t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/parser/terror/terror.go:307\ github.com/pingcap/tidb/server.(*Server).onConn\ \t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/server/server.go:519”]

该日志每隔15s在tidb.log中记录一条

1 个赞

可以抓包看下链接的来源 ~

1 个赞

现在我把业务都停了,数据库也没有连接了,tidb.log日志中还是没15s记录一条报错信息:
image

image

没有连接了,抓包也没有抓出来,还是会出现报错,感觉是5.3.0版本引入的bug吧

[2022/02/15 20:01:57.682 +08:00] [ERROR] [terror.go:307] [“encountered error”] [error=EOF] [stack=“github.com/pingcap/tidb/parser/terror.Log
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/parser/terror/terror.go:307
github.com/pingcap/tidb/server.(*Server).onConn
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/server/server.go:519”]
[2022/02/15 20:02:12.702 +08:00] [ERROR] [terror.go:307] [“encountered error”] [error=EOF] [stack=“github.com/pingcap/tidb/parser/terror.Log
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/parser/terror/terror.go:307
github.com/pingcap/tidb/server.(*Server).onConn
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/server/server.go:519”]
[2022/02/15 20:02:27.719 +08:00] [ERROR] [terror.go:307] [“encountered error”] [error=EOF] [stack=“github.com/pingcap/tidb/parser/terror.Log
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/parser/terror/terror.go:307
github.com/pingcap/tidb/server.(*Server).onConn
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/server/server.go:519”]
[2022/02/15 20:02:42.736 +08:00] [ERROR] [terror.go:307] [“encountered error”] [error=EOF] [stack=“github.com/pingcap/tidb/parser/terror.Log
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/parser/terror/terror.go:307
github.com/pingcap/tidb/server.(*Server).onConn
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/server/server.go:519”]
[2022/02/15 20:02:57.751 +08:00] [ERROR] [terror.go:307] [“encountered error”] [error=EOF] [stack=“github.com/pingcap/tidb/parser/terror.Log
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/parser/terror/terror.go:307
github.com/pingcap/tidb/server.(*Server).onConn
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/server/server.go:519”]
[2022/02/15 20:03:12.767 +08:00] [ERROR] [terror.go:307] [“encountered error”] [error=EOF] [stack=“github.com/pingcap/tidb/parser/terror.Log
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/parser/terror/terror.go:307
github.com/pingcap/tidb/server.(*Server).onConn
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/server/server.go:519”]
[2022/02/15 20:03:27.784 +08:00] [ERROR] [terror.go:307] [“encountered error”] [error=EOF] [stack=“github.com/pingcap/tidb/parser/terror.Log
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/parser/terror/terror.go:307
github.com/pingcap/tidb/server.(*Server).onConn
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/server/server.go:519”]
[2022/02/15 20:03:42.799 +08:00] [ERROR] [terror.go:307] [“encountered error”] [error=EOF] [stack=“github.com/pingcap/tidb/parser/terror.Log
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/parser/terror/terror.go:307
github.com/pingcap/tidb/server.(*Server).onConn
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/server/server.go:519”]
[2022/02/15 20:03:57.816 +08:00] [ERROR] [terror.go:307] [“encountered error”] [error=EOF] [stack=“github.com/pingcap/tidb/parser/terror.Log
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/parser/terror/terror.go:307
github.com/pingcap/tidb/server.(*Server).onConn
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/server/server.go:519”]
[2022/02/15 20:04:12.832 +08:00] [ERROR] [terror.go:307] [“encountered error”] [error=“read tcp 10.30.128.105:4000->10.30.128.105:35797: read: connection reset by peer”] [stack=“github.com/pingcap/tidb/parser/terror.Log
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/parser/terror/terror.go:307
github.com/pingcap/tidb/server.(*Server).onConn
\t/home/jenkins/agent/workspace/optimization-build-tidb-linux-amd/go/src/github.com/pingcap/tidb/server/server.go:519”]

1 个赞

这个信息可以作为参考,但是也请用 tcpdump 把访问 4000 端口的包抓出来吧。tcpdump 使用方式,可以搜索下,示例文档如下:

2 个赞

tidbserver节点ip:10.30.128.105

1、使用tcpdump -i eth0 port 4000命令,啥也抓不到

2、使用如下命令:tcpdump -i eth0 -s0 host 10.30.128.105 -w /data/10.30.128.105_20220215.pcap

抓到的包没看出有啥异常,抓包内容见附件
10.30.128.105_20220215.pcap (1.5 MB)

1 个赞

@zhenjiaogao 大佬,这个问题看的咋样,还需要我提供啥信息吗

  1. 这个环境中,tidb server 上层有使用 lb 产品或组件吗?

  2. ip addr 提供下 10.30.128.105 对应的网卡名

  3. tiup cluster display {cluster_name} 看下集群的拓扑

  4. 上面的 tcpdump 包的命令是在 10.30.128.105 服务器运行的不?如果是,方便用这个命令再抓一次吗?辛苦tcpdump -i any port 4000 -w xxx

1、tidb server 上层有consul层提供访问域名,但是目前业务全部停了,数据库也没有连接了;
2、10.30.128.105 对应的网卡名是eth0


3、集群拓扑如下:

4、使用tcpdump -i any port 4000 -w /data/10.30.128.105_20220216.pcap命令抓包结果见附件
10.30.128.105_20220216.pcap (6.4 KB)

数据包内容如下:

原始的 tcpdump 内容如下:

17:51:34.854020 IP 10.30.128.105.29230 > 10.30.128.105.terabase: Flags [S], seq 1253906537, win 43690, options [mss 65495], length 0
17:51:34.854033 IP 10.30.128.105.terabase > 10.30.128.105.29230: Flags [S.], seq 3903843819, ack 1253906538, win 43690, options [mss 65495], length 0
17:51:34.854044 IP 10.30.128.105.29230 > 10.30.128.105.terabase: Flags [.], ack 1, win 43690, length 0
17:51:34.854134 IP 10.30.128.105.29230 > 10.30.128.105.terabase: Flags [F.], seq 1, ack 1, win 43690, length 0
17:51:34.854305 IP 10.30.128.105.terabase > 10.30.128.105.29230: Flags [.], ack 2, win 43690, length 0
17:51:34.854334 IP 10.30.128.105.terabase > 10.30.128.105.29230: Flags [P.], seq 1:91, ack 2, win 43690, length 90
17:51:34.854345 IP 10.30.128.105.29230 > 10.30.128.105.terabase: Flags [R], seq 1253906539, win 0, length 0
17:51:49.868525 IP 10.30.128.105.29272 > 10.30.128.105.terabase: Flags [S], seq 520147196, win 43690, options [mss 65495], length 0
17:51:49.868593 IP 10.30.128.105.terabase > 10.30.128.105.29272: Flags [S.], seq 1875314534, ack 520147197, win 43690, options [mss 65495], length 0
17:51:49.868621 IP 10.30.128.105.29272 > 10.30.128.105.terabase: Flags [.], ack 1, win 43690, length 0
17:51:49.868710 IP 10.30.128.105.29272 > 10.30.128.105.terabase: Flags [F.], seq 1, ack 1, win 43690, length 0
17:51:49.868952 IP 10.30.128.105.terabase > 10.30.128.105.29272: Flags [P.], seq 1:91, ack 2, win 43690, length 90
17:51:49.868968 IP 10.30.128.105.29272 > 10.30.128.105.terabase: Flags [R], seq 520147198, win 0, length 0
17:52:04.885999 IP 10.30.128.105.29304 > 10.30.128.105.terabase: Flags [S], seq 3574505248, win 43690, options [mss 65495], length 0
17:52:04.886016 IP 10.30.128.105.terabase > 10.30.128.105.29304: Flags [S.], seq 687320293, ack 3574505249, win 43690, options [mss 65495], length 0
17:52:04.886038 IP 10.30.128.105.29304 > 10.30.128.105.terabase: Flags [.], ack 1, win 43690, length 0
17:52:04.886216 IP 10.30.128.105.29304 > 10.30.128.105.terabase: Flags [F.], seq 1, ack 1, win 43690, length 0
17:52:04.886299 IP 10.30.128.105.terabase > 10.30.128.105.29304: Flags [.], ack 2, win 43690, length 0
17:52:04.886317 IP 10.30.128.105.terabase > 10.30.128.105.29304: Flags [P.], seq 1:91, ack 2, win 43690, length 90
17:52:04.886331 IP 10.30.128.105.29304 > 10.30.128.105.terabase: Flags [R], seq 3574505250, win 0, length 0
17:52:19.901650 IP 10.30.128.105.29338 > 10.30.128.105.terabase: Flags [S], seq 2918664409, win 43690, options [mss 65495], length 0
17:52:19.901664 IP 10.30.128.105.terabase > 10.30.128.105.29338: Flags [S.], seq 2953225914, ack 2918664410, win 43690, options [mss 65495], length 0
17:52:19.901674 IP 10.30.128.105.29338 > 10.30.128.105.terabase: Flags [.], ack 1, win 43690, length 0

时间基本上非常规律,15s 一次,请确认下『consul层』对 tidb server 的心跳检查机制是怎样的,比如心跳检查的时间间隔 ~

感谢你的回复,我确认了一下consul层的健康检查时间间隔就是15s,符合日志输出的规律

但是我有个疑问,为啥会在5.3.0这个版本出现这个问题呢?这个报错日志对应的功能具体是干什么的呢?
在低版本5.0.3这个版本就不会出现这个问题;

另外这个问题,有什么办法解决或规避掉吗?

这个报错目前看确实是不友好的,如果 tidb 上层是其他的 lb 产品,也会出现这个问题,针对这个问题,已经有了 pr 来进行 fix:

https://github.com/pingcap/tidb/pull/31081

好吧,多谢;那我关注下这个问题什么时候解决吧

此话题已在最后回复的 1 分钟后被自动关闭。不再允许新回复。