【 TiDB 使用环境】生产环境 /测试/ Poc
【 TiDB 版本】
【复现路径】做过哪些操作出现的问题
【遇到的问题:问题现象及影响】
升级集群版本v4.0.9 → v5.4.3 升级后 tidb日志大量报错:
[stack="github.com/pingcap/tidb/parser/terror.Log\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/parser/terror/terror.go:307\ngithub.com/pingcap/tidb/server.(*Server).onConn\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/server/server.go:516"]
[2023/04/21 11:06:12.400 +08:00] [ERROR] [terror.go:307] ["encountered error"] [error="write tcp 192.168.241.72:4000->192.168.241.55:21118: write: connection reset by peer"] [stack="github.com/pingcap/tidb/parser/terror.Log\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/parser/terror/terror.go:307\ngithub.com/pingcap/tidb/server.(*Server).onConn\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/server/server.go:516"]
[2023/04/21 11:06:12.444 +08:00] [ERROR] [terror.go:307] ["encountered error"] [error="write tcp 192.168.241.72:4000->192.168.241.55:21123: write: connection reset by peer"] [stack="github.com/pingcap/tidb/parser/terror.Log\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/parser/terror/terror.go:307\ngithub.com/pingcap/tidb/server.(*Server).onConn\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/server/server.go:516"]
[2023/04/21 11:06:12.507 +08:00] [ERROR] [terror.go:307] ["encountered error"] [error="write tcp 192.168.241.72:4000->192.168.241.54:40415: write: connection reset by peer"] [stack="github.com/pingcap/tidb/parser/terror.Log\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/parser/terror/terror.go:307\ngithub.com/pingcap/tidb/server.(*Server).onConn\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/server/server.go:516"]
[2023/04/21 11:06:12.519 +08:00] [ERROR] [terror.go:307] ["encountered error"] [error="write tcp 192.168.241.72:4000->192.168.241.54:40416: write: connection reset by peer"] [stack="github.com/pingcap/tidb/parser/terror.Log\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/parser/terror/terror.go:307\ngithub.com/pingcap/tidb/server.(*Server).onConn\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/tidb/server/server.go:516"]
搜了下asktug上也有很多这个问题。大部分都是关闭了负载均衡的健康检查,但好像都没有最终的解决方案。
asktug上有大佬说修改haproxy的探活端口,这个具体也没说怎么实现。 官方文档中对应的haproxy配置也没有说明~~~
我在v4.0.9中没遇到这问题,升级到v5.4.3遇到了。 我想问下这是BUG吗?在v6.5.1中有没有解决呢? 因为我的目标版本是v6.5.1
以下是我的haproxy配置文件,也是参照的官方的配置
# cat /etc/haproxy/haproxy.cfg
global
log 127.0.0.1 local2
chroot /var/lib/haproxy
pidfile /var/run/haproxy.pid
maxconn 4000
user haproxy
group haproxy
nbproc 10
daemon
stats socket /var/lib/haproxy/stats
defaults
log global
retries 2
timeout connect 2s
timeout client 30000s
timeout server 30000s
listen admin_stats
bind 192.168.241.54:18080
mode http
option httplog
maxconn 10
stats refresh 30s
stats uri /haproxy
stats realm HAProxy
stats auth admin:UXnxFu5Mxxxxxxxxxxxx
stats hide-version
stats admin if TRUE
listen tidb-xxxxx
bind 0.0.0.0:14000
mode tcp
balance leastconn
server tidb-71 192.168.241.71:4000 send-proxy check inter 2000 rise 2 fall 3
server tidb-72 192.168.241.72:4000 send-proxy check inter 2000 rise 2 fall 3
server tidb-73 192.168.241.73:4000 send-proxy check inter 2000 rise 2 fall 3