3.0.12 报错

一直正常运行的业务。版本:3.0.12。今天不断频繁重启服务
实际环境并无:/home/jenkins 目录

[2020/10/12 12:11:03.009 +08:00] [INFO] [server.go:367] [“new connection”] [conn=201] [remoteAddr=172.16.116.95:24143]
[2020/10/12 12:11:03.013 +08:00] [INFO] [set.go:192] [“set session var”] [conn=200] [name=character_set_results] [val=NULL]
[2020/10/12 12:11:03.013 +08:00] [INFO] [set.go:192] [“set session var”] [conn=200] [name=tx_read_only] [val=0]
[2020/10/12 12:11:03.013 +08:00] [INFO] [set.go:192] [“set session var”] [conn=200] [name=transaction_read_only] [val=0]
[2020/10/12 12:11:03.016 +08:00] [INFO] [set.go:192] [“set session var”] [conn=201] [name=character_set_results] [val=NULL]
[2020/10/12 12:11:03.017 +08:00] [INFO] [set.go:192] [“set session var”] [conn=201] [name=tx_read_only] [val=0]
[2020/10/12 12:11:03.017 +08:00] [INFO] [set.go:192] [“set session var”] [conn=201] [name=transaction_read_only] [val=0]
[2020/10/12 12:11:05.053 +08:00] [INFO] [server.go:367] [“new connection”] [conn=202] [remoteAddr=172.16.116.94:47391]
[2020/10/12 12:11:05.056 +08:00] [INFO] [server.go:370] [“connection closed”] [conn=202]
[2020/10/12 12:11:06.253 +08:00] [INFO] [server.go:367] [“new connection”] [conn=203] [remoteAddr=172.16.118.7:56601]
[2020/10/12 12:11:06.254 +08:00] [INFO] [set.go:192] [“set session var”] [conn=203] [name=sql_mode] [val=NO_ENGINE_SUBSTITUTION]
[2020/10/12 12:11:06.268 +08:00] [INFO] [server.go:370] [“connection closed”] [conn=203]
[2020/10/12 12:11:08.062 +08:00] [INFO] [server.go:367] [“new connection”] [conn=204] [remoteAddr=172.16.116.94:47397]
[2020/10/12 12:11:08.066 +08:00] [INFO] [server.go:370] [“connection closed”] [conn=204]
[2020/10/12 12:11:09.656 +08:00] [INFO] [server.go:367] [“new connection”] [conn=205] [remoteAddr=172.16.118.7:57027]
[2020/10/12 12:11:09.656 +08:00] [INFO] [set.go:192] [“set session var”] [conn=205] [name=sql_mode] [val=NO_ENGINE_SUBSTITUTION]
[2020/10/12 12:11:09.672 +08:00] [INFO] [server.go:370] [“connection closed”] [conn=205]
[2020/10/12 12:11:11.072 +08:00] [INFO] [server.go:367] [“new connection”] [conn=206] [remoteAddr=172.16.116.94:47401]
[2020/10/12 12:11:11.075 +08:00] [INFO] [server.go:370] [“connection closed”] [conn=206]
[2020/10/12 12:11:14.082 +08:00] [INFO] [server.go:367] [“new connection”] [conn=207] [remoteAddr=172.16.116.94:47407]
[2020/10/12 12:11:14.087 +08:00] [INFO] [server.go:370] [“connection closed”] [conn=207]
[2020/10/12 12:11:16.813 +08:00] [INFO] [server.go:367] [“new connection”] [conn=208] [remoteAddr=172.16.118.7:58391]
[2020/10/12 12:11:16.813 +08:00] [INFO] [set.go:192] [“set session var”] [conn=208] [name=sql_mode] [val=NO_ENGINE_SUBSTITUTION]
2020/10/12 12:11:16.814 terror.go:357: [error] EOF
github.com/pingcap/errors.AddStack
/home/jenkins/agent/workspace/tidb_v3.0.12/go/pkg/mod/github.com/pingcap/errors@v0.11.4/errors.go:174
github.com/pingcap/errors.Trace
/home/jenkins/agent/workspace/tidb_v3.0.12/go/pkg/mod/github.com/pingcap/errors@v0.11.4/juju_adaptor.go:15
github.com/pingcap/tidb/server.(*packetIO).readOnePacket
/home/jenkins/agent/workspace/tidb_v3.0.12/go/src/github.com/pingcap/tidb/server/packetio.go:80
github.com/pingcap/tidb/server.(*packetIO).readPacket
/home/jenkins/agent/workspace/tidb_v3.0.12/go/src/github.com/pingcap/tidb/server/packetio.go:105
github.com/pingcap/tidb/server.(*clientConn).readPacket
/home/jenkins/agent/workspace/tidb_v3.0.12/go/src/github.com/pingcap/tidb/server/conn.go:265
github.com/pingcap/tidb/server.(*clientConn).readOptionalSSLRequestAndHandshakeResponse
/home/jenkins/agent/workspace/tidb_v3.0.12/go/src/github.com/pingcap/tidb/server/conn.go:471
github.com/pingcap/tidb/server.(*clientConn).handshake
/home/jenkins/agent/workspace/tidb_v3.0.12/go/src/github.com/pingcap/tidb/server/conn.go:172
github.com/pingcap/tidb/server.(*Server).onConn
/home/jenkins/agent/workspace/tidb_v3.0.12/go/src/github.com/pingcap/tidb/server/server.go:345
runtime.goexit
/usr/local/go/src/runtime/asm_amd64.s:1357

Hi 麻烦上传完整的 tidb.log,tidb_stderr.log 日志,可以查下 dmesg -T | grep -i oom 看看是否有 oom,另外业务上有没有做过什么变更

tidb_stderr.log (8.8 MB)
[Sun Oct 11 20:33:10 2020] [] oom_kill_process+0x254/0x3d0
[Sun Oct 11 20:33:10 2020] [] ? oom_unkillable_task+0xcd/0x120
[Sun Oct 11 20:33:10 2020] [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name
[Mon Oct 12 00:01:17 2020] tidb-server invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0
[Mon Oct 12 00:01:17 2020] [] oom_kill_process+0x254/0x3d0
[Mon Oct 12 00:01:17 2020] [] ? oom_unkillable_task+0xcd/0x120
[Mon Oct 12 00:01:17 2020] [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name
[Mon Oct 12 00:01:17 2020] pd-server invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0
[Mon Oct 12 00:01:17 2020] [] oom_kill_process+0x254/0x3d0
[Mon Oct 12 00:01:17 2020] [] ? oom_unkillable_task+0xcd/0x120
[Mon Oct 12 00:01:17 2020] [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name
[Mon Oct 12 00:01:17 2020] pd-server invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0
[Mon Oct 12 00:01:17 2020] [] oom_kill_process+0x254/0x3d0
[Mon Oct 12 00:01:17 2020] [] ? oom_unkillable_task+0xcd/0x120
[Mon Oct 12 00:01:17 2020] [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name
[Mon Oct 12 00:01:18 2020] pd-server invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0
[Mon Oct 12 00:01:18 2020] [] oom_kill_process+0x254/0x3d0
[Mon Oct 12 00:01:18 2020] [] ? oom_unkillable_task+0xcd/0x120
[Mon Oct 12 00:01:18 2020] [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name
[Mon Oct 12 01:32:45 2020] tidb-server invoked oom-killer: gfp_mask=0x280da, order=0, oom_score_adj=0
[Mon Oct 12 01:32:45 2020] [] oom_kill_process+0x254/0x3d0
[Mon Oct 12 01:32:45 2020] [] ? oom_unkillable_task+0xcd/0x120
[Mon Oct 12 01:32:45 2020] [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name
[Mon Oct 12 03:37:09 2020] tidb-server invoked oom-killer: gfp_mask=0x280da, order=0, oom_score_adj=0
[Mon Oct 12 03:37:09 2020] [] oom_kill_process+0x254/0x3d0
[Mon Oct 12 03:37:09 2020] [] ? oom_unkillable_task+0xcd/0x120
[Mon Oct 12 03:37:09 2020] [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name

dmesg 记录是有 oom,可以查下 tidb 监控 Server- Memory Usage 内存是否有明显上涨,确认下今天是否有临时任务在跑,比如批量导数/抽数之类,tidb 日志也可能有记录 expensive query 或 Out Of Memory Quota 相关的 sql