tidb-server使用内存突然为0

【 TiDB 使用环境】生产环境
【 TiDB 版本】 v7.5.4
【遇到的问题:问题现象及影响】某些时间,监控中显示tidb-server使用的内存突然变成0,程序中日志提示连接数据库超时。但是监控显示,tidb-server未重启
【附件:截图/日志/监控】监控图如下:

日志如下:
[2024/12/13 15:45:05.137 +08:00] [ERROR] [distsql.go:1486] [“table reader fetch next chunk failed”] [conn=807630880] [session_alias=] [error=“context canceled”]
[2024/12/13 15:45:05.137 +08:00] [ERROR] [distsql.go:1486] [“table reader fetch next chunk failed”] [conn=807630880] [session_alias=] [error=“context canceled”]
[2024/12/13 15:45:05.137 +08:00] [ERROR] [distsql.go:1486] [“table reader fetch next chunk failed”] [conn=807630880] [session_alias=] [error=“context canceled”]
[2024/12/13 15:45:05.137 +08:00] [ERROR] [distsql.go:1486] [“table reader fetch next chunk failed”] [conn=807630880] [session_alias=] [error=“context canceled”]
[2024/12/13 15:46:56.700 +08:00] [ERROR] [distsql.go:1486] [“table reader fetch next chunk failed”] [conn=807630880] [session_alias=] [error=“context canceled”]
[2024/12/13 15:46:56.700 +08:00] [ERROR] [distsql.go:1486] [“table reader fetch next chunk failed”] [conn=807630880] [session_alias=] [error=“context canceled”]
[2024/12/13 15:46:56.700 +08:00] [ERROR] [distsql.go:1486] [“table reader fetch next chunk failed”] [conn=807630880] [session_alias=] [error=“context canceled”]
[2024/12/13 15:46:56.700 +08:00] [ERROR] [distsql.go:1486] [“table reader fetch next chunk failed”] [conn=807630880] [session_alias=] [error=“context canceled”]
[2024/12/13 17:27:50.038 +08:00] [ERROR] [pd_service_discovery.go:284] [“[pd] failed to update service mode”] [urls=“[http://10.0.0.14:2379]”] [error=“[PD:client:ErrClientGetClusterInfo]error:rpc error: code = DeadlineExceeded desc = context deadline exceeded target:10.0.0.14:2379 status:READY: error:rpc error: code = DeadlineExceeded desc = context deadline exceeded target:10.0.0.14:2379 status:READY”]
[2024/12/13 17:27:33.538 +08:00] [ERROR] [tso_dispatcher.go:202] [“[tso] tso request is canceled due to timeout”] [dc-location=global] [error=“[PD:client:ErrClientGetTSOTimeout]get TSO timeout”]
[2024/12/13 17:30:05.088 +08:00] [ERROR] [tso_dispatcher.go:202] [“[tso] tso request is canceled due to timeout”] [dc-location=global] [error=“[PD:client:ErrClientGetTSOTimeout]get TSO timeout”]
[2024/12/13 17:30:16.938 +08:00] [ERROR] [terror.go:324] [“encountered error”] [error=“write tcp 10.0.0.14:4000->10.0.0.3:49290: write: broken pipe”] [stack=“github.com/pingcap/tidb/pkg/parser/terror.Log\n\t/workspace/source/tidb/pkg/parser/terror/terror.go:324\ngithub.com/pingcap/tidb/pkg/server/internal.(*PacketIO).WritePacket\n\t/workspace/source/tidb/pkg/server/internal/packetio.go:284\ngithub.com/pingcap/tidb/pkg/server.(*clientConn).writePacket\n\t/workspace/source/tidb/pkg/server/conn.go:466\ngithub.com/pingcap/tidb/pkg/server.(*clientConn).writeError\n\t/workspace/source/tidb/pkg/server/conn.go:1515\ngithub.com/pingcap/tidb/pkg/server.(*clientConn).Run\n\t/workspace/source/tidb/pkg/server/conn.go:1141\ngithub.com/pingcap/tidb/pkg/server.(*Server).onConn\n\t/workspace/source/tidb/pkg/server/server.go:737”]
[2024/12/13 17:30:17.439 +08:00] [ERROR] [region_cache.go:2676] [“loadStore from PD failed”] [id=5] [error=“rpc error: code = DeadlineExceeded desc = context deadline exceeded”]
[2024/12/13 17:30:17.440 +08:00] [ERROR] [error.go:334] [“encountered error”] [error=“rpc error: code = DeadlineExceeded desc = context deadline exceeded”] [stack=“github.com/tikv/client-go/v2/error.Log\n\t/root/go/pkg/mod/github.com/tikv/client-go/v2@v2.0.8-0.20240920100427-3725b31fa3c0/error/error.go:334\ngithub.com/tikv/client-go/v2/internal/locate.(*RegionCache).checkAndResolve\n\t/root/go/pkg/mod/github.com/tikv/client-go/v2@v2.0.8-0.20240920100427-3725b31fa3c0/internal/locate/region_cache.go:633\ngithub.com/tikv/client-go/v2/internal/locate.(*RegionCache).asyncCheckAndResolveLoop\n\t/root/go/pkg/mod/github.com/tikv/client-go/v2@v2.0.8-0.20240920100427-3725b31fa3c0/internal/locate/region_cache.go:586”]
[2024/12/13 17:30:17.440 +08:00] [ERROR] [terror.go:324] [“encountered error”] [error=“connection was bad”] [stack=“github.com/pingcap/tidb/pkg/parser/terror.Log\n\t/workspace/source/tidb/pkg/parser/terror/terror.go:324\ngithub.com/pingcap/tidb/pkg/server.(*clientConn).Run\n\t/workspace/source/tidb/pkg/server/conn.go:1142\ngithub.com/pingcap/tidb/pkg/server.(*Server).onConn\n\t/workspace/source/tidb/pkg/server/server.go:737”]
[2024/12/13 17:30:18.191 +08:00] [ERROR] [controller.go:537] [“[resource group controller] token bucket rpc error”] [error=“rpc error: code = Unavailable desc = keepalive ping failed to receive ACK within timeout”]
[2024/12/13 17:30:18.292 +08:00] [ERROR] [terror.go:324] [“encountered error”] [error=“write tcp 10.0.0.14:4000->10.0.0.3:56042: write: broken pipe”] [stack=“github.com/pingcap/tidb/pkg/parser/terror.Log\n\t/workspace/source/tidb/pkg/parser/terror/terror.go:324\ngithub.com/pingcap/tidb/pkg/server/internal.(*PacketIO).WritePacket\n\t/workspace/source/tidb/pkg/server/internal/packetio.go:284\ngithub.com/pingcap/tidb/pkg/server.(*clientConn).writePacket\n\t/workspace/source/tidb/pkg/server/conn.go:466\ngithub.com/pingcap/tidb/pkg/server.(*clientConn).writeError\n\t/workspace/source/tidb/pkg/server/conn.go:1515\ngithub.com/pingcap/tidb/pkg/server.(*clientConn).Run\n\t/workspace/source/tidb/pkg/server/conn.go:1141\ngithub.com/pingcap/tidb/pkg/server.(*Server).onConn\n\t/workspace/source/tidb/pkg/server/server.go:737”]
[2024/12/13 17:30:18.292 +08:00] [ERROR] [terror.go:324] [“encountered error”] [error=“connection was bad”] [stack=“github.com/pingcap/tidb/pkg/parser/terror.Log\n\t/workspace/source/tidb/pkg/parser/terror/terror.go:324\ngithub.com/pingcap/tidb/pkg/server.(*clientConn).Run\n\t/workspace/source/tidb/pkg/server/conn.go:1142\ngithub.com/pingcap/tidb/pkg/server.(*Server).onConn\n\t/workspace/source/tidb/pkg/server/server.go:737”]
[2024/12/13 17:30:18.293 +08:00] [ERROR] [terror.go:324] [“encountered error”] [error=“write tcp 10.0.0.14:4000->10.0.0.3:44748: write: broken pipe”] [stack=“github.com/pingcap/tidb/pkg/parser/terror.Log\n\t/workspace/source/tidb/pkg/parser/terror/terror.go:324\ngithub.com/pingcap/tidb/pkg/server/internal.(*PacketIO).WritePacket\n\t/workspace/source/tidb/pkg/server/internal/packetio.go:284\ngithub.com/pingcap/tidb/pkg/server.(*clientConn).writePacket\n\t/workspace/source/tidb/pkg/server/conn.go:466\ngithub.com/pingcap/tidb/pkg/server.(*clientConn).writeError\n\t/workspace/source/tidb/pkg/server/conn.go:1515\ngithub.com/pingcap/tidb/pkg/server.(*clientConn).Run\n\t/workspace/source/tidb/pkg/server/conn.go:1141\ngithub.com/pingcap/tidb/pkg/server.(*Server).onConn\n\t/workspace/source/tidb/pkg/server/server.go:737”]
[2024/12/13 17:30:18.293 +08:00] [ERROR] [terror.go:324] [“encountered error”] [error=“connection was bad”] [stack=“github.com/pingcap/tidb/pkg/parser/terror.Log\n\t/workspace/source/tidb/pkg/parser/terror/terror.go:324\ngithub.com/pingcap/tidb/pkg/server.(*clientConn).Run\n\t/workspace/source/tidb/pkg/server/conn.go:1142\ngithub.com/pingcap/tidb/pkg/server.(*Server).onConn\n\t/workspace/source/tidb/pkg/server/server.go:737”]
[2024/12/13 17:30:18.293 +08:00] [ERROR] [region_cache.go:2676] [“loadStore from PD failed”] [id=2] [error=“rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 10.0.0.14:2379: i/o timeout"”]
[2024/12/13 17:30:18.293 +08:00] [ERROR] [error.go:334] [“encountered error”] [error=“rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 10.0.0.14:2379: i/o timeout"”] [stack=“github.com/tikv/client-go/v2/error.Log\n\t/root/go/pkg/mod/github.com/tikv/client-go/v2@v2.0.8-0.20240920100427-3725b31fa3c0/error/error.go:334\ngithub.com/tikv/client-go/v2/internal/locate.(*RegionCache).checkAndResolve\n\t/root/go/pkg/mod/github.com/tikv/client-go/v2@v2.0.8-0.20240920100427-3725b31fa3c0/internal/locate/region_cache.go:633\ngithub.com/tikv/client-go/v2/internal/locate.(*RegionCache).asyncCheckAndResolveLoop\n\t/root/go/pkg/mod/github.com/tikv/client-go/v2@v2.0.8-0.20240920100427-3725b31fa3c0/internal/locate/region_cache.go:586”]
[2024/12/13 17:30:18.293 +08:00] [ERROR] [region_cache.go:2676] [“loadStore from PD failed”] [id=1] [error=“rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 10.0.0.14:2379: i/o timeout"”]
[2024/12/13 17:30:18.293 +08:00] [ERROR] [error.go:334] [“encountered error”] [error=“rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 10.0.0.14:2379: i/o timeout"”] [stack=“github.com/tikv/client-go/v2/error.Log\n\t/root/go/pkg/mod/github.com/tikv/client-go/v2@v2.0.8-0.20240920100427-3725b31fa3c0/error/error.go:334\ngithub.com/tikv/client-go/v2/internal/locate.(*RegionCache).checkAndResolve\n\t/root/go/pkg/mod/github.com/tikv/client-go/v2@v2.0.8-0.20240920100427-3725b31fa3c0/internal/locate/region_cache.go:633\ngithub.com/tikv/client-go/v2/internal/locate.(*RegionCache).asyncCheckAndResolveLoop\n\t/root/go/pkg/mod/github.com/tikv/client-go/v2@v2.0.8-0.20240920100427-3725b31fa3c0/internal/locate/region_cache.go:586”]
[2024/12/13 17:30:18.294 +08:00] [ERROR] [tso_dispatcher.go:498] [“[tso] getTS error after processing requests”] [dc-location=global] [stream-addr=http://10.0.0.14:2379] [error=“[PD:client:ErrClientGetTSO]get TSO failed, %v: rpc error: code = Canceled desc = context canceled”]
[2024/12/13 17:30:18.294 +08:00] [ERROR] [kv.go:302] [“fail to load safepoint from pd”] [error=“context deadline exceeded”]
[2024/12/13 17:30:16.838 +08:00] [ERROR] [pd_service_discovery.go:257] [“[pd] failed to update member”] [urls=“[http://10.0.0.14:2379]”] [error=“[PD:client:ErrClientGetMember]get member failed”]
[2024/12/13 17:30:18.295 +08:00] [ERROR] [terror.go:324] [“encountered error”] [error=“write tcp 10.0.0.14:4000->10.0.0.3:44778: write: broken pipe”] [stack=“github.com/pingcap/tidb/pkg/parser/terror.Log\n\t/workspace/source/tidb/pkg/parser/terror/terror.go:324\ngithub.com/pingcap/tidb/pkg/server/internal.(*PacketIO).WritePacket\n\t/workspace/source/tidb/pkg/server/internal/packetio.go:284\ngithub.com/pingcap/tidb/pkg/server.(*clientConn).writePacket\n\t/workspace/source/tidb/pkg/server/conn.go:466\ngithub.com/pingcap/tidb/pkg/server.(*clientConn).writeError\n\t/workspace/source/tidb/pkg/server/conn.go:1515\ngithub.com/pingcap/tidb/pkg/server.(*clientConn).Run\n\t/workspace/source/tidb/pkg/server/conn.go:1141\ngithub.com/pingcap/tidb/pkg/server.(*Server).onConn\n\t/workspace/source/tidb/pkg/server/server.go:737”]
[2024/12/13 17:30:18.295 +08:00] [ERROR] [terror.go:324] [“encountered error”] [error=“connection was bad”] [stack=“github.com/pingcap/tidb/pkg/parser/terror.Log\n\t/workspace/source/tidb/pkg/parser/terror/terror.go:324\ngithub.com/pingcap/tidb/pkg/server.(*clientConn).Run\n\t/workspace/source/tidb/pkg/server/conn.go:1142\ngithub.com/pingcap/tidb/pkg/server.(*Server).onConn\n\t/workspace/source/tidb/pkg/server/server.go:737”]
[2024/12/13 17:30:18.296 +08:00] [ERROR] [pd.go:236] [“updateTS error”] [txnScope=global] [error=“rpc error: code = Canceled desc = context canceled”]
[2024/12/13 17:30:18.296 +08:00] [ERROR] [region_cache.go:2676] [“loadStore from PD failed”] [id=5] [error=“rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 10.0.0.14:2379: i/o timeout"”]
[2024/12/13 17:30:18.296 +08:00] [ERROR] [error.go:334] [“encountered error”] [error=“rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 10.0.0.14:2379: i/o timeout"”] [stack=“github.com/tikv/client-go/v2/error.Log\n\t/root/go/pkg/mod/github.com/tikv/client-go/v2@v2.0.8-0.20240920100427-3725b31fa3c0/error/error.go:334\ngithub.com/tikv/client-go/v2/internal/locate.(*RegionCache).checkAndResolve\n\t/root/go/pkg/mod/github.com/tikv/client-go/v2@v2.0.8-0.20240920100427-3725b31fa3c0/internal/locate/region_cache.go:633\ngithub.com/tikv/client-go/v2/internal/locate.(*RegionCache).asyncCheckAndResolveLoop\n\t/root/go/pkg/mod/github.com/tikv/client-go/v2@v2.0.8-0.20240920100427-3725b31fa3c0/internal/locate/region_cache.go:586”]
[2024/12/13 17:30:18.296 +08:00] [ERROR] [region_cache.go:2676] [“loadStore from PD failed”] [id=2] [error=“rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 10.0.0.14:2379: i/o timeout"”]
[2024/12/13 17:30:18.296 +08:00] [ERROR] [error.go:334] [“encountered error”] [error=“rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 10.0.0.14:2379: i/o timeout"”] [stack=“github.com/tikv/client-go/v2/error.Log\n\t/root/go/pkg/mod/github.com/tikv/client-go/v2@v2.0.8-0.20240920100427-3725b31fa3c0/error/error.go:334\ngithub.com/tikv/client-go/v2/internal/locate.(*RegionCache).checkAndResolve\n\t/root/go/pkg/mod/github.com/tikv/client-go/v2@v2.0.8-0.20240920100427-3725b31fa3c0/internal/locate/region_cache.go:633\ngithub.com/tikv/client-go/v2/internal/locate.(*RegionCache).asyncCheckAndResolveLoop\n\t/root/go/pkg/mod/github.com/tikv/client-go/v2@v2.0.8-0.20240920100427-3725b31fa3c0/internal/locate/region_cache.go:586”]
[2024/12/13 17:30:18.296 +08:00] [ERROR] [region_cache.go:2676] [“loadStore from PD failed”] [id=1] [error=“rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 10.0.0.14:2379: i/o timeout"”]
[2024/12/13 17:30:18.296 +08:00] [ERROR] [error.go:334] [“encountered error”] [error=“rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 10.0.0.14:2379: i/o timeout"”] [stack=“github.com/tikv/client-go/v2/error.Log\n\t/root/go/pkg/mod/github.com/tikv/client-go/v2@v2.0.8-0.20240920100427-3725b31fa3c0/error/error.go:334\ngithub.com/tikv/client-go/v2/internal/locate.(*RegionCache).checkAndResolve\n\t/root/go/pkg/mod/github.com/tikv/client-go/v2@v2.0.8-0.20240920100427-3725b31fa3c0/internal/locate/region_cache.go:633\ngithub.com/tikv/client-go/v2/internal/locate.(*RegionCache).asyncCheckAndResolveLoop\n\t/root/go/pkg/mod/github.com/tikv/client-go/v2@v2.0.8-0.20240920100427-3725b31fa3c0/internal/locate/region_cache.go:586”]
[2024/12/13 17:30:18.298 +08:00] [ERROR] [terror.go:324] [“encountered error”] [error=“write tcp 10.0.0.14:4000->10.0.0.3:36074: write: broken pipe”] [stack=“github.com/pingcap/tidb/pkg/parser/terror.Log\n\t/workspace/source/tidb/pkg/parser/terror/terror.go:324\ngithub.com/pingcap/tidb/pkg/server/internal.(*PacketIO).WritePacket\n\t/workspace/source/tidb/pkg/server/internal/packetio.go:284\ngithub.com/pingcap/tidb/pkg/server.(*clientConn).writePacket\n\t/workspace/source/tidb/pkg/server/conn.go:466\ngithub.com/pingcap/tidb/pkg/server.(*clientConn).writeError\n\t/workspace/source/tidb/pkg/server/conn.go:1515\ngithub.com/pingcap/tidb/pkg/server.(*clientConn).Run\n\t/workspace/source/tidb/pkg/server/conn.go:1141\ngithub.com/pingcap/tidb/pkg/server.(*Server).onConn\n\t/workspace/source/tidb/pkg/server/server.go:737”]
[2024/12/13 17:30:18.298 +08:00] [ERROR] [terror.go:324] [“encountered error”] [error=“connection was bad”] [stack=“github.com/pingcap/tidb/pkg/parser/terror.Log\n\t/workspace/source/tidb/pkg/parser/terror/terror.go:324\ngithub.com/pingcap/tidb/pkg/server.(*clientConn).Run\n\t/workspace/source/tidb/pkg/server/conn.go:1142\ngithub.com/pingcap/tidb/pkg/server.(*Server).onConn\n\t/workspace/source/tidb/pkg/server/server.go:737”]
[2024/12/13 17:30:18.300 +08:00] [ERROR] [terror.go:324] [“encountered error”] [error=“write tcp 10.0.0.14:4000->10.0.0.3:59048: write: broken pipe”] [stack=“github.com/pingcap/tidb/pkg/parser/terror.Log\n\t/workspace/source/tidb/pkg/parser/terror/terror.go:324\ngithub.com/pingcap/tidb/pkg/server/internal.(*PacketIO).WritePacket\n\t/workspace/source/tidb/pkg/server/internal/packetio.go:284\ngithub.com/pingcap/tidb/pkg/server.(*clientConn).writePacket\n\t/workspace/source/tidb/pkg/server/conn.go:466\ngithub.com/pingcap/tidb/pkg/server.(*clientConn).writeError\n\t/workspace/source/tidb/pkg/server/conn.go:1515\ngithub.com/pingcap/tidb/pkg/server.(*clientConn).Run\n\t/workspace/source/tidb/pkg/server/conn.go:1141\ngithub.com/pingcap/tidb/pkg/server.(*Server).onConn\n\t/workspace/source/tidb/pkg/server/server.go:737”]
[2024/12/13 17:30:18.300 +08:00] [ERROR] [terror.go:324] [“encountered error”] [error=“connection was bad”] [stack=“github.com/pingcap/tidb/pkg/parser/terror.Log\n\t/workspace/source/tidb/pkg/parser/terror/terror.go:324\ngithub.com/pingcap/tidb/pkg/server.(*clientConn).Run\n\t/workspace/source/tidb/pkg/server/conn.go:1142\ngithub.com/pingcap/tidb/pkg/server.(*Server).onConn\n\t/workspace/source/tidb/pkg/server/server.go:737”]
[2024/12/13 17:30:18.301 +08:00] [ERROR] [terror.go:324] [“encountered error”] [error=“write tcp 10.0.0.14:4000->10.0.0.3:44754: write: broken pipe”] [stack=“github.com/pingcap/tidb/pkg/parser/terror.Log\n\t/workspace/source/tidb/pkg/parser/terror/terror.go:324\ngithub.com/pingcap/tidb/pkg/server/internal.(*PacketIO).WritePacket\n\t/workspace/source/tidb/pkg/server/internal/packetio.go:284\ngithub.com/pingcap/tidb/pkg/server.(*clientConn).writePacket\n\t/workspace/source/tidb/pkg/server/conn.go:466\ngithub.com/pingcap/tidb/pkg/server.(*clientConn).writeError\n\t/workspace/source/tidb/pkg/server/conn.go:1515\ngithub.com/pingcap/tidb/pkg/server.(*clientConn).Run\n\t/workspace/source/tidb/pkg/server/conn.go:1141\ngithub.com/pingcap/tidb/pkg/server.(*Server).onConn\n\t/workspace/source/tidb/pkg/server/server.go:737”]
[2024/12/13 17:30:18.301 +08:00] [ERROR] [terror.go:324] [“encountered error”] [error=“connection was bad”] [stack=“github.com/pingcap/tidb/pkg/parser/terror.Log\n\t/workspace/source/tidb/pkg/parser/terror/terror.go:324\ngithub.com/pingcap/tidb/pkg/server.(*clientConn).Run\n\t/workspace/source/tidb/pkg/server/conn.go:1142\ngithub.com/pingcap/tidb/pkg/server.(*Server).onConn\n\t/workspace/source/tidb/pkg/server/server.go:737”]
[2024/12/13 17:30:18.307 +08:00] [ERROR] [terror.go:324] [“encountered error”] [error=“write tcp 10.0.0.14:4000->10.0.0.3:44874: write: broken pipe”] [stack=“github.com/pingcap/tidb/pkg/parser/terror.Log\n\t/workspace/source/tidb/pkg/parser/terror/terror.go:324\ngithub.com/pingcap/tidb/pkg/server/internal.(*PacketIO).WritePacket\n\t/workspace/source/tidb/pkg/server/internal/packetio.go:284\ngithub.com/pingcap/tidb/pkg/server.(*clientConn).writePacket\n\t/workspace/source/tidb/pkg/server/conn.go:466\ngithub.com/pingcap/tidb/pkg/server.(*clientConn).writeError\n\t/workspace/source/tidb/pkg/server/conn.go:1515\ngithub.com/pingcap/tidb/pkg/server.(*clientConn).Run\n\t/workspace/source/tidb/pkg/server/conn.go:1141\ngithub.com/pingcap/tidb/pkg/server.(*Server).onConn\n\t/workspace/source/tidb/pkg/server/server.go:737”]
[2024/12/13 17:30:18.307 +08:00] [ERROR] [terror.go:324] [“encountered error”] [error=“connection was bad”] [stack=“github.com/pingcap/tidb/pkg/parser/terror.Log\n\t/workspace/source/tidb/pkg/parser/terror/terror.go:324\ngithub.com/pingcap/tidb/pkg/server.(*clientConn).Run\n\t/workspace/source/tidb/pkg/server/conn.go:1142\ngithub.com/pingcap/tidb/pkg/server.(*Server).onConn\n\t/workspace/source/tidb/pkg/server/server.go:737”]
[2024/12/13 17:30:18.309 +08:00] [ERROR] [terror.go:324] [“encountered error”] [error=“write tcp 10.0.0.14:4000->10.0.0.3:49420: write: broken pipe”] [stack=“github.com/pingcap/tidb/pkg/parser/terror.Log\n\t/workspace/source/tidb/pkg/parser/terror/terror.go:324\ngithub.com/pingcap/tidb/pkg/server/internal.(*PacketIO).WritePacket\n\t/workspace/source/tidb/pkg/server/internal/packetio.go:284\ngithub.com/pingcap/tidb/pkg/server.(*clientConn).writePacket\n\t/workspace/source/tidb/pkg/server/conn.go:466\ngithub.com/pingcap/tidb/pkg/server.(*clientConn).writeError\n\t/workspace/source/tidb/pkg/server/conn.go:1515\ngithub.com/pingcap/tidb/pkg/server.(*clientConn).Run\n\t/workspace/source/tidb/pkg/server/conn.go:1141\ngithub.com/pingcap/tidb/pkg/server.(*Server).onConn\n\t/workspace/source/tidb/pkg/server/server.go:737”]
[2024/12/13 17:30:18.309 +08:00] [ERROR] [terror.go:324] [“encountered error”] [error=“connection was bad”] [stack=“github.com/pingcap/tidb/pkg/parser/terror.Log\n\t/workspace/source/tidb/pkg/parser/terror/terror.go:324\ngithub.com/pingcap/tidb/pkg/server.(*clientConn).Run\n\t/workspace/source/tidb/pkg/server/conn.go:1142\ngithub.com/pingcap/tidb/pkg/server.(*Server).onConn\n\t/workspace/source/tidb/pkg/server/server.go:737”]
[2024/12/13 17:30:18.309 +08:00] [ERROR] [terror.go:324] [“encountered error”] [error=“write tcp 10.0.0.14:4000->10.0.0.3:52990: write: broken pipe”] [stack=“github.com/pingcap/tidb/pkg/parser/terror.Log\n\t/workspace/source/tidb/pkg/parser/terror/terror.go:324\ngithub.com/pingcap/tidb/pkg/server/internal.(*PacketIO).WritePacket\n\t/workspace/source/tidb/pkg/server/internal/packetio.go:284\ngithub.com/pingcap/tidb/pkg/server.(*clientConn).writePacket\n\t/workspace/source/tidb/pkg/server/conn.go:466\ngithub.com/pingcap/tidb/pkg/server.(*clientConn).writeError\n\t/workspace/source/tidb/pkg/server/conn.go:1515\ngithub.com/pingcap/tidb/pkg/server.(*clientConn).Run\n\t/workspace/source/tidb/pkg/server/conn.go:1141\ngithub.com/pingcap/tidb/pkg/server.(*Server).onConn\n\t/workspace/source/tidb/pkg/server/server.go:737”]
[2024/12/13 17:30:18.309 +08:00] [ERROR] [terror.go:324] [“encountered error”] [error=“connection was bad”] [stack=“github.com/pingcap/tidb/pkg/parser/terror.Log\n\t/workspace/source/tidb/pkg/parser/terror/terror.go:324\ngithub.com/pingcap/tidb/pkg/server.(*clientConn).Run\n\t/workspace/source/tidb/pkg/server/conn.go:1142\ngithub.com/pingcap/tidb/pkg/server.(*Server).onConn\n\t/workspace/source/tidb/pkg/server/server.go:737”]
[2024/12/13 17:30:18.317 +08:00] [ERROR] [terror.go:324] [“encountered error”] [error=“write tcp 10.0.0.14:4000->10.0.0.3:41790: write: broken pipe”] [stack=“github.com/pingcap/tidb/pkg/parser/terror.Log\n\t/workspace/source/tidb/pkg/parser/terror/terror.go:324\ngithub.com/pingcap/tidb/pkg/server/internal.(*PacketIO).WritePacket\n\t/workspace/source/tidb/pkg/server/internal/packetio.go:284\ngithub.com/pingcap/tidb/pkg/server.(*clientConn).writePacket\n\t/workspace/source/tidb/pkg/server/conn.go:466\ngithub.com/pingcap/tidb/pkg/server.(*clientConn).writeError\n\t/workspace/source/tidb/pkg/server/conn.go:1515\ngithub.com/pingcap/tidb/pkg/server.(*clientConn).Run\n\t/workspace/source/tidb/pkg/server/conn.go:1141\ngithub.com/pingcap/tidb/pkg/server.(*Server).onConn\n\t/workspace/source/tidb/pkg/server/server.go:737”]
[2024/12/13 17:30:18.317 +08:00] [ERROR] [terror.go:324] [“encountered error”] [error=“connection was bad”] [stack=“github.com/pingcap/tidb/pkg/parser/terror.Log\n\t/workspace/source/tidb/pkg/parser/terror/terror.go:324\ngithub.com/pingcap/tidb/pkg/server.(*clientConn).Run\n\t/workspace/source/tidb/pkg/server/conn.go:1142\ngithub.com/pingcap/tidb/pkg/server.(*Server).onConn\n\t/workspace/source/tidb/pkg/server/server.go:737”]
[2024/12/13 17:30:18.326 +08:00] [ERROR] [terror.go:324] [“encountered error”] [error=“write tcp 10.0.0.14:4000->10.0.0.3:44404: write: broken pipe”] [stack=“github.com/pingcap/tidb/pkg/parser/terror.Log\n\t/workspace/source/tidb/pkg/parser/terror/terror.go:324\ngithub.com/pingcap/tidb/pkg/server/internal.(*PacketIO).WritePacket\n\t/workspace/source/tidb/pkg/server/internal/packetio.go:284\ngithub.com/pingcap/tidb/pkg/server.(*clientConn).writePacket\n\t/workspace/source/tidb/pkg/server/conn.go:466\ngithub.com/pingcap/tidb/pkg/server.(*clientConn).writeError\n\t/workspace/source/tidb/pkg/server/conn.go:1515\ngithub.com/pingcap/tidb/pkg/server.(*clientConn).Run\n\t/workspace/source/tidb/pkg/server/conn.go:1141\ngithub.com/pingcap/tidb/pkg/server.(*Server).onConn\n\t/workspace/source/tidb/pkg/server/server.go:737”]
[2024/12/13 17:30:18.326 +08:00] [ERROR] [terror.go:324] [“encountered error”] [error=“connection was bad”] [stack=“github.com/pingcap/tidb/pkg/parser/terror.Log\n\t/workspace/source/tidb/pkg/parser/terror/terror.go:324\ngithub.com/pingcap/tidb/pkg/server.(*clientConn).Run\n\t/workspace/source/tidb/pkg/server/conn.go:1142\ngithub.com/pingcap/tidb/pkg/server.(*Server).onConn\n\t/workspace/source/tidb/pkg/server/server.go:737”]
[2024/12/13 17:30:18.326 +08:00] [ERROR] [pd_service_discovery.go:284] [“[pd] failed to update service mode”] [urls=“[http://10.0.0.14:2379]”] [error=“[PD:client:ErrClientGetClusterInfo]error:rpc error: code = DeadlineExceeded desc = context deadline exceeded target:10.0.0.14:2379 status:TRANSIENT_FAILURE: error:rpc error: code = DeadlineExceeded desc = context deadline exceeded target:10.0.0.14:2379 status:TRANSIENT_FAILURE”]
[2024/12/13 17:30:18.326 +08:00] [ERROR] [terror.go:324] [“encountered error”] [error=“write tcp 10.0.0.14:4000->10.0.0.3:57412: write: broken pipe”] [stack=“github.com/pingcap/tidb/pkg/parser/terror.Log\n\t/workspace/source/tidb/pkg/parser/terror/terror.go:324\ngithub.com/pingcap/tidb/pkg/server/internal.(*PacketIO).WritePacket\n\t/workspace/source/tidb/pkg/server/internal/packetio.go:284\ngithub.com/pingcap/tidb/pkg/server.(*clientConn).writePacket\n\t/workspace/source/tidb/pkg/server/conn.go:466\ngithub.com/pingcap/tidb/pkg/server.(*clientConn).writeError\n\t/workspace/source/tidb/pkg/server/conn.go:1515\ngithub.com/pingcap/tidb/pkg/server.(*clientConn).Run\n\t/workspace/source/tidb/pkg/server/conn.go:1141\ngithub.com/pingcap/tidb/pkg/server.(*Server).onConn\n\t/workspace/source/tidb/pkg/server/server.go:737”]
[2024/12/13 17:30:18.326 +08:00] [ERROR] [terror.go:324] [“encountered error”] [error=“connection was bad”] [stack=“github.com/pingcap/tidb/pkg/parser/terror.Log\n\t/workspace/source/tidb/pkg/parser/terror/terror.go:324\ngithub.com/pingcap/tidb/pkg/server.(*clientConn).Run\n\t/workspace/source/tidb/pkg/server/conn.go:1142\ngithub.com/pingcap/tidb/pkg/server.(*Server).onConn\n\t/workspace/source/tidb/pkg/server/server.go:737”]
[2024/12/13 17:30:18.326 +08:00] [ERROR] [pd_service_discovery.go:284] [“[pd] failed to update service mode”] [urls=“[http://10.0.0.14:2379]”] [error=“[PD:client:ErrClientGetClusterInfo]error:rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 10.0.0.14:2379: i/o timeout" target:10.0.0.14:2379 status:TRANSIENT_FAILURE: error:rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing: dial tcp 10.0.0.14:2379: i/o timeout" target:10.0.0.14:2379 status:TRANSIENT_FAILURE”]
[2024/12/13 17:30:18.326 +08:00] [ERROR] [terror.go:324] [“encountered error”] [error=“write tcp 10.0.0.14:4000->10.0.0.3:42102: write: broken pipe”] [stack=“github.com/pingcap/tidb/pkg/parser/terror.Log\n\t/workspace/source/tidb/pkg/parser/terror/terror.go:324\ngithub.com/pingcap/tidb/pkg/server/internal.(*PacketIO).WritePacket\n\t/workspace/source/tidb/pkg/server/internal/packetio.go:284\ngithub.com/pingcap/tidb/pkg/server.(*clientConn).writePacket\n\t/workspace/source/tidb/pkg/server/conn.go:466\ngithub.com/pingcap/tidb/pkg/server.(*clientConn).writeError\n\t/workspace/source/tidb/pkg/server/conn.go:1515\ngithub.com/pingcap/tidb/pkg/server.(*clientConn).Run\n\t/workspace/source/tidb/pkg/server/conn.go:1141\ngithub.com/pingcap/tidb/pkg/server.(*Server).onConn\n\t/workspace/source/tidb/pkg/server/server.go:737”]
[2024/12/13 17:30:18.326 +08:00] [ERROR] [terror.go:324] [“encountered error”] [error=“connection was bad”] [stack=“github.com/pingcap/tidb/pkg/parser/terror.Log\n\t/workspace/source/tidb/pkg/parser/terror/terror.go:324\ngithub.com/pingcap/tidb/pkg/server.(*clientConn).Run\n\t/workspace/source/tidb/pkg/server/conn.go:1142\ngithub.com/pingcap/tidb/pkg/server.(*Server).onConn\n\t/workspace/source/tidb/pkg/server/server.go:737”]
[2024/12/13 17:30:18.415 +08:00] [ERROR] [runaway.go:153] [“try to get done runaway watch”] [error=“[tikv:9001]PD server timeout: start timestamp may fall behind safe point”]
[2024/12/13 17:30:18.415 +08:00] [ERROR] [manager.go:430] [“task manager met error”] [error=“[tikv:9001]PD server timeout: start timestamp may fall behind safe point”] [stack=“github.com/pingcap/tidb/pkg/disttask/framework/scheduler.(*Manager).logErr\n\t/workspace/source/tidb/pkg/disttask/framework/scheduler/manager.go:430\ngithub.com/pingcap/tidb/pkg/disttask/framework/scheduler.(*Manager).fetchAndFastCancelTasksLoop\n\t/workspace/source/tidb/pkg/disttask/framework/scheduler/manager.go:178\ngithub.com/pingcap/tidb/pkg/util.(*WaitGroupWrapper).Run.func1\n\t/workspace/source/tidb/pkg/util/wait_group_wrapper.go:156”]
[2024/12/13 17:30:18.497 +08:00] [ERROR] [pd_service_discovery.go:257] [“[pd] failed to update member”] [urls=“[http://10.0.0.14:2379]”] [error=“[PD:client:ErrClientGetMember]get member failed”]
[2024/12/13 17:30:18.898 +08:00] [ERROR] [pd_service_discovery.go:257] [“[pd] failed to update member”] [urls=“[http://10.0.0.14:2379]”] [error=“[PD:client:ErrClientGetMember]get member failed”]
[2024/12/13 17:30:19.296 +08:00] [ERROR] [domain.go:1872] [“update bindinfo failed”] [error=“[tikv:9001]PD server timeout: start timestamp may fall behind safe point”]
[2024/12/13 17:30:19.296 +08:00] [ERROR] [runtime.go:236] [“error occurs when fullRefreshTimers”] [groupID=ttl] [error=“[tikv:9001]PD server timeout: start timestamp may fall behind safe point”]
[2024/12/13 17:30:19.297 +08:00] [ERROR] [manager.go:430] [“task manager met error”] [error=“[tikv:9001]PD server timeout: start timestamp may fall behind safe point”] [stack=“github.com/pingcap/tidb/pkg/disttask/framework/scheduler.(*Manager).logErr\n\t/workspace/source/tidb/pkg/disttask/framework/scheduler/manager.go:430\ngithub.com/pingcap/tidb/pkg/disttask/framework/scheduler.(*Manager).fetchAndHandleRunnableTasksLoop\n\t/workspace/source/tidb/pkg/disttask/framework/scheduler/manager.go:156\ngithub.com/pingcap/tidb/pkg/util.(*WaitGroupWrapper).Run.func1\n\t/workspace/source/tidb/pkg/util/wait_group_wrapper.go:156”]
[2024/12/13 17:30:19.297 +08:00] [ERROR] [gc_worker.go:774] [“delete range returns an error”] [category=“gc worker”] [uuid=64e80f5d494000e] [error=“[tikv:9001]PD server timeout: start timestamp may fall behind safe point”]
[2024/12/13 17:30:19.298 +08:00] [ERROR] [domain.go:897] [“reload schema in loop failed”] [error=“[tikv:9001]PD server timeout: start timestamp may fall behind safe point”]
[2024/12/13 17:30:19.299 +08:00] [ERROR] [manager.go:430] [“task manager met error”] [error=“[tikv:9001]PD server timeout: start timestamp may fall behind safe point”] [stack=“github.com/pingcap/tidb/pkg/disttask/framework/scheduler.(*Manager).logErr\n\t/workspace/source/tidb/pkg/disttask/framework/scheduler/manager.go:430\ngithub.com/pingcap/tidb/pkg/disttask/framework/scheduler.(*Manager).fetchAndFastCancelTasksLoop\n\t/workspace/source/tidb/pkg/disttask/framework/scheduler/manager.go:178\ngithub.com/pingcap/tidb/pkg/util.(*WaitGroupWrapper).Run.func1\n\t/workspace/source/tidb/pkg/util/wait_group_wrapper.go:156”]
[2024/12/13 17:30:19.299 +08:00] [ERROR] [gc_worker.go:226] [runGCJob] [category=“gc worker”] [error=“[tikv:9001]PD server timeout: start timestamp may fall behind safe point”]
[2024/12/13 17:30:19.301 +08:00] [ERROR] [runaway.go:145] [“try to get new runaway watch”] [error=“[tikv:9001]PD server timeout: start timestamp may fall behind safe point”]
[2024/12/13 17:30:19.302 +08:00] [ERROR] [runtime.go:236] [“error occurs when fullRefreshTimers”] [groupID=ttl] [error=“context canceled”]
[2024/12/13 17:30:19.331 +08:00] [ERROR] [advancer.go:400] [“listen task meet error, would reopen.”] [error=“etcdserver: mvcc: required revision has been compacted”]
[2024/12/13 17:30:19.336 +08:00] [ERROR] [domain.go:1745] [“LoadSysVarCacheLoop loop watch channel closed”]
[2024/12/13 17:30:19.369 +08:00] [ERROR] [domain.go:1682] [“load privilege loop watch channel closed”]
[2024/12/13 17:30:19.459 +08:00] [ERROR] [info.go:897] [“update minStartTS failed”] [error=“etcdserver: requested lease not found”]
[2024/12/13 17:30:19.473 +08:00] [ERROR] [domain.go:753] [“refresh topology in loop failed”] [error=“etcdserver: requested lease not found”]
[2024/12/13 17:30:19.624 +08:00] [ERROR] [info.go:897] [“update minStartTS failed”] [error=“etcdserver: requested lease not found”]
[2024/12/13 17:30:19.636 +08:00] [ERROR] [domain.go:753] [“refresh topology in loop failed”] [error=“etcdserver: requested lease not found”]
[2024/12/13 17:30:19.698 +08:00] [ERROR] [pd_service_discovery.go:257] [“[pd] failed to update member”] [urls=“[http://10.0.0.14:2379]”] [error=“[PD:client:ErrClientGetMember]get member failed”]

为0一般是节点down掉了,大概率是OOM了
https://docs.pingcap.com/zh/tidb/stable/troubleshoot-tidb-oom#tidb-oom-故障排查

1 个赞

估计重启了哈

1 个赞

应该是重启了 ,你去grafana的 tidb页面看看uptime监控

没有重启,dashboard和日志都看了,没重启

没有重启。 仪表盘和日志都看了。

1、检查tidb日志里是否有Welcome关键字;
2、检查一下异常tidb服务器的网络是否异常

可以看下网络是不是有问题,在garafana上tidb-cluster-node_exporter页面网络监控部分

上面就是整个故障时间日志。没有 Welcome字样。所以判断没重启。网络的话 ,天翼云。pd和tidb-server在同节点。也暂时没发现问题。


故障时间段左右的监控,也没看到问题。

[error=“rpc error: code = Unavailable desc = connection error: desc = “transport: Error while dialing: dial tcp 10.0.0.14:2379: i/o timeout””]

看日志报错,有pd 节点无法访问,确认节点 10.0.0.14:2379 是否正常

PD 节点访问异常,PD集群有问题,导致访问集群的请求失败了

TiDB的压力是不是很大,可以看下TiDB CPU是不是快打满了。


大佬,我看故障时间段,pd 的资源使用也是正常。还有什么情况会导致pd访问异常呢。我现在怀疑是bug。我之前还发了一个帖子。就是我在 dashboard中查看某个慢sql的详情时 ,tidb-server就会重启。我数据库一个人访问都没有。就我自己用。我就查个 慢 sql 详情。32G内存的机器。我只能相信这是 bug。

pd挂了把

[2024/12/13 17:30:18.293 +08:00] [ERROR] [region_cache.go:2676] [“loadStore from PD failed”] [id=1] [error=“rpc error: code = Unavailable desc = connection error: desc = “transport: Error while dialing: dial tcp 10.0.0.14:2379: i/o timeout””]

[2024/12/13 17:30:18.317 +08:00] [ERROR] [terror.go:324] [“encountered error”] [error=“write tcp 10.0.0.14:4000->10.0.0.3:41790: write: broken pipe”]

loadStore from PD failed
get TSO timeout, PD

检查日志看问题时段网络是否正常,PD节点是否正常

首先,v7.5.4版本也有不少生产使用,从直观上判断不大可能是bug。
然后,需要具体问题具体分析,当前你这个问题,tidb-server节点报错明确了无法访问pd节点,说明问题就出现在这。

主要还是确认为什么无法访问pd:

  • 你的集群的拓扑是什么样的的?
  • 确认其他tidb或tikv节点在故障时段,是否有类似的报错。去查看对应的日志。
  • 确认故障时段,是否有pd leader切换。grafana的pd面板可以查看过去一段时间的leader是哪个。
  • 确认tidb-server 到pd 节点之间的网络是否正常,可以通过grafana的node_exporter或其他图表查看。

但是看 pd 的监控数据。 uptime没有变化。 :sob:

那是肯定有问题了啊

大佬。我是 两台服务器。pd和tidb-server一台,kv一台。kv有三个,挂载了三个磁盘。pd和tidb-server都只有一个。pd这个机器32G内存。为tidb-server分配了 20G 。故障时间是 13 号的 17点30 左右(就是帖子上日志的那个时间)。dashboard显示tidb-server启动时间为12月 7 号 (如下图)。

grafana显示如下:

dmesg中未显示oom。如下图:

所以说当前架构下,不存在leader切换的情况。至于说网络,同一个服务器的情况下,我这无法排查问题(能力有限 :smiling_face_with_tear:) ,还望大佬指点方向。