tidb-server集群夜里低峰期夯筑

tidb集群V4.0.8:
集群中有3个tidb-server节点,其中的一个节点夯筑,就是登陆后无法任何操作,操作命令是阻塞状态的:message日志:


tidb日志:

2021/06/18 02:50:44.867 +08:00] [WARN] [syncer.go:150] [“[ddl] etcd-cli put kv failed”] [key=/topology/tidb/192.168.21.184:4000/ttl] [value=1623955844730860856] [error=“etcdserver: requested lease not found”] [retryCnt=0]
[2021/06/18 02:50:45.034 +08:00] [WARN] [syncer.go:150] [“[ddl] etcd-cli put kv failed”] [key=/topology/tidb/192.168.21.184:4000/ttl] [value=1623955844730860856] [error=“etcdserver: requested lease not found”] [retryCnt=1]
[2021/06/18 02:50:45.333 +08:00] [WARN] [syncer.go:150] [“[ddl] etcd-cli put kv failed”] [key=/topology/tidb/192.168.21.184:4000/ttl] [value=1623955844730860856] [error=“etcdserver: requested lease not found”] [retryCnt=2]
[2021/06/18 02:50:45.549 +08:00] [WARN] [syncer.go:150] [“[ddl] etcd-cli put kv failed”] [key=/topology/tidb/192.168.21.184:4000/ttl] [value=1623955844730860856] [error=“etcdserver: requested lease not found”] [retryCnt=3]
[2021/06/18 02:50:45.806 +08:00] [WARN] [syncer.go:150] [“[ddl] etcd-cli put kv failed”] [key=/topology/tidb/192.168.21.184:4000/ttl] [value=1623955844730860856] [error=“etcdserver: requested lease not found”] [retryCnt=4]
[2021/06/18 02:50:45.914 +08:00] [ERROR] [domain.go:482] [“refresh topology in loop failed”] [error=“etcdserver: requested lease not found”]
[2021/06/18 02:50:45.949 +08:00] [INFO] [domain.go:485] [“server topology syncer need to restart”]
[2021/06/18 02:50:48.436 +08:00] [INFO] [domain.go:489] [“server topology syncer restarted”]
[2021/06/18 02:50:52.288 +08:00] [INFO] [client_batch.go:347] [“batchRecvLoop fails when receiving, needs to reconnect”] [target=192.168.21.129:20171] [error=“rpc error: code = Unavailable desc = transport is closing”]
[2021/06/18 02:50:52.345 +08:00] [INFO] [client_batch.go:313] [“batchRecvLoop re-create streaming success”] [target=192.168.21.129:20171]
[2021/06/18 02:50:52.354 +08:00] [INFO] [client_batch.go:347] [“batchRecvLoop fails when receiving, needs to reconnect”] [target=192.168.21.189:20171] [error=“rpc error: code = Unavailable desc = transport is closing”]
[2021/06/18 02:50:52.366 +08:00] [INFO] [client_batch.go:313] [“batchRecvLoop re-create streaming success”] [target=192.168.21.189:20171]
[2021/06/18 02:51:16.914 +08:00] [INFO] [client_batch.go:347] [“batchRecvLoop fails when receiving, needs to reconnect”] [target=192.168.21.189:20171] [error=“rpc error: code = Unavailable desc = transport is closing”]
[2021/06/18 02:51:16.914 +08:00] [INFO] [client_batch.go:347] [“batchRecvLoop fails when receiving, needs to reconnect”] [target=192.168.104.9:20172] [error=“rpc error: code = Unavailable desc = transport is closing”]
[2021/06/18 02:51:16.914 +08:00] [INFO] [client_batch.go:347] [“batchRecvLoop fails when receiving, needs to reconnect”] [target=192.168.21.179:20171] [error=“rpc error: code = Unavailable desc = transport is closing”]
[2021/06/18 02:51:17.258 +08:00] [INFO] [client_batch.go:347] [“batchRecvLoop fails when receiving, needs to reconnect”] [target=192.168.21.179:20171] [error=“rpc error: code = Unavailable desc = transport is closing”]
[2021/06/18 02:51:17.258 +08:00] [INFO] [client_batch.go:347] [“batchRecvLoop fails when receiving, needs to reconnect”] [target=192.168.21.105:20172] [error=“rpc error: code = Unavailable desc = transport is closing”]
[2021/06/18 02:51:17.258 +08:00] [INFO] [client_batch.go:347] [“batchRecvLoop fails when receiving, needs to reconnect”] [target=192.168.21.182:20171] [error=“rpc error: code = Unavailable desc = transport is closing”]
[2021/06/18 02:51:17.587 +08:00] [INFO] [client_batch.go:313] [“batchRecvLoop re-create streaming success”] [target=192.168.21.189:20171]
[2021/06/18 02:51:17.587 +08:00] [INFO] [client_batch.go:313] [“batchRecvLoop re-create streaming success”] [target=192.168.21.179:20171]
[2021/06/18 02:51:17.655 +08:00] [INFO] [client_batch.go:313] [“batchRecvLoop re-create streaming success”] [target=192.168.104.9:20172]
[2021/06/18 02:51:17.734 +08:00] [INFO] [client_batch.go:313] [“batchRecvLoop re-create streaming success”] [target=192.168.21.179:20171]
[2021/06/18 02:51:17.734 +08:00] [INFO] [client_batch.go:313] [“batchRecvLoop re-create streaming success”] [target=192.168.21.182:20171]
[2021/06/18 02:51:17.791 +08:00] [INFO] [client_batch.go:313] [“batchRecvLoop re-create streaming success”] [target=192.168.21.105:20172]
[2021/06/18 02:50:52.345 +08:00] [INFO] [client_batch.go:313] [“batchRecvLoop re-create streaming success”] [target=192.168.21.129:20171]
[2021/06/18 02:50:52.354 +08:00] [INFO] [client_batch.go:347] [“batchRecvLoop fails when receiving, needs to reconnect”] [target=192.168.21.189:20171] [error=“rpc error: code = Unavailable desc = transport is closing”]
[2021/06/18 02:50:52.366 +08:00] [INFO] [client_batch.go:313] [“batchRecvLoop re-create streaming success”] [target=192.168.21.189:20171]
[2021/06/18 02:51:16.914 +08:00] [INFO] [client_batch.go:347] [“batchRecvLoop fails when receiving, needs to reconnect”] [target=192.168.21.189:20171] [error=“rpc error: code = Unavailable desc = transport is closing”]
[2021/06/18 02:51:16.914 +08:00] [INFO] [client_batch.go:347] [“batchRecvLoop fails when receiving, needs to reconnect”] [target=192.168.104.9:20172] [error=“rpc error: code = Unavailable desc = transport is closing”]
[2021/06/18 02:51:16.914 +08:00] [INFO] [client_batch.go:347] [“batchRecvLoop fails when receiving, needs to reconnect”] [target=192.168.21.179:20171] [error=“rpc error: code = Unavailable desc = transport is closing”]
[2021/06/18 02:51:17.258 +08:00] [INFO] [client_batch.go:347] [“batchRecvLoop fails when receiving, needs to reconnect”] [target=192.168.21.179:20171] [error=“rpc error: code = Unavailable desc = transport is closing”]
[2021/06/18 02:51:17.258 +08:00] [INFO] [client_batch.go:347] [“batchRecvLoop fails when receiving, needs to reconnect”] [target=192.168.21.105:20172] [error=“rpc error: code = Unavailable desc = transport is closing”]
[2021/06/18 02:51:17.258 +08:00] [INFO] [client_batch.go:347] [“batchRecvLoop fails when receiving, needs to reconnect”] [target=192.168.21.182:20171] [error=“rpc error: code = Unavailable desc = transport is closing”]
[2021/06/18 02:51:17.587 +08:00] [INFO] [client_batch.go:313] [“batchRecvLoop re-create streaming success”] [target=192.168.21.189:20171]
[2021/06/18 02:51:17.587 +08:00] [INFO] [client_batch.go:313] [“batchRecvLoop re-create streaming success”] [target=192.168.21.179:20171]
[2021/06/18 02:51:17.655 +08:00] [INFO] [client_batch.go:313] [“batchRecvLoop re-create streaming success”] [target=192.168.104.9:20172]
[2021/06/18 02:51:17.734 +08:00] [INFO] [client_batch.go:313] [“batchRecvLoop re-create streaming success”] [target=192.168.21.179:20171]
[2021/06/18 02:51:17.734 +08:00] [INFO] [client_batch.go:313] [“batchRecvLoop re-create streaming success”] [target=192.168.21.182:20171]
[2021/06/18 02:51:17.791 +08:00] [INFO] [client_batch.go:313] [“batchRecvLoop re-create streaming success”] [target=192.168.21.105:20172]
[2021/06/18 02:53:18.599 +08:00] [WARN] [conn.go:508] [“parse attrs failed”] [conn=1807283] [error=EOF]
[2021/06/18 06:03:19.460 +08:00] [INFO] [signal_posix.go:53] [“got signal to exit”] [signal=terminated]
[2021/06/18 06:03:19.463 +08:00] [INFO] [server.go:585] [“[server] graceful shutdown.”]
[2021/06/18 06:03:19.463 +08:00] [ERROR] [http_status.go:354] [“start status/rpc server error”] [error=“accept tcp [::]:10080: use of closed network connection”]
[2021/06/18 06:03:19.463 +08:00] [ERROR] [http_status.go:349] [“http server error”] [error=“http: Server closed”]
[2021/06/18 06:03:19.465 +08:00] [ERROR] [http_status.go:344] [“grpc server error”] []
[2021/06/18 06:03:19.465 +08:00] [INFO] [server.go:598] [“graceful shutdown…”] [“conn count”=74]
[2021/06/18 06:03:34.462 +08:00] [INFO] [server.go:552] [“[server] kill all connections.”]
[2021/06/18 06:03:34.462 +08:00] [WARN] [txn.go:514] [“wait tso failed”] [error=“context canceled”]
[2021/06/18 06:03:34.462 +08:00] [WARN] [txn.go:514] [“wait tso failed”] [error=“context canceled”]
[2021/06/18 06:03:34.462 +08:00] [WARN] [txn.go:514] [“wait tso failed”] [error=“context canceled”]
[2021/06/18 06:03:34.462 +08:00] [WARN] [txn.go:514] [“wait tso failed”] [error=“context canceled”]
[2021/06/18 06:03:34.462 +08:00] [WARN] [txn.go:514] [“wait tso failed”] [error=“context canceled”]
[2021/06/18 06:03:34.462 +08:00] [WARN] [txn.go:514] [“wait tso failed”] [error=“context canceled”]
[2021/06/18 06:03:34.462 +08:00] [WARN] [txn.go:514] [“wait tso failed”] [error=“context canceled”]
[2021/06/18 06:03:34.462 +08:00] [WARN] [txn.go:514] [“wait tso failed”] [error=“context canceled”]
[2021/06/18 06:03:34.462 +08:00] [WARN] [txn.go:514] [“wait tso failed”] [error=“context canceled”]
[2021/06/18 06:03:34.462 +08:00] [WARN] [txn.go:514] [“wait tso failed”] [error=“context canceled”]
[2021/06/18 06:03:34.462 +08:00] [WARN] [txn.go:514] [“wait tso failed”] [error=“context canceled”]
[2021/06/18 06:03:34.462 +08:00] [WARN] [txn.go:514] [“wait tso failed”] [error=“context canceled”]
[2021/06/18 06:03:34.462 +08:00] [WARN] [txn.go:514] [“wait tso failed”] [error=“context canceled”]
[2021/06/18 06:03:34.462 +08:00] [WARN] [txn.go:514] [“wait tso failed”] [error=“context canceled”]
[2021/06/18 06:03:34.463 +08:00] [WARN] [txn.go:514] [“wait tso failed”] [error=“context canceled”]
[2021/06/18 06:03:34.463 +08:00] [WARN] [txn.go:514] [“wait tso failed”] [error=“context canceled”]
[2021/06/18 06:03:34.463 +08:00] [WARN] [txn.go:514] [“wait tso failed”] [error=“context canceled”]
[2021/06/18 06:03:34.463 +08:00] [WARN] [txn.go:514] [“wait tso failed”] [error=“context canceled”]
[2021/06/18 06:03:34.463 +08:00] [WARN] [txn.go:514] [“wait tso failed”] [error=“context canceled”]
[2021/06/18 06:03:34.463 +08:00] [WARN] [txn.go:514] [“wait tso failed”] [error=“context canceled”]
[2021/06/18 06:03:34.463 +08:00] [WARN] [txn.go:514] [“wait tso failed”] [error=“context canceled”]
[2021/06/18 06:03:34.463 +08:00] [WARN] [txn.go:514] [“wait tso failed”] [error=“context canceled”]
[2021/06/18 06:03:34.471 +08:00] [INFO] [manager.go:276] [“failed to campaign”] [“owner info”=“[ddl] /tidb/ddl/fg/owner ownerManager 2d007e19-140b-40fc-88e4-e6a8dc2b726f”] [error=“context canceled”]
[2021/06/18 06:03:34.471 +08:00] [INFO] [manager.go:248] [“etcd session is done, creates a new one”] [“owner info”=“[ddl] /tidb/ddl/fg/owner ownerManager 2d007e19-140b-40fc-88e4-e6a8dc2b726f”]
[2021/06/18 06:03:34.472 +08:00] [INFO] [manager.go:252] [“break campaign loop, NewSession failed”] [“owner info”=“[ddl] /tidb/ddl/fg/owner ownerManager 2d007e19-140b-40fc-88e4-e6a8dc2b726f”] [error=“context canceled”]
[2021/06/18 06:03:34.476 +08:00] [INFO] [manager.go:301] [“revoke session”] [“owner info”=“[ddl] /tidb/ddl/fg/owner ownerManager 2d007e19-140b-40fc-88e4-e6a8dc2b726f”] [error=“etcdserver: requested lease not found”]

1 Like

看系统日志感觉是机器的问题,能否从这方面查查。

已经定位到了,是机器上的某个服务引起的系统负载异常20分钟左右,导致tidb-server问题,应该是元数据版本缓存不一致引起的。

:ok_hand: 好的

此话题已在最后回复的 1 分钟后被自动关闭。不再允许新回复。