数据库忽然崩溃,暂时没查出原因。

【 TiDB 使用环境】生产环境 /测试/ Poc
【 TiDB 版本】
【复现路径】做过哪些操作出现的问题
【遇到的问题:问题现象及影响】
【资源配置】进入到 TiDB Dashboard -集群信息 (Cluster Info) -主机(Hosts) 截图此页面
【附件:截图/日志/监控】

  1. 8点40开始延时增高,QPS降低。经过分析当时并没有慢查询,排除该原因导致的故障。

2、经过排查发现在有大量的seek_tombstone。说明其中有tikv节点挂掉。在处理节点异常的过程中导致整个集群不可用。

3、 监控发现是10.204.168.71:20184这台机器有大量的task在修复。初步判断是因为TiDB内部机制导致的异常。通过检查日志,监控其他指标,并未发现引起TiKV故障的根本原因,已经将现象发送到TiDB开源社区寻求根因确认。

分析方法:
系统层:
1、系统层日志观察下【/var/log/message看看有没有实例oom】
2、系统层负载情况【cpu,网络延迟等】
数据库层:
1、检查tidb,pd, tikv的错误日志是否有明显报错
2、是否有占用资源多的SQL

现在还用的V4版本,感觉有点老了,可以升级试试

看下tikv的日志,在看下那段时间的tikv资源使用情况,以及当时执行的sql情况

[2024/09/28 08:39:01.025 +08:00] [WARN] [endpoint.rs:535] [error-response] [err=“Key is locked (will clean up) primary_lock: 7480000000000000EE5F698000000000000001010000000000000000F703800000030EA6D558 lock_version: 452849550219280740 k
ey: 7480000000000000EE5F698000000000000001014643303231323534FF3330323032343039FF3238303833383535FF3733353232354D00FE03800000030EA6D558 lock_ttl: 3000 txn_size: 1”]
[2024/09/28 08:39:01.035 +08:00] [WARN] [endpoint.rs:535] [error-response] [err=“Key is locked (will clean up) primary_lock: 7480000000000000E25F6980000000000000010419B47889C00000000380000004497F004F lock_version: 452849550219280708 key
: 7480000000000000E25F698000000000000003014643303231393037FF3632303234303932FF3830383339303039FF37303532354D0000FD0380000004497F004F lock_ttl: 3000 txn_size: 1”]
[2024/09/28 08:39:01.056 +08:00] [WARN] [endpoint.rs:535] [error-response] [err=“Key is locked (will clean up) primary_lock: 7480000000000000E25F6980000000000000010419B47889C1000000038000000449812DFC lock_version: 452849550219281177 key
: 7480000000000000E25F698000000000000003014643303130313232FF3636323032343039FF3238303833393030FF3938373239344D00FE038000000449812DFC lock_ttl: 3000 txn_size: 1”]
[2024/09/28 08:39:01.612 +08:00] [WARN] [endpoint.rs:535] [error-response] [err=“Key is locked (will clean up) primary_lock: 7480000000000000E25F6980000000000000010419B47889C10000000380000004497DC53A lock_version: 452849550363460437 key
: 7480000000000000E25F698000000000000003014643373331353235FF3632303234303932FF3830383339303135FF32383331374D0000FD0380000004497DC53A lock_ttl: 3000 txn_size: 1”]
[2024/09/28 08:39:01.615 +08:00] [WARN] [endpoint.rs:535] [error-response] [err=“Key is locked (will clean up) primary_lock: 7480000000000000E25F6980000000000000010419B47889C10000000380000004497CDBE2 lock_version: 452849550376566991 key
: 7480000000000000E25F698000000000000003014643343131303131FF3632303234303932FF3830383339303135FF35303831364D0000FD0380000004497CDBE2 lock_ttl: 3000 txn_size: 1”]
[2024/09/28 08:39:01.942 +08:00] [WARN] [endpoint.rs:535] [error-response] [err=“Key is locked (will clean up) primary_lock: 7480000000000000E25F6980000000000000010419B47889C100000003800000044980A9D8 lock_version: 452849550455210503 key
: 7480000000000000E25F698000000000000003014643303130313231FF3436323032343039FF3238303833393031FF3837353836344D00FE03800000044980A9D8 lock_ttl: 3000 txn_size: 1”]
[2024/09/28 08:39:01.948 +08:00] [WARN] [endpoint.rs:535] [error-response] [err=“Key is locked (will clean up) primary_lock: 7480000000000000EE5F698000000000000001010000000000000000F703800000030EA89888 lock_version: 452849550468579910 k
ey: 7480000000000000EE5F698000000000000001014643373535333335FF3132303234303932FF3830383338353631FF35303030354D0000FD03800000030EA89888 lock_ttl: 3000 txn_size: 1”]
[2024/09/28 08:39:03.050 +08:00] [WARN] [endpoint.rs:535] [error-response] [err=“Key is locked (will clean up) primary_lock: 7480000000000000EE5F698000000000000001010000000000000000F703800000030EA6525B lock_version: 452849550756676199 k
ey: 7480000000000000EE5F698000000000000001014643373535313234FF3939323032343039FF3238303833383532FF3239303237344D00FE03800000030EA6525B lock_ttl: 3000 txn_size: 1”]
[2024/09/28 08:39:03.104 +08:00] [WARN] [endpoint.rs:535] [error-response] [err=“Key is locked (will clean up) primary_lock: 7480000000000000E25F6980000000000000010419B47889C30000000380000004497CDC07 lock_version: 452849550769782797 key
: 7480000000000000E25F698000000000000003014643373639373136FF3432303234303932FF3830383339303330FF35363335374D0000FD0380000004497CDC07 lock_ttl: 3000 txn_size: 1”]
[2024/09/28 08:39:04.377 +08:00] [INFO] [raft.rs:1003] [“received a message with higher term from 24975695”] [“msg type”=MsgRequestVote] [message_term=2172] [term=2171] [from=24975695] [raft_id=25605993] [region_id=44317]
[2024/09/28 08:39:04.377 +08:00] [INFO] [raft.rs:783] [“became follower at term 2172”] [term=2172] [raft_id=25605993] [region_id=44317]
[2024/09/28 08:39:04.377 +08:00] [INFO] [raft.rs:1192] [“[logterm: 2171, index: 65724339, vote: 0] cast vote for 24975695 [logterm: 2171, index: 65724339] at term 2172”] [“msg type”=MsgRequestVote] [term=2172] [msg_index=65724339] [msg_
term=2171] [from=24975695] [vote=0] [log_index=65724339] [log_term=2171] [raft_id=25605993] [region_id=44317]
[2024/09/28 08:39:07.606 +08:00] [WARN] [endpoint.rs:535] [error-response] [err=“Key is locked (will clean up) primary_lock: 7480000000000000E25F6980000000000000010419B47889C7000000038000000449812EE7 lock_version: 452849551910109433 key
: 7480000000000000E25F698000000000000003014643303130313937FF3032303234303932FF3830383339303734FF31303930384D0000FD038000000449812EE7 lock_ttl: 3000 txn_size: 1”]
[2024/09/28 08:39:07.614 +08:00] [WARN] [endpoint.rs:535] [error-response] [err=“Key is locked (will clean up) primary_lock: 7480000000000000E25F6980000000000000010419B47889C70000000380000004497FCB67 lock_version: 452849551923216654 key
: 7480000000000000E25F698000000000000003014643333136313039FF3732303234303932FF3830383339303734FF36323631374D0000FD0380000004497FCB67 lock_ttl: 3000 txn_size: 1”]
[2024/09/28 08:39:07.983 +08:00] [WARN] [endpoint.rs:535] [error-response] [err=“Key is locked (will clean up) primary_lock: 7480000000000000E25F6980000000000000010419B47889C70000000380000004497DC608 lock_version: 452849552054288391 key
: 7480000000000000E25F698000000000000003014643303231343434FF3536323032343039FF3238303833393037FF3935323931334D00FE0380000004497DC608 lock_ttl: 3000 txn_size: 1”]
[2024/09/28 08:39:08.011 +08:00] [WARN] [endpoint.rs:535] [error-response] [err=“Key is locked (will clean up) primary_lock: 7480000000000000E25F6980000000000000010419B47889C800000003800000044980AA24 lock_version: 452849552054289250 key
: 7480000000000000E25F698000000000000003014643303235373730FF3532303234303932FF3830383339303739FF39333430374D0000FD03800000044980AA24 lock_ttl: 3000 txn_size: 1”]
[2024/09/28 08:39:08.104 +08:00] [WARN] [endpoint.rs:535] [error-response] [err=“Key is locked (will clean up) primary_lock: 7480000000000000E25F6980000000000000010419B47889C80000000380000004497E8ABA lock_version: 452849552080503111 key
: 7480000000000000E25F698000000000000003014643303130313439FF3834323032343039FF3238303833393038FF3036343430354D00FE0380000004497E8ABA lock_ttl: 3000 txn_size: 1”]
[2024/09/28 08:39:08.757 +08:00] [WARN] [endpoint.rs:535] [error-response] [err=“Key is locked (will clean up) primary_lock: 7480000000000000E25F6980000000000000010419B47889C800000003800000044980AA34 lock_version: 452849552250896642 key
: 7480000000000000E25F698000000000000003014643303232343533FF3332303234303932FF3830383339303837FF32323936324D0000FD03800000044980AA34 lock_ttl: 3000 txn_size: 1”]
[2024/09/28 08:39:10.073 +08:00] [WARN] [endpoint.rs:535] [error-response] [err=“Key is locked (will clean up) primary_lock: 7480000000000000E25F6980000000000000010419B47889CA00000003800000044980AA4B lock_version: 452849552591684686 key
: 7480000000000000E25F698000000000000003014643333131313633FF3132303234303932FF3830383339313030FF33373530374D0000FD03800000044980AA4B lock_ttl: 3000 txn_size: 1”]
[2024/09/28 08:39:11.659 +08:00] [WARN] [endpoint.rs:535] [error-response] [err=“Key is locked (will clean up) primary_lock: 7480000000000000E25F6980000000000000010419B47889CB0000000380000004497CDCE1 lock_version: 452849552998007504 key
: 7480000000000000E25F698000000000000003014643343331313639FF3132303234303932FF3830383339313136FF30303534354D0000FD0380000004497CDCE1 lock_ttl: 3000 txn_size: 1”]
[2024/09/28 08:39:12.211 +08:00] [WARN] [endpoint.rs:535] [error-response] [err=“Key is locked (will clean up) primary_lock: 7480000000000000E25F6980000000000000010419B47889CC0000000380000004497E4471 lock_version: 452849553155293223 key
: 7480000000000000E25F698000000000000003014643303230373632FF3732303234303932FF3830383339313231FF34383437314D0000FD0380000004497E4471 lock_ttl: 3000 txn_size: 1”]
[2024/09/28 08:39:12.485 +08:00] [WARN] [endpoint.rs:535] [error-response] [err=“Key is locked (will clean up) primary_lock: 7480000000000000EE5F698000000000000001010000000000000000F703800000030EA93EED lock_version: 452849553233936566 k
ey: 7480000000000000EE5F698000000000000001014643303235393035FF3232303234303932FF3830383339303433FF36393838384D0000FD03800000030EA93EED lock_ttl: 3000 txn_size: 1”]
[2024/09/28 08:39:13.160 +08:00] [WARN] [endpoint.rs:535] [error-response] [err=“Key is locked (will clean up) primary_lock: 7480000000000000E25F6980000000000000010419B47889CD0000000380000004497E4481 lock_version: 452849553404330873 key
: 7480000000000000E25F698000000000000003014643303130323334FF3334323032343039FF3238303833393133FF3133323438304D00FE0380000004497E4481 lock_ttl: 3000 txn_size: 1”]
[2024/09/28 08:39:13.444 +08:00] [WARN] [endpoint.rs:535] [error-response] [err=“Key is locked (will clean up) primary_lock: 7480000000000000E25F6980000000000000010419B47889CD00000003800000044980AA9C lock_version: 452849553469866700 key
: 7480000000000000E25F698000000000000003014643303130323137FF3933323032343039FF3238303833393133FF3338373536384D00FE03800000044980AA9C lock_ttl: 3000 txn_size: 1”]
[2024/09/28 08:39:13.446 +08:00] [WARN] [endpoint.rs:535] [error-response] [err=“Key is locked (will clean up) primary_lock: 7480000000000000E25F6980000000000000010419B47889CD000000038000000449812F90 lock_version: 452849553469866838 key
: 7480000000000000E25F698000000000000003014643303237373538FF3232303234303932FF3830383339313333FF39333030304D0000FD038000000449812F90 lock_ttl: 3000 txn_size: 1”]
[2024/09/28 08:39:14.295 +08:00] [WARN] [endpoint.rs:535] [error-response] [err=“Key is locked (will clean up) primary_lock: 7480000000000000E25F6980000000000000010419B47889CE000000038000000449819842 lock_version: 452849553692689847 key
: 7480000000000000E25F698000000000000003014643373639333936FF3632303234303932FF3830383339313432FF34363733364D0000FD038000000449819842 lock_ttl: 3000 txn_size: 1”]
[2024/09/28 08:39:14.497 +08:00] [WARN] [endpoint.rs:535] [error-response] [err=“Key is locked (will clean up) primary_lock: 7480000000000000E25F6980000000000000010419B47889CE0000000380000004497E44A6 lock_version: 452849553758225057 key
: 7480000000000000E25F698000000000000003014643303231333838FF3436323032343039FF3238303833393134FF3437363532354D00FE0380000004497E44A6 lock_ttl: 3000 txn_size: 1”]
[2024/09/28 08:39:14.807 +08:00] [WARN] [endpoint.rs:535] [error-response] [err=“Key is locked (will clean up) primary_lock: 7480000000000000E25F6980000000000000010419B47889CE0000000380000004497CDD3F lock_version: 452849553836868475 key
: 7480000000000000E25F698000000000000003014643333731363638FF3832303234303932FF3830383339313437FF38373035354D0000FD0380000004497CDD3F lock_ttl: 3000 txn_size: 1”]
[2024/09/28 08:39:15.334 +08:00] [WARN] [endpoint.rs:535] [error-response] [err=“Key is locked (will clean up) primary_lock: 7480000000000000EE5F698000000000000001010000000000000000F703800000030EA79D5C lock_version: 452849553981047191 k
ey: 7480000000000000EE5F698000000000000001014643303231333933FF3034323032343039FF3238303833393130FF3037323432384D00FE03800000030EA79D5C lock_ttl: 3000 txn_size: 1”]

这版本有点低了。建议升级到新版本

1 个赞

https://cn.pingcap.com/tidb-release-support-policy/

考虑升级吧,4.0已经EOL了。

看看那段时间是不是有大量业务查询或者上游数据有海量数据写入

这个问题现在解决了吗?解决的话,分享下如何解决的?

是的,有慢SQL。是忽然跑错索引的。

建议升级的,就是来骗积分的。
如果看监控 或日志没发现明显错误,估计遇到bug了。

遇到bug不升级还有其他办法吗?

1 个赞

:grinning: 一般认为升级可以解决问题。有时升级的决心很难下。得想想其他合规办法。
考研知识深度的机会到了。