【TiDB 使用环境】生产环境
【TiDB 版本】v7.1.5
【集群数据量】
Cluster version: v7.1.5
Deploy user: tidb
SSH type: builtin
Dashboard URL: http://192.168.241.59:2379/dashboard
Grafana URL: http://192.168.241.72:3000
ID Role Host Ports OS/Arch Status Data Dir Deploy Dir
-- ---- ---- ----- ------- ------ -------- ----------
192.168.241.71:8300 cdc 192.168.241.71 8300 linux/x86_64 Up /disk2/cdc-8300 /home/tidb/deploy/cdc-8300
192.168.241.72:8300 cdc 192.168.241.72 8300 linux/x86_64 Up /disk2/cdc-8300 /home/tidb/deploy/cdc-8300
192.168.241.72:3000 grafana 192.168.241.72 3000 linux/x86_64 Up - /home/tidb/deploy/grafana-3000
192.168.241.59:2379 pd 192.168.241.59 2379/2380 linux/x86_64 Up|UI /disk2/pd-2379 /home/tidb/deploy/pd-2379
192.168.241.60:2379 pd 192.168.241.60 2379/2380 linux/x86_64 Up /disk2/pd-2379 /home/tidb/deploy/pd-2379
192.168.241.61:2379 pd 192.168.241.61 2379/2380 linux/x86_64 Up|L /disk2/pd-2379 /home/tidb/deploy/pd-2379
192.168.241.71:9090 prometheus 192.168.241.71 9090/12020 linux/x86_64 Up /disk2/prometheus-9090 /home/tidb/deploy/prometheus-9090
192.168.241.59:4000 tidb 192.168.241.59 4000/10080 linux/x86_64 Up - /home/tidb/deploy/tidb-4000
192.168.241.60:4000 tidb 192.168.241.60 4000/10080 linux/x86_64 Up - /home/tidb/deploy/tidb-4000
192.168.241.61:4000 tidb 192.168.241.61 4000/10080 linux/x86_64 Up - /home/tidb/deploy/tidb-4000
192.168.241.81:4000 tidb 192.168.241.81 4000/10080 linux/x86_64 Up - /home/tidb/deploy/tidb-4000
192.168.241.85:4000 tidb 192.168.241.85 4000/10080 linux/x86_64 Up - /home/tidb/deploy/tidb-4000
192.168.241.71:9000 tiflash 192.168.241.71 9000/8123/3930/20170/20292/8234 linux/x86_64 Up /disk2/tiflash-9000 /home/tidb/deploy/tiflash-9000
192.168.241.71:20160 tikv 192.168.241.71 20160/20180 linux/x86_64 Up /disk2/tikv-20160 /home/tidb/deploy/tikv-20160
192.168.241.72:20160 tikv 192.168.241.72 20160/20180 linux/x86_64 Up /disk2/tikv-20160 /home/tidb/deploy/tikv-20160
192.168.241.73:20160 tikv 192.168.241.73 20160/20180 linux/x86_64 Up /disk2/tikv-20160 /home/tidb/deploy/tikv-20160
192.168.241.74:20160 tikv 192.168.241.74 20160/20180 linux/x86_64 Up /disk2/tikv-20160 /home/tidb/deploy/tikv-20160
192.168.241.75:20160 tikv 192.168.241.75 20160/20180 linux/x86_64 Up /disk2/tikv-20160 /home/tidb/deploy/tikv-20160
192.168.241.76:20160 tikv 192.168.241.76 20160/20180 linux/x86_64 Up /disk2/tikv-20160 /home/tidb/deploy/tikv-20160
192.168.241.83:20160 tikv 192.168.241.83 20160/20180 linux/x86_64 Up /disk2/tikv-20160 /home/tidb/deploy/tikv-20160
Total nodes: 20
【遇到的问题:问题现象及影响】
昨天下午16点左右,突然收到很多告警【TiDB_query_duration/PD_tidb_handle_requests_duration主要是这俩报警】
看TIDB日志85节点:一直在刷get timestamp too slow
$ tailf log/tidb.log |grep -v INFO
[2025/04/14 12:45:05.789 +08:00] [WARN] [pd.go:156] ["get timestamp too slow"] ["cost time"=209.251121ms]
[2025/04/14 12:45:05.789 +08:00] [WARN] [pd.go:156] ["get timestamp too slow"] ["cost time"=38.943349ms]
[2025/04/14 12:45:06.217 +08:00] [WARN] [pd.go:156] ["get timestamp too slow"] ["cost time"=136.5953ms]
[2025/04/14 12:45:06.217 +08:00] [WARN] [pd.go:156] ["get timestamp too slow"] ["cost time"=136.692471ms]
[2025/04/14 12:45:06.217 +08:00] [WARN] [pd.go:156] ["get timestamp too slow"] ["cost time"=132.049231ms]
[2025/04/14 12:45:06.427 +08:00] [WARN] [pd.go:156] ["get timestamp too slow"] ["cost time"=81.115165ms]
[2025/04/14 12:45:06.427 +08:00] [WARN] [pd.go:156] ["get timestamp too slow"] ["cost time"=81.152082ms]
[2025/04/14 12:45:06.427 +08:00] [WARN] [pd.go:156] ["get timestamp too slow"] ["cost time"=81.40483ms]
[2025/04/14 12:45:06.955 +08:00] [WARN] [pd.go:156] ["get timestamp too slow"] ["cost time"=137.884456ms]
[2025/04/14 12:45:07.164 +08:00] [WARN] [pd.go:156] ["get timestamp too slow"] ["cost time"=194.533174ms]
[2025/04/14 12:45:07.375 +08:00] [WARN] [pd.go:156] ["get timestamp too slow"] ["cost time"=187.693764ms]
[2025/04/14 12:45:07.752 +08:00] [WARN] [pd.go:156] ["get timestamp too slow"] ["cost time"=151.272534ms]
[2025/04/14 12:45:07.752 +08:00] [WARN] [pd.go:156] ["get timestamp too slow"] ["cost time"=126.892345ms]
[2025/04/14 12:45:07.752 +08:00] [WARN] [pd.go:156] ["get timestamp too slow"] ["cost time"=155.55664ms]
[2025/04/14 12:45:07.752 +08:00] [WARN] [pd.go:156] ["get timestamp too slow"] ["cost time"=155.644418ms]
[2025/04/14 12:45:07.752 +08:00] [WARN] [pd.go:156] ["get timestamp too slow"] ["cost time"=155.697309ms]
[2025/04/14 12:45:07.752 +08:00] [WARN] [pd.go:156] ["get timestamp too slow"] ["cost time"=155.757283ms]
[2025/04/14 12:45:07.956 +08:00] [WARN] [pd.go:156] ["get timestamp too slow"] ["cost time"=193.618624ms]
[2025/04/14 12:45:07.956 +08:00] [WARN] [pd.go:156] ["get timestamp too slow"] ["cost time"=197.917383ms]
[2025/04/14 12:45:07.956 +08:00] [WARN] [pd.go:156] ["get timestamp too slow"] ["cost time"=89.226347ms]
[2025/04/14 12:45:07.956 +08:00] [WARN] [pd.go:156] ["get timestamp too slow"] ["cost time"=89.737459ms]
[2025/04/14 12:45:07.956 +08:00] [WARN] [pd.go:156] ["get timestamp too slow"] ["cost time"=111.226659ms]
[2025/04/14 12:45:08.164 +08:00] [WARN] [pd.go:156] ["get timestamp too slow"] ["cost time"=199.095257ms]
[2025/04/14 12:45:08.369 +08:00] [WARN] [pd.go:156] ["get timestamp too slow"] ["cost time"=172.168247ms]
[2025/04/14 12:45:08.369 +08:00] [WARN] [pd.go:156] ["get timestamp too slow"] ["cost time"=169.918419ms]
PD master日志:一直有存储空间不够的日志,但我这集群经常会超80%之前也没啥问题,不知为啥这次,出现了异常
$ tailf log/pd.log |grep -v INFO
[2025/04/14 12:45:47.746 +08:00] [WARN] [cluster.go:893] ["store does not have enough disk space"] [store-id=2] [capacity=3937850605568] [available=781844189184]
[2025/04/14 12:45:47.990 +08:00] [WARN] [cluster.go:893] ["store does not have enough disk space"] [store-id=3] [capacity=3937850605568] [available=781035110400]
[2025/04/14 12:45:55.193 +08:00] [WARN] [cluster.go:893] ["store does not have enough disk space"] [store-id=2949399] [capacity=3937850605568] [available=787348148224]
[2025/04/14 12:45:57.746 +08:00] [WARN] [cluster.go:893] ["store does not have enough disk space"] [store-id=2] [capacity=3937850605568] [available=781844279296]
[2025/04/14 12:45:57.992 +08:00] [WARN] [cluster.go:893] ["store does not have enough disk space"] [store-id=3] [capacity=3937850605568] [available=780242685952]
【其他附件:截图/日志/监控】