tidb频繁中断,日志告警,请问什么问题

[2023/04/21 14:34:25.069 +08:00] [WARN] [pd.go:152] [“get timestamp too slow”] [“cost time”=1.860438519s]
[2023/04/21 14:34:25.207 +08:00] [WARN] [pd.go:152] [“get timestamp too slow”] [“cost time”=1.309973486s]
[2023/04/21 14:34:25.207 +08:00] [WARN] [pd.go:152] [“get timestamp too slow”] [“cost time”=1.289187957s]
[2023/04/21 14:34:25.207 +08:00] [WARN] [pd.go:152] [“get timestamp too slow”] [“cost time”=1.216492095s]
[2023/04/21 14:34:25.376 +08:00] [INFO] [coprocessor.go:1109] [“[TIME_COP_WAIT] resp_time:349.669223ms txnStartTS:440941620226686978 region_id:377911 store_addr:10.201.14.
5:20160 kv_process_ms:0 kv_wait_ms:0 kv_read_ms:0 processed_versions:29 total_versions:86 rocksdb_delete_skipped_count:0 rocksdb_key_skipped_count:85 rocksdb_cache_hit_cou
nt:17 rocksdb_read_count:1 rocksdb_read_byte:16303”]
[2023/04/21 14:34:25.411 +08:00] [WARN] [pd.go:152] [“get timestamp too slow”] [“cost time”=203.569255ms]
[2023/04/21 14:34:25.873 +08:00] [WARN] [pd.go:152] [“get timestamp too slow”] [“cost time”=235.976268ms]
[2023/04/21 14:34:25.873 +08:00] [INFO] [2pc.go:1162] [“send TxnHeartBeat”] [startTS=440941553969790984] [newTTL=345500]
[2023/04/21 14:34:26.326 +08:00] [WARN] [pd.go:152] [“get timestamp too slow”] [“cost time”=190.976232ms]
[2023/04/21 14:34:26.326 +08:00] [INFO] [2pc.go:1162] [“send TxnHeartBeat”] [startTS=440941553039179784] [newTTL=349500]

[2023/04/21 14:38:55.662 +08:00] [INFO] [2pc.go:840] [“2PC detect large amount of mutations on a single region”] [region=457843] [“mutations count”=205371]
[2023/04/21 14:38:55.664 +08:00] [INFO] [2pc.go:840] [“2PC detect large amount of mutations on a single region”] [region=457839] [“mutations count”=166580]
[2023/04/21 14:38:55.667 +08:00] [INFO] [2pc.go:840] [“2PC detect large amount of mutations on a single region”] [region=457835] [“mutations count”=122896]
[2023/04/21 14:38:55.675 +08:00] [INFO] [2pc.go:840] [“2PC detect large amount of mutations on a single region”] [region=457835] [“mutations count”=115353]
[2023/04/21 14:38:55.689 +08:00] [INFO] [2pc.go:840] [“2PC detect large amount of mutations on a single region”] [region=457847] [“mutations count”=184994]
[2023/04/21 14:38:55.713 +08:00] [INFO] [2pc.go:840] [“2PC detect large amount of mutations on a single region”] [region=457831] [“mutations count”=274089]
[2023/04/21 14:38:55.746 +08:00] [INFO] [2pc.go:840] [“2PC detect large amount of mutations on a single region”] [region=457851] [“mutations count”=197124]
[2023/04/21 14:38:55.772 +08:00] [INFO] [2pc.go:840] [“2PC detect large amount of mutations on a single region”] [region=456779] [“mutations count”=223606]
[2023/04/21 14:38:55.800 +08:00] [INFO] [2pc.go:840] [“2PC detect large amount of mutations on a single region”] [region=457839] [“mutations count”=168292]
[2023/04/21 14:38:55.821 +08:00] [INFO] [2pc.go:840] [“2PC detect large amount of mutations on a single region”] [region=457835] [“mutations count”=125067]
[2023/04/21 14:39:04.426 +08:00] [INFO] [2pc.go:1162] [“send TxnHeartBeat”] [startTS=440941705161342984] [newTTL=47400]
[2023/04/21 14:39:04.586 +08:00] [INFO] [2pc.go:1162] [“send TxnHeartBeat”] [startTS=440941705187557384] [newTTL=47500]
[2023/04/21 14:39:04.923 +08:00] [INFO] [2pc.go:1162] [“send TxnHeartBeat”] [startTS=440941705266462729] [newTTL=47499]
[2023/04/21 14:39:05.257 +08:00] [INFO] [2pc.go:1162] [“send TxnHeartBeat”] [startTS=440941705292414984] [newTTL=47750]
[2023/04/21 14:39:05.425 +08:00] [INFO] [2pc.go:1162] [“send TxnHeartBeat”] [startTS=440941705318629385] [newTTL=47800]

获取tso慢 看看tidb pd的机器负载情况,pd的io,还有tidb和pd之间的网络情况

tidb频繁中断

是指tidb-serevr 实例频繁重启,还是说执行的SQL有问题?

看看整个集群的资源负载情况,频繁中断是不是TIDB有OOM的情况。

Apr 21 16:00:17 tidb1 kernel: Out of memory: Kill process 19780 (tidb-server) score 959 or sacrifice child
Apr 21 16:00:17 tidb1 kernel: Killed process 19780 (tidb-server), UID 1000, total-vm:129511996kB, anon-rss:126488904kB, file-rss:56kB, shmem-rss:0kB
Apr 21 16:00:27 tidb1 systemd: tidb-4000.service: main process exited, code=killed, status=9/KILL
Apr 21 16:00:27 tidb1 systemd: Unit tidb-4000.service entered failed state.
Apr 21 16:00:27 tidb1 systemd: tidb-4000.service failed.
Apr 21 16:00:42 tidb1 systemd: tidb-4000.service holdoff time over, scheduling restart.
Apr 21 16:00:42 tidb1 systemd: Stopped tidb service.
Apr 21 16:00:42 tidb1 systemd: Started tidb service.
Apr 21 16:00:42 tidb1 bash: [2023/04/21 16:00:42.866 +08:00] [WARN] [config.go:1112] [“Some configuration options should be moved to [instance] section. Please use the latter config options in [instance] instead: (slow-threshold, tidb_slow_log_threshold).”]

系统层面有内存不足报错

那就需要查下是不是慢SQL导致了OOM

一般频繁OOM往往是某些大SQL导致。

我在load csv,往tidb里面迁移数据

不Load的时候会中断吗?

可以在load的时候观察下内存的增长情况。

不load数据,肯定不会有中断

tidb-server内存太小了吧,把csv拆分成多个文件导入呢?

此话题已在最后回复的 60 天后被自动关闭。不再允许新回复。