【TiDB 使用环境】生产环境
【TiDB 版本】v8.1.0
【操作系统】ubuntu20.04.1
【部署方式】
【集群数据量】
【集群节点数】3台机器,每台机器都是1个pd,1个flash,3个kv
【问题复现路径】做过哪些操作出现的问题
【遇到的问题:问题现象及影响】
晚上出现连接不到数据库的问题,每天的tiup dumping任务也会失败,报错显示是连不到pd,查看监控发现有短时间,监控不到数据,查看日志发现有一个节点的kv oom了,在pd日志中发现,有与一个节点的pd时间不同步的问题,之前没有做时间同步,刚刚做上
【资源配置】
【复制黏贴 ERROR 报错的日志】
连接tidb报错:
Copy_of_ODBC_Connector_1: ODBC function “SQLExecDirect()” reported: SQLSTATE = HY000: Native Error Code = 9,001: Msg = [MySQL][ODBC 5.3(w) Driver][mysqld-8.0.11-TiDB-v8.1.0]PD server timeout: (CC_OdbcAdapter::preRun, file CC_OdbcAdapter.cpp, line 501)
tiup dumping报错:
[2025/07/30 23:22:00.183 +08:00] [ERROR] [pd_service_discovery.go:586] [“[pd] failed to update service mode”] [urls=“[http://192.168.176.1:2379,http://192.168.176.2:2379,http://192.168.176.3:2
379]”] [error=“[PD:client:ErrClientGetClusterInfo]error:rpc error: code = DeadlineExceeded desc = context deadline exceeded target:192.168.176.2:2379 status:READY: error:rpc error: code = Dead
lineExceeded desc = context deadline exceeded target:192.168.176.2:2379 status:READY”]
[2025/07/30 23:22:02.997 +08:00] [INFO] [status.go:37] [progress] [tables=“432/1084 (39.9%)”] [“finished rows”=177064834] [“estimate total rows”=447005139] [“finished size”=163.7GB] [“average
speed(MiB/s)”=229.5108308184227] [“recent speed bps”=238963153.23269615] [“chunks progress”=“52.28 %”]
[2025/07/30 23:22:21.879 +08:00] [ERROR] [pd_service_discovery.go:586] [“[pd] failed to update service mode”] [urls=“[http://192.168.176.1:2379,http://192.168.176.2:2379,http://192.168.176.3:2
379]”] [error=“[PD:client:ErrClientGetClusterInfo]error:rpc error: code = DeadlineExceeded desc = context deadline exceeded target:192.168.176.2:2379 status:READY: error:rpc error: code = Dead
lineExceeded desc = context deadline exceeded target:192.168.176.2:2379 status:READY”]
[2025/07/30 23:22:27.605 +08:00] [INFO] [pd_service_discovery.go:910] [“[pd] cannot update member from this url”] [url=http://192.168.176.1:2379] [error=“[PD:client:ErrClientGetMember]error:rp
c error: code = DeadlineExceeded desc = context deadline exceeded target:192.168.176.1:2379 status:READY: error:rpc error: code = DeadlineExceeded desc = context deadline exceeded target:192.1
68.176.1:2379 status:READY”]
[2025/07/30 23:22:31.152 +08:00] [ERROR] [pd_service_discovery.go:586] [“[pd] failed to update service mode”] [urls=“[http://192.168.176.1:2379,http://192.168.176.2:2379,http://192.168.176.3:2
379]”] [error=“[PD:client:ErrClientGetClusterInfo]error:rpc error: code = DeadlineExceeded desc = context deadline exceeded target:192.168.176.2:2379 status:READY: error:rpc error: code = Dead
lineExceeded desc = context deadline exceeded target:192.168.176.2:2379 status:READY”]
[2025/07/30 23:22:47.849 +08:00] [ERROR] [pd_service_discovery.go:586] [“[pd] failed to update service mode”] [urls=“[http://192.168.176.1:2379,http://192.168.176.2:2379,http://192.168.176.3:2
379]”] [error=“[PD:client:ErrClientGetClusterInfo]error:rpc error: code = DeadlineExceeded desc = context deadline exceeded target:192.168.176.2:2379 status:READY: error:rpc error: code = Dead
lineExceeded desc = context deadline exceeded target:192.168.176.2:2379 status:READY”]
[2025/07/30 23:23:00.430 +08:00] [INFO] [pd_service_discovery.go:910] [“[pd] cannot update member from this url”] [url=http://192.168.176.1:2379] [error=“[PD:client:ErrClientGetMember]error:rp
c error: code = DeadlineExceeded desc = context deadline exceeded target:192.168.176.1:2379 status:READY: error:rpc error: code = DeadlineExceeded desc = context deadline exceeded target:192.1
68.176.1:2379 status:READY”]
[2025/07/30 23:23:00.430 +08:00] [ERROR] [pd_service_discovery.go:586] [“[pd] failed to update service mode”] [urls=“[http://192.168.176.1:2379,http://192.168.176.2:2379,http://192.168.176.3:2
379]”] [error=“[PD:client:ErrClientGetClusterInfo]error:rpc error: code = DeadlineExceeded desc = context deadline exceeded target:192.168.176.2:2379 status:READY: error:rpc error: code = Dead
lineExceeded desc = context deadline exceeded target:192.168.176.2:2379 status:READY”]
[2025/07/30 23:23:31.351 +08:00] [INFO] [pd_service_discovery.go:910] [“[pd] cannot update member from this url”] [url=http://192.168.176.2:2379] [error=“[PD:client:ErrClientGetMember]error:rp
c error: code = DeadlineExceeded desc = context deadline exceeded target:192.168.176.2:2379 status:READY: error:rpc error: code = DeadlineExceeded desc = context deadline exceeded target:192.1
68.176.2:2379 status:READY”]
[2025/07/30 23:23:31.351 +08:00] [ERROR] [pd_service_discovery.go:586] [“[pd] failed to update service mode”] [urls=“[http://192.168.176.1:2379,http://192.168.176.2:2379,http://192.168.176.3:2
379]”] [error=“[PD:client:ErrClientGetClusterInfo]error:rpc error: code = DeadlineExceeded desc = context deadline exceeded target:192.168.176.2:2379 status:READY: error:rpc error: code = Dead
lineExceeded desc = context deadline exceeded target:192.168.176.2:2379 status:READY”]
[2025/07/30 23:23:43.279 +08:00] [INFO] [writer.go:272] [“no data written in table chunk”] [database=ODS_GF_ISMS] [table=CE_DIM_DAT05_SAP] [chunkIdx=0]
[2025/07/30 23:24:02.087 +08:00] [INFO] [status.go:37] [progress] [tables=“433/1084 (39.9%)”] [“finished rows”=183050526] [“estimate total rows”=447005139] [“finished size”=172.5GB] [“average
speed(MiB/s)”=71.01821853528952] [“recent speed bps”=74418596.28237112] [“chunks progress”=“54.06 %”]
【其他附件:截图/日志/监控】
dashborad 收集的erro级别logs.zip (262.9 KB)