pd集群重搭后 经常 pd server out

可以参考一下官方文档操作
https://docs.pingcap.com/zh/tidb/stable/clinic-introduction

Diag 诊断客户端支持 TiDB v4.0 及以上的集群

我取了一份日志了
链接直接发在这里吗

把clinic收集结果文件 上传上来

member leader resign

手动迁移一下pd leader

试过没用

这个表是否有tiflash 副本
有的话 先暂时取消使用tiflash 试试
如果是这个问题,先删除在重新同步。

没有使用tiflash ,而且不是一张表有这个问题,有部分表有这个问题

我看上面监控 tikv在不断重启 是手动执行的吗?

是手动执行

链接私信给您了

tiup cluster check --cluster
检查一下集群

10.130.1.7 service Pass service firewalld not found, ignore
10.130.1.7 permission Pass /tidb/tidb-deploy/tiflash-9000 is writable
10.130.1.7 permission Pass /tidb/tidb-data/tiflash-9000 is writable
10.130.1.7 os-version Warn OS is Ubuntu 20.04.3 LTS 20.04.3 (ubuntu support is not fully tested, be careful)
10.130.1.7 cpu-governor Fail CPU frequency governor is powersave, should use performance
10.130.1.7 memory Pass memory size is 131072MB
10.130.1.7 network Pass network speed of eno7 is 1000MB
10.130.1.7 network Pass network speed of eno8 is 1000MB
10.130.1.7 network Pass network speed of ens1f0 is 10000MB
10.130.1.7 network Pass network speed of ens1f1 is 10000MB
10.130.1.7 network Pass network speed of vlan500 is 20000MB
10.130.1.7 network Pass network speed of bond0 is 20000MB
10.130.1.7 network Pass network speed of eno5 is 1000MB
10.130.1.7 network Pass network speed of eno6 is 1000MB
10.130.1.7 selinux Pass SELinux is disabled
10.130.1.7 cpu-cores Pass number of CPU cores / threads: 48
10.130.1.7 thp Fail THP is enabled, please disable it for best performance
10.130.1.7 command Pass numactl: policy: default
10.130.1.3 os-version Warn OS is Ubuntu 20.04.5 LTS 20.04.5 (ubuntu support is not fully tested, be careful)
10.130.1.3 cpu-cores Pass number of CPU cores / threads: 48
10.130.1.3 memory Pass memory size is 131072MB
10.130.1.3 network Pass network speed of vlan500 is 20000MB
10.130.1.3 network Pass network speed of bond0 is 20000MB
10.130.1.3 network Pass network speed of eno5 is 1000MB
10.130.1.3 network Pass network speed of eno6 is 1000MB
10.130.1.3 network Pass network speed of eno7 is 1000MB
10.130.1.3 network Pass network speed of eno8 is 1000MB
10.130.1.3 network Pass network speed of ens1f0 is 10000MB
10.130.1.3 network Pass network speed of ens1f1 is 10000MB
10.130.1.3 thp Fail THP is enabled, please disable it for best performance
10.130.1.3 service Pass service firewalld not found, ignore
10.130.1.3 command Fail numactl not usable, bash: numactl: command not found
10.130.1.3 permission Pass /tidb/tidb-deploy/tidb-4000 is writable
10.130.1.3 permission Pass /tidb/tidb-data/pd-2379 is writable
10.130.1.3 permission Pass /tidb/tidb-deploy/pd-2379 is writable
10.130.1.3 cpu-governor Fail CPU frequency governor is powersave, should use performance
10.130.1.3 disk Fail mount point / does not have ‘nodelalloc’ option set
10.130.1.3 disk Warn mount point / does not have ‘noatime’ option set
10.130.1.3 selinux Pass SELinux is disabled
10.130.1.6 cpu-governor Fail CPU frequency governor is powersave, should use performance
10.130.1.6 network Pass network speed of eno8 is 1000MB
10.130.1.6 network Pass network speed of ens1f0 is 10000MB
10.130.1.6 network Pass network speed of ens1f1 is 10000MB
10.130.1.6 network Pass network speed of vlan500 is 20000MB
10.130.1.6 network Pass network speed of bond0 is 20000MB
10.130.1.6 network Pass network speed of eno5 is 1000MB
10.130.1.6 network Pass network speed of eno6 is 1000MB
10.130.1.6 network Pass network speed of eno7 is 1000MB
10.130.1.6 thp Fail THP is enabled, please disable it for best performance
10.130.1.6 command Pass numactl: policy: default
10.130.1.6 permission Pass /tidb/tidb-deploy/tikv-20161 is writable
10.130.1.6 permission Pass /tidb/tidb-data/tikv-20161 is writable
10.130.1.6 permission Pass /data/node1/tidb-deploy/tikv-20162 is writable
10.130.1.6 permission Pass /data/node1/tidb-data/tikv-20162 is writable
10.130.1.6 os-version Warn OS is Ubuntu 20.04.3 LTS 20.04.3 (ubuntu support is not fully tested, be careful)
10.130.1.6 cpu-cores Pass number of CPU cores / threads: 48
10.130.1.6 memory Pass memory size is 131072MB
10.130.1.6 selinux Pass SELinux is disabled
10.130.1.6 service Pass service firewalld not found, ignore
10.130.1.1 os-version Warn OS is Ubuntu 20.04.3 LTS 20.04.3 (ubuntu support is not fully tested, be careful)
10.130.1.1 network Pass network speed of ens1f0 is 10000MB
10.130.1.1 network Pass network speed of ens1f1 is 10000MB
10.130.1.1 network Pass network speed of vlan500 is 20000MB
10.130.1.1 network Pass network speed of bond0 is 20000MB
10.130.1.1 network Pass network speed of eno5 is 1000MB
10.130.1.1 network Pass network speed of eno6 is 1000MB
10.130.1.1 network Pass network speed of eno7 is 1000MB
10.130.1.1 network Pass network speed of eno8 is 1000MB
10.130.1.1 thp Fail THP is enabled, please disable it for best performance
10.130.1.1 service Pass service firewalld not found, ignore
10.130.1.1 command Pass numactl: policy: default
10.130.1.1 permission Pass /tidb/tidb-deploy/pd-2379 is writable
10.130.1.1 permission Pass /tidb/tidb-data/pd-2379 is writable
10.130.1.1 permission Pass /tidb/tidb-deploy/tidb-4000 is writable
10.130.1.1 cpu-cores Pass number of CPU cores / threads: 48
10.130.1.1 cpu-governor Fail CPU frequency governor is powersave, should use performance
10.130.1.1 memory Pass memory size is 131072MB
10.130.1.1 selinux Pass SELinux is disabled
10.130.1.5 cpu-governor Fail CPU frequency governor is powersave, should use performance
10.130.1.5 permission Pass /tidb/tidb-data/tikv-20161 is writable
10.130.1.5 permission Pass /tidb/tidb-data/pd-2379 is writable
10.130.1.5 permission Pass /data/node1/tidb-deploy/tikv-20162 is writable
10.130.1.5 permission Pass /data/node1/tidb-data/tikv-20162 is writable
10.130.1.5 permission Pass /tidb/tidb-deploy/tikv-20161 is writable
10.130.1.5 permission Pass /tidb/tidb-deploy/pd-2379 is writable
10.130.1.5 cpu-cores Pass number of CPU cores / threads: 48
10.130.1.5 memory Pass memory size is 131072MB
10.130.1.5 network Pass network speed of bond0 is 20000MB
10.130.1.5 network Pass network speed of eno5 is 1000MB
10.130.1.5 network Pass network speed of eno6 is 1000MB
10.130.1.5 network Pass network speed of eno7 is 1000MB
10.130.1.5 network Pass network speed of eno8 is 1000MB
10.130.1.5 network Pass network speed of ens1f0 is 10000MB
10.130.1.5 network Pass network speed of ens1f1 is 10000MB
10.130.1.5 network Pass network speed of vlan500 is 20000MB
10.130.1.5 selinux Pass SELinux is disabled
10.130.1.5 thp Fail THP is enabled, please disable it for best performance
10.130.1.5 service Pass service firewalld not found, ignore
10.130.1.5 command Pass numactl: policy: default
10.130.1.5 os-version Warn OS is Ubuntu 20.04.3 LTS 20.04.3 (ubuntu support is not fully tested, be careful)
10.130.1.8 cpu-cores Pass number of CPU cores / threads: 48
10.130.1.8 selinux Pass SELinux is disabled
10.130.1.8 command Pass numactl: policy: default
10.130.1.8 thp Fail THP is enabled, please disable it for best performance
10.130.1.8 service Pass service firewalld not found, ignore
10.130.1.8 permission Pass /tidb/tidb-deploy/tiflash-9000 is writable
10.130.1.8 permission Pass /tidb/tidb-data/tiflash-9000 is writable
10.130.1.8 os-version Warn OS is Ubuntu 20.04.3 LTS 20.04.3 (ubuntu support is not fully tested, be careful)
10.130.1.8 cpu-governor Fail CPU frequency governor is powersave, should use performance
10.130.1.8 memory Pass memory size is 131072MB
10.130.1.8 network Pass network speed of eno7 is 1000MB
10.130.1.8 network Pass network speed of eno8 is 1000MB
10.130.1.8 network Pass network speed of ens1f0 is 10000MB
10.130.1.8 network Pass network speed of ens1f1 is 10000MB
10.130.1.8 network Pass network speed of vlan500 is 20000MB
10.130.1.8 network Pass network speed of bond0 is 20000MB
10.130.1.8 network Pass network speed of eno5 is 1000MB
10.130.1.8 network Pass network speed of eno6 is 1000MB
10.130.1.4 thp Fail THP is enabled, please disable it for best performance
10.130.1.4 service Pass service firewalld not found, ignore
10.130.1.4 cpu-cores Pass number of CPU cores / threads: 48
10.130.1.4 cpu-governor Fail CPU frequency governor is powersave, should use performance
10.130.1.4 memory Pass memory size is 131072MB
10.130.1.4 network Pass network speed of bond0 is 20000MB
10.130.1.4 network Pass network speed of eno5 is 1000MB
10.130.1.4 network Pass network speed of eno6 is 1000MB
10.130.1.4 network Pass network speed of eno7 is 1000MB
10.130.1.4 network Pass network speed of eno8 is 1000MB
10.130.1.4 network Pass network speed of ens1f0 is 10000MB
10.130.1.4 network Pass network speed of ens1f1 is 10000MB
10.130.1.4 network Pass network speed of vlan500 is 20000MB
10.130.1.4 permission Pass /tidb/tidb-deploy/tikv-20161 is writable
10.130.1.4 permission Pass /tidb/tidb-deploy/pd-2379 is writable
10.130.1.4 permission Pass /tidb/tidb-data/tikv-20161 is writable
10.130.1.4 permission Pass /tidb/tidb-data/pd-2379 is writable
10.130.1.4 permission Pass /data/node1/tidb-deploy/tikv-20162 is writable
10.130.1.4 permission Pass /data/node1/tidb-data/tikv-20162 is writable
10.130.1.4 os-version Warn OS is Ubuntu 20.04.3 LTS 20.04.3 (ubuntu support is not fully tested, be careful)
10.130.1.4 selinux Pass SELinux is disabled
10.130.1.4 command Pass numactl: policy: default
10.130.1.9 command Fail numactl not usable, bash: numactl: command not found
10.130.1.9 permission Pass /tidb/tidb-deploy/prometheus-8249 is writable
10.130.1.9 permission Pass /tidb/tidb-deploy/grafana-3000 is writable
10.130.1.9 permission Pass /tidb/tidb-data/prometheus-8249 is writable
10.130.1.9 permission Pass /tidb/tidb-deploy/alertmanager-9093 is writable
10.130.1.9 permission Pass /tidb/tidb-data/alertmanager-9093 is writable
10.130.1.9 cpu-cores Pass number of CPU cores / threads: 20
10.130.1.9 network Pass network speed of eno5 is 1000MB
10.130.1.9 network Pass network speed of eno6 is 1000MB
10.130.1.9 network Pass network speed of eno7 is 1000MB
10.130.1.9 network Pass network speed of eno8 is 1000MB
10.130.1.9 network Pass network speed of ens1f0 is 10000MB
10.130.1.9 network Pass network speed of ens1f1 is 10000MB
10.130.1.9 network Pass network speed of vlan500 is 20000MB
10.130.1.9 network Pass network speed of bond0 is 20000MB
10.130.1.9 memory Pass memory size is 32768MB
10.130.1.9 limits Fail soft limit of ‘nofile’ for user ‘tidb’ is not set or too low
10.130.1.9 limits Fail hard limit of ‘nofile’ for user ‘tidb’ is not set or too low
10.130.1.9 limits Fail soft limit of ‘stack’ for user ‘tidb’ is not set or too low
10.130.1.9 sysctl Fail net.core.somaxconn = 4096, should be greater than 32768
10.130.1.9 sysctl Fail net.ipv4.tcp_syncookies = 1, should be 0
10.130.1.9 sysctl Fail vm.swappiness = 60, should be 0
10.130.1.9 service Pass service firewalld not found, ignore
10.130.1.9 os-version Warn OS is Ubuntu 20.04.4 LTS 20.04.4 (ubuntu support is not fully tested, be careful)
10.130.1.9 disk Fail mount point / does not have ‘nodelalloc’ option set
10.130.1.9 disk Warn mount point / does not have ‘noatime’ option set
10.130.1.9 thp Fail THP is enabled, please disable it for best performance
10.130.1.9 cpu-governor Warn Unable to determine current CPU frequency governor policy
10.130.1.9 swap Fail swap is enabled, please disable it for best performance
10.130.1.9 selinux Pass SELinux is disabled
10.130.1.2 permission Pass /tidb/tidb-deploy/tidb-4000 is writable
10.130.1.2 permission Pass /tidb/tidb-deploy/pd-2379 is writable
10.130.1.2 permission Pass /tidb/tidb-data/pd-2379 is writable
10.130.1.2 os-version Warn OS is Ubuntu 20.04.3 LTS 20.04.3 (ubuntu support is not fully tested, be careful)
10.130.1.2 cpu-governor Fail CPU frequency governor is powersave, should use performance
10.130.1.2 memory Pass memory size is 131072MB
10.130.1.2 selinux Pass SELinux is disabled
10.130.1.2 thp Fail THP is enabled, please disable it for best performance
10.130.1.2 service Pass service firewalld not found, ignore
10.130.1.2 cpu-cores Pass number of CPU cores / threads: 48
10.130.1.2 network Pass network speed of ens1f1 is 10000MB
10.130.1.2 network Pass network speed of vlan500 is 20000MB
10.130.1.2 network Pass network speed of bond0 is 20000MB
10.130.1.2 network Pass network speed of eno5 is 1000MB
10.130.1.2 network Pass network speed of eno6 is 1000MB
10.130.1.2 network Pass network speed of eno7 is 1000MB
10.130.1.2 network Pass network speed of eno8 is 1000MB
10.130.1.2 network Pass network speed of ens1f0 is 10000MB
10.130.1.2 command Pass numactl: policy: default

我提一个建议点,我发现你的pd和tidb是混合部署的,可以考虑分开,避免组件之间互相影响。

之前是可以 ,但是现在pd 重搭就出现这个问题,即使我pd 迁移到其他节点 也有这个问题

尝试一下将有问题的表加入到tiflash中。在查询看是否有问题

还是不行

强制使用 tiflash /*+ read_from_storage(tiflash[table_name]) */ 试一下。 确定是否是region的问题

另外把pd 重新搭建的过程说下。


我是强制tiflash 不成功吗

怀疑是pd重建时候的问题,你删除重建一下这个表的对应索引。