为什么grafana面版的实例只看一个PD节点的数据啊

tidb版本 : 用tiup部署的3.0.12

TiDB Cluster: tidb-test
TiDB Version: v3.0.12
ID                 Role          Host         Ports        Status     Data Dir                          Deploy Dir
--                 ----          ----         -----        ------     --------                          ----------
10.3.87.221:9093   alertmanager  10.3.87.221  9093/9094    Up         /tidb/app/data/alertmanager-9093  /tidb/app/deploy/alertmanager-9093
10.3.87.221:3000   grafana       10.3.87.221  3000         Up         -                                 /tidb/app/deploy/grafana-3000
10.3.87.202:2379   pd            10.3.87.202  2379/2380    Healthy    /tidb/app/data/pd-2379            /tidb/app/deploy/pd-2379
10.3.87.221:2379   pd            10.3.87.221  2379/2380    Healthy|L  /tidb/app/data/pd-2379            /tidb/app/deploy/pd-2379
10.3.87.34:2389    pd            10.3.87.34   2389/2381    Healthy    /tidb/app/data/pd-2389            /tidb/app/deploy/pd-2389
10.3.87.221:9090   prometheus    10.3.87.221  9090         Up         /tidb/app/data/prometheus-9090    /tidb/app/deploy/prometheus-9090
10.3.87.202:4000   tidb          10.3.87.202  4000/10080   Up         -                                 /tidb/app/deploy/tidb-4000
10.3.87.221:4000   tidb          10.3.87.221  4000/10080   Up         -                                 /tidb/app/deploy/tidb-4000
10.3.87.34:4000    tidb          10.3.87.34   4000/10080   Up         -                                 /tidb/app/deploy/tidb-4000
10.3.87.202:20160  tikv          10.3.87.202  20160/20180  Up         /tidb/app/data/tikv-20160         /tidb/app/deploy/tikv-20160
10.3.87.221:20160  tikv          10.3.87.221  20160/20180  Up         /tidb/app/data/tikv-20160         /tidb/app/deploy/tikv-20160
10.3.87.34:20160   tikv          10.3.87.34   20160/20180  Up         /tidb/app/data/tikv-20160         /tidb/app/deploy/tikv-20160

我看了一下prometheus的配制文件确实是去抓了三个节点的数据

- job_name: pd
  honor_labels: true
  scrape_interval: 15s
  scrape_timeout: 10s
  metrics_path: /metrics
  scheme: http
  static_configs:
  - targets:
    - 10.3.87.221:2379
    - 10.3.87.202:2379
    - 10.3.87.34:2389

但得到的数据是

pd_cluster_status{instance=“10.3.87.221:2379”,job=“pd”,namespace=“global”,type=“leader_count”} 79
pd_cluster_status{instance=“10.3.87.221:2379”,job=“pd”,namespace=“global”,type=“region_count”} 237
pd_cluster_status{instance=“10.3.87.221:2379”,job=“pd”,namespace=“global”,type=“storage_capacity”} 263819968512
pd_cluster_status{instance=“10.3.87.221:2379”,job=“pd”,namespace=“global”,type=“storage_size”} 325384432
pd_cluster_status{instance=“10.3.87.221:2379”,job=“pd”,namespace=“global”,type=“store_disconnected_count”} 0
pd_cluster_status{instance=“10.3.87.221:2379”,job=“pd”,namespace=“global”,type=“store_down_count”} 0
pd_cluster_status{instance=“10.3.87.221:2379”,job=“pd”,namespace=“global”,type=“store_low_space_count”} 0
pd_cluster_status{instance=“10.3.87.221:2379”,job=“pd”,namespace=“global”,type=“store_offline_count”} 0
pd_cluster_status{instance=“10.3.87.221:2379”,job=“pd”,namespace=“global”,type=“store_tombstone_count”} 0
pd_cluster_status{instance=“10.3.87.221:2379”,job=“pd”,namespace=“global”,type=“store_unhealth_count”} 0
pd_cluster_status{instance=“10.3.87.221:2379”,job=“pd”,namespace=“global”,type=“store_up_count”}

%E5%BE%AE%E4%BF%A1%E6%88%AA%E5%9B%BE_20200416160423 %E5%BE%AE%E4%BF%A1%E6%88%AA%E5%9B%BE_20200416160515 %E5%BE%AE%E4%BF%A1%E6%88%AA%E5%9B%BE_20200416160551

而且我还发现同一个实例但在两个面板上显示的角色不一样
一个是leader 一个是follower

1、原则上grafana PD 的监控面板只显示角色为 leader 的节点,并且只有 leader 的节点数据是最新的,且是变化的。但是 pd leader 切换的情况例外。

2、如果 TiDB overview 面板和 PD 面板显示的角色不一致,请先使用 pd-ctl member 检查下当前 PD 的 leader 节点,并将相应的监控面板参考监控公式修正为正确的信息。pd-ctl 参考下述命令:

https://pingcap.com/docs-cn/v3.0/reference/tools/pd-control/#下载安装包

3、我这里再确认下,是否是监控模板本身存在问题,感谢~~

count(delta(pd_tso_events{type=“save”,instance="$instance"}[1m]))

count(delta(pd_server_tso{type=“save”,instance="$instance"}[1m]))

我看一下两个面板用的不是同一个语句 我的leader就是10.2.87.221

:+1:,请参照 PD 面板的公式设置下 overview 面板吧,我这里反馈下,感谢~~~