TIDB的监控器服务器地址和DM 监控的服务器地址是分开的 TIDB集群可以通过alertmanger可以配置微信或者邮件告警
但是DM通过同样的方式却不告警,alertmanager日志和prometheus.log都正常. [tidb@pd-tikv03 conf]$ cat prometheus.yml
global: scrape_interval: 15s # By default, scrape targets every 15 seconds. evaluation_interval: 15s # How frequently to evaluate rules.
scrape_timeout is set to the global default (10s).
external_labels: cluster: ‘hcloud-test-cluster’ monitor: “prometheus”
Load and evaluate rules in this file every ‘evaluation_interval’ seconds.
rule_files:
- ‘dm_worker.rules.yml’
alerting: alertmanagers:
- static_configs:
- targets:
- ‘10.200.25.83:9093’
- targets:
scrape_configs:
- job_name: “dm_worker”
honor_labels: true # don’t overwrite job & instance labels
static_configs:
- targets:
- ‘10.200.25.83:8262’
- ‘10.200.25.83:8263’
- ‘10.200.25.82:8262’
- ‘10.200.25.82:8263’
- targets:
[tidb@pd-tikv03 log]$ tail -f alertmanager.log
level=info ts=2020-04-24T02:21:25.266047945Z caller=main.go:275 msg=“Loading configuration file” file=conf/alertmanager.yml level=info ts=2020-04-24T02:21:25.274735349Z caller=main.go:350 msg=Listening address=:9093 level=info ts=2020-04-24T02:36:25.266644124Z caller=nflog.go:293 component=nflog msg=“Running maintenance” level=info ts=2020-04-24T02:36:25.266762038Z caller=silence.go:269 component=silences msg=“Running maintenance” level=info ts=2020-04-24T02:36:25.268486797Z caller=silence.go:271 component=silences msg=“Maintenance done” duration=1.834747ms size=0 level=info ts=2020-04-24T02:36:25.268797692Z caller=nflog.go:295 component=nflog msg=“Maintenance done” duration=2.182355ms size=1756 level=info ts=2020-04-24T02:48:17.669480674Z caller=main.go:136 msg=“Starting Alertmanager” version="(version=0.14.0, branch=HEAD, revision=30af4d051b37ce817ea7e35b56c57a0e2ec9dbb0)" level=info ts=2020-04-24T02:48:17.669634111Z caller=main.go:137 build_context="(go=go1.9.2, user=root@37b6a49ebba9, date=20180213-08:16:42)" level=info ts=2020-04-24T02:48:17.670981714Z caller=main.go:275 msg=“Loading configuration file” file=conf/alertmanager.yml level=info ts=2020-04-24T02:48:17.683878104Z caller=main.go:350 msg=Listening address=:9093 ^C
[tidb@pd-tikv03 log]$ tail -f prometheus.log
level=error ts=2020-04-24T02:04:58.371952703Z caller=notifier.go:473 component=notifier alertmanager=http://10.200.25.83:9093/api/v1/alerts count=0 msg=“Error sending alert” err=“Post http://10.200.25.83:9093/api/v1/alerts: dial tcp 10.200.25.83:9093: connect: connection refused” level=info ts=2020-04-24T02:48:11.444826815Z caller=main.go:220 msg=“Starting Prometheus” version="(version=2.2.1, branch=HEAD, revision=bc6058c81272a8d938c05e75607371284236aadc)" level=info ts=2020-04-24T02:48:11.445014157Z caller=main.go:221 build_context="(go=go1.10, user=root@149e5b3f0829, date=20180314-14:15:45)" level=info ts=2020-04-24T02:48:11.44510546Z caller=main.go:222 host_details="(Linux 3.10.0-1062.el7.x86_64 #1 SMP Wed Aug 7 18:08:02 UTC 2019 x86_64 pd-tikv03 (none))" level=info ts=2020-04-24T02:48:11.445140388Z caller=main.go:223 fd_limits="(soft=1000000, hard=1000000)" level=info ts=2020-04-24T02:48:11.452491503Z caller=main.go:504 msg=“Starting TSDB …” level=info ts=2020-04-24T02:48:11.452681391Z caller=web.go:382 component=web msg=“Start listening for connections” address=:9090 level=info ts=2020-04-24T02:48:11.923424153Z caller=main.go:514 msg=“TSDB started” level=info ts=2020-04-24T02:48:11.923565384Z caller=main.go:588 msg=“Loading configuration file” filename=/data/dm-master/conf/prometheus.yml level=info ts=2020-04-24T02:48:11.927782969Z caller=main.go:491 msg=“Server is ready to receive web requests.”
测试步骤是:1 停止任务不告警 2 停止dm集群也不告警