alertmanager无法发送邮件

alertmanager配置好了邮件告警,但是无法收到邮件,alertmanager.log日志如下:
level=error ts=2021-08-12T10:51:17.235759877Z caller=email.go:147 msg=“failed to close SMTP connection” err=“x509: certificate is valid for mail.service.test.cn, not mail.in.service.test.cn
level=error ts=2021-08-12T10:51:43.718760827Z caller=email.go:147 msg=“failed to close SMTP connection” err=“x509: certificate is valid for mail.service.test.cn, not mail.in.service.test.cn
level=error ts=2021-08-12T10:52:41.07742121Z caller=email.go:147 msg=“failed to close SMTP connection” err=“x509: certificate is valid for mail.service.test.cn, not mail.in.service.test.cn
level=error ts=2021-08-12T10:53:33.155078534Z caller=notify.go:339 component=dispatcher msg=“Error on notify” err=“starttls failed: x509: certificate is valid for mail.service.test.cn, not mail.in.service.test.cn
level=error ts=2021-08-12T10:53:33.155199631Z caller=dispatch.go:264 component=dispatcher msg=“Notify for alerts failed” num_alerts=1 err=“starttls failed: x509: certificate is valid for mail.service.test.cn, not mail.in.service.test.cn
level=error ts=2021-08-12T11:01:33.161492016Z caller=email.go:147 msg=“failed to close SMTP connection” err=“x509: certificate is valid for mail.service.test.cn, not mail.in.service.test.cn

但是网络是通的

1 个赞

检查一下 mail.in.service.test.cn 域名解析是否可以正确解析

1 个赞

域名是可以正常解析的,而且这个地址我通过写的python程序,在alertmanager节点是可以发送邮件的

现在报错信息如下:
level=error ts=2021-08-12T19:17:25.053044474Z caller=email.go:147 msg=“failed to close SMTP connection” err=“write tcp 10.30.xx.1:53145->10.30.xx.2:25: use of closed network connection”
level=error ts=2021-08-12T19:17:27.271730275Z caller=email.go:147 msg=“failed to close SMTP connection” err=“write tcp 10.30.xx.1:53159->10.30.xx.2:25: use of closed network connection”
level=error ts=2021-08-12T19:17:33.715002817Z caller=email.go:147 msg=“failed to close SMTP connection” err=“write tcp 10.30.xx.1:53175->10.30.xx.2:25: use of closed network connection”
level=error ts=2021-08-12T19:17:40.238189504Z caller=email.go:147 msg=“failed to close SMTP connection” err=“write tcp 10.30.xx.1:53197->10.30.xx.2:25: use of closed network connection”
level=error ts=2021-08-12T19:17:48.982331404Z caller=email.go:147 msg=“failed to close SMTP connection” err=“write tcp 10.30.xx.1:53241->10.30.xx.2:25: use of closed network connection”
level=error ts=2021-08-12T19:18:09.190726778Z caller=email.go:147 msg=“failed to close SMTP connection” err=“write tcp 10.30.xx.1:53317->10.30.xx.2:25: use of closed network connection”
level=error ts=2021-08-12T19:18:44.919456324Z caller=email.go:147 msg=“failed to close SMTP connection” err=“write tcp 10.30.xx.1:53441->10.30.xx.2:25: use of closed network connection”
level=error ts=2021-08-12T19:19:24.603175171Z caller=email.go:147 msg=“failed to close SMTP connection” err=“write tcp 10.30.xx.1:53587->10.30.xx.2:25: use of closed network connection”
level=error ts=2021-08-12T19:20:18.152998932Z caller=notify.go:339 component=dispatcher msg=“Error on notify” err="*smtp.plainAuth failed: unencrypt
ed connection"
level=error ts=2021-08-12T19:20:18.153125647Z caller=dispatch.go:264 component=dispatcher msg=“Notify for alerts failed” num_alerts=1 err="*smtp.pla
inAuth failed: unencrypted connection"

看看是不是这个问题哈 http://www.haosan.com/www/doc/view/?doc_id=2616

不知道是哪里的问题,奇怪了就:pensive:

这个参数值我设置的就是false,但是出现的报错就是这个:
level=error ts=2021-08-12T19:17:25.053044474Z caller=email.go:147 msg=“failed to close SMTP connection” err=“write tcp 10.30.xx.1:53145->10.30.xx.2:25: use of closed network connection”
level=error ts=2021-08-12T19:17:27.271730275Z caller=email.go:147 msg=“failed to close SMTP connection” err=“write tcp 10.30.xx.1:53159->10.30.xx.2:25: use of closed network connection”
level=error ts=2021-08-12T19:17:33.715002817Z caller=email.go:147 msg=“failed to close SMTP connection” err=“write tcp 10.30.xx.1:53175->10.30.xx.2:25: use of closed network connection”
level=error ts=2021-08-12T19:17:40.238189504Z caller=email.go:147 msg=“failed to close SMTP connection” err=“write tcp 10.30.xx.1:53197->10.30.xx.2:25: use of closed network connection”
level=error ts=2021-08-12T19:17:48.982331404Z caller=email.go:147 msg=“failed to close SMTP connection” err=“write tcp 10.30.xx.1:53241->10.30.xx.2:25: use of closed network connection”
level=error ts=2021-08-12T19:18:09.190726778Z caller=email.go:147 msg=“failed to close SMTP connection” err=“write tcp 10.30.xx.1:53317->10.30.xx.2:25: use of closed network connection”
level=error ts=2021-08-12T19:18:44.919456324Z caller=email.go:147 msg=“failed to close SMTP connection” err=“write tcp 10.30.xx.1:53441->10.30.xx.2:25: use of closed network connection”
level=error ts=2021-08-12T19:19:24.603175171Z caller=email.go:147 msg=“failed to close SMTP connection” err=“write tcp 10.30.xx.1:53587->10.30.xx.2:25: use of closed network connection”
level=error ts=2021-08-12T19:20:18.152998932Z caller=notify.go:339 component=dispatcher msg=“Error on notify” err="*smtp.plainAuth failed: unencrypt
ed connection"
level=error ts=2021-08-12T19:20:18.153125647Z caller=dispatch.go:264 component=dispatcher msg=“Notify for alerts failed” num_alerts=1 err="*smtp.pla
inAuth failed: unencrypted connection"

看一下 altermanager 的 启动 log 里面参数配置是生效的吗 ?

是生效的,您参考一下这个这两个链接:https://github.com/prometheus/alertmanager/issues/1358,https://github.com/umputun/remark42/pull/681 看看是不是这个问题

  1. 需要知道你 alertmanager 的配置,不然不好帮你排查
  2. 这里的两个报错你都处理过了吗?第一个是证书不匹配,第二个是非加密连接的 plain auth 问题
1 个赞

alertmanager.yml配置
global:

The smarthost and SMTP sender used for mail notifications.

smtp_smarthost: “mail.in.service.test.cn:25
smtp_from: "xxxxx@service.xxx.cn"
smtp_auth_username: “xxxx”
smtp_auth_password: “xxxx”
smtp_require_tls: false
route:

A default receiver

receiver: “db-alert-email”

receivers:

  • name: “db-alert-email”
    email_configs:

第一个问题设置smtp_require_tls: false就解决了,第二个没解决,不知道哪里的问题

两个方法:

  1. smtp_require_tls: true
  • 需要你解决一下证书问题。报错很明显,你用的 SMTP 地址是 mail.in.service.test.cn,但是这个服务用的证书是 mail.service.test.cn 的,匹配不上。
  1. smtp_require_tls: false
  • 不需要你解决证书问题,但是不能用这样的认证方式,你贴的 github issue 讨论的就是这个主题。你仔细读一下。
2 个赞

感谢你的解答
有些没有太明白你上面说的两种解决方案:
1、解决一下证书问题,这个证书问题要怎么解决?
2、我看 github issue 上说明,


这个没有完全理解,麻烦能解释一下吗?

  1. 你可以把地址改成 mail.service.test.cn (然后启用 TLS),毕竟服务端用的证书就是这个域名的
  2. 意思就是要么你启用 TLS,要么SMTP Server 是本地的,不然就不能用 plain auth 认证

方法1不行,mail.service.test.cn会出现连接超时的情况
方法2不可行,SMTP Server非本地的

噢 怪不得你用了一个 mail.in.service.test.cn 地址。
不过无论如何,这都是你内部的,和 SMTP 服务器之间的网络问题,认证问题。不是 TiDB 集群的问题,甚至都谈不上是 alertmanager 的问题,肯定有办法解决的,你可以内部沟通下。

1 个赞

好的,多谢了

此话题已在最后回复的 1 分钟后被自动关闭。不再允许新回复。