基于Prometheus的全方位监控平台--告警平台(Alertmanager)高级配置一、基于企业微信的报警媒介二、基于钉钉的报警媒介2.1、dingtalk部署配置2.2、配置alertmanager配置文件configmap三、告警静默3.1、什么是告警静默3.2、如何设置临时静默四、总结
xxxxxxxxxx
apiVersion v1
kind ConfigMap
metadata
name alertmanager-config
namespace monitor
data
alertmanager.yml -
global
resolve_timeout 1m
smtp_smarthost'smtp.exmail.qq.com:465' # 邮箱服务器的SMTP主机配置
smtp_from'zhdya@zhdya.cn' # 发送邮件主题
smtp_auth_username'zhdya@zhdya.cn' # 登录用户名
smtp_auth_password'yfXBjd6J73DwYTjn' # 此处的auth password是邮箱的第三方登录授权密码,而非用户密码
smtp_require_tls false # 有些邮箱需要开启此配置,这里使用的是企微邮箱,仅做测试,不需要开启此功能。
templates
'/etc/alertmanager/*.tmpl'
route
group_by'env''instance''type''group''job''alertname''cluster'
group_wait 10s
group_interval 2m
repeat_interval 10m
receiver'email'
routes
receiver'wechat'
match
severity critical
receivers
name'email'
email_configs
to'zhdyaa@163.com'
send_resolvedtrue
html'{{ template "email.to.html" . }}'
name'wechat'
wechat_configs
corp_id'ww182a29bdbaeXXXc4'
to_party'413'
to_user'@all'
agent_id1000035
api_secret'XXXXS6lb5WRCq2-EoDxoqFXSnBdY3fyocuDP-tc'
send_resolvedtrue
inhibit_rules
source_match
severity'critical'
target_match
severity'warning'
equal'alertname' 'dev' 'instance'
wechat.tmpl -
"wechat.default.message" define
- if gt (len .Alerts.Firing) 0 -
- range $index $alert := .Alerts -
- if eq $index 0
========= 监控报警 =========
告警状态: .Status
告警级别: .Labels.severity
告警类型: $alert.Labels.alertname
故障主机 $alert.Labels.instance
告警主题 $alert.Annotations.summary
告警详情 $alert.Annotations.message $alert.Annotations.description ;
触发阀值: .Annotations.value
故障时间 ($alert.StartsAt.Add 28800e9).Format "2006-01-02 15:04:05"
========= = end = =========
- end
- end
- end
- if gt (len .Alerts.Resolved) 0 -
- range $index $alert := .Alerts -
- if eq $index 0
========= 告警恢复 =========
告警类型: .Labels.alertname
告警状态: .Status
告警主题 $alert.Annotations.summary
告警详情 $alert.Annotations.message $alert.Annotations.description ;
故障时间 ($alert.StartsAt.Add 28800e9).Format "2006-01-02 15:04:05"
恢复时间 ($alert.EndsAt.Add 28800e9).Format "2006-01-02 15:04:05"
- if gt (len $alert.Labels.instance) 0
实例信息 $alert.Labels.instance
- end
========= = end = =========
- end
- end
- end
- end
email.tmpl -
"email.from" xxx.com end define
"email.to" xxx.com end define
"email.to.html" define
- if gt (len .Alerts.Firing) 0 -
range .Alerts
========= 监控报警 =========<br>
告警程序 prometheus_alert <br>
告警级别 .Labels.severity <br>
告警类型 .Labels.alertname <br>
告警主机 .Labels.instance <br>
告警主题 .Annotations.summary <br>
告警详情 .Annotations.description <br>
触发时间 .StartsAt.Format "2006-01-02 15:04:05" <br>
========= = end = =========<br>
end end -
- if gt (len .Alerts.Resolved) 0 -
range .Alerts
========= 告警恢复 =========<br>
告警程序 prometheus_alert <br>
告警级别 .Labels.severity <br>
告警类型 .Labels.alertname <br>
告警主机 .Labels.instance <br>
告警主题 .Annotations.summary <br>
告警详情 .Annotations.description <br>
触发时间 .StartsAt.Format "2006-01-02 15:04:05" <br>
恢复时间 .EndsAt.Format "2006-01-02 15:04:05" <br>
========= = end = =========<br>
end end -
- end
测试验证:
xxxxxxxxxx
## 匹配如上webhook标签:hostname:zhdya
$ curl -XPOST -H 'Content-Type: application/json' http://alertmanager.kubernets.cn/api/v1/alerts -d '[{"labels":{"severity":"critical"},"annotations":{"summary":"This is a test alert"}}]'
告警触发:
钉钉-机器人管理(复制生成的webhook)
自定义机器人安全设置 - 钉钉开放平台 (dingtalk.com)
xxxxxxxxxx
cat << EOF > dingtalk-webhook.yaml
apiVersion apps/v1
kind Deployment
metadata
labels
run dingtalk
name webhook-dingtalk
namespace monitor
spec
replicas1
selector
matchLabels
run dingtalk
template
metadata
labels
run dingtalk
spec
containers
name dingtalk
image timonwong/prometheus-webhook-dingtalk v1.4.0
imagePullPolicy IfNotPresent
args
--ding.profile=webhook1=https://oapi.dingtalk.com/robot/send?access_token=<替换成你的token>
ports
containerPort8060
protocol TCP
---
apiVersion v1
kind Service
metadata
labels
run dingtalk
name webhook-dingtalk
namespace monitor
spec
ports
port8060
protocol TCP
targetPort8060
selector
run dingtalk
sessionAffinity None
EOF
xxxxxxxxxx
route
group_by'env''instance''type''group''job''alertname''cluster'
group_wait 10s
group_interval 2m
repeat_interval 10m
receiver'email'
routes
receiver'wechat'
match
severity critical
receiver'webhook' ## 新增告警receiver通道
match
hostname zhdya
receivers
name'email'
email_configs
to'zhdyaa@163.com'
send_resolvedtrue
html'{{ template "email.to.html" . }}'
name'wechat'
wechat_configs
corp_id'ww187a29bdbaececc4'
to_party'413'
to_user'@all'
agent_id1000035
api_secret'IVRfzG15S6lb5WRCq2-EoDxoqFXSnBdY3fyocuDP-tc'
send_resolvedtrue
name'webhook' ## 配置接收告警的媒介
webhook_configs
url'http://webhook-dingtalk.monitor.svc.cluster.local:8060/dingtalk/webhook1/send'
send_resolvedtrue
测试验证:
xxxxxxxxxx
## 匹配如上webhook标签:hostname:zhdya
$ curl -XPOST -H 'Content-Type: application/json' http://alertmanager.kubernets.cn/api/v1/alerts -d '[{"labels":{"hostname":"zhdya"},"annotations":{"summary":"This is a test alert"}}]'
静默 Silences
:指让通过设置让警报在指定时间暂时不会发送警报的一种方式。