基于Prometheus的全方位监控平台--黑盒监控Blackbox一、Blackbox Exporter 部署二、DNS 监控三、ICMP监控四、HTTP 监控(K8S 内部发现方法)4.1、自定义发现 Service
监控 端口
和 路径
4.2、TCP检测4.3、Ingress服务的探测(视频中未讲到,扩展+)五、HTTP 监控(监控外部域名)六、HTTP Post 监控(监控外部域名)七、小结
Prometheus
监控分为两种:
白盒监控:是指我们日常监控主机的资源用量、容器的运行状态的运行数据。
黑盒监控:常见的黑盒监控包括 HTTP探针
、TCP探针
、Dns
、Icmp
等用于检测站点、服务的可访问性、服务的连通性,以及访问效率等。
两者比较:
一个完善的监控目标是要能够从白盒的角度发现潜在问题,能够在黑盒的角度快速发现已经发生的问题。
目前支持的应用场景:
ICMP 测试
TCP 测试
HTTP 测试
POST 测试
SSL 证书过期时间
Exporter Configmap
定义,可以参考下面两个链接
https://github.com/prometheus/blackbox_exporter/blob/master/CONFIGURATION.md
https://github.com/prometheus/blackbox_exporter/blob/master/example.yml
首先得声明一个 Blackbox 的 Deployment,并利用 Configmap 来为 Blackbox 提供配置文件。
Configmap:
参考 BlackBox Exporter 的 Github 提供的 示例配置文件
xxxxxxxxxx
apiVersion: v1
kind: ConfigMap
metadata:
name: blackbox-exporter
namespace: monitor
labels:
app: blackbox-exporter
data:
blackbox.yml: |-
modules:
## ----------- DNS 检测配置 -----------
dns_tcp:
prober: dns
dns:
transport_protocol: "tcp"
preferred_ip_protocol: "ip4"
query_name: "kubernetes.default.svc.cluster.local" # 用于检测域名可用的网址
query_type: "A"
## ----------- TCP 检测模块配置 -----------
tcp_connect:
prober: tcp
timeout: 5s
## ----------- ICMP 检测配置 -----------
icmp:
prober: icmp
timeout: 5s
icmp:
preferred_ip_protocol: "ip4"
## ----------- HTTP GET 2xx 检测模块配置 -----------
http_get_2xx:
prober: http
timeout: 10s
http:
method: GET
preferred_ip_protocol: "ip4"
valid_http_versions: ["HTTP/1.1","HTTP/2"]
valid_status_codes: [200] # 验证的HTTP状态码,默认为2xx
no_follow_redirects: false # 是否不跟随重定向
## ----------- HTTP GET 3xx 检测模块配置 -----------
http_get_3xx:
prober: http
timeout: 10s
http:
method: GET
preferred_ip_protocol: "ip4"
valid_http_versions: ["HTTP/1.1","HTTP/2"]
valid_status_codes: [301,302,304,305,306,307] # 验证的HTTP状态码,默认为2xx
no_follow_redirects: false # 是否不跟随重定向
## ----------- HTTP POST 监测模块 -----------
http_post_2xx:
prober: http
timeout: 10s
http:
method: POST
preferred_ip_protocol: "ip4"
valid_http_versions: ["HTTP/1.1", "HTTP/2"]
#headers: # HTTP头设置
# Content-Type: application/json
#body: '{}' # 请求体设置
Deployment:
xxxxxxxxxx
apiVersion v1
kind Service
metadata
name blackbox-exporter
namespace monitor
labels
k8s-app blackbox-exporter
spec
type ClusterIP
ports
name http
port9115
targetPort9115
selector
k8s-app blackbox-exporter
---
apiVersion apps/v1
kind Deployment
metadata
name blackbox-exporter
namespace monitor
labels
k8s-app blackbox-exporter
spec
replicas1
selector
matchLabels
k8s-app blackbox-exporter
template
metadata
labels
k8s-app blackbox-exporter
spec
containers
name blackbox-exporter
image prom/blackbox-exporter v0.21.0
imagePullPolicy IfNotPresent
args
--config.file=/etc/blackbox_exporter/blackbox.yml
--web.listen-address=:9115
--log.level=info
ports
name http
containerPort9115
resources
limits
cpu 200m
memory 256Mi
requests
cpu 100m
memory 50Mi
livenessProbe
tcpSocket
port9115
initialDelaySeconds5
timeoutSeconds5
periodSeconds10
successThreshold1
failureThreshold3
readinessProbe
tcpSocket
port9115
initialDelaySeconds5
timeoutSeconds5
periodSeconds10
successThreshold1
failureThreshold3
volumeMounts
name config
mountPath /etc/blackbox_exporter
volumes
name config
configMap
name blackbox-exporter
defaultMode420
xxxxxxxxxx
# 部署
$ kubectl apply -f blackbox-configmap.yaml
$ kubectl apply -f blackbox-exporter.yaml
# 查看部署后的资源
$ kg all -nmonitor |grep blackbox
定义 BlackBox 在 Prometheus 抓取设置
下面抓取设置,都存放在
prometheus-config.yaml
文件中,设置可参考
xxxxxxxxxx
job_name"kubernetes-dns"
metrics_path /probe # 不是 metrics,是 probe
params
module dns_tcp # 使用 DNS TCP 模块
static_configs
targets
# 不要省略端口号 kube-dns.kube-system:53
8.8.4.4:53
8.8.8.8:53
223.5.5.5:53
relabel_configs
source_labels __address__
target_label __param_target
source_labels __param_target
target_label instance
target_label __address__
replacement blackbox-exporter.monitor 9115 # 服务地址,和上面的 Service 定义保持一致
参数解释:
xxxxxxxxxx
################ DNS 服务器监控 ###################
- job_name: "kubernetes-dns"
metrics_path: /probe
params:
## 配置要使用的模块,要与blackbox exporter配置中的一致
## 这里使用DNS模块
module: [dns_tcp]
static_configs:
## 配置要检测的地址
- targets:
- kube-dns.kube-system:53
- 8.8.4.4:53
- 8.8.8.8:53
- 223.5.5.5
relabel_configs:
## 将上面配置的静态DNS服务器地址转换为临时变量 “__param_target”
- source_labels: [__address__]
target_label: __param_target
## 将 “__param_target” 内容设置为 instance 实例名称
- source_labels: [__param_target]
target_label: instance
## BlackBox Exporter 的 Service 地址
- target_label: __address__
replacement: blackbox-exporter.monitor:9115
更新 prometheus-config.yaml
配置 :
xxxxxxxxxx
curl -XPOST http://prometheus.kubernets.cn/-/reload
打开 Prometheus 的 Target 页面,就会看到 上面定义的 blackbox-k8s-service-dns
任务;
graph 页面,可以使用 probe_success
和 probe_duration_seconds
等来检查历史结果。
xxxxxxxxxx
- job_name: icmp-status
metrics_path: /probe
params:
module: [icmp]
static_configs:
- targets:
- 192.10.192.222
labels:
group: icmp
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: blackbox-exporter.monitor:9115
按上面方法重载 Prometheus,打开 Prometheus 的 Target 页面,就会看到 上面定义的 blackbox-k8s-http-services
任务
xxxxxxxxxx
curl -XPOST http://prometheus.kubernets.cn/-/reload
Service
监控 端口
和 路径
可以如下设置:
xxxxxxxxxx
job_name'kubernetes-services'
metrics_path /probe
params
module## 使用HTTP_GET_2xx与HTTP_GET_3XX模块
"http_get_2xx"
"http_get_3xx"
kubernetes_sd_configs## 使用Kubernetes动态服务发现,且使用Service类型的发现
role service
relabel_configs## 设置只监测Kubernetes Service中Annotation里配置了注解prometheus.io/http_probe: true的service
action keep
source_labels __meta_kubernetes_service_annotation_prometheus_io_http_probe
regex"true"
action replace
source_labels
"__meta_kubernetes_service_name"
"__meta_kubernetes_namespace"
"__meta_kubernetes_service_annotation_prometheus_io_http_probe_port"
"__meta_kubernetes_service_annotation_prometheus_io_http_probe_path"
target_label __param_target
regex (.+);(.+);(.+);(.+)
replacement $1.$2 $3$4
target_label __address__
replacement blackbox-exporter.monitor 9115 ## BlackBox Exporter 的 Service 地址
source_labels __param_target
target_label instance
action labelmap
regex __meta_kubernetes_service_label_(.+)
source_labels __meta_kubernetes_namespace
target_label kubernetes_namespace
source_labels __meta_kubernetes_service_name
target_label kubernetes_name
然后,需要在 Service
中配置这样的 annotation
:
xxxxxxxxxx
annotations
prometheus.io/http-probe"true" ## 开启 HTTP 探针
prometheus.io/http-probe-port"8080" ## HTTP 探针会使用 8080 端口来进行探测
prometheus.io/http-probe-path"/healthCheck" ## HTTP 探针会请求 /healthCheck 路径来进行探测,以检查应用程序是否正常运行
示例:Java应用的svc:
xxxxxxxxxx
apiVersion: v1
kind: Service
metadata:
name: springboot
annotations:
prometheus.io/http-probe: "true"
prometheus.io/http-probe-port: "8080"
prometheus.io/http-probe-path: "/apptwo"
spec:
type: ClusterIP
selector:
app: springboot
ports:
- name: http
port: 8080
protocol: TCP
targetPort: 8080
按上面方法重载 Prometheus,打开 Prometheus 的 Target 页面,就会看到 上面定义的 blackbox-k8s-http-services
任务
xxxxxxxxxx
curl -XPOST http://prometheus.kubernets.cn/-/reload
xxxxxxxxxx
job_name"service-tcp-probe"
scrape_interval 1m
metrics_path /probe
# 使用blackbox exporter配置文件的tcp_connect的探针
params
module tcp_connect
kubernetes_sd_configs
role service
relabel_configs
# 保留prometheus.io/scrape: "true"和prometheus.io/tcp-probe: "true"的service
source_labels __meta_kubernetes_service_annotation_prometheus_io_scrape __meta_kubernetes_service_annotation_prometheus_io_tcp_probe
action keep
regex true;true
# 将原标签名__meta_kubernetes_service_name改成service_name
source_labels __meta_kubernetes_service_name
action replace
regex (.*)
target_label service_name
# 将原标签名__meta_kubernetes_service_name改成service_name
source_labels __meta_kubernetes_namespace
action replace
regex (.*)
target_label namespace
# 将instance改成 `clusterIP:port` 地址
source_labels __meta_kubernetes_service_cluster_ip __meta_kubernetes_service_annotation_prometheus_io_http_probe_port
action replace
regex (.*);(.*)
target_label __param_target
replacement $1 $2
source_labels __param_target
target_label instance
# 将__address__的值改成 `blackbox-exporter.monitor:9115`
target_label __address__
replacement blackbox-exporter.monitor9115
按上面方法重载 Prometheus,打开 Prometheus 的 Target 页面,就会看到 上面定义的 service-tcp-probe
任务
xxxxxxxxxx
curl -XPOST http://prometheus.kubernets.cn/-/reload
则需要在service上添加注释必须有以下三行
xxxxxxxxxx
annotations
prometheus.io/scrape"true" ## 这个服务是可以被采集指标的,Prometheus 可以对这个服务进行数据采集
prometheus.io/tcp-probe"true" ## 开启 TCP 探针
prometheus.io/http-probe-port"8080" ## HTTP 探针会使用 8080 端口来进行探测,以检查应用程序是否正常运行
示例:Java应用的svc:
xxxxxxxxxx
apiVersion: v1
kind: Service
metadata:
name: springboot
annotations:
prometheus.io/scrape: "true"
prometheus.io/tcp-probe: "true"
prometheus.io/http-probe-port: "8080"
spec:
type: ClusterIP
selector:
app: springboot
ports:
- name: http
port: 8080
protocol: TCP
targetPort: 8080
xxxxxxxxxx
- job_name: 'blackbox-k8s-ingresses'
scrape_interval: 30s
scrape_timeout: 10s
metrics_path: /probe
params:
module: [http_get_2xx] # 使用定义的http模块
kubernetes_sd_configs:
- role: ingress # ingress 类型的服务发现
relabel_configs:
# 只有ingress的annotation中配置了 prometheus.io/http_probe=true 的才进行发现
- source_labels: [__meta_kubernetes_ingress_annotation_prometheus_io_http_probe]
action: keep
regex: true
- source_labels: [__meta_kubernetes_ingress_scheme,__address__,__meta_kubernetes_ingress_path]
regex: (.+);(.+);(.+)
replacement: ${1}://${2}${3}
target_label: __param_target
- target_label: __address__
replacement: blackbox-exporter.monitor:9115
- source_labels: [__param_target]
target_label: instance
- action: labelmap
regex: __meta_kubernetes_ingress_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_ingress_name]
target_label: kubernetes_name
则需要在ingress上添加注释必须有以下三行
xxxxxxxxxx
annotations
prometheus.io/http_probe"true"
prometheus.io/http-probe-port'8080'
prometheus.io/http-probe-path'/healthz'
示例:Java应用的ing:
xxxxxxxxxx
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: java-ingress-nginx
namespace: default
annotations:
prometheus.io/http_probe: "true"
prometheus.io/http-probe-port: '8080'
prometheus.io/http-probe-path: '/apptwo'
spec:
ingressClassName: nginx
rules:
- host: java.kubernets.cn
http:
paths:
- pathType: Prefix
backend:
service:
name: springboot
port:
number: 8080
path: /
xxxxxxxxxx
job_name"blackbox-external-website"
scrape_interval 30s
scrape_timeout 15s
metrics_path /probe
params
module http_get_2xx
static_configs
targets
# 改为公司对外服务的域名 https://www.baidu.com
https://www.jd.com
relabel_configs
source_labels __address__
target_label __param_target
source_labels __param_target
target_label instance
target_label __address__
replacement blackbox-exporter.monitor9115
按上面方法重载 Prometheus,打开 Prometheus 的 Target 页面,就会看到 上面定义的 blackbox-external-website
任务
xxxxxxxxxx
curl -XPOST http://prometheus.kubernets.cn/-/reload
xxxxxxxxxx
job_name'blackbox-http-post'
metrics_path /probe
params
module http_post_2xx
static_configs
targets
# 要检查的网址 https://www.example.com/api
relabel_configs
source_labels __address__
target_label __param_target
source_labels __param_target
target_label instance
target_label __address__
replacement blackbox-exporter.monitor9115
按上面方法重载 Prometheus,打开 Prometheus 的 Target 页面,就会看到 上面定义的 blackbox-http-post
任务
xxxxxxxxxx
curl -XPOST http://prometheus.kubernets.cn/-/reload