一、打造基于Prometheus的全方位监控平台1.1、前言1.2、prometheus架构1.3、prometheus时间序列数据1.3.1、什么是序列数据?1.3.2、时间序列数据特点1.3.3、Promethues适合场景二、部署配置2.1、Prometheus部署2.1.1、创建命名空间2.1.2、创建RBAC规则2.1.3、创建ConfigMap类型的Prometheus配置文件2.1.4、创建ConfigMap类型的prometheus rules配置文件2.1.5、创建prometheus svc2.1.6、创建prometheus deploy2.1.7、创建prometheus ingress实现外部域名访问三、初识Prometheus监控平台四、总结
官网地址:https://prometheus.io/docs/prometheus/latest/getting_started/
Prometheus 的工作原理主要分为五个步骤: 1. 数据采集(Exporters):Prometheus 定期通过HTTP请求从目标资源中拉取数据。目标资源可以是应用程序、系统、服务或其他资源。
时间序列数据(TimeSeries Data):按照时间顺序记录系统、设备状态变化的数据被称为时序数据。
官方数据:Prometheus 有着非常高效的时间序列数据存储方法,每个采样数据仅仅占用 3.5byte 左右空间,上百万条时间序列,30 秒间隔,保留 60 天,大概200多G。
Prometheus 非常适合记录任何纯数字时间序列。它既适合以机器为中心的监控,也适合监控高度动态的面向服务的体系架构。
整个监控体系涉及的技术栈较多,几乎可覆盖真实企业中的所有场景。主要技术栈如下:
部署对外可访问Prometheus:
部署顺序如图下:
xxxxxxxxxx
$ kubectl create namespace monitor
创建RBAC规则,包含ServiceAccount
、ClusterRole
、ClusterRoleBinding
三类YAML文件。
xxxxxxxxxx
apiVersion v1
kind ServiceAccount
metadata
name prometheus
namespace monitor
---
apiVersion rbac.authorization.k8s.io/v1
kind ClusterRole
metadata
name prometheus
rules
apiGroups""
resources"nodes""nodes/proxy""services""endpoints""pods"
verbs"get" "list" "watch"
apiGroups"extensions"
resources"ingress"
verbs"get" "list" "watch"
nonResourceURLs"/metrics"
verbs"get"
---
apiVersion rbac.authorization.k8s.io/v1
kind ClusterRoleBinding
metadata
name prometheus
roleRef
apiGroup rbac.authorization.k8s.io
kind ClusterRole
name cluster-admin
subjects
kind ServiceAccount
name prometheus
namespace monitor
确认验证:
xxxxxxxxxx
$ kubectl get sa prometheus -n monitor
$ kubectl get clusterrole prometheus
$ kubectl get clusterrolebinding prometheus
xxxxxxxxxx
apiVersion v1
kind ConfigMap
metadata
name prometheus-config
namespace monitor
data
prometheus.yml
global:
scrape_interval: 15s
evaluation_interval: 15s
external_labels:
cluster: "kubernetes"
############ 数据采集job ###################
scrape_configs:
- job_name: prometheus
static_configs:
- targets: ['127.0.0.1:9090']
labels:
instance: prometheus
############ 指定告警规则文件路径位置 ###################
rule_files
/etc/prometheus/rules/*.rules
确认验证:
xxxxxxxxxx
$ kubectl get cm prometheus-config -n monitor
使用ConfigMap方式创建prometheus rules配置文件:
包含的内容是两块,分别是general.rules
和node.rules
。使用以下命令创建Prometheus的另外两个配置文件:
xxxxxxxxxx
apiVersion v1
kind ConfigMap
metadata
name prometheus-rules
namespace monitor
data
general.rules
groups:
- name: general.rules
rules:
- alert: InstanceDown
expr: |
up{job=~"k8s-nodes|prometheus"} == 0
for: 1m
labels:
severity: critical
annotations:
summary: "Instance {{ $labels.instance }} 停止工作"
description: "{{ $labels.instance }} 主机名:{{ $labels.hostname }} 已经停止1分钟以上."
node.rules
groups:
- name: node.rules
rules:
- alert: NodeFilesystemUsage
expr: |
100 - (node_filesystem_avail_bytes / node_filesystem_size_bytes) * 100 > 85
for: 1m
labels:
severity: warning
annotations:
summary: "Instance {{ $labels.instance }} : {{ $labels.mountpoint }} 分区使用率过高"
description: "{{ $labels.instance }} 主机名:{{ $labels.hostname }} : {{ $labels.mountpoint }} 分区使用大于85% (当前值: {{ $value }})"
确认验证:
xxxxxxxxxx
$ kubectl get cm -n monitor prometheus-rules
xxxxxxxxxx
apiVersion v1
kind Service
metadata
name prometheus
namespace monitor
labels
k8s-app prometheus
spec
type ClusterIP
ports
name http
port9090
targetPort9090
selector
k8s-app prometheus
由于Prometheus需要对数据进行持久化,以便在重启后能够恢复历史数据。所以这边我们通过早先课程部署的NFS做存储来实现持久化。
当前我们使用NFS提供的StorageClass来做数据存储。
xxxxxxxxxx
apiVersion v1
kind PersistentVolumeClaim
metadata
name prometheus-data-pvc
namespace monitor
spec
accessModes
ReadWriteMany
storageClassName"nfs-storage"
resources
requests
storage 10Gi
Prometheus控制器文件:
xxxxxxxxxx
apiVersion apps/v1
kind Deployment
metadata
name prometheus
namespace monitor
labels
k8s-app prometheus
spec
replicas1
selector
matchLabels
k8s-app prometheus
template
metadata
labels
k8s-app prometheus
spec
serviceAccountName prometheus
containers
name prometheus
image prom/prometheus v2.36.0
imagePullPolicy IfNotPresent
ports
name http
containerPort9090
securityContext
runAsUser65534
privilegedtrue
command
"/bin/prometheus"
args
"--config.file=/etc/prometheus/prometheus.yml"
"--web.enable-lifecycle"
"--storage.tsdb.path=/prometheus"
"--storage.tsdb.retention.time=10d"
"--web.console.libraries=/etc/prometheus/console_libraries"
"--web.console.templates=/etc/prometheus/consoles"
resources
limits
cpu 2000m
memory 2048Mi
requests
cpu 1000m
memory 512Mi
readinessProbe
httpGet
path /-/ready
port9090
initialDelaySeconds5
timeoutSeconds10
livenessProbe
httpGet
path /-/healthy
port9090
initialDelaySeconds30
timeoutSeconds30
volumeMounts
name data
mountPath /prometheus
subPath prometheus
name config
mountPath /etc/prometheus
name prometheus-rules
mountPath /etc/prometheus/rules
name configmap-reload
image jimmidyson/configmap-reload v0.5.0
imagePullPolicy IfNotPresent
args
"--volume-dir=/etc/config"
"--webhook-url=http://localhost:9090/-/reload"
resources
limits
cpu 100m
memory 100Mi
requests
cpu 10m
memory 10Mi
volumeMounts
name config
mountPath /etc/config
readOnlytrue
volumes
name data
persistentVolumeClaim
claimName prometheus-data-pvc
name prometheus-rules
configMap
name prometheus-rules
name config
configMap
name prometheus-config
部署的 Deployment 资源文件中的 containers 部分配置了两个容器,分别是:
上面资源文件中 Prometheus 参数说明:
确认验证:
xxxxxxxxxx
$ kubectl get deploy -n monitor
$ kubectl get pods -n monitor
xxxxxxxxxx
apiVersion networking.k8s.io/v1
kind Ingress
metadata
namespace monitor
name prometheus-ingress
spec
ingressClassName nginx
rules
host prometheus.kubernets.cn
http
paths
pathType Prefix
backend
service
name prometheus
port
number9090
path /
访问验证:
xxxxxxxxxx
# curl prometheus.kubernets.cn
<a href="/graph">Found</a>.
prometheus监控平台: