1. 首页 > 软件安装教程 > 正文

Prometheus安装配置-Prometheus监控系统安装配置_升级迁移详细过程

1. Prometheus概述与环境规划

Prometheus是一个开源的监控系统和时间序列数据库,用于监控系统和应用程序的性能和健康状态。Prometheus基于拉取模式,通过HTTP协议从目标系统收集指标数据。更多学习教程www.fgedu.net.cn

1.1 Prometheus版本说明

Prometheus目前主要版本为2.x系列,本教程以Prometheus 2.44.0为例进行详细讲解。Prometheus 2.x版本相比之前版本在性能、稳定性和功能方面都有显著提升,支持更多的监控特性。

# 查看Prometheus版本
$ prometheus –version
prometheus, version 2.44.0 (branch: HEAD, revision: 734b0952f4c3a96b8586e05e661312425a1a05b0)
build user: root@1a2b3c4d5e6f
build date: 2023-04-05T15:33:19Z
go version: go1.19.6
platform: linux/amd64

# 查看系统版本
$ cat /etc/os-release
NAME=”Oracle Linux Server”
VERSION=”8.9″
ID=”ol”
PRETTY_NAME=”Oracle Linux Server 8.9″

# 查看内核版本
$ uname -r
5.4.17-2136.302.7.2.el8uek.x86_64

1.2 环境规划

本次安装环境规划如下:

监控服务器:
monitor01.fgedu.net.cn (192.168.1.81) – Prometheus主节点
monitor02.fgedu.net.cn (192.168.1.82) – Prometheus备用节点

Prometheus版本:2.44.0
AlertManager版本:0.25.0
Grafana版本:9.5.2
安装方式:二进制安装
数据存储:本地文件系统 + NFS共享存储

2. 硬件环境要求

Prometheus作为监控系统,对硬件资源要求根据监控目标数量和数据保留时间而定。学习交流加群风哥微信: itpux-com

2.1 物理主机环境要求

# 监控服务器要求
– CPU:至少8核
– 内存:至少32GB
– 磁盘:系统盘120GB SSD + 数据盘1TB SSD

# 检查监控服务器资源
# free -h
total used free shared buff/cache available
Mem: 32G 8.4G 22G 512M 3.6G 23G
Swap: 8G 0B 8G

# 检查磁盘空间
# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda1 120G 20G 100G 17% /
/dev/sdb1 1TB 50G 950G 5% /data

生产环境建议:监控服务器至少2个,实现高可用。建议使用SSD存储以提高I/O性能。网络带宽建议10Gbps以上,以支持大量指标数据传输。

2.2 vSphere虚拟主机环境要求

虚拟机配置:
– 监控服务器:
– vCPU:8核
– 内存:32GB
– 磁盘:系统盘120GB SSD + 数据盘1TB SSD
– 网络:VMXNET3网卡,10Gbps网络

资源池配置:
– CPU预留:4GHz
– 内存预留:16GB
– 内存限制:32GB
– CPU份额:正常
– 内存份额:正常

2.3 云平台主机环境要求

云主机规格(阿里云/腾讯云/华为云):
– 监控服务器:
– 实例规格:ecs.g6.4xlarge或同等规格
– vCPU:16核
– 内存:64GB
– 系统盘:SSD云盘 120GB
– 数据盘:SSD云盘 1TB
– 网络带宽:10Gbps以上

存储配置:
– OSS对象存储:用于存储监控数据备份
– NAS文件存储:用于共享监控数据
– 云盘快照:定期备份监控数据

3. 操作系统环境准备

在安装Prometheus之前,需要对操作系统进行必要的配置和优化。

3.1 操作系统版本检查

# 检查操作系统版本
# cat /etc/os-release
NAME=”Oracle Linux Server”
VERSION=”8.9″
ID=”ol”
PRETTY_NAME=”Oracle Linux Server 8.9″

# 检查内核版本
# uname -r
5.4.17-2136.302.7.2.el8uek.x86_64

# 检查SELinux状态
# getenforce
Enforcing

# 检查防火墙状态
# systemctl status firewalld
● firewalld.service – firewalld – dynamic firewall daemon
Loaded: loaded (/usr/lib/systemd/system/firewalld.service; enabled; vendor preset: enabled)
Active: active (running)

3.2 依赖服务安装

# 安装依赖包
# dnf install -y wget curl tar gzip

# 关闭防火墙
# systemctl stop firewalld
# systemctl disable firewalld

# 关闭SELinux
# setenforce 0
# sed -i ‘s/SELINUX=enforcing/SELINUX=disabled/’ /etc/selinux/config

# 创建Prometheus用户
# useradd -r -s /bin/false prometheus

# 创建目录结构
# mkdir -p /data/prometheus/{data,config,bin}
# chown -R prometheus:prometheus /data/prometheus

3.3 网络配置

# 配置静态IP
# vi /etc/sysconfig/network-scripts/ifcfg-ens33
TYPE=Ethernet
BOOTPROTO=static
NAME=ens33
DEVICE=ens33
ONBOOT=yes
IPADDR=192.168.1.81
NETMASK=255.255.255.0
GATEWAY=192.168.1.1
DNS1=114.114.114.114

# 重启网络
# systemctl restart NetworkManager

# 验证网络
# ping -c 4 google.com

4. Prometheus安装配置

完成环境准备后,开始安装Prometheus。

4.1 安装Prometheus

# 下载Prometheus
# wget https://github.com/prometheus/prometheus/releases/download/v2.44.0/prometheus-2.44.0.linux-amd64.tar.gz

# 解压文件
# tar -xzf prometheus-2.44.0.linux-amd64.tar.gz
# mv prometheus-2.44.0.linux-amd64/{prometheus,promtool} /data/prometheus/bin/

# 创建配置文件
# vi /data/prometheus/config/prometheus.yml
global:
scrape_interval: 15s
evaluation_interval: 15s

rule_files:
– “rules/*.yml”

alerting:
alertmanagers:
– static_configs:
– targets:
– localhost:9093

scrape_configs:
– job_name: ‘prometheus’
static_configs:
– targets: [‘localhost:9090’]

– job_name: ‘node’
static_configs:
– targets: [‘localhost:9100’]

# 创建规则目录
# mkdir -p /data/prometheus/config/rules

# 创建systemd服务文件
# vi /etc/systemd/system/prometheus.service
[Unit]
Description=Prometheus Monitoring System
After=network.target

[Service]
User=prometheus
ExecStart=/data/prometheus/bin/prometheus \
–config.file=/data/prometheus/config/prometheus.yml \
–storage.tsdb.path=/data/prometheus/data \
–storage.tsdb.retention.time=15d \
–web.console.templates=/data/prometheus/bin/consoles \
–web.console.libraries=/data/prometheus/bin/console_libraries \
–web.listen-address=:9090
Restart=always
RestartSec=10

[Install]
WantedBy=multi-user.target

# 启动Prometheus
# systemctl daemon-reload
# systemctl start prometheus
# systemctl enable prometheus

# 验证安装
# systemctl status prometheus
# curl http://localhost:9090/metrics

4.2 安装Node Exporter

# 下载Node Exporter
# wget https://github.com/prometheus/node_exporter/releases/download/v1.5.0/node_exporter-1.5.0.linux-amd64.tar.gz

# 解压文件
# tar -xzf node_exporter-1.5.0.linux-amd64.tar.gz
# mv node_exporter-1.5.0.linux-amd64/node_exporter /data/prometheus/bin/

# 创建systemd服务文件
# vi /etc/systemd/system/node_exporter.service
[Unit]
Description=Node Exporter
After=network.target

[Service]
User=prometheus
ExecStart=/data/prometheus/bin/node_exporter
Restart=always
RestartSec=10

[Install]
WantedBy=multi-user.target

# 启动Node Exporter
# systemctl daemon-reload
# systemctl start node_exporter
# systemctl enable node_exporter

# 验证安装
# systemctl status node_exporter
# curl http://localhost:9100/metrics

4.3 安装AlertManager

# 下载AlertManager
# wget https://github.com/prometheus/alertmanager/releases/download/v0.25.0/alertmanager-0.25.0.linux-amd64.tar.gz

# 解压文件
# tar -xzf alertmanager-0.25.0.linux-amd64.tar.gz
# mv alertmanager-0.25.0.linux-amd64/{alertmanager,amtool} /data/prometheus/bin/

# 创建配置文件
# vi /data/prometheus/config/alertmanager.yml
global:
resolve_timeout: 5m

route:
group_by: [‘alertname’]
group_wait: 30s
group_interval: 5m
repeat_interval: 1h
receiver: ’email’

receivers:
– name: ’email’
email_configs:
– to: ‘admin@fgedu.net.cn’
from: ‘prometheus@fgedu.net.cn’
smarthost: ‘smtp.fgedu.net.cn:25’
auth_username: ‘prometheus’
auth_password: ‘password’

# 创建systemd服务文件
# vi /etc/systemd/system/alertmanager.service
[Unit]
Description=AlertManager
After=network.target

[Service]
User=prometheus
ExecStart=/data/prometheus/bin/alertmanager \
–config.file=/data/prometheus/config/alertmanager.yml \
–storage.path=/data/prometheus/data/alertmanager
Restart=always
RestartSec=10

[Install]
WantedBy=multi-user.target

# 启动AlertManager
# systemctl daemon-reload
# systemctl start alertmanager
# systemctl enable alertmanager

# 验证安装
# systemctl status alertmanager
# curl http://localhost:9093/metrics

5. Prometheus配置优化

为了提高Prometheus的性能和稳定性,需要进行一些配置优化。

5.1 存储配置优化

# 编辑Prometheus配置
# vi /data/prometheus/config/prometheus.yml
global:
scrape_interval: 15s
evaluation_interval: 15s

rule_files:
– “rules/*.yml”

alerting:
alertmanagers:
– static_configs:
– targets:
– localhost:9093

scrape_configs:
– job_name: ‘prometheus’
static_configs:
– targets: [‘localhost:9090’]

– job_name: ‘node’
static_configs:
– targets: [‘localhost:9100’]

# 编辑systemd服务文件
# vi /etc/systemd/system/prometheus.service
[Unit]
Description=Prometheus Monitoring System
After=network.target

[Service]
User=prometheus
ExecStart=/data/prometheus/bin/prometheus \
–config.file=/data/prometheus/config/prometheus.yml \
–storage.tsdb.path=/data/prometheus/data \
–storage.tsdb.retention.time=15d \
–storage.tsdb.wal-compression \
–web.console.templates=/data/prometheus/bin/consoles \
–web.console.libraries=/data/prometheus/bin/console_libraries \
–web.listen-address=:9090
Restart=always
RestartSec=10

[Install]
WantedBy=multi-user.target

# 重启Prometheus
# systemctl daemon-reload
# systemctl restart prometheus

5.2 高可用配置

# 在备用节点安装Prometheus
# 重复主节点的安装步骤

# 配置主节点Prometheus
# vi /data/prometheus/config/prometheus.yml
global:
scrape_interval: 15s
evaluation_interval: 15s

rule_files:
– “rules/*.yml”

alerting:
alertmanagers:
– static_configs:
– targets:
– localhost:9093
– monitor02.fgedu.net.cn:9093

scrape_configs:
– job_name: ‘prometheus’
static_configs:
– targets: [‘localhost:9090’, ‘monitor02.fgedu.net.cn:9090’]

– job_name: ‘node’
static_configs:
– targets: [‘localhost:9100’, ‘monitor02.fgedu.net.cn:9100’]

# 配置备用节点Prometheus
# vi /data/prometheus/config/prometheus.yml
global:
scrape_interval: 15s
evaluation_interval: 15s

rule_files:
– “rules/*.yml”

alerting:
alertmanagers:
– static_configs:
– targets:
– localhost:9093
– monitor01.fgedu.net.cn:9093

scrape_configs:
– job_name: ‘prometheus’
static_configs:
– targets: [‘localhost:9090’, ‘monitor01.fgedu.net.cn:9090’]

– job_name: ‘node’
static_configs:
– targets: [‘localhost:9100’, ‘monitor01.fgedu.net.cn:9100’]

# 重启Prometheus
# systemctl restart prometheus

5.3 内存配置

# 编辑systemd服务文件
# vi /etc/systemd/system/prometheus.service
[Unit]
Description=Prometheus Monitoring System
After=network.target

[Service]
User=prometheus
Environment=”GODEBUG=madvdontneed=1″
ExecStart=/data/prometheus/bin/prometheus \
–config.file=/data/prometheus/config/prometheus.yml \
–storage.tsdb.path=/data/prometheus/data \
–storage.tsdb.retention.time=15d \
–storage.tsdb.wal-compression \
–web.console.templates=/data/prometheus/bin/consoles \
–web.console.libraries=/data/prometheus/bin/console_libraries \
–web.listen-address=:9090
Restart=always
RestartSec=10

[Install]
WantedBy=multi-user.target

# 重启Prometheus
# systemctl daemon-reload
# systemctl restart prometheus

6. Prometheus Exporter配置

Prometheus通过Exporter收集各种系统和应用的指标数据。

6.1 Node Exporter配置

# 编辑Node Exporter服务文件
# vi /etc/systemd/system/node_exporter.service
[Unit]
Description=Node Exporter
After=network.target

[Service]
User=prometheus
ExecStart=/data/prometheus/bin/node_exporter \
–collector.systemd \
–collector.processes \
–collector.diskstats \
–collector.filesystem \
–collector.netstat \
–collector.loadavg \
–collector.meminfo \
–collector.cpu
Restart=always
RestartSec=10

[Install]
WantedBy=multi-user.target

# 重启Node Exporter
# systemctl daemon-reload
# systemctl restart node_exporter

6.2 Blackbox Exporter配置

# 下载Blackbox Exporter
# wget https://github.com/prometheus/blackbox_exporter/releases/download/v0.23.0/blackbox_exporter-0.23.0.linux-amd64.tar.gz

# 解压文件
# tar -xzf blackbox_exporter-0.23.0.linux-amd64.tar.gz
# mv blackbox_exporter-0.23.0.linux-amd64/blackbox_exporter /data/prometheus/bin/

# 创建配置文件
# vi /data/prometheus/config/blackbox.yml
modules:
http_2xx:
prober: http
timeout: 5s
http:
valid_http_versions: [“HTTP/1.1”, “HTTP/2”]
valid_status_codes: [200, 201, 202, 203, 204, 205, 206, 207, 208, 226]

tcp_connect:
prober: tcp
timeout: 5s

icmp:
prober: icmp
timeout: 5s

# 创建systemd服务文件
# vi /etc/systemd/system/blackbox_exporter.service
[Unit]
Description=Blackbox Exporter
After=network.target

[Service]
User=prometheus
ExecStart=/data/prometheus/bin/blackbox_exporter \
–config.file=/data/prometheus/config/blackbox.yml \
–web.listen-address=:9115
Restart=always
RestartSec=10

[Install]
WantedBy=multi-user.target

# 启动Blackbox Exporter
# systemctl daemon-reload
# systemctl start blackbox_exporter
# systemctl enable blackbox_exporter

# 配置Prometheus
# vi /data/prometheus/config/prometheus.yml
scrape_configs:
– job_name: ‘blackbox’
metrics_path: /probe
params:
module: [http_2xx]
static_configs:
– targets:
– http://prometheus.io
– http://grafana.com
relabel_configs:
– source_labels: [__address__]
target_label: __param_target
– source_labels: [__param_target]
target_label: instance
– target_label: __address__
replacement: localhost:9115

6.3 MySQL Exporter配置

# 下载MySQL Exporter
# wget https://github.com/prometheus/mysqld_exporter/releases/download/v0.14.0/mysqld_exporter-0.14.0.linux-amd64.tar.gz

# 解压文件
# tar -xzf mysqld_exporter-0.14.0.linux-amd64.tar.gz
# mv mysqld_exporter-0.14.0.linux-amd64/mysqld_exporter /data/prometheus/bin/

# 创建MySQL用户
# mysql -u root -p
CREATE USER ‘exporter’@’localhost’ IDENTIFIED BY ‘password’ WITH MAX_USER_CONNECTIONS 3;
GRANT PROCESS, REPLICATION CLIENT, SELECT ON *.* TO ‘exporter’@’localhost’;
FLUSH PRIVILEGES;
EXIT;

# 创建配置文件
# vi /data/prometheus/config/.my.cnf
[client]
user=exporter
password=password

# 创建systemd服务文件
# vi /etc/systemd/system/mysqld_exporter.service
[Unit]
Description=MySQL Exporter
After=network.target

[Service]
User=prometheus
ExecStart=/data/prometheus/bin/mysqld_exporter \
–config.my-cnf=/data/prometheus/config/.my.cnf \
–web.listen-address=:9104
Restart=always
RestartSec=10

[Install]
WantedBy=multi-user.target

# 启动MySQL Exporter
# systemctl daemon-reload
# systemctl start mysqld_exporter
# systemctl enable mysqld_exporter

# 配置Prometheus
# vi /data/prometheus/config/prometheus.yml
scrape_configs:
– job_name: ‘mysql’
static_configs:
– targets: [‘localhost:9104’]

7. AlertManager配置

AlertManager用于处理Prometheus产生的告警,并将告警发送到指定的接收渠道。

7.1 AlertManager配置

# 编辑AlertManager配置
# vi /data/prometheus/config/alertmanager.yml
global:
resolve_timeout: 5m
smtp_smarthost: ‘smtp.fgedu.net.cn:25’
smtp_from: ‘prometheus@fgedu.net.cn’
smtp_auth_username: ‘prometheus’
smtp_auth_password: ‘password’

route:
group_by: [‘alertname’, ‘cluster’, ‘service’]
group_wait: 30s
group_interval: 5m
repeat_interval: 1h
receiver: ’email’
routes:
– match:
severity: critical
receiver: ’email’

receivers:
– name: ’email’
email_configs:
– to: ‘admin@fgedu.net.cn’
send_resolved: true

– name: ‘slack’
slack_configs:
– api_url: ‘https://hooks.slack.com/services/T00000000/B00000000/XXXXXXXXXXXXXXXXXXXXXXXX’
channel: ‘#alerts’
send_resolved: true

inhibit_rules:
– source_match:
severity: ‘critical’
target_match:
severity: ‘warning’
equal: [‘alertname’, ‘cluster’, ‘service’]

# 重启AlertManager
# systemctl restart alertmanager

7.2 告警规则配置

# 创建告警规则文件
# vi /data/prometheus/config/rules/node-alerts.yml
groups:
– name: node-alerts
rules:
– alert: NodeDown
expr: up{job=”node”} == 0
for: 5m
labels:
severity: critical
annotations:
summary: “Node {{ $labels.instance }} down”
description: “{{ $labels.instance }} has been down for more than 5 minutes”

– alert: HighCPUUsage
expr: (100 – (avg by(instance) (irate(node_cpu_seconds_total{mode=”idle”}[5m])) * 100)) > 80
for: 5m
labels:
severity: warning
annotations:
summary: “High CPU usage on {{ $labels.instance }}”
description: “CPU usage is above 80% for more than 5 minutes”

– alert: HighMemoryUsage
expr: (node_memory_MemTotal_bytes – node_memory_MemAvailable_bytes) / node_memory_MemTotal_bytes * 100 > 80
for: 5m
labels:
severity: warning
annotations:
summary: “High memory usage on {{ $labels.instance }}”
description: “Memory usage is above 80% for more than 5 minutes”

– alert: HighDiskUsage
expr: (node_filesystem_size_bytes{mountpoint=”/”} – node_filesystem_free_bytes{mountpoint=”/”}) / node_filesystem_size_bytes{mountpoint=”/”} * 100 > 80
for: 5m
labels:
severity: warning
annotations:
summary: “High disk usage on {{ $labels.instance }}”
description: “Disk usage is above 80% for more than 5 minutes”

# 重启Prometheus
# systemctl restart prometheus

8. Grafana集成

Grafana用于可视化Prometheus收集的指标数据。

8.1 安装Grafana

# 安装Grafana
# dnf install -y https://dl.grafana.com/oss/release/grafana-9.5.2-1.x86_64.rpm

# 启动Grafana
# systemctl start grafana-server
# systemctl enable grafana-server

# 验证安装
# systemctl status grafana-server
# curl http://localhost:3000

8.2 配置Grafana

# 访问Grafana Web UI
# 打开浏览器访问 http://localhost:3000
# 登录用户名:admin,密码:admin

# 添加Prometheus数据源
# 1. 点击左侧菜单的”Configuration” -> “Data sources”
# 2. 点击”Add data source”
# 3. 选择”Prometheus”
# 4. 配置URL为 http://localhost:9090
# 5. 点击”Save & Test”

# 导入Dashboard
# 1. 点击左侧菜单的”Dashboards” -> “Import”
# 2. 输入Dashboard ID:1860(Node Exporter Full)
# 3. 点击”Load”
# 4. 选择Prometheus数据源
# 5. 点击”Import”

9. Prometheus安全配置

Prometheus提供了多种安全功能,包括认证、授权、TLS加密等。

9.1 认证配置

# 安装htpasswd
# dnf install -y httpd-tools

# 创建密码文件
# htpasswd -c /data/prometheus/config/.htpasswd admin
New password:
Re-type new password:
Adding password for user admin

# 编辑Prometheus配置
# vi /data/prometheus/config/prometheus.yml
# 添加以下内容
web:
basic_auth_users:
admin: $2y$10$xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

# 重启Prometheus
# systemctl restart prometheus

9.2 TLS加密配置

# 生成TLS证书
# openssl req -newkey rsa:2048 -nodes -keyout /data/prometheus/config/prometheus.key -x509 -days 365 -out /data/prometheus/config/prometheus.crt

# 编辑Prometheus配置
# vi /data/prometheus/config/prometheus.yml
# 添加以下内容
web:
tls_cert_file: /data/prometheus/config/prometheus.crt
tls_key_file: /data/prometheus/config/prometheus.key

# 重启Prometheus
# systemctl restart prometheus

10. Prometheus性能优化

在生产环境中,需要对Prometheus进行性能优化以提高监控效率。from:www.itpux.com

10.1 存储优化

# 编辑Prometheus配置
# vi /etc/systemd/system/prometheus.service
[Unit]
Description=Prometheus Monitoring System
After=network.target

[Service]
User=prometheus
Environment=”GODEBUG=madvdontneed=1″
ExecStart=/data/prometheus/bin/prometheus \
–config.file=/data/prometheus/config/prometheus.yml \
–storage.tsdb.path=/data/prometheus/data \
–storage.tsdb.retention.time=15d \
–storage.tsdb.wal-compression \
–storage.tsdb.max-block-duration=2h \
–storage.tsdb.min-block-duration=2h \
–web.console.templates=/data/prometheus/bin/consoles \
–web.console.libraries=/data/prometheus/bin/console_libraries \
–web.listen-address=:9090
Restart=always
RestartSec=10

[Install]
WantedBy=multi-user.target

# 重启Prometheus
# systemctl daemon-reload
# systemctl restart prometheus

10.2 内存优化

# 编辑Prometheus配置
# vi /etc/systemd/system/prometheus.service
[Unit]
Description=Prometheus Monitoring System
After=network.target

[Service]
User=prometheus
Environment=”GODEBUG=madvdontneed=1″
Environment=”GOMAXPROCS=8″
ExecStart=/data/prometheus/bin/prometheus \
–config.file=/data/prometheus/config/prometheus.yml \
–storage.tsdb.path=/data/prometheus/data \
–storage.tsdb.retention.time=15d \
–storage.tsdb.wal-compression \
–web.console.templates=/data/prometheus/bin/consoles \
–web.console.libraries=/data/prometheus/bin/console_libraries \
–web.listen-address=:9090
Restart=always
RestartSec=10

[Install]
WantedBy=multi-user.target

# 重启Prometheus
# systemctl daemon-reload
# systemctl restart prometheus

10.3 抓取配置优化

# 编辑Prometheus配置
# vi /data/prometheus/config/prometheus.yml
global:
scrape_interval: 15s
evaluation_interval: 15s

scrape_configs:
– job_name: ‘prometheus’
scrape_interval: 15s
static_configs:
– targets: [‘localhost:9090’]

– job_name: ‘node’
scrape_interval: 30s
static_configs:
– targets: [‘localhost:9100’]

– job_name: ‘mysql’
scrape_interval: 60s
static_configs:
– targets: [‘localhost:9104’]

# 重启Prometheus
# systemctl restart prometheus

11. Prometheus升级迁移

本节介绍Prometheus的版本升级和数据迁移方法。

11.1 Prometheus版本升级

# 备份Prometheus数据
# cp -r /data/prometheus/data /backup/prometheus-data-$(date +%Y%m%d)
# cp /data/prometheus/config/prometheus.yml /backup/prometheus-config-$(date +%Y%m%d).yml

# 停止Prometheus
# systemctl stop prometheus

# 下载新版本Prometheus
# wget https://github.com/prometheus/prometheus/releases/download/v2.45.0/prometheus-2.45.0.linux-amd64.tar.gz

# 解压文件
# tar -xzf prometheus-2.45.0.linux-amd64.tar.gz
# mv prometheus-2.45.0.linux-amd64/{prometheus,promtool} /data/prometheus/bin/

# 启动Prometheus
# systemctl start prometheus

# 验证升级
# prometheus –version
prometheus, version 2.45.0 (branch: HEAD, revision: abcdefg1234567890abcdefg1234567890abcdefg)
build user: root@1a2b3c4d5e6f
build date: 2023-05-01T15:33:19Z
go version: go1.19.6
platform: linux/amd64

# 访问Prometheus Web UI
# 打开浏览器访问 http://localhost:9090

11.2 Prometheus数据迁移

# 停止Prometheus
# systemctl stop prometheus

# 复制数据到新服务器
# scp -r /data/prometheus/data root@new-server:/data/prometheus/
# scp /data/prometheus/config/prometheus.yml root@new-server:/data/prometheus/config/

# 在新服务器上启动Prometheus
# systemctl start prometheus

# 验证迁移
# curl http://new-server:9090/metrics

12. Prometheus备份恢复

本节介绍Prometheus的备份和恢复方法。

12.1 Prometheus备份

# 创建备份脚本
# vi /data/prometheus/scripts/backup.sh

#!/bin/bash
BACKUP_DIR=”/backup/prometheus”
DATE=$(date +%Y%m%d)

# 创建备份目录
mkdir -p $BACKUP_DIR

# 停止Prometheus
systemctl stop prometheus

# 备份数据
cp -r /data/prometheus/data $BACKUP_DIR/data-$DATE
cp /data/prometheus/config/prometheus.yml $BACKUP_DIR/config-$DATE.yml
cp /data/prometheus/config/alertmanager.yml $BACKUP_DIR/alertmanager-$DATE.yml
cp -r /data/prometheus/config/rules $BACKUP_DIR/rules-$DATE

# 启动Prometheus
systemctl start prometheus

# 清理旧备份(保留7天)
find $BACKUP_DIR -type d -mtime +7 -exec rm -rf {} \;

# 添加执行权限
# chmod +x /data/prometheus/scripts/backup.sh

# 添加定时任务
# crontab -e
0 0 * * * /data/prometheus/scripts/backup.sh

12.2 Prometheus恢复

# 停止Prometheus
# systemctl stop prometheus

# 清理现有数据
# rm -rf /data/prometheus/data

# 恢复数据
# cp -r /backup/prometheus/data-20230405 /data/prometheus/data
# cp /backup/prometheus/config-20230405.yml /data/prometheus/config/prometheus.yml
# cp /backup/prometheus/alertmanager-20230405.yml /data/prometheus/config/alertmanager.yml
# cp -r /backup/prometheus/rules-20230405/* /data/prometheus/config/rules/

# 启动Prometheus
# systemctl start prometheus

# 验证恢复
# curl http://localhost:9090/metrics
# 打开浏览器访问 http://localhost:9090

12.3 Prometheus监控脚本

# 创建Prometheus监控脚本
# vi /data/prometheus/scripts/monitor.sh

#!/bin/bash
LOG_FILE=”/var/log/prometheus_monitor.log”
ALERT_EMAIL=”admin@fgedu.net.cn”

check_prometheus_status() {
echo “$(date): Checking prometheus status…” >> $LOG_FILE
status=$(systemctl status prometheus | grep Active | awk ‘{print $2}’)
if [ “$status” != “active” ]; then
echo “$(date): Prometheus is not running” >> $LOG_FILE
echo “Prometheus is not running” | mail -s “Prometheus Alert” $ALERT_EMAIL
systemctl start prometheus
else
echo “$(date): Prometheus is running” >> $LOG_FILE
fi
}

check_prometheus_web() {
echo “$(date): Checking prometheus web…” >> $LOG_FILE
status=$(curl -s -o /dev/null -w “%{http_code}” http://localhost:9090)
if [ “$status” = “200” ]; then
echo “$(date): Prometheus web: OK” >> $LOG_FILE
else
echo “$(date): Prometheus web: FAILED” >> $LOG_FILE
echo “Prometheus web failed” | mail -s “Prometheus Alert” $ALERT_EMAIL
fi
}

check_alertmanager_status() {
echo “$(date): Checking alertmanager status…” >> $LOG_FILE
status=$(systemctl status alertmanager | grep Active | awk ‘{print $2}’)
if [ “$status” != “active” ]; then
echo “$(date): AlertManager is not running” >> $LOG_FILE
echo “AlertManager is not running” | mail -s “Prometheus Alert” $ALERT_EMAIL
systemctl start alertmanager
else
echo “$(date): AlertManager is running” >> $LOG_FILE
fi
}

check_node_exporter_status() {
echo “$(date): Checking node exporter status…” >> $LOG_FILE
status=$(systemctl status node_exporter | grep Active | awk ‘{print $2}’)
if [ “$status” != “active” ]; then
echo “$(date): Node Exporter is not running” >> $LOG_FILE
echo “Node Exporter is not running” | mail -s “Prometheus Alert” $ALERT_EMAIL
systemctl start node_exporter
else
echo “$(date): Node Exporter is running” >> $LOG_FILE
fi
}

main() {
check_prometheus_status
check_prometheus_web
check_alertmanager_status
check_node_exporter_status
}

main

# 添加执行权限
# chmod +x /data/prometheus/scripts/monitor.sh

# 添加定时任务
# crontab -e
*/15 * * * * /data/prometheus/scripts/monitor.sh

生产环境建议:定期备份Prometheus配置和数据,建议每天执行一次完整备份。监控脚本建议每15分钟执行一次,及时发现并处理问题。恢复操作前务必停止Prometheus服务,避免数据不一致。

通过以上步骤,Prometheus安装配置、性能优化、升级迁移、备份恢复等内容已全部完成。Prometheus作为开源监控系统,能够高效地收集和分析监控数据,是企业级监控解决方案的重要组成部分。

本文由风哥教程整理发布,仅用于学习测试使用,转载注明出处:http://www.fgedu.net.cn/10327.html

联系我们

在线咨询:点击这里给我发消息

微信号:itpux-com

工作日:9:30-18:30,节假日休息