Prometheus安装配置-Prometheus数据库安装配置_升级迁移详细过程

1. Prometheus概述与环境规划

Prometheus是一款开源的系统监控和告警工具包，最初由SoundCloud开发。Prometheus采用拉取式数据采集模型，支持多维数据模型和灵活的查询语言PromQL，广泛应用于云原生和微服务监控场景。更多学习教程www.fgedu.net.cn

1.1 Prometheus版本说明

Prometheus目前主要版本为2.45，本教程以Prometheus 2.45为例进行详细讲解。

# 查看Prometheus版本
$ prometheus –version
prometheus, version 2.45.0 (branch: HEAD, revision: abc123)
build user: root@buildhost
build date: 20240101-00:00:00
go version: go1.21.0
platform: linux/amd64

# 查看配置信息
$ prometheus –config.check –config.file=/etc/prometheus/prometheus.yml
prometheus configuration file /etc/prometheus/prometheus.yml is valid

1.2 环境规划

本次安装环境规划如下：

主机名：fgedudb01.fgedu.net.cn
IP地址：192.168.1.51
HTTP端口：9090
数据目录：/data/prometheus
配置目录：/etc/prometheus
日志目录：/var/log/prometheus

存储规划：
数据保留期：15天
采集间隔：15秒

1.3 Prometheus核心特性

主要特点：
1. 多维数据模型：指标名称和键值对标签
2. PromQL查询语言：强大的数据查询能力
3. 拉取式采集：主动从目标拉取指标
4. 服务发现：自动发现监控目标
5. 告警管理：支持Alertmanager告警
6. 可视化：支持Grafana集成
7. 时序存储：高效的本地存储
8. 联邦集群：支持多级联邦架构

2. 硬件环境要求与检查

在安装Prometheus之前，需要对服务器硬件环境进行全面检查。学习交流加群风哥微信: itpux-com

2.1 最低硬件要求

最低配置：
CPU：2核心
内存：4GB
磁盘：20GB

推荐配置（生产环境）：
CPU：4核心以上
内存：16GB以上
磁盘：200GB以上SSD

大规模监控配置：
CPU：8核心以上
内存：32GB以上
磁盘：500GB以上SSD

2.2 系统环境检查

# 检查操作系统版本
# cat /etc/redhat-release
Red Hat Enterprise Linux release 8.8 (Ootpa)

# 检查内核版本
# uname -a
Linux fgedudb01 4.18.0-477.10.1.el8_8.x86_64 #1 SMP Fri Apr 4 10:00:00 CST 2026 x86_64 x86_64 x86_64 GNU/Linux

# 检查内存信息
# free -h
total used free shared buff/cache available
Mem: 31Gi 1.0Gi 29Gi 256Mi 1.0Gi 30Gi
Swap: 7Gi 0B 7Gi

# 检查磁盘空间
# df -h
文件系统容量已用可用已用% 挂载点
/dev/mapper/vg_system-lv_root 50G 2.5G 48G 5% /
/dev/sda2 1014M 150M 865M 15% /boot
/dev/mapper/vg_data-lv_data 200G 20G 180G 10% /data

# 检查时间同步
# timedatectl status
Local time: 五 2026-04-04 10:00:00 CST
Universal time: 五 2026-04-04 02:00:00 UTC
RTC time: 五 2026-04-04 02:00:00
Time zone: Asia/Shanghai (CST, +0800)
NTP enabled: yes
NTP synchronized: yes

2.3 内核参数配置

# 配置内核参数
# vi /etc/sysctl.d/99-prometheus.conf

# 添加以下参数
# 网络参数
net.core.somaxconn = 65535
net.core.netdev_max_backlog = 65535
net.ipv4.tcp_max_syn_backlog = 65535
net.ipv4.tcp_fin_timeout = 30
net.ipv4.tcp_keepalive_time = 300

# 文件描述符限制
fs.file-max = 655360

# 使内核参数生效
# sysctl -p /etc/sysctl.d/99-prometheus.conf

# 输出示例：
net.core.somaxconn = 65535
net.core.netdev_max_backlog = 65535
…

2.4 用户资源限制配置

# 配置用户限制
# vi /etc/security/limits.conf

# 添加以下配置
prometheus soft nofile 65535
prometheus hard nofile 65535
prometheus soft nproc 65535
prometheus hard nproc 65535

# 创建用户
# useradd -r -s /sbin/nologin prometheus

生产环境建议：Prometheus对磁盘I/O要求较高，建议使用SSD存储。时间同步对监控系统非常重要，务必配置NTP服务。

3. Prometheus安装步骤

本节详细介绍Prometheus 2.45的安装过程。学习交流加群风哥QQ113257174

3.1 创建目录结构

# 创建目录结构
# mkdir -p /etc/prometheus
# mkdir -p /data/prometheus
# mkdir -p /var/log/prometheus

# 设置目录权限
# chown -R prometheus:prometheus /etc/prometheus
# chown -R prometheus:prometheus /data/prometheus
# chown -R prometheus:prometheus /var/log/prometheus
# chmod -R 750 /data/prometheus
# chmod -R 750 /var/log/prometheus

# 验证目录权限
# ls -la /data/
总用量 0
drwxr-xr-x. 2 prometheus prometheus 6 4月 4 10:00 prometheus

3.2 下载并安装Prometheus

# 下载Prometheus
# cd /usr/local/src
# wget https://github.com/prometheus/prometheus/releases/download/v2.45.0/prometheus-2.45.0.linux-amd64.tar.gz

# 输出示例：
–2026-04-04 10:00:00– https://github.com/prometheus/prometheus/releases/download/v2.45.0/prometheus-2.45.0.linux-amd64.tar.gz
正在解析主机 github.com… 140.82.121.4
正在连接 github.com|140.82.121.4|:443… 已连接。
已发出 HTTP 请求，正在等待回应… 200 OK
长度：100000000 (95M) [application/octet-stream]
正在保存至: “prometheus-2.45.0.linux-amd64.tar.gz”
100%[======================================>] 100,000,000 10.0MB/s 用时 9.5s
2026-04-04 10:00:10 (10.0 MB/s) – 已保存 “prometheus-2.45.0.linux-amd64.tar.gz”

# 解压安装包
# tar -xzf prometheus-2.45.0.linux-amd64.tar.gz

# 复制二进制文件
# cp prometheus-2.45.0.linux-amd64/prometheus /usr/local/bin/
# cp prometheus-2.45.0.linux-amd64/promtool /usr/local/bin/

# 复制默认配置文件
# cp prometheus-2.45.0.linux-amd64/prometheus.yml /etc/prometheus/

# 设置权限
# chown prometheus:prometheus /usr/local/bin/prometheus
# chown prometheus:prometheus /usr/local/bin/promtool
# chmod 755 /usr/local/bin/prometheus
# chmod 755 /usr/local/bin/promtool

# 验证安装
# prometheus –version
prometheus, version 2.45.0 (branch: HEAD, revision: abc123)
build user: root@buildhost
build date: 20240101-00:00:00
go version: go1.21.0
platform: linux/amd64

3.3 创建配置文件

# 编辑配置文件
# vi /etc/prometheus/prometheus.yml

# 添加以下配置
global:
scrape_interval: 15s
evaluation_interval: 15s
external_labels:
monitor: ‘fgedu-monitor’

alerting:
alertmanagers:
– static_configs:
– targets:
– localhost:9093

rule_files:
– /etc/prometheus/rules/*.yml

scrape_configs:
– job_name: ‘prometheus’
static_configs:
– targets: [‘localhost:9090’]
labels:
instance: ‘fgedudb01’

– job_name: ‘node_exporter’
static_configs:
– targets: [‘192.168.1.51:9100’, ‘192.168.1.52:9100’]
labels:
group: ‘production’

– job_name: ‘mysql_exporter’
static_configs:
– targets: [‘192.168.1.51:9104’]

– job_name: ‘redis_exporter’
static_configs:
– targets: [‘192.168.1.51:9121’]

# 创建规则目录
# mkdir -p /etc/prometheus/rules

# 创建告警规则文件
# vi /etc/prometheus/rules/alert_rules.yml

groups:
– name: node_alerts
rules:
– alert: InstanceDown
expr: up == 0
for: 1m
labels:
severity: critical
annotations:
summary: “Instance {{ $labels.instance }} down”
description: “{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 1 minute.”

– alert: HighCPUUsage
expr: 100 – (avg by(instance) (rate(node_cpu_seconds_total{mode=”idle”}[5m])) * 100) > 80
for: 5m
labels:
severity: warning
annotations:
summary: “High CPU usage on {{ $labels.instance }}”
description: “CPU usage is above 80% (current value: {{ $value }}%)”

– alert: HighMemoryUsage
expr: (1 – (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100 > 85
for: 5m
labels:
severity: warning
annotations:
summary: “High memory usage on {{ $labels.instance }}”
description: “Memory usage is above 85% (current value: {{ $value }}%)”

– alert: DiskSpaceLow
expr: (node_filesystem_avail_bytes{fstype!=”tmpfs”} / node_filesystem_size_bytes{fstype!=”tmpfs”}) * 100 < 15 for: 5m labels: severity: warning annotations: summary: "Low disk space on {{ $labels.instance }}" description: "Disk usage is above 85% (current value: {{ $value }}%)" # 设置权限 # chown -R prometheus:prometheus /etc/prometheus

3.4 创建Systemd服务

# 创建服务文件
# vi /etc/systemd/system/prometheus.service

[Unit]
Description=Prometheus Monitoring System
Documentation=https://prometheus.io/docs/introduction/overview/
After=network-online.target

[Service]
User=prometheus
Group=prometheus
Type=simple
Restart=on-failure
ExecStart=/usr/local/bin/prometheus \
–config.file=/etc/prometheus/prometheus.yml \
–storage.tsdb.path=/data/prometheus \
–storage.tsdb.retention.time=15d \
–storage.tsdb.retention.size=50GB \
–web.console.templates=/etc/prometheus/consoles \
–web.console.libraries=/etc/prometheus/console_libraries \
–web.listen-address=0.0.0.0:9090 \
–web.external-url=http://192.168.1.51:9090 \
–log.level=info \
–log.format=logfmt
RestartSec=5
LimitNOFILE=65535

[Install]
WantedBy=multi-user.target

# 重载systemd
# systemctl daemon-reload

# 启动Prometheus服务
# systemctl start prometheus

# 设置开机自启动
# systemctl enable prometheus

# 输出示例：
Created symlink /etc/systemd/system/multi-user.target.wants/prometheus.service → /etc/systemd/system/prometheus.service.

# 检查服务状态
# systemctl status prometheus

● prometheus.service – Prometheus Monitoring System
Loaded: loaded (/etc/systemd/system/prometheus.service; enabled; vendor preset: disabled)
Active: active (running) since Fri 2026-04-04 10:00:00 CST; 5s ago
Docs: https://prometheus.io/docs/introduction/overview/
Main PID: 12345 (prometheus)
Tasks: 8 (limit: 4915)
Memory: 100.0M
CGroup: /system.slice/prometheus.service
└─12345 /usr/local/bin/prometheus –config.file=/etc/prometheus/prometheus.yml …

# 检查端口
# netstat -tlnp | grep prometheus
tcp6 0 0 :::9090 :::* LISTEN 12345/prometheus

风哥提示：Prometheus配置文件使用YAML格式，注意缩进。storage.tsdb.retention.time参数控制数据保留时间，根据存储容量合理设置。

4. Prometheus参数配置

Prometheus参数配置是监控系统的关键步骤，直接影响监控效果和性能。更多学习教程公众号风哥教程itpux_com

4.1 配置服务发现

# 编辑配置文件
# vi /etc/prometheus/prometheus.yml

# 文件服务发现
scrape_configs:
– job_name: ‘file_sd’
file_sd_configs:
– files:
– /etc/prometheus/targets/*.json
refresh_interval: 5m

# 创建目标文件
# mkdir -p /etc/prometheus/targets
# vi /etc/prometheus/targets/nodes.json

[
{
“targets”: [“192.168.1.51:9100”, “192.168.1.52:9100”],
“labels”: {
“job”: “node_exporter”,
“env”: “production”
}
}
]

# Consul服务发现
scrape_configs:
– job_name: ‘consul_sd’
consul_sd_configs:
– server: ‘192.168.1.100:8500’
services: [‘node-exporter’, ‘mysql-exporter’]

# Kubernetes服务发现
scrape_configs:
– job_name: ‘kubernetes_sd’
kubernetes_sd_configs:
– role: pod
namespaces:
names:
– monitoring
– default

# 重载配置
# systemctl reload prometheus

4.2 配置远程存储

# 编辑配置文件
# vi /etc/prometheus/prometheus.yml

# 配置远程写入
remote_write:
– url: “http://192.168.1.51:8428/api/v1/write”
queue_config:
max_samples_per_send: 10000
max_shards: 200
capacity: 2500

# 配置远程读取
remote_read:
– url: “http://192.168.1.51:8428/api/v1/read”
read_recent: true

# 重启服务
# systemctl restart prometheus

4.3 配置告警规则

# 创建告警规则文件
# vi /etc/prometheus/rules/alert_rules.yml

groups:
– name: mysql_alerts
rules:
– alert: MySQLDown
expr: mysql_up == 0
for: 1m
labels:
severity: critical
annotations:
summary: “MySQL instance {{ $labels.instance }} is down”
description: “MySQL instance has been down for more than 1 minute.”

– alert: MySQLReplicationLag
expr: mysql_slave_status_seconds_behind_master > 30
for: 5m
labels:
severity: warning
annotations:
summary: “MySQL replication lag on {{ $labels.instance }}”
description: “Replication lag is {{ $value }} seconds.”

– alert: MySQLTooManyConnections
expr: mysql_global_status_threads_connected / mysql_global_variables_max_connections > 0.8
for: 5m
labels:
severity: warning
annotations:
summary: “MySQL too many connections on {{ $labels.instance }}”
description: “Connection usage is {{ $value }}%.”

– name: redis_alerts
rules:
– alert: RedisDown
expr: redis_up == 0
for: 1m
labels:
severity: critical
annotations:
summary: “Redis instance {{ $labels.instance }} is down”
description: “Redis instance has been down for more than 1 minute.”

– alert: RedisMemoryHigh
expr: redis_memory_used_bytes / redis_memory_max_bytes * 100 > 80
for: 5m
labels:
severity: warning
annotations:
summary: “Redis memory usage high on {{ $labels.instance }}”
description: “Memory usage is {{ $value }}%.”

# 验证配置
# promtool check rules /etc/prometheus/rules/alert_rules.yml

Checking /etc/prometheus/rules/alert_rules.yml
SUCCESS: 6 rules found

# 重载配置
# systemctl reload prometheus

生产环境建议：建议使用文件服务发现或Consul服务发现实现动态目标管理。告警规则要根据实际业务需求定制，避免误报和漏报。

5. 数据采集与查询

Prometheus使用拉取模式采集数据，使用PromQL语言进行查询。from:www.itpux.com

5.1 安装Node Exporter

# 下载Node Exporter
# cd /usr/local/src
# wget https://github.com/prometheus/node_exporter/releases/download/v1.6.1/node_exporter-1.6.1.linux-amd64.tar.gz

# 解压并安装
# tar -xzf node_exporter-1.6.1.linux-amd64.tar.gz
# cp node_exporter-1.6.1.linux-amd64/node_exporter /usr/local/bin/
# chmod 755 /usr/local/bin/node_exporter

# 创建服务文件
# vi /etc/systemd/system/node_exporter.service

[Unit]
Description=Node Exporter
After=network-online.target

[Install]
WantedBy=multi-user.target

# 启动服务
# systemctl daemon-reload
# systemctl start node_exporter
# systemctl enable node_exporter

# 验证采集
# curl http://localhost:9100/metrics | head -20

# HELP go_gc_duration_seconds A summary of the pause duration of garbage collection cycles.
# TYPE go_gc_duration_seconds summary
go_gc_duration_seconds{quantile=”0″} 0.000123
go_gc_duration_seconds{quantile=”0.25″} 0.000234
go_gc_duration_seconds{quantile=”0.5″} 0.000345
go_gc_duration_seconds{quantile=”0.75″} 0.000456
go_gc_duration_seconds{quantile=”1″} 0.001234
go_gc_duration_seconds_sum 0.012345
go_gc_duration_seconds_count 10
# HELP go_goroutines Number of goroutines that currently exist.
# TYPE go_goroutines gauge
go_goroutines 10

5.2 PromQL查询示例

# 查询CPU使用率
$ curl -G ‘http://192.168.1.51:9090/api/v1/query’ \
–data-urlencode ‘query=100 – (avg by(instance) (rate(node_cpu_seconds_total{mode=”idle”}[5m])) * 100)’

# 输出示例：
{
“status”: “success”,
“data”: {
“resultType”: “vector”,
“result”: [
{
“metric”: {
“instance”: “192.168.1.51:9100”
},
“value”: [1712205600, “25.5”]
}
]
}
}

# 查询内存使用率
$ curl -G ‘http://192.168.1.51:9090/api/v1/query’ \
–data-urlencode ‘query=(1 – (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100’

# 查询磁盘使用率
$ curl -G ‘http://192.168.1.51:9090/api/v1/query’ \
–data-urlencode ‘query=(1 – (node_filesystem_avail_bytes{fstype!=”tmpfs”} / node_filesystem_size_bytes{fstype!=”tmpfs”})) * 100’

# 查询网络流量
$ curl -G ‘http://192.168.1.51:9090/api/v1/query’ \
–data-urlencode ‘query=rate(node_network_receive_bytes_total{device=”eth0″}[5m])’

# 查询范围数据
$ curl -G ‘http://192.168.1.51:9090/api/v1/query_range’ \
–data-urlencode ‘query=node_cpu_seconds_total{mode=”idle”}’ \
–data-urlencode ‘start=1712205000’ \
–data-urlencode ‘end=1712205600’ \
–data-urlencode ‘step=60s’

# 聚合查询
$ curl -G ‘http://192.168.1.51:9090/api/v1/query’ \
–data-urlencode ‘query=sum by(job) (rate(node_cpu_seconds_total[5m]))’

# 统计监控目标数量
$ curl -G ‘http://192.168.1.51:9090/api/v1/query’ \
–data-urlencode ‘query=count(up)’

5.3 使用Web UI查询

# 访问Web界面
# 浏览器打开 http://192.168.1.51:9090

# 常用查询示例：

# 1. 查看所有实例状态
up

# 2. CPU使用率（按实例）
100 – (avg by(instance) (rate(node_cpu_seconds_total{mode=”idle”}[5m])) * 100)

# 3. 内存使用率
(1 – (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100

# 4. 磁盘IO
rate(node_disk_read_bytes_total[5m])
rate(node_disk_written_bytes_total[5m])

# 5. 网络流量
rate(node_network_receive_bytes_total[5m])
rate(node_network_transmit_bytes_total[5m])

# 6. 系统负载
node_load1
node_load5
node_load15

# 7. 进程数
node_procs_running
node_procs_blocked

# 8. 文件描述符
node_filefd_allocated
node_filefd_maximum

风哥提示：PromQL是Prometheus的核心查询语言，支持丰富的函数和操作符。建议熟悉常用查询模式，提高监控效率。

6. 网络连接配置

网络连接配置是客户端访问Prometheus的关键，需要正确配置监听端口和连接方式。更多学习教程www.fgedu.net.cn

6.1 配置网络监听

# 查看当前监听端口
# netstat -tlnp | grep prometheus
tcp6 0 0 :::9090 :::* LISTEN 12345/prometheus

# 修改监听地址
# vi /etc/systemd/system/prometheus.service
ExecStart=/usr/local/bin/prometheus \
–web.listen-address=192.168.1.51:9090 \
…

# 重启服务
# systemctl daemon-reload
# systemctl restart prometheus

# 配置防火墙
# firewall-cmd –permanent –add-port=9090/tcp
success
# firewall-cmd –reload
success

6.2 配置认证

# Prometheus原生不支持认证
# 建议使用反向代理实现认证

# 安装nginx
# dnf install -y nginx

# 配置nginx反向代理
# vi /etc/nginx/conf.d/prometheus.conf

server {
listen 80;
server_name prometheus.fgedu.net.cn;

location / {
auth_basic “Prometheus”;
auth_basic_user_file /etc/nginx/.htpasswd;
proxy_pass http://127.0.0.1:9090;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
}
}

# 创建密码文件
# htpasswd -c /etc/nginx/.htpasswd admin
New password:
Re-type new password:
Adding password for user admin

# 启动nginx
# systemctl start nginx
# systemctl enable nginx

# 访问认证后的Prometheus
# curl -u admin:password http://prometheus.fgedu.net.cn/api/v1/query?query=up

6.3 配置HTTPS

# 生成SSL证书
# openssl req -x509 -nodes -newkey rsa:2048 \
-keyout /etc/nginx/ssl/prometheus.key \
-out /etc/nginx/ssl/prometheus.crt \
-days 365 \
-subj “/CN=prometheus.fgedu.net.cn”

# 配置HTTPS
# vi /etc/nginx/conf.d/prometheus.conf

server {
listen 443 ssl;
server_name prometheus.fgedu.net.cn;

ssl_certificate /etc/nginx/ssl/prometheus.crt;
ssl_certificate_key /etc/nginx/ssl/prometheus.key;

location / {
auth_basic “Prometheus”;
auth_basic_user_file /etc/nginx/.htpasswd;
proxy_pass http://127.0.0.1:9090;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
}
}

# 重启nginx
# systemctl restart nginx

生产环境建议：建议配置反向代理实现认证和HTTPS。对于敏感环境，可以配置IP白名单限制访问。

7. 备份恢复配置

备份恢复是监控系统管理的重要环节，Prometheus的数据存储在本地TSDB中。学习交流加群风哥微信: itpux-com

7.1 数据备份

# 创建备份目录
# mkdir -p /backup/prometheus

# 停止Prometheus服务
# systemctl stop prometheus

# 备份数据目录
# tar -czf /backup/prometheus/prometheus_data_$(date +%Y%m%d).tar.gz /data/prometheus

# 备份配置文件
# tar -czf /backup/prometheus/prometheus_config_$(date +%Y%m%d).tar.gz /etc/prometheus

# 启动服务
# systemctl start prometheus

# 验证备份文件
# ls -la /backup/prometheus/
总用量 2048
-rw-r–r–. 1 root root 1024000 4月 4 10:00 prometheus_config_20260404.tar.gz
-rw-r–r–. 1 root root 5120000 4月 4 10:00 prometheus_data_20260404.tar.gz

7.2 数据恢复

# 停止Prometheus服务
# systemctl stop prometheus

# 恢复数据目录
# rm -rf /data/prometheus/*
# tar -xzf /backup/prometheus/prometheus_data_20260404.tar.gz -C /
# chown -R prometheus:prometheus /data/prometheus

# 恢复配置文件
# rm -rf /etc/prometheus/*
# tar -xzf /backup/prometheus/prometheus_config_20260404.tar.gz -C /
# chown -R prometheus:prometheus /etc/prometheus

# 启动服务
# systemctl start prometheus

# 验证恢复
# curl -s ‘http://localhost:9090/api/v1/query?query=up’ | head -20

7.3 自动备份脚本

# 创建备份脚本
# vi /usr/local/bin/prometheus_backup.sh

#!/bin/bash
BACKUP_DIR=/backup/prometheus
DATE=$(date +%Y%m%d)
LOG_FILE=/var/log/prometheus/backup.log

echo “=== Backup started at $(date) ===” >> $LOG_FILE

# 备份配置文件
tar -czf ${BACKUP_DIR}/prometheus_config_${DATE}.tar.gz /etc/prometheus >> $LOG_FILE 2>&1

# 使用promtool快照
curl -X POST http://localhost:9090/api/v1/admin/tsdb/snapshot >> $LOG_FILE 2>&1

if [ $? -eq 0 ]; then
# 复制快照到备份目录
SNAPSHOT_DIR=$(ls -td /data/prometheus/snapshots/* | head -1)
tar -czf ${BACKUP_DIR}/prometheus_snapshot_${DATE}.tar.gz ${SNAPSHOT_DIR} >> $LOG_FILE 2>&1
echo “Backup completed successfully” >> $LOG_FILE
else
echo “Backup failed” >> $LOG_FILE
fi

# 清理30天前的备份
find ${BACKUP_DIR} -name “*.tar.gz” -mtime +30 -delete >> $LOG_FILE 2>&1

echo “=== Backup finished at $(date) ===” >> $LOG_FILE
echo “” >> $LOG_FILE

# 设置脚本权限
# chmod +x /usr/local/bin/prometheus_backup.sh

# 配置定时任务
# crontab -e

# 添加以下内容（每天凌晨2点执行备份）
0 2 * * * /usr/local/bin/prometheus_backup.sh

风哥提示：生产环境建议配置自动备份脚本，定期执行备份。对于大规模部署，建议使用远程存储如VictoriaMetrics或Thanos。

8. 升级与迁移

Prometheus升级和迁移是运维工作中的重要环节，需要仔细规划和执行。更多学习教程公众号风哥教程itpux_com

8.1 版本升级

# 检查当前版本
$ prometheus –version
prometheus, version 2.45.0

# 执行完整备份
# systemctl stop prometheus
# tar -czf /backup/prometheus/pre_upgrade.tar.gz /data/prometheus /etc/prometheus

# 下载新版本
# cd /usr/local/src
# wget https://github.com/prometheus/prometheus/releases/download/v2.46.0/prometheus-2.46.0.linux-amd64.tar.gz

# 解压并替换二进制文件
# tar -xzf prometheus-2.46.0.linux-amd64.tar.gz
# cp prometheus-2.46.0.linux-amd64/prometheus /usr/local/bin/
# cp prometheus-2.46.0.linux-amd64/promtool /usr/local/bin/
# chown prometheus:prometheus /usr/local/bin/prometheus /usr/local/bin/promtool

# 验证配置兼容性
# promtool check config /etc/prometheus/prometheus.yml
Checking /etc/prometheus/prometheus.yml
SUCCESS: 0 potential problems found

# 启动服务
# systemctl start prometheus

# 验证版本
$ prometheus –version
prometheus, version 2.46.0

8.2 迁移到新服务器

# 在源服务器执行备份
# systemctl stop prometheus
# tar -czf prometheus_full_backup.tar.gz /data/prometheus /etc/prometheus

# 传输备份文件
# scp prometheus_full_backup.tar.gz new-server:/backup/

# 在新服务器安装Prometheus
# 参考3.2节安装步骤

# 恢复数据
# systemctl stop prometheus
# tar -xzf /backup/prometheus_full_backup.tar.gz -C /
# chown -R prometheus:prometheus /data/prometheus /etc/prometheus
# systemctl start prometheus

# 验证迁移
# curl -s ‘http://localhost:9090/api/v1/query?query=up’

生产环境建议：升级前必须执行完整备份，并在测试环境验证升级过程。跨大版本升级需要仔细阅读升级文档。

9. 生产环境实战案例

本节提供一个完整的生产环境配置案例，帮助读者更好地理解Prometheus的实际应用。from:www.itpux.com

9.1 安装Alertmanager

# 下载Alertmanager
# cd /usr/local/src
# wget https://github.com/prometheus/alertmanager/releases/download/v0.26.0/alertmanager-0.26.0.linux-amd64.tar.gz

# 解压并安装
# tar -xzf alertmanager-0.26.0.linux-amd64.tar.gz
# cp alertmanager-0.26.0.linux-amd64/alertmanager /usr/local/bin/
# cp alertmanager-0.26.0.linux-amd64/amtool /usr/local/bin/

# 创建配置文件
# vi /etc/prometheus/alertmanager.yml

global:
resolve_timeout: 5m
smtp_smarthost: ‘smtp.fgedu.net.cn:25’
smtp_from: ‘alertmanager@fgedu.net.cn’
smtp_auth_username: ‘alertmanager@fgedu.net.cn’
smtp_auth_password: ‘password’

route:
group_by: [‘alertname’, ‘severity’]
group_wait: 30s
group_interval: 5m
repeat_interval: 4h
receiver: ‘default-receiver’
routes:
– match:
severity: critical
receiver: ‘critical-receiver’
– match:
severity: warning
receiver: ‘warning-receiver’

receivers:
– name: ‘default-receiver’
email_configs:
– to: ‘admin@fgedu.net.cn’
– name: ‘critical-receiver’
email_configs:
– to: ‘admin@fgedu.net.cn’
webhook_configs:
– url: ‘http://192.168.1.100:5001/webhook’
– name: ‘warning-receiver’
email_configs:
– to: ‘admin@fgedu.net.cn’

# 创建服务文件
# vi /etc/systemd/system/alertmanager.service

[Unit]
Description=Alertmanager
After=network-online.target

[Service]
User=prometheus
Group=prometheus
Type=simple
ExecStart=/usr/local/bin/alertmanager \
–config.file=/etc/prometheus/alertmanager.yml \
–storage.path=/data/alertmanager \
–web.listen-address=:9093
Restart=on-failure

[Install]
WantedBy=multi-user.target

# 启动服务
# mkdir -p /data/alertmanager
# chown prometheus:prometheus /data/alertmanager
# systemctl daemon-reload
# systemctl start alertmanager
# systemctl enable alertmanager

9.2 性能监控

# 查看Prometheus自身指标
$ curl -s http://localhost:9090/metrics | grep prometheus_

# HELP prometheus_build_info A metric with a constant ‘1’ value labeled by version, revision, branch, and goversion from which prometheus was built.
# TYPE prometheus_build_info gauge
prometheus_build_info{branch=”HEAD”,goversion=”go1.21.0″,revision=”abc123″,version=”2.45.0″} 1

# HELP prometheus_config_last_reload_success_timestamp_seconds Timestamp of the last successful configuration reload.
# TYPE prometheus_config_last_reload_success_timestamp_seconds gauge
prometheus_config_last_reload_success_timestamp_seconds 1.7122056e+09

# HELP prometheus_config_last_reload_successful Whether the last configuration reload attempt was successful.
# TYPE prometheus_config_last_reload_successful gauge
prometheus_config_last_reload_successful 1

# 查看采集目标状态
$ curl -s ‘http://localhost:9090/api/v1/targets’ | jq ‘.data.activeTargets[] | {job: .labels.job, health: .health}’

{
“job”: “prometheus”,
“health”: “up”
}
{
“job”: “node_exporter”,
“health”: “up”
}

# 查看TSDB状态
$ curl -s ‘http://localhost:9090/api/v1/status/tsdb’

{
“status”: “success”,
“data”: {
“headStats”: {
“numSeries”: 10000,
“numChunks”: 50000,
“chunkCount”: 50000,
“minTime”: 1712205000,
“maxTime”: 1712205600,
“minTimeMillis”: 1712205000000,
“maxTimeMillis”: 1712205600000
},
“seriesCountByMetricName”: […],
“labelCountByMetricName”: […],
“numLabelPairs”: 1000
}
}

9.3 高可用配置

# Prometheus高可用方案

# 方案1：多实例部署
# 部署多个Prometheus实例，各自独立采集数据
# 使用负载均衡器分发查询请求

# 方案2：联邦集群
# 主Prometheus配置
# vi /etc/prometheus/prometheus.yml

scrape_configs:
– job_name: ‘federate’
scrape_interval: 15s
honor_labels: true
metrics_path: ‘/federate’
params:
‘match[]’:
– ‘{job=”prometheus”}’
– ‘{job=”node_exporter”}’
static_configs:
– targets:
– ‘192.168.1.52:9090’
– ‘192.168.1.53:9090’
labels:
federation: ‘datacenter1’

# 方案3：使用Thanos或VictoriaMetrics
# Thanos提供长期存储和高可用查询
# VictoriaMetrics提供高性能存储和查询

# Thanos Sidecar配置示例
# vi /etc/systemd/system/thanos-sidecar.service

[Unit]
Description=Thanos Sidecar
After=network-online.target

[Service]
User=prometheus
Group=prometheus
Type=simple
ExecStart=/usr/local/bin/thanos sidecar \
–prometheus.url=http://localhost:9090 \
–tsdb.path=/data/prometheus \
–objstore.config-file=/etc/thanos/objectstore.yml \
–http-address=0.0.0.0:19191 \
–grpc-address=0.0.0.0:10901
Restart=on-failure

[Install]
WantedBy=multi-user.target

风哥提示：Prometheus单实例存在单点故障风险，建议使用多实例部署或联邦架构实现高可用。对于大规模监控，建议使用Thanos或VictoriaMetrics。

本文由风哥教程整理发布,仅用于学习测试使用,转载注明出处:http://www.fgedu.net.cn/10327.html

Prometheus安装配置-Prometheus数据库安装配置_升级迁移详细过程

1. Prometheus概述与环境规划

1.1 Prometheus版本说明

1.2 环境规划

1.3 Prometheus核心特性

2. 硬件环境要求与检查

2.1 最低硬件要求

2.2 系统环境检查

2.3 内核参数配置

2.4 用户资源限制配置

3. Prometheus安装步骤

3.1 创建目录结构

3.2 下载并安装Prometheus

3.3 创建配置文件

3.4 创建Systemd服务

4. Prometheus参数配置

4.1 配置服务发现

4.2 配置远程存储

4.3 配置告警规则

5. 数据采集与查询

5.1 安装Node Exporter

5.2 PromQL查询示例

5.3 使用Web UI查询

6. 网络连接配置

6.1 配置网络监听

6.2 配置认证

6.3 配置HTTPS

7. 备份恢复配置

7.1 数据备份

7.2 数据恢复

7.3 自动备份脚本

8. 升级与迁移

8.1 版本升级

8.2 迁移到新服务器

9. 生产环境实战案例

9.1 安装Alertmanager

9.2 性能监控

9.3 高可用配置

相关推荐

联系我们