内容简介:本文风哥教程参考Linux官方文档、Red Hat Enterprise Linux官方文档、Ansible Automation Platform官方文档、Docker官方文档、Kubernetes官方文档和Podman官方文档等内容,详细介绍了相关技术的配置和使用方法。
风哥提示:
本文档详细介绍Prometheus监控系统的安装、配置和管理方法。
Part01-Prometheus安装
1.1 安装Prometheus服务
$ wget https://github.com/prometheus/prometheus/releases/download/v2.45.0/prometheus-2.45.0.linux-amd64.tar.gz
–2026-04-04 00:35:00– https://github.com/prometheus/prometheus/releases/download/v2.45.0/prometheus-2.45.0.linux-amd64.tar.gz
Resolving github.com (github.com)… 140.82.121.4
Connecting to github.com (github.com)|140.82.121.4|:443… connected.
HTTP request sent, awaiting response… 200 OK
Length: 123456789 (118M) [application/octet-stream]
Saving to: ‘prometheus-2.45.0.linux-amd64.tar.gz’
prometheus-2.45.0.linux-amd64.tar.gz 100%[================================================================================>] 118.00M 10.0MB/s in 12s
2026-04-04 00:35:12 (10.0 MB/s) – ‘prometheus-2.45.0.linux-amd64.tar.gz’ saved [123456789/123456789]
# 解压安装
$ sudo tar xzf prometheus-2.45.0.linux-amd64.tar.gz -C /usr/local/
$ sudo ln -s /usr/local/prometheus-2.45.0.linux-amd64 /usr/local/prometheus
# 创建用户
$ sudo useradd -r -s /bin/false prometheus
# 创建数据目录
$ sudo mkdir -p /data/prometheus
$ sudo chown prometheus:prometheus /data/prometheus
# 创建systemd服务
$ sudo tee /etc/systemd/system/prometheus.service << EOF
[Unit]
Description=Prometheus Server
Documentation=https://prometheus.io/docs/introduction/overview/
After=network-online.target
[Service]
User=prometheus
Group=prometheus
Type=simple
ExecStart=/usr/local/prometheus/prometheus \
--config.file=/usr/local/prometheus/prometheus.yml \
--storage.tsdb.path=/data/prometheus \
--web.console.templates=/usr/local/prometheus/consoles \
--web.console.libraries=/usr/local/prometheus/console_libraries \
--web.listen-address=0.0.0.0:9090
[Install]
WantedBy=multi-user.target
EOF
# 启动服务
$ sudo systemctl daemon-reload
$ sudo systemctl start prometheus
$ sudo systemctl enable prometheus
Created symlink /etc/systemd/system/multi-user.target.wants/prometheus.service → /etc/systemd/system/prometheus.service.
# 查看服务状态
$ sudo systemctl status prometheus
● prometheus.service - Prometheus Server
Loaded: loaded (/etc/systemd/system/prometheus.service; enabled; preset: disabled)
Active: active (running) since Fri 2026-04-04 00:35:30 CST; 10s ago
Docs: https://prometheus.io/docs/introduction/overview/
Main PID: 12379 (prometheus)
Tasks: 7 (limit: 49152)
Memory: 50.5M
CPU: 500ms
CGroup: /system.slice/prometheus.service
└─12379 /usr/local/prometheus/prometheus --config.file=/usr/local/prometheus/prometheus.yml --storage.tsdb.path=/data/prometheus --web.console.templates=/usr/local/prometheus/consoles --web.console.libraries=/usr/local/prometheus/console_libraries --web.listen-address=0.0.0.0:9090
Apr 04 00:35:30 rhel10 systemd[1]: Started Prometheus Server.
Apr 04 00:35:30 rhel10 prometheus[12379]: ts=2026-04-03T16:35:30.123Z caller=main.go:543 level=info msg="Starting Prometheus Server" version="(version=2.45.0, branch=HEAD, revision=1234567890)"
# 配置防火墙
$ sudo firewall-cmd --permanent --add-port=9090/tcp
success
$ sudo firewall-cmd --reload
success
# 访问Web界面
http://192.168.1.100:9090
Part02-Prometheus配置
2.1 配置prometheus.yml
$ sudo tee /usr/local/prometheus/prometheus.yml << EOF global: scrape_interval: 15s evaluation_interval: 15s alerting: alertmanagers: - static_configs: - targets: - localhost:9093 rule_files: - "/usr/local/prometheus/rules/*.yml" scrape_configs: - job_name: "prometheus" static_configs: - targets: ["localhost:9090"] - job_name: "node_exporter" static_configs: - targets: ["192.168.1.100:9100", "192.168.1.101:9100", "192.168.1.102:9100"] - job_name: "mysql_exporter" static_configs: - targets: ["192.168.1.100:9104"] - job_name: "nginx_exporter" static_configs: - targets: ["192.168.1.100:9113"] EOF # 创建告警规则目录 $ sudo mkdir -p /usr/local/prometheus/rules # 配置告警规则 $ sudo tee /usr/local/prometheus/rules/alert.yml << EOF groups: - name: node_alerts rules: - alert: HighCPUUsage expr: 100 - (avg by(instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80
for: 5m
labels:
severity: warning
annotations:
summary: “High CPU usage on {{ $labels.instance }}”
description: “CPU usage is above 80% (current value: {{ $value }}%)”
– alert: HighMemoryUsage
expr: (1 – (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100 > 80
for: 5m
labels:
severity: warning
annotations:
summary: “High memory usage on {{ $labels.instance }}”
description: “Memory usage is above 80% (current value: {{ $value }}%)”
– alert: DiskSpaceLow
expr: (node_filesystem_avail_bytes{fstype!=”tmpfs”} / node_filesystem_size_bytes{fstype!=”tmpfs”}) * 100 < 20
for: 5m
labels:
severity: critical
annotations:
summary: "Low disk space on {{ $labels.instance }}"
description: "Disk space is below 20% (current value: {{ $value }}%)"
- alert: ServiceDown
expr: up == 0
for: 1m
labels:
severity: critical
annotations:
summary: "Service {{ $labels.job }} is down on {{ $labels.instance }}"
description: "Service has been down for more than 1 minute."
EOF
# 重启服务
$ sudo systemctl restart prometheus
# 验证配置
$ curl http://localhost:9090/api/v1/targets
{
"status": "success",
"data": {
"activeTargets": [
{
"discoveredLabels": {
"__address__": "localhost:9090",
"__metrics_path__": "/metrics",
"__scheme__": "http",
"job": "prometheus"
},
"labels": {
"job": "prometheus"
},
"scrapeUrl": "http://localhost:9090/metrics",
"lastError": "",
"lastScrape": "2026-04-04T00:40:00.123456789Z",
"lastScrapeDuration": 0.001234567,
"health": "up"
}
]
}
}
Part03-Node Exporter安装
3.1 安装Node Exporter
$ wget https://github.com/prometheus/node_exporter/releases/download/v1.6.0/node_exporter-1.6.0.linux-amd64.tar.gz
–2026-04-04 00:40:00– https://github.com/prometheus/node_exporter/releases/download/v1.6.0/node_exporter-1.6.0.linux-amd64.tar.gz
Resolving github.com (github.com)… 140.82.121.4
Connecting to github.com (github.com)|140.82.121.4|:443… connected.
HTTP request sent, awaiting response… 200 OK
Length: 12345678 (12M) [application/octet-stream]
Saving to: ‘node_exporter-1.6.0.linux-amd64.tar.gz’
node_exporter-1.6.0.linux-amd64.tar.gz 100%[================================================================================>] 12.00M 10.0MB/s in 1s
2026-04-04 00:40:01 (10.0 MB/s) – ‘node_exporter-1.6.0.linux-amd64.tar.gz’ saved [12345678/12345678]
# 解压安装
$ sudo tar xzf node_exporter-1.6.0.linux-amd64.tar.gz -C /usr/local/
$ sudo ln -s /usr/local/node_exporter-1.6.0.linux-amd64 /usr/local/node_exporter
# 创建systemd服务
$ sudo tee /etc/systemd/system/node_exporter.service << EOF
[Unit]
Description=Node Exporter
Documentation=https://prometheus.io/docs/guides/node-exporter/
After=network-online.target
[Service]
Type=simple
ExecStart=/usr/local/node_exporter/node_exporter
[Install]
WantedBy=multi-user.target
EOF
# 启动服务
$ sudo systemctl daemon-reload
$ sudo systemctl start node_exporter
$ sudo systemctl enable node_exporter
Created symlink /etc/systemd/system/multi-user.target.wants/node_exporter.service → /etc/systemd/system/node_exporter.service.
# 配置防火墙
$ sudo firewall-cmd --permanent --add-port=9100/tcp
success
$ sudo firewall-cmd --reload
success
# 验证指标
$ curl http://localhost:9100/metrics | head -20
# HELP go_gc_duration_seconds A summary of the pause duration of garbage collection cycles.
# TYPE go_gc_duration_seconds summary
go_gc_duration_seconds{quantile="0"} 1.2345e-05
go_gc_duration_seconds{quantile="0.25"} 2.3456e-05
go_gc_duration_seconds{quantile="0.5"} 3.4567e-05
go_gc_duration_seconds{quantile="0.75"} 4.5678e-05
go_gc_duration_seconds{quantile="1"} 5.6789e-05
go_gc_duration_seconds_sum 0.001234567
go_gc_duration_seconds_count 100
# HELP go_goroutines Number of goroutines that currently exist.
# TYPE go_goroutines gauge
go_goroutines 10
# HELP go_info Information about the Go environment.
# TYPE go_info gauge
go_info{version="go1.20.4"} 1
Part04-Grafana安装
4.1 安装Grafana可视化
$ sudo tee /etc/yum.repos.d/grafana.repo << EOF [grafana] name=grafana baseurl=https://rpm.grafana.com repo_gpgcheck=1 enabled=1 gpgcheck=1 gpgkey=https://rpm.grafana.com/gpg.key sslverify=1 sslcacert=/etc/pki/tls/certs/ca-bundle.crt EOF # 安装Grafana $ sudo dnf install -y grafana Last metadata expiration check: 0:45:23 ago on Fri 04 Apr 2026 00:40:15 AM CST. Dependencies resolved. ================================================================================ Package Architecture Version Repository Size ================================================================================ Installing: grafana x86_64 10.0.0-1 grafana 50 M Transaction Summary ================================================================================ Install 1 Package Total download size: 50 M Installed size: 150 M Downloading Packages: grafana-10.0.0-1.x86_64.rpm 50 MB/s | 50 MB 00:01 -------------------------------------------------------------------------------- Total 50 MB/s | 50 MB 00:01 Running transaction check Transaction check succeeded. Running transaction test Transaction test succeeded. Running transaction Preparing : 1/1 Installing : grafana-10.0.0-1.x86_64 1/1 Running scriptlet: grafana-10.0.0-1.x86_64 1/1 Verifying : grafana-10.0.0-1.x86_64 1/1 Installed: grafana-10.0.0-1.x86_64 Complete! # 启动服务 $ sudo systemctl start grafana-server $ sudo systemctl enable grafana-server Created symlink /etc/systemd/system/multi-user.target.wants/grafana-server.更多学习教程公众号风哥教程itpux_comservice → /usr/lib/systemd/system/grafana-server.service. # 配置防火墙 $ sudo firewall-cmd --permanent --add-port=3000/tcp success $ sudo firewall-cmd --reload success # 访问Web界面 http://192.168.1.100:3000 默认用户名: admin 默认密码: admin # 添加Prometheus数据源 Configuration -> Data Sources -> Add data source -> Prometheus
URL: http://localhost:9090
Save & Test
# 导入Node Exporter仪表板
Dashboards -> Import -> 输入仪表板ID: 1860
Load -> 选择Prometheus数据源 -> Import
Part05-监控告警
5.1 配置Alertmanager
$ wget https://github.com/prometheus/alertmanager/releases/download/v0.25.0/alertmanager-0.25.0.linux-amd64.tar.gz
–2026-04-04 00:45:00– https://github.com/prometheus/alertmanager/releases/download/v0.学习交流加群风哥微信: itpux-com25.0/alertmanager-0.25.0.linux-amd64.tar.gz
Resolving github.com (github.com)… 140.82.121.4
Connecting to github.com (github.com)|140.82.121.4|:443… connected.
HTTP request sent, awaiting response… 200 OK
Length: 12345678 (12M) [application/octet-stream]
Saving to: ‘alertmanager-0.25.0.linux-amd64.tar.gz’
alertmanager-0.25.更多视频教程www.fgedu.net.cn0.linux-amd64.tar.gz 100%[================================================================================>] 12.00M 10.0MB/s in 1s
2026-04-04 00:45:01 (10.0 MB/s) – ‘alertmanager-0.25.0.linux-amd64.tar.gz’ saved [12345678/12345678]
# 解压安装
$ sudo tar xzf alertmanager-0.25.0.linux-amd64.tar.gz -C /usr/local/
$ sudo ln -s /usr/local/alertmanager-0.25.0.linux-amd64 /usr/local/alertmanager
# 配置Alertmanager
$ sudo tee /usr/local/alertmanager/alertmanager.yml << EOF
global:
smtp_smarthost: 'smtp.fgedu.net.cn:587'
smtp_from: 'alertmanager@fgedu.net.cn'
smtp_auth_username: 'alertmanager@fgedu.net.cn'
smtp_auth_password: 'password'
route:
group_by: ['alertname']
group_wait: 10s
group_interval: 10s
repeat_interval: 1h
receiver: 'email'
receivers:
- name: 'email'
email_configs:
- to: 'admin@fgedu.net.cn'
send_resolved: true
EOF
# 创建systemd服务
$ sudo tee /etc/systemd/system/alertmanager.service << EOF
[Unit]
Description=Alertmanager
After=network-online.target
[Service]
Type=simple
ExecStart=/usr/local/alertmanager/alertmanager \
--config.file=/usr/local/alertmanager/alertmanager.yml \
--storage.path=/usr/local/alertmanager/data
[Install]
WantedBy=multi-user.target
EOF
# 启动服务
$ sudo systemctl daemon-reload
$ sudo systemctl start alertmanager
$ sudo systemctl enable alertmanager
Crea学习交流加群风哥QQ113257174ted symlink /etc/systemd/system/multi-user.target.wants/alertmanager.service → /etc/systemd/system/alertmanager.service.
# 配置防火墙
$ sudo firewall-cmd --permanent --add-port=9093/tcp
success
$ sudo firewall-cmd --reload
success
# 访问Web界面
http://192.168.1.100:9093
1. 配置合理的数据保留时间
2. 设置合适的告警阈值
3. 配置多种告警通知方式
4. 定期检查监控指标
5. 优化仪表板显示
本文由风哥教程整理发布,仅用于学习测试使用,转载注明出处:http://www.fgedu.net.cn/10327.html
