1. 首页 > Linux教程 > 正文

Linux教程FG243-监控服务器配置

内容简介:本文风哥教程参考Linux官方文档、Red Hat Enterprise Linux官方文档、Ansible Automation Platform官方文档、Docker官方文档、Kubernetes官方文档和Podman官方文档等内容,详细介绍了相关技术的配置和使用方法。

风哥提示:

本文档详细介绍Prometheus监控系统的安装、配置和管理方法。

Part01-Prometheus安装

1.1 安装Prometheus服务

# 下载Prometheus
$ wget https://github.com/prometheus/prometheus/releases/download/v2.45.0/prometheus-2.45.0.linux-amd64.tar.gz
–2026-04-04 00:35:00– https://github.com/prometheus/prometheus/releases/download/v2.45.0/prometheus-2.45.0.linux-amd64.tar.gz
Resolving github.com (github.com)… 140.82.121.4
Connecting to github.com (github.com)|140.82.121.4|:443… connected.
HTTP request sent, awaiting response… 200 OK
Length: 123456789 (118M) [application/octet-stream]
Saving to: ‘prometheus-2.45.0.linux-amd64.tar.gz’

prometheus-2.45.0.linux-amd64.tar.gz 100%[================================================================================>] 118.00M 10.0MB/s in 12s

2026-04-04 00:35:12 (10.0 MB/s) – ‘prometheus-2.45.0.linux-amd64.tar.gz’ saved [123456789/123456789]

# 解压安装
$ sudo tar xzf prometheus-2.45.0.linux-amd64.tar.gz -C /usr/local/
$ sudo ln -s /usr/local/prometheus-2.45.0.linux-amd64 /usr/local/prometheus

# 创建用户
$ sudo useradd -r -s /bin/false prometheus

# 创建数据目录
$ sudo mkdir -p /data/prometheus
$ sudo chown prometheus:prometheus /data/prometheus

# 创建systemd服务
$ sudo tee /etc/systemd/system/prometheus.service << EOF [Unit] Description=Prometheus Server Documentation=https://prometheus.io/docs/introduction/overview/ After=network-online.target [Service] User=prometheus Group=prometheus Type=simple ExecStart=/usr/local/prometheus/prometheus \ --config.file=/usr/local/prometheus/prometheus.yml \ --storage.tsdb.path=/data/prometheus \ --web.console.templates=/usr/local/prometheus/consoles \ --web.console.libraries=/usr/local/prometheus/console_libraries \ --web.listen-address=0.0.0.0:9090 [Install] WantedBy=multi-user.target EOF # 启动服务 $ sudo systemctl daemon-reload $ sudo systemctl start prometheus $ sudo systemctl enable prometheus Created symlink /etc/systemd/system/multi-user.target.wants/prometheus.service → /etc/systemd/system/prometheus.service. # 查看服务状态 $ sudo systemctl status prometheus ● prometheus.service - Prometheus Server Loaded: loaded (/etc/systemd/system/prometheus.service; enabled; preset: disabled) Active: active (running) since Fri 2026-04-04 00:35:30 CST; 10s ago Docs: https://prometheus.io/docs/introduction/overview/ Main PID: 12379 (prometheus) Tasks: 7 (limit: 49152) Memory: 50.5M CPU: 500ms CGroup: /system.slice/prometheus.service └─12379 /usr/local/prometheus/prometheus --config.file=/usr/local/prometheus/prometheus.yml --storage.tsdb.path=/data/prometheus --web.console.templates=/usr/local/prometheus/consoles --web.console.libraries=/usr/local/prometheus/console_libraries --web.listen-address=0.0.0.0:9090 Apr 04 00:35:30 rhel10 systemd[1]: Started Prometheus Server. Apr 04 00:35:30 rhel10 prometheus[12379]: ts=2026-04-03T16:35:30.123Z caller=main.go:543 level=info msg="Starting Prometheus Server" version="(version=2.45.0, branch=HEAD, revision=1234567890)" # 配置防火墙 $ sudo firewall-cmd --permanent --add-port=9090/tcp success $ sudo firewall-cmd --reload success # 访问Web界面 http://192.168.1.100:9090

Part02-Prometheus配置

2.1 配置prometheus.yml

# 编辑配置文件
$ sudo tee /usr/local/prometheus/prometheus.yml << EOF global: scrape_interval: 15s evaluation_interval: 15s alerting: alertmanagers: - static_configs: - targets: - localhost:9093 rule_files: - "/usr/local/prometheus/rules/*.yml" scrape_configs: - job_name: "prometheus" static_configs: - targets: ["localhost:9090"] - job_name: "node_exporter" static_configs: - targets: ["192.168.1.100:9100", "192.168.1.101:9100", "192.168.1.102:9100"] - job_name: "mysql_exporter" static_configs: - targets: ["192.168.1.100:9104"] - job_name: "nginx_exporter" static_configs: - targets: ["192.168.1.100:9113"] EOF # 创建告警规则目录 $ sudo mkdir -p /usr/local/prometheus/rules # 配置告警规则 $ sudo tee /usr/local/prometheus/rules/alert.yml << EOF groups: - name: node_alerts rules: - alert: HighCPUUsage expr: 100 - (avg by(instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80
for: 5m
labels:
severity: warning
annotations:
summary: “High CPU usage on {{ $labels.instance }}”
description: “CPU usage is above 80% (current value: {{ $value }}%)”

– alert: HighMemoryUsage
expr: (1 – (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100 > 80
for: 5m
labels:
severity: warning
annotations:
summary: “High memory usage on {{ $labels.instance }}”
description: “Memory usage is above 80% (current value: {{ $value }}%)”

– alert: DiskSpaceLow
expr: (node_filesystem_avail_bytes{fstype!=”tmpfs”} / node_filesystem_size_bytes{fstype!=”tmpfs”}) * 100 < 20 for: 5m labels: severity: critical annotations: summary: "Low disk space on {{ $labels.instance }}" description: "Disk space is below 20% (current value: {{ $value }}%)" - alert: ServiceDown expr: up == 0 for: 1m labels: severity: critical annotations: summary: "Service {{ $labels.job }} is down on {{ $labels.instance }}" description: "Service has been down for more than 1 minute." EOF # 重启服务 $ sudo systemctl restart prometheus # 验证配置 $ curl http://localhost:9090/api/v1/targets { "status": "success", "data": { "activeTargets": [ { "discoveredLabels": { "__address__": "localhost:9090", "__metrics_path__": "/metrics", "__scheme__": "http", "job": "prometheus" }, "labels": { "job": "prometheus" }, "scrapeUrl": "http://localhost:9090/metrics", "lastError": "", "lastScrape": "2026-04-04T00:40:00.123456789Z", "lastScrapeDuration": 0.001234567, "health": "up" } ] } }

Part03-Node Exporter安装

3.1 安装Node Exporter

# 下载Node Exporter
$ wget https://github.com/prometheus/node_exporter/releases/download/v1.6.0/node_exporter-1.6.0.linux-amd64.tar.gz
–2026-04-04 00:40:00– https://github.com/prometheus/node_exporter/releases/download/v1.6.0/node_exporter-1.6.0.linux-amd64.tar.gz
Resolving github.com (github.com)… 140.82.121.4
Connecting to github.com (github.com)|140.82.121.4|:443… connected.
HTTP request sent, awaiting response… 200 OK
Length: 12345678 (12M) [application/octet-stream]
Saving to: ‘node_exporter-1.6.0.linux-amd64.tar.gz’

node_exporter-1.6.0.linux-amd64.tar.gz 100%[================================================================================>] 12.00M 10.0MB/s in 1s

2026-04-04 00:40:01 (10.0 MB/s) – ‘node_exporter-1.6.0.linux-amd64.tar.gz’ saved [12345678/12345678]

# 解压安装
$ sudo tar xzf node_exporter-1.6.0.linux-amd64.tar.gz -C /usr/local/
$ sudo ln -s /usr/local/node_exporter-1.6.0.linux-amd64 /usr/local/node_exporter

# 创建systemd服务
$ sudo tee /etc/systemd/system/node_exporter.service << EOF [Unit] Description=Node Exporter Documentation=https://prometheus.io/docs/guides/node-exporter/ After=network-online.target [Service] Type=simple ExecStart=/usr/local/node_exporter/node_exporter [Install] WantedBy=multi-user.target EOF # 启动服务 $ sudo systemctl daemon-reload $ sudo systemctl start node_exporter $ sudo systemctl enable node_exporter Created symlink /etc/systemd/system/multi-user.target.wants/node_exporter.service → /etc/systemd/system/node_exporter.service. # 配置防火墙 $ sudo firewall-cmd --permanent --add-port=9100/tcp success $ sudo firewall-cmd --reload success # 验证指标 $ curl http://localhost:9100/metrics | head -20 # HELP go_gc_duration_seconds A summary of the pause duration of garbage collection cycles. # TYPE go_gc_duration_seconds summary go_gc_duration_seconds{quantile="0"} 1.2345e-05 go_gc_duration_seconds{quantile="0.25"} 2.3456e-05 go_gc_duration_seconds{quantile="0.5"} 3.4567e-05 go_gc_duration_seconds{quantile="0.75"} 4.5678e-05 go_gc_duration_seconds{quantile="1"} 5.6789e-05 go_gc_duration_seconds_sum 0.001234567 go_gc_duration_seconds_count 100 # HELP go_goroutines Number of goroutines that currently exist. # TYPE go_goroutines gauge go_goroutines 10 # HELP go_info Information about the Go environment. # TYPE go_info gauge go_info{version="go1.20.4"} 1

Part04-Grafana安装

4.1 安装Grafana可视化

# 添加Grafana仓库
$ sudo tee /etc/yum.repos.d/grafana.repo << EOF [grafana] name=grafana baseurl=https://rpm.grafana.com repo_gpgcheck=1 enabled=1 gpgcheck=1 gpgkey=https://rpm.grafana.com/gpg.key sslverify=1 sslcacert=/etc/pki/tls/certs/ca-bundle.crt EOF # 安装Grafana $ sudo dnf install -y grafana Last metadata expiration check: 0:45:23 ago on Fri 04 Apr 2026 00:40:15 AM CST. Dependencies resolved. ================================================================================ Package Architecture Version Repository Size ================================================================================ Installing: grafana x86_64 10.0.0-1 grafana 50 M Transaction Summary ================================================================================ Install 1 Package Total download size: 50 M Installed size: 150 M Downloading Packages: grafana-10.0.0-1.x86_64.rpm 50 MB/s | 50 MB 00:01 -------------------------------------------------------------------------------- Total 50 MB/s | 50 MB 00:01 Running transaction check Transaction check succeeded. Running transaction test Transaction test succeeded. Running transaction Preparing : 1/1 Installing : grafana-10.0.0-1.x86_64 1/1 Running scriptlet: grafana-10.0.0-1.x86_64 1/1 Verifying : grafana-10.0.0-1.x86_64 1/1 Installed: grafana-10.0.0-1.x86_64 Complete! # 启动服务 $ sudo systemctl start grafana-server $ sudo systemctl enable grafana-server Created symlink /etc/systemd/system/multi-user.target.wants/grafana-server.更多学习教程公众号风哥教程itpux_comservice → /usr/lib/systemd/system/grafana-server.service. # 配置防火墙 $ sudo firewall-cmd --permanent --add-port=3000/tcp success $ sudo firewall-cmd --reload success # 访问Web界面 http://192.168.1.100:3000 默认用户名: admin 默认密码: admin # 添加Prometheus数据源 Configuration -> Data Sources -> Add data source -> Prometheus
URL: http://localhost:9090
Save & Test

# 导入Node Exporter仪表板
Dashboards -> Import -> 输入仪表板ID: 1860
Load -> 选择Prometheus数据源 -> Import

Part05-监控告警

5.1 配置Alertmanager

# 下载Alertmanager
$ wget https://github.com/prometheus/alertmanager/releases/download/v0.25.0/alertmanager-0.25.0.linux-amd64.tar.gz
–2026-04-04 00:45:00– https://github.com/prometheus/alertmanager/releases/download/v0.学习交流加群风哥微信: itpux-com25.0/alertmanager-0.25.0.linux-amd64.tar.gz
Resolving github.com (github.com)… 140.82.121.4
Connecting to github.com (github.com)|140.82.121.4|:443… connected.
HTTP request sent, awaiting response… 200 OK
Length: 12345678 (12M) [application/octet-stream]
Saving to: ‘alertmanager-0.25.0.linux-amd64.tar.gz’

alertmanager-0.25.更多视频教程www.fgedu.net.cn0.linux-amd64.tar.gz 100%[================================================================================>] 12.00M 10.0MB/s in 1s

2026-04-04 00:45:01 (10.0 MB/s) – ‘alertmanager-0.25.0.linux-amd64.tar.gz’ saved [12345678/12345678]

# 解压安装
$ sudo tar xzf alertmanager-0.25.0.linux-amd64.tar.gz -C /usr/local/
$ sudo ln -s /usr/local/alertmanager-0.25.0.linux-amd64 /usr/local/alertmanager

# 配置Alertmanager
$ sudo tee /usr/local/alertmanager/alertmanager.yml << EOF global: smtp_smarthost: 'smtp.fgedu.net.cn:587' smtp_from: 'alertmanager@fgedu.net.cn' smtp_auth_username: 'alertmanager@fgedu.net.cn' smtp_auth_password: 'password' route: group_by: ['alertname'] group_wait: 10s group_interval: 10s repeat_interval: 1h receiver: 'email' receivers: - name: 'email' email_configs: - to: 'admin@fgedu.net.cn' send_resolved: true EOF # 创建systemd服务 $ sudo tee /etc/systemd/system/alertmanager.service << EOF [Unit] Description=Alertmanager After=network-online.target [Service] Type=simple ExecStart=/usr/local/alertmanager/alertmanager \ --config.file=/usr/local/alertmanager/alertmanager.yml \ --storage.path=/usr/local/alertmanager/data [Install] WantedBy=multi-user.target EOF # 启动服务 $ sudo systemctl daemon-reload $ sudo systemctl start alertmanager $ sudo systemctl enable alertmanager Crea学习交流加群风哥QQ113257174ted symlink /etc/systemd/system/multi-user.target.wants/alertmanager.service → /etc/systemd/system/alertmanager.service. # 配置防火墙 $ sudo firewall-cmd --permanent --add-port=9093/tcp success $ sudo firewall-cmd --reload success # 访问Web界面 http://192.168.1.100:9093

风哥针对配置建议:
1. 配置合理的数据保留时间
2. 设置合适的告警阈值
3. 配置多种告警通知方式
4. 定期检查监控指标
5. 优化仪表板显示

本文由风哥教程整理发布,仅用于学习测试使用,转载注明出处:http://www.fgedu.net.cn/10327.html

联系我们

在线咨询:点击这里给我发消息

微信号:itpux-com

工作日:9:30-18:30,节假日休息