内容简介:本文风哥教程参考Linux官方文档、Red Hat Enterprise Linux官方文档、Ansible Automation Platform官方文档、Docker官方文档、Kubernetes官方文档和Podman官方文档等内容,详细介绍了相关技术的配置和使用方法。
本文档详细介绍集群服务资源配置和故障转移测试方法。
风哥提示:
Part01-服务资源配置
1.1 创建Web服务资源
[root@ha-node1 ~]# pcs resource create web_vip ocf:heartbeat:IPaddr2 \
ip=192.168.1.100 cidr_netmask=24 \
op monitor interval=30s
[root@ha-node1 ~]# pcs resource create web_service systemd:httpd \
op monitor interval=20s timeout=10s
# 创建资源组
[root@ha-node1 ~]# pcs resource group add webgroup web_vip web_service
# 查看资源组
[root@ha-node1 ~]# pcs resource show webgroup
Group: webgroup
Resource: web_vip (class=ocf provider=heartbeat type=IPaddr2)
Attributes: cidr_netmask=24 ip=192.168.1.100
Operations: monitor interval=30s (web_vip-monitor-interval-30s)
Resource: web_service (class=systemd type=httpd)
Operations: monitor interval=20s timeout=10s (web_service-monitor-interval-20s)
# 查看资源状态
[root@ha-node1 ~]# pcs status resources
* webgroup (ocf::heartbeat:IPaddr2): Started ha-node1
1.2 创建数据库服务资源
[root@ha-node1 ~]# pcs resource create db_vip ocf:heartbeat:IPaddr2 \
ip=192.168.1.101 cidr_netmask=24 \
op monitor interval=30s
[root@ha-node1 ~]# pcs resource create db_service systemd:mariadb \
op monitor interval=20s timeout=10s
# 创建资源组
[root@ha-node1 ~]# pcs resource group add dbgroup db_vip db_service
# 查看所有资源
[root@ha-node1 ~]# pcs status resources
* webgroup (ocf::heartbeat:IPaddr2): Started ha-node1
* dbgroup (ocf::heartbeat:IPaddr2): Started ha-node2
# 验证服务运行
[root@ha-node1 ~]# curl -I http://192.168.1.100
HTTP/1.1 200 OK
Date: Fri, 04 Apr 2026 11:10:00 GMT
Server: Apache/2.4.53 (Rocky Linux)
Last-Modified: Fri, 04 Apr 2026 10:00:00 GMT
ETag: “1234-5678”
Accept-Ranges: bytes
Content-Length: 1234
Content-Type: text/html; charset=UTF-8
[root@ha-node1 ~]# mysql -h 192.168.1.101 -u root -p -e “SELECT 1”
Enter password:
+—+
| 1 |
+—+
| 1 |
+—+
Part02-故障转移测试
2.1 节点故障测试
[root@ha-node1 ~]# pcs status resources
* webgroup (ocf::heartbeat:IPaddr2): Started ha-node1
* dbgroup (ocf::heartbeat:IPaddr2): Started ha-node2
# 模拟node1故障
[root@ha-node1 ~]# pcs cluster standby ha-node1
# 查看资源状态(资源应转移到node2)
[root@ha-node1 ~]# pcs status resources
* webgroup (ocf::heartbeat:IPaddr2): Started ha-node2
* dbgroup (ocf::heartbeat:IPaddr2): Started ha-node2
# 验证服务可访问
[root@client ~]# curl -I http://192.168.1.100
HTTP/1.1 200 OK
Date: Fri, 04 Apr 2026 11:12:00 GMT
Server: Apache/2.4.53 (Rocky Linux)
Content-Type: text/html; charset=UTF-8
[root@client ~]# mysql -h 192.168.1.101 -u root -p -e “SELECT 1”
Enter password:
+—+
| 1 |
+—+
| 1 |
+—+
# 恢复node1
[root@ha-node1 ~]# pcs cluster unstandby ha-node1
# 查看节点状态
[root@ha-node1 ~]# pcs status nodes
Pacemaker Nodes:
Online: ha-node1 ha-node2
Standby:
Maintenance:
Offline:
from PG视频:www.itpux.com
2.2 服务故障测试
[root@ha-node1 ~]# systemctl stop httpd
# 查看资源状态(集群应自动重启服务)
[root@ha-node1 ~]# pcs status resources
* webgroup (ocf::heartbeat:IPaddr2): Started ha-node1
# 查看服务状态
[root@ha-node1 ~]# systemctl status httpd
● httpd.service – The Apache HTTP Server
Loaded: loaded (/usr/lib/systemd学习交流加群风哥微信: itpux-com/system/httpd.service; enabled; preset: disabled)
Active: active (running) since Fri 2026-04-04 11:15:00 CST; 10s ago
Docs: man:httpd(8)
man:apachectl(8)
Main PID: 12345 (httpd)
Status: “Total requests: 0; Idle/Busy workers 100/0;Requests/sec: 0; Bytes served/sec: 0 B/sec”
Tasks: 213 (limit: 11232)
Memory: 25.6M
CGroup: /system.slice/httpd.service
├─12345 /usr/sbin/httpd -DFOREGROUND
├─12346 /usr/sbin/httpd -DFOREGROUND
└─12347 /usr/sbin/httpd -DFOREGROUND
Apr 04 11:15:00 ha-node1 systemd[1]: Started The Apache HTTP Server.
# 查看失败计数
[root@ha-node1 ~]# pcs status failcounts
Fail counts for web_service
ha-node1: 1 (last-failure: Fri Apr 4 11:14:50 2026)
# 清理失败计数
[root@ha-node1 ~]# pcs resource cleanup web_service
Cleaned up web_service on ha-node1
Cleaned up web_service on ha-node2
Part03-故障转移优化
3.1 设置迁移阈值
[root@ha-node1 ~]# pcs resource update web_service meta migration-threshold=3
# 设置失败超时
[root@ha-node1 ~]# pcs resource update web_service meta failure-timeout=60s
# 查看资源配置
[root@ha-node1 ~]# pcs resource show web_service
Resource: web_service (class=systemd type=httpd)
Meta Attrs: failure-timeout=60s migration-threshold=3
Operations: monitor interval=20s timeout=10s (web_service-monitor-interval-20s)
# 设置资源粘性
[root@ha-node1 ~]# pcs resource update webgroup meta resource-stickiness=100
# 查看资源组配置
[root@ha-node1 ~]# pcs resource show webgroup
Group: webgroup
Meta Attrs: resource-stickiness=100
Resource: web_vip (class=ocf provider=heartb更多学习教程公众号风哥教程itpux_comeat type=IPaddr2)
Attributes: cidr_netmask=24 ip=192.168.1.100
Operations: monitor interval=30s (web_vip-monitor-interval-30s)
Resource: web_service (class=systemd type=httpd)
Meta Attrs: failure-timeout=60s migration-threshold=3
Operations: monitor interval=20s timeout=10s (web_service-monitor-interval-20s)
3.2 设置故障转移策略
[root@ha-node1 ~]# pcs constraint location webgroup prefers ha-node1=100
[root@ha-node1 ~]# pcs constraint location webgroup prefers ha-node2=50
# 查看位置约束
[root@ha-node1 ~]# pcs constraint location
Location Constraints:
Resource: webgroup
Enabled on: ha-node1 (score:100)
Enabled on: ha-node2 (score:50)
# 设置故障回转策略
[root@ha-node1 ~]# pcs resource update webgroup meta resource-stickiness=200
# 验证配置
[root@ha-node1 ~]# pcs resource show webgroup
Group: webgroup
Meta Attrs: resource-stickiness=200
Resource: web_vip (class=ocf provider=heartbeat type=IPaddr2)
Attributes: cidr_netmask=24 ip=192.168.1.100
Operations: monitor interval=30s (web_vip-monitor-interval-30s)
Resource: web_service (class=systemd type=httpd)
Meta Attrs: failure-timeout=60s migration-threshold=3
Operations: monitor interval=20s timeout=10s (web_service-monitor-interval-20s)
Part04-故障转移监控
4.1 监控故障转移
[root@ha-node1 ~]# pcs status events
Events:
* Fri Apr 4 11:14:50 2026: web_service failed on ha-node1
* Fri Apr 4 11:14:51 2026: web_service restarted on ha-node1
* Fri Apr 4 11:15:00 2026: web_service recovered on ha-node1
# 查看资源操作历史
[root@ha-node1 ~]# pcs status operations
Operations:
* web_vip: monitor interval=30s last-rc-change=Fri Apr 4 11:15:00 2026 exec-time=10ms
* web_service: monitor interval=20s last-rc-change=Fri Apr 4 11:15:00 2026 exec-time=5ms
* db_vip: monitor interval=30s last-rc-change=Fri Apr 4 11:10:00 2026 exec-time=10ms
* db_service: monitor interval=20s last-rc-change=Fri Apr 4 11:10:00 2026 exec-time=8ms
# 查看节点历史
[root@ha-node1 ~]# pcs status nodes –all
Pacemaker Nodes:
Online: ha-node1 ha-node2
Standby:
Maintenance:
Offline:
Corosync Nodes:
Online: ha-node1 ha-node2
Offline:
- 定期进行故障转移测试
- 设置合理的迁移阈值
- 配置资源粘性避免频繁切换
- 监控故障转移时间
- 记录故障转移事件
本文由风哥教程整理发布,仅用于学习测试使用,转载注明出处:http://www.fgedu.net.cn/10327.html
