内容简介:本文风哥教程参考Linux官方文档、Red Hat Enterprise Linux官方文档、Ansible Automation Platform官方文档、Docker官方文档、Kubernetes官方文档和Podman官方文档等内容,详细介绍了相关技术的配置和使用方法。
本文档详细介绍Pacemaker/Corosync高可用集群的安装配置方法,包括集群软件安装、环境准备、集群初始化等核心内容,适合运维人员在学习和测试中使用,如果要应用于生产环境则需要自行确认。
from PG视频:www.itpux.com
Part01-基础概念与理论知识
1.1 高可用集群概念
高可用集群(High Availability Cluster)是指通过多台服务器协同工作,提供服务冗余和故障转移能力,确保业务连续性的系统架构。当某个节点发生故障时,服务会自动切换到其他健康节点继续运行。
- 故障自动检测:实时监控节点和服务状态
- 自动故障转移:故障时自动切换服务
- 数据一致性:确保数据不丢失、不损坏
- 服务连续性:最小化服务中断时间
1.2 Pacemaker集群资源管理器
Pacemaker是企业级开源集群资源管理器,负责管理集群中的资源(如IP地址、服务、存储等),实现资源的启动、停止、监控和故障转移。
– 资源管理:管理IP、服务、存储等资源
– 故障检测:监控节点和资源健康状态
– 故障恢复:自动重启或迁移故障资源
– 约束管理:控制资源运行位置和顺序
– 仲裁管理:处理集群分裂场景
1.3 Corosync集群引擎
Corosync是集群通信层,提供节点间的消息传递、成员管理和仲裁服务。Pacemaker依赖Corosync实现节点间的协调通信。
– 消息传递:节点间可靠消息传输
– 成员管理:维护集群成员列表
– 仲裁服务:防止集群脑裂
– 心跳检测:监控节点存活状态
Part02-生产环境规划与建议
2.1 集群环境规划
生产环境集群规划需要考虑节点数量、网络架构、存储方案等多个方面。
节点数量:最少2个节点,推荐3个或奇数个节点
节点配置:
– node1: 192.168.1.10 (ha-node1.fgedu.net.cn)
– node2: 192.168.1.11 (ha-node2.fgedu.net.cn)
– node3: 192.168.1.12 (ha-node3.fgedu.net.cn) [可选]
# VIP规划
虚拟IP: 192.168.1.100 (用于服务访问)
# 存储规划
共享存储: NFS/iSCSI/GFS2
本地存储: 各节点独立存储
2.2 网络规划建议
网络规划需要考虑业务网络、心跳网络和管理网络的隔离。
方案一:单网络(测试环境)
– 所有流量共用一个网络
方案二:双网络(生产环境推荐)
– 业务网络:对外提供服务
– 心跳网络:节点间通信
方案三:三网络(高安全环境)
– 业务网络:对外服务
– 心跳网络:集群通信
– 管理网络:运维管理
# 端口规划
TCP 2224: pcs通信端口
UDP 5405-5406: corosync通信端口
TCP 3121: pacemaker远程连接
2.3 存储规划建议
存储规划需要根据业务需求选择合适的存储方案。
1. 共享存储(推荐)
– NFS: 简单易用,适合文件共享
– iSCSI: 块存储,适合数据库
– GFS2: 集群文件系统,多节点并发访问
2. 本地存储
– 每个节点独立存储
– 需要数据同步机制
3. 分布式存储
– GlusterFS: 分布式文件系统
– Ceph: 统一存储平台
Part03-生产环境项目实施方案
3.1 安装集群软件包
3.1.1 在所有节点安装软件包
[root@ha-node1 ~]# dnf install -y pacemaker corosync pcs
Updating Subscription Management repositories.
Last metadata expiration check: 0:05:23 ago on Fri Apr 4 10:00:00 2026.
Dependencies resolved.
================================================================================
Package Architecture Version Repository Size
================================================================================
Installing:
pacemaker x86_64 2.1.6-1.el9 appstream 1.2 M
corosync x86_64 3.1.7-1.el9 appstream 280 k
pcs x86_64 0.11.4-1.el9 appstream 3.5 M
Transaction Summary
================================================================================
Install 3 Packages
Total download size: 5.0 M
Installed size: 25 M
Downloading Packages:
(1/3): corosync-3.1.7-1.el9.x86_64.rpm 2.8 MB/s | 280 kB 00:00
(2/3): pacemaker-2.1.6-1.el9.x86_64.rpm 5.0 MB/s | 1.2 MB 00:00
(3/3): pcs-0.11.4-1.el9.x86_64.rpm 8.0 MB/s | 3.5 MB 00:00
——————————————————————————–
Total 5.0 MB/s | 5.0 MB 00:01
Running transaction check
Transaction check succeeded.
Running transaction test
Transaction test succeeded.
Running transaction
Preparing : 1/1
Installing : corosync-3.1.7-1.el9.x86_64 1/3
Installing : pacemaker-2.1.6-1.el9.x86_64 2/3
Installing : pcs-0.11.4-1.el9.x86_64 3/3
Running scriptlet: pcs-0.11.4-1.el9.x86_64 3/3
Verifying : corosync-3.1.7-1.el9.x86_64 1/3
Verifying : pacemaker-2.1.6-1.el9.x86_64 2/3
Verifying : pcs-0.11.4-1.el9.x86_64 3/3
Installed:
corosync-3.1.7-1.el9.x86_64 学习交流加群风哥QQ113257174pacemaker-2.1.6-1.el9.x86_64 pcs-0.11.4-1.el9.x86_64
Complete!
# 验证安装
[root@ha-node1 ~]# rpm -qa | grep -E “pacemaker|corosync|pcs”
pacemaker-libs-2.1.6-1.el9.x86_64
pacemaker-cluster-libs-2.1.6-1.el9.x86_64
pacemaker-2.1.6-1.el9.x86_64
corosync-3.1.7-1.el9.x86_64
pcs-0.11.4-1.el9.x86_64
3.2 配置主机名和hosts文件
3.2.1 设置主机名
[root@ha-node1 ~]# hostnamectl set-hostname ha-node1.fgedu.net.cn
# 验证主机名
[root@ha-node1 ~]# hostnamectl
Static hostname: ha-node1.fgedu.net.cn
Icon name: computer-vm
Chassis: vm
Machine ID: abc123def456789
Boot ID: xyz789uvw123456
Virtualization: vmware
Operating System: Red Hat Enterprise Linux 9.4 (Plow)
CPE OS Name: cpe:/o:redhat:enterprise_linux:9::baseos
Kernel: Linux 5.14.0-427.13.1.el9_4.x86_64
Architecture: x86-64
Hardware Vendor: VMware, Inc.
Hardware Model: VMware Virtual Platform
# 在node2上设置主机名
[root@ha-node2 ~]# hostnamectl set-hostname ha-node2.fgedu.net.cn
# 验证主机名
[root@ha-node2 ~]# hostnamectl
Static hostname: ha-node2.fgedu.net.cn
Icon name: computer-vm
Chassis: vm
Machine ID: def456ghi789012
Boot ID: abc123def456789
Virtualization: vmware
Operating System: Red Hat Enterprise Linux 9.4 (Plow)
CPE OS Name: cpe:/o:redhat:enterprise_linux:9::baseos
Kernel: Linux 5.14.0-427.13.1.el9_4.x86_64
Architecture: x86-64
Hardware Vendor: VMware, Inc.
Hardware Model: VMware Virtual Platform
3.2.2 配置hosts文件
[root@ha-node1 ~]# cat >> /etc/hosts << EOF 192.168.1.10 ha-node1.fgedu.net.cn ha-node1 192.168.1.11 ha-node2.fgedu.net.cn ha-node2 EOF # 验证hosts文件 [root@ha-node1 ~]# cat /etc/hosts 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 192.168.1.10 ha-node1.fgedu.net.cn ha-node1 192.168.1.11 ha-node2.fgedu.net.cn ha-node2 # 测试主机名解析 [root@ha-node1 ~]# ping -c 2 ha-node2.fgedu.net.cn PING ha-node2.fgedu.net.cn (192.168.1.11) 56(84) bytes of data. 64 bytes from ha-node2.fgedu.net.cn (192.168.1.11): icmp_seq=1 ttl=64 time=0.521 ms 64 bytes from ha-node2.fgedu.net.cn (192.168.1.11): icmp_seq=2 ttl=64 time=0.398 ms --- ha-node2.fgedu.net.cn ping statistics --- 2 packets transmitted, 2 received, 0% packet loss, time 1001ms rtt min/avg/max/mdev=0.398/0.459/0.521/0.061 ms
3.3 配置SSH免密登录
3.3.1 生成SSH密钥
[root@ha-node1 ~]# ssh-keygen -t rsa -N “” -f ~/.ssh/id_rsa
Generating public/private rsa key pair.
Your identification has been saved in /root/.ssh/id_rsa
Your public key has been saved in /root/.ssh/id_rsa.pub
The key fingerprint is:
SHA256:abc123def456ghi789jkl012mno345pqr678stu901vwx root@ha-node1.fgedu.net.cn
The key’s randomart image is:
+—[RSA 3072]—-+
| .o. |
| . o |
| . . o |
| . o .o . |
| + oo S |
| . =.+o . |
| = B.o. |
| o O.=.E |
| . =.++o |
+—-[SHA256]—–+
# 查看生成的密钥
[root@ha-node1 ~]# ls -la ~/.ssh/
total 12
drwx——. 2 root root 57 Apr 4 10:05 .
dr-xr-x—. 6 root root 243 Apr 4 10:00 ..
-rw——-. 1 root root 2602 Apr 4 10:05 id_rsa
-rw-r–r–. 1 root root 572 Apr 4 10:05 id_rsa.pub
3.3.2 分发公钥到其他节点
[root@ha-node1 ~]# ssh-copy-id ha-node2.fgedu.net.cn
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: “/root/.ssh/id_rsa.pub”
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are
already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed — if you are prompted now it is to
install the new keys
root@ha-node2.fgedu.net.cn’s password:
Number of key(s) added: 1
Now try logging into the machine, with: “ssh ‘ha-node2.fgedu.net.cn'”
and check to make sure that only the key(s) you wanted were added.
# 测试免密登录
[root@ha-node1 ~]# ssh ha-node2.fgedu.net.cn ‘hostname’
ha-node2.fgedu.net.cn
# 在node2上重复相同操作
[root@ha-node2 ~]# ssh-keygen -t rsa -N “” -f ~/.ssh/id_rsa
[root@ha-node2 ~]# ssh-copy-id ha-node1.fgedu.net.cn
[root@ha-node2 ~]# ssh ha-node1.fgedu.net.cn ‘hostname’
ha-node1.fgedu.net.cn
3.4 配置防火墙规则
3.4.1 开放集群所需端口
[root@ha-node1 ~]# firewall-cmd –permanent –add-service=high-availability
success
[root@ha-node1 ~]# firewall-cmd –permanent –add-port=2224/tcp
success
[root@ha-node1 ~]# firewall-cmd –permanent –add-port=3121/tcp
success
[root@ha-node1 ~]# firewall-cmd –permanent –add-port=5405-5406/udp
success
[root@ha-node1 ~]# firewall-cmd –permanent –add-port=5405-5406/tcp
success
# 重新加载防火墙
[root@ha-node1 ~]# firewall-cmd –reload
success
# 验证防火墙规则
[root@ha-node1 ~]# firewall-cmd –list-all
public (active)
target: default
icmp-block-inversion: no
interfaces: ens33
sources:
services: cockpit dhcpv6-client high-availability ssh
ports: 2224/tcp 3121/tcp 5405-5406/udp 5405-5406/tcp
protocols:
forward: no
masquerade: no
forward-ports:
source-ports:
icmp-blocks:
rich rules:
Part04-生产案例与实战讲解
4.1 初始化集群
4.1.1 配置hacluster用户密码
[root@ha-node1 ~]# echo “hacluster:YourStrongPassword123!” | chpasswd
[root@ha-node2 ~]# echo “hacluster:YourStrongPassword123!” | chpasswd
# 启动pcsd服务
[root@ha-node1 ~]# systemctl start pcsd
[root@ha-node1 ~]# systemctl enable pcsd
Created symlink /etc/systemd/system/multi-user.target.wants/pcsd.service →
/usr/lib/systemd/system/pcsd.service.
[root@ha-node2 ~]# systemctl start pcsd
[root@ha-node2 ~]# systemctl enable pcsd
Created symlink /etc/systemd/system/multi-user.target.wants/pcsd.service →
/usr/lib/systemd/system/pcsd.service.
# 验证pcsd服务状态
[root@ha-node1 ~]# systemctl status pcsd
● pcsd.service – PCS GUI and remote configuration interface
Loaded: loaded (/usr/lib/systemd/system/pcsd.service; enabled; preset: disabled)
Active: active (running) since Fri 2026-04-04 10:10:00 CST; 30s ago
Main PID: 12345 (pcsd)
Tasks: 1 (limit: 23456)
Memory: 25.6M
CPU: 120ms
CGroup: /system.slice/pcsd.service
└─12345 /usr/libexec/pcsd/pcsd
Apr 04 10:10:00 ha-node1.fgedu.net.cn systemd[1]: Started PCS GUI and remote configuration interface.
4.1.2 认证集群节点
[root@ha-node1 ~]# pcs host auth ha-node1.fgedu.net.cn ha-node2.fgedu.net.cn -u hacluster -p
YourStrongPassword123!
ha-node1.fgedu.net.cn: Authorized
ha-node2.fgedu.net.cn: Authorized
# 验证认证状态
[root@ha-node1 ~]# pcs host list
ha-node1.fgedu.net.cn
ha-node2.fgedu.net.cn
4.1.3 创建集群
[root@ha-node1 ~]# pcs cluster setup hacluster ha-node1.fgedu.net.cn ha-node2.fgedu.net.cn
No addresses specified for host ‘ha-node1.fgedu.net.cn’, using ‘ha-node1.fgedu.net.cn’
No addresses specified for host ‘ha-node2.fgedu.net.cn’, using ‘ha-node2.fgedu.net.cn’
Destroying cluster on hosts: ‘ha-node1.fgedu.net.cn’学习交流加群风哥微信: itpux-com, ‘ha-node2.fgedu.net.cn’…
ha-node2.fgedu.net.cn: Successfully destroyed cluster
ha-node1.fgedu.net.cn: Successfully destroyed cluster
Requesting remove ‘pcsd settings’ from ‘ha-node1.fgedu.net.cn’, ‘ha-node2.fgedu.net.cn’
ha-node1.fgedu.net.cn: successful removal of the file ‘pcsd settings’
ha-node2.fgedu.net.cn: successful removal of the file ‘pcsd settings’
Sending ‘corosync authkey’, ‘pacemaker authkey’ to ‘ha-node1.fgedu.net.cn’, ‘ha-node2.fgedu.net.cn’
ha-node1.fgedu.net.cn: successful distribution of the file ‘corosync authkey’
ha-node1.fgedu.net.cn: successful distribution of the file ‘pacemaker authkey’
ha-node2.fgedu.net.cn: successful distribution of the file ‘corosync authkey’
ha-node2.fgedu.net.cn: successful distribution of the file ‘pacemaker authkey’
Sending ‘corosync.conf’ to ‘ha-node1.fgedu.net.cn’, ‘ha-node2.fgedu.net.cn’
ha-node1.fgedu.net.cn: successful distribution of the file ‘corosync.conf’
ha-node2.fgedu.net.cn: successful distribution of the file ‘corosync.conf’
Cluster has been successfully set up.
4.2 启动集群服务
4.2.1 启动集群
[root@ha-node1 ~]# pcs cluster start –all
ha-node1.fgedu.net.cn: Starting Cluster (corosync)…
ha-node1.fgedu.net.cn: Starting Cluster (pacemaker)…
ha-node2.fgedu.net.cn: Starting Cluster (corosync)…
ha-node2.fgedu.net.cn: Starting Cluster (pacemaker)…
# 设置集群服务开机自启
[root@ha-node1 ~]# pcs cluster enable –all
ha-node1.fgedu.net.cn: Cluster Enabled
ha-node2.fgedu.net.cn: Cluster Enabled
# 验证服务状态
[root@ha-node1 ~]# systemctl status corosync
● corosync.service – Corosync Cluster Engine
Loaded: loaded (/usr/lib/systemd/system/corosync.service; enabled; preset: disabled)
Active: active (running) since Fri 2026-04-04 10:15:00 CST; 1min ago
Docs: man:corosync
Main PID: 23456 (corosync)
Tasks: 2 (limit: 23456)
Memory: 12.3M
CPU: 350ms
CGroup: /system.slice/corosync.service
└─23456 corosync
Apr 04 10:15:00 ha-node1.fgedu.net.cn systemd[1]: Started Corosync Cluster Engine.
[root@ha-node1 ~]# systemctl status pacemaker
● pacemaker.service – Pacemaker High Availability Cluster Manager
Loaded: loaded (/usr/lib/systemd/system/pacemaker.service; enabled; preset: disabled)
Active: active (running) since Fri 2026-04-04 10:15:01 CST; 1min ago
Docs: man:pacemakerd
Main PID: 24567 (pacemakerd)
Tasks: 6 (limit: 23456)
Memory: 45.2M
CPU: 890ms
CGroup: /system.slice/pacemaker.service
├─24567 /usr/sbin/pacemakerd -f
├─24568 /usr/libexec/pacemaker/pacemaker-based
├─24569 /usr/libexec/pacemaker/pacemaker-controld
├─24570 /usr/libexec/pacemaker/pacemaker-attrd
├─24571 /usr/libexec/pacemaker/pacemaker-schedulerd
└─24572 /usr/libexec/pacemaker/pacemaker-fenced
Apr 04 10:15:01 ha-node1.fgedu.net.cn systemd[1]: Started Pacemaker High Availability Cluster
Manager.
4.3 验证集群状态
4.3.1 查看集群状态
[root@ha-node1 ~]# pcs status cluster
Cluster Status:
Cluster Summary:
* Stack: corosync
* Current DC: ha-node1.fgedu.net.cn (version 2.1.6-1.el9 – eab9cfc1ae) – partition with quorum
* Last updated: Fri Apr 4 10:20:00 2026
* Last change: Fri Apr 4 10:15:01 2026 by hacluster via crmd on ha-node1.fgedu.net.cn
* 2 nodes configured
* 0 resource instances configured
# 查看节点状态
[root@ha-node1 ~]# pcs status nodes
Pacemaker Nodes:
Online: ha-node1.fgedu.net.cn ha-node2.fgedu.net.cn
Standby:
Maintenance:
Offline:
# 查看完整状态
[root@ha-node1 ~]# pcs status
Cluster name: hacluster
Cluster Summary:
* Stack: corosync
* Current DC: ha-node1.fgedu.net.cn (version 2.1.6-1.el9 – eab9cfc1ae) – partition with quorum
* Last updated: Fri Apr 4 10:20:00 2026
* Last change: Fri Apr 4 10:15:01 2026 by hacluster via crmd on ha-node1.fgedu.net.cn
* 2 nodes configured
* 0 resource instances configured
Node List:
* Online: [ ha-node1.fgedu.net.cn ha-node2.fgedu.net.cn ]
Full List of Resources:
* No resources
Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
4.3.2 验证Corosync配置
[root@ha-node1 ~]# cat /etc/corosync/corosync.conf
totem {
version: 2
cluster_name: hacluster
secauth: off
transport: knet
}
nodelist {
node {
ring0_addr: ha-node1.fgedu.net.cn
name: ha-node1.fgedu.net.cn
nodeid: 1
}
node {
ring0_addr: ha-node2.fgedu.net.cn
name: ha-node2.fgedu.net.cn
nodeid: 2
}
}
quorum {
provider: corosync_votequorum
two_node: 1
}
logging {
to_logfile: yes
logfile: /var/log/cluster/corosync.log
to_syslog: yes
timestamp: on
}
# 验证corosync成员
[root@ha-node1 ~]# corosync-cmapctl | grep members
runtime.totem.pg.mrp.srp.members更多学习教程公众号风哥教程itpux_com.1.config_version (u64) = 0
runtime.totem.pg.mrp.srp.members.1.ip (str) = r(0) ip(192.168.1.10)
runtime.totem.pg.mrp.srp.members.1.join_count (u32) = 1
runtime.totem.pg.mrp.srp.members.1.status (str) = joined
runtime.totem.pg.mrp.srp.members.2.config_version (u64) = 0
runtime.totem.pg.mrp.srp.members.2.ip (str) = r(0) ip(192.168.1.11)
runtime.totem.pg.mrp.srp.members.2.join_count (u32) = 1
runtime.totem.pg.mrp.srp.members.2.status (str) = joined
- 至少使用2个节点,推荐3个或奇数个节点
- 配置独立的网络用于心跳通信
- 确保时间同步(NTP/Chrony)
- 定期备份集群配置
- 监控集群状态和资源运行情况
Part05-风哥经验总结与分享
5.1 高可用集群最佳实践
高可用集群部署的最佳实践包括:
- 节点规划:至少2个节点,推荐3个节点避免脑裂
- 网络规划:业务网络和心跳网络分离
- 存储规划:使用共享存储确保数据一致性
- 监控告警:实时监控集群状态和资源运行情况
- 定期演练:定期进行故障转移测试
5.2 常见问题与解决方案
解决方案:
1. 检查网络连通性
2. 检查防火墙规则
3. 检查hosts文件解析
4. 检查时间同步
# 问题2:集群脑裂
解决方案:
1. 使用奇数个节点
2. 配置仲裁设备
3. 配置fence设备
# 问题3:资源无法启动
解决方案:
1. 检查资源配置
2. 检查资源依赖
3. 查看集群日志
4. 手动测试资源
5.3 集群管理工具推荐
常用集群管理工具:
- pcs命令:集群配置和管理的主要工具
- crm_mon:实时监控集群状态
- pcs-web-ui:Web界面管理集群
- Hawk2:SUSE提供的Web管理界面
itpux-com
本文由风哥教程整理发布,仅用于学习测试使用,转载注明出处:http://www.fgedu.net.cn/10327.html
