KubeSphere-049-生产环境最佳实践和运维手册
Production Environment Best Practices and O&M Manual
目录
1. 基础概念
1.1 生产环境特点
生产环境是企业业务运行的核心环境,具有以下特点:
- 高可用性:系统需要保证99.99%以上的可用性
- 高性能:系统需要支持高并发访问
- 高安全性:系统需要保证数据安全和访问控制
- 可扩展性:系统需要支持水平扩展
- 可观测性:系统需要提供完整的监控和日志
- 可维护性:系统需要易于维护和升级
1.2 运维最佳实践
运维最佳实践包括:
- 自动化运维:使用自动化工具减少人工操作
- 标准化管理:建立标准化的运维流程
- 监控告警:建立完善的监控告警体系
- 备份恢复:建立完善的备份恢复机制
- 文档管理:建立完善的运维文档
- 团队协作:建立高效的团队协作机制
1.3 KubeSphere生产环境
KubeSphere生产环境需要考虑: 风哥提示: 学习交流加群风哥微信: itpux-com 学习交流加群风哥QQ113257174 更多视频教程www.fgedu.net.cn 更多学习教程公众号风哥教程itpux_com from K8S+DB视频:www.itpux.com
- 集群架构:设计高可用的集群架构
- 资源规划:合理规划集群资源
- 安全配置:配置安全策略和访问控制
- 监控告警:配置监控和告警
- 备份恢复:配置备份和恢复
- 升级维护:规划升级和维护流程
2. 生产环境规划
2.1 集群架构规划
2.1.1 高可用架构
# – 3个Master节点
# – 3个ETCD节点
# – 6个Worker节点
# – 负载均衡器
# – 多可用区部署
2.1.2 网络架构
# – Pod网络:10.233.0.0/16
# – Service网络:10.96.0.0/12
# – 网络插件:Calico
# – 网络策略:启用
2.2 资源规划
2.2.1 节点资源
# – Master节点:4 CPU, 16GB RAM
# – Worker节点:8 CPU, 32GB RAM
# – 总资源:60 CPU, 240GB RAM
2.2.2 存储资源
# – 系统存储:100GB
# – 数据存储:1TB
# – 日志存储:500GB
# – 备份存储:1TB
2.3 安全规划
2.3.1 网络安全
# – 防火墙规则
# – 网络隔离
# – 访问控制
2.3.2 访问控制
# – RBAC权限
# – Pod安全策略
# – 审计日志
3. 实施步骤
3.1 集群初始化
3.1.1 节点准备
# 配置主机名
hostnamectl set-hostname k8s-master-1
hostnamectl set-hostname k8s-master-2
hostnamectl set-hostname k8s-master-3
hostnamectl set-hostname k8s-worker-1
hostnamectl set-hostname k8s-worker-2
hostnamectl set-hostname k8s-worker-3
hostname set successfully
# 配置hosts文件
cat >> /etc/hosts <<EOF
192.168.1.101 k8s-master-1
192.168.1.102 k8s-master-2
192.168.1.103 k8s-master-3
192.168.1.201 k8s-worker-1
192.168.1.202 k8s-worker-2
192.168.1.203 k8s-worker-3
EOF
hosts file updated
# 关闭Swap
swapoff -a
sed -i ‘/ swap / s/^\(.*\)$/#\1/g’ /etc/fstab
swap disabled
# 配置内核参数
cat >> /etc/sysctl.conf <<EOF
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
net.ipv4.ip_forward = 1
EOF
sysctl -p
kernel parameters updated
3.1.2 安装Docker
yum install -y yum-utils device-mapper-persistent-data lvm2
yum-utils installed
yum-config-manager –add-repo https://download.docker.com/linux/centos/docker-ce.repo
repo added
yum install -y docker-ce docker-ce-cli containerd.io
docker installed
# 配置Docker
mkdir -p /etc/docker
cat > /etc/docker/daemon.json <<EOF
{
“registry-mirrors”: [“https://mirror.example.com”],
“exec-opts”: [“native.cgroupdriver=systemd”],
“log-driver”: “json-file”,
“log-opts”: {
“max-size”: “100m”
},
“storage-driver”: “overlay2”
}
EOF
docker configured
systemctl enable docker
systemctl start docker
docker started
# 验证Docker
docker version
Client: Docker Engine – Community
Version: 24.0.7
Server: Docker Engine – Community
Version: 24.0.7
3.2 安装KubeSphere
3.2.1 安装KubeKey
,
curl -sfL https://get-kk.kubesphere.io | VERSION=v3.0.13 sh –
Downloading kk …
kk installed successfully
# 创建配置文件
cat > config.yaml <<EOF
apiVersion: kubekey.kubesphere.io/v1alpha2
kind: Cluster
metadata:
name: sample
spec:
hosts:
– {name: k8s-master-1, address: 192.168.1.101, internalAddress: 192.168.1.101, user: root, password: “root123”}
– {name: k8s-master-2, address: 192.168.1.102, internalAddress: 192.168.1.102, user: root, password: “root123”}
– {name: k8s-master-3, address: 192.168.1.103, internalAddress: 192.168.1.103, user: root, password: “root123”}
– {name: k8s-worker-1, address: 192.168.1.201, internalAddress: 192.168.1.201, user: root, password: “root123”}
– {name: k8s-worker-2, address: 192.168.1.202, internalAddress: 192.168.1.202, user: root, password: “root123”}
– {name: k8s-worker-3, address: 192.168.1.203, internalAddress: 192.168.1.203, user: root, password: “root123”}
roleGroups:
etcd:
– k8s-master-1
– k8s-master-2
– k8s-master-3
control-plane:
– k8s-master-1
– k8s-master-2
– k8s-master-3
worker:
– k8s-worker-1
– k8s-worker-2
– k8s-worker-3
controlPlaneEndpoint:
internalLoadbalancer: haproxy
domain: lb.kubesphere.local
address: “”
port: 6443
kubernetes:
version: v1.28.0
clusterName: cluster.local
autoRenewCerts: true
containerManager: docker
etcd:
type: kubekey
network:
plugin: calico
kubePodsCIDR: 10.233.0.0/16
kubeServiceCIDR: 10.96.0.0/12
registry:
privateRegistry: “”
namespaceOverride: “”
registryMirrors: []
insecureRegistries: []
addons: []
—
apiVersion: installer.kubesphere.io/v1alpha1
kind: ClusterConfiguration
metadata:
name: ks-installer
namespace: kubesphere-system
labels:
version: v3.4.1
spec:
persistence:
storageClass: “”
authentication:
jwtSecret: “”
local_registry: “”
namespace_override: “”
etcd:
monitoring: false
endpointIps: 192.168.1.101,192.168.1.102,192.168.1.103
port: 2379
tlsEnable: true
common:
core:
console:
enableMultiLogin: true
port: 30880
type: NodePort
,
alerting:
enabled: true
auditing:
enabled: true
devops:
enabled: true
jenkinsCpuReq: 0.5
jenkinsCpuLim: 1
jenkinsMemoryReq: 4Gi
jenkinsMemoryLim: 4Gi
jenkinsVolumeSize: 8Gi
events:
enabled: true
logging:
enabled: true
logsidecar:
enabled: true
replicas: 2
metrics_server:
enabled: false
monitoring:
storageClass: “”
prometheusMemoryRequest: 400Mi
prometheusVolumeSize: 20Gi
alertmanagerVolumeSize: 2Gi
multicluster:
clusterRole: none
network:
networkpolicy:
enabled: true
ippool:
type: calico
topology:
type: none
openpitrix:
store:
enabled: true
servicemesh:
enabled: true
istio:
components:
ingressGateways:
– name: istio-ingressgateway
enabled: false
cni:
enabled: false
edgeruntime:
enabled: false
gatekeeper:
enabled: false
terminal:
timeout: 600
EOF
config.yaml created
3.2.2 安装KubeSphere
./kk create cluster -f config.yaml
Cluster installation started successfully
# 查看安装状态
kubectl logs -n kubesphere-system deployment/ks-installer -f
Waiting for installation to complete…
Installation completed successfully!
4. 实战案例
4.1 集群运维
4.1.1 节点维护
# 标记节点为不可调度
kubectl cordon k8s-worker-1
node/k8s-worker-1 cordoned
# 驱逐节点上的Pod
kubectl drain k8s-worker-1 –ignore-daemonsets –delete-emptydir-data
node/k8s-worker-1 drained
# 维护节点
# … 执行维护操作 …
# 恢复节点调度
kubectl uncordon k8s-worker-1
node/k8s-worker-1 uncordoned
4.1.2 集群升级
# 备份ETCD
ETCDCTL_API=3 etcdctl snapshot save snapshot.db \
–cacert=/etc/kubernetes/pki/etcd/ca.crt \
–cert=/etc/kubernetes/pki/etcd/server.crt \
–key=/etc/kubernetes/pki/etcd/server.key
Snapshot saved at snapshot.db
# 升级Kubernetes
kubeadm upgrade plan
Components that must be upgraded manually after you have upgraded the control plane with ‘kubeadm upgrade apply’:
COMPONENT CURRENT AVAILABLE
kubelet 1 x v1.27.0 v1.28.0
# 升级控制平面
kubeadm upgrade apply v1.28.0
[upgrade/successful] SUCCESS! Your cluster was upgraded to “v1.28.0”. Enjoy!
# 升级kubelet
yum install -y kubelet-1.28.0 kubeadm-1.28.0 kubectl-1.28.0
systemctl daemon-reload
systemctl restart kubelet
kubelet upgraded successfully
4.2 备份恢复
4.2.1 备份集群
# 安装Velero
velero install –provider aws \
–plugins velero/velero-plugin-for-aws:v1.8.0 \
–bucket velero-backups \
–secret-file ./credentials-velero \
–use-volume-snapshots=true \
–backup-location-config region=minio
Velero is installed!
# 创建备份
velero backup create myapp-backup –include-namespaces myapp
Backup request “myapp-backup” submitted successfully.
Run `velero backup describe myapp-backup` or `velero backup logs myapp-backup` for more details.
# 查看备份
velero backup get
NAME STATUS ERRORS WARNINGS CREATED EXPIRES STORAGE LOCATION SELECTOR
myapp-backup Completed 0 0 2026-01-15 10:00:00 +0000 UTC 29d default <none>
4.2.2 恢复集群
# 执行恢复
velero restore create –from-backup myapp-backup
Restore request “myapp-backup-20260115102000” submitted successfully.
Run `velero restore describe myapp-backup-20260115102000` or `velero restore logs myapp-backup-20260115102000` for more details.
# 查看恢复状态
velero restore get
NAME BACKUP STATUS STARTED COMPLETED ERRORS WARNINGS CREATED SELECTOR
myapp-backup-20260115102000 myapp-backup Completed 2026-01-15 10:20:00 +0000 UTC 2026-01-15 10:20:30 +0000 UTC 0 0 2026-01-15 10:20:00 +0000 UTC <none>
5. 经验总结
5.1 最佳实践
5.1.1 集群管理最佳实践
- 高可用架构:部署高可用的集群架构
- 资源规划:合理规划集群资源
- 安全配置:配置安全策略和访问控制
- 监控告警:配置监控和告警
- 备份恢复:配置备份和恢复
5.1.2 运维管理最佳实践
- 自动化运维:使用自动化工具减少人工操作
- 标准化管理:建立标准化的运维流程
- 文档管理:建立完善的运维文档
- 团队协作:建立高效的团队协作机制
- 持续改进:持续改进运维流程
5.2 常见问题
5.2.1 集群问题
- 问题1:节点NotReady
- 解决方案:检查kubelet和容器运行时
- 问题2:Pod无法启动
- 解决方案:检查资源配额和镜像
- 问题3:网络不通
- 解决方案:检查网络插件和策略
5.2.2 运维问题
- 问题1:备份失败
- 解决方案:检查存储和权限
- 问题2:升级失败
- 解决方案:检查版本兼容性
- 问题3:恢复失败
- 解决方案:检查备份完整性
5.3 安全建议
5.3.1 网络安全
- 网络隔离:使用网络策略隔离网络
- 防火墙规则:配置防火墙规则限制访问
- 加密传输:使用TLS加密传输
- 证书管理:定期更新证书
5.3.2 访问控制
- RBAC权限:配置RBAC权限控制
- Pod安全:使用Pod安全策略保护Pod
- 密钥管理:使用Secret管理敏感信息
- 审计日志:启用审计日志记录操作
本文由风哥教程整理发布,仅用于学习测试使用,转载注明出处:http://www.fgedu.net.cn/10327.html
