内容简介:本文风哥教程参考Linux官方文档、Red Hat Enterprise Linux官方文档、Ansible Automation Platform官方文档、Docker官方文档、Kubernetes官方文档和Podman官方文档等内容,详细介绍了相关技术的配置和使用方法。
本文档介绍Kubernetes生产环境的最佳实践。
风哥提示:
Part01-集群规划
1.1 生产环境规划
[root@k8s-master ~]# cat > /root/k8s-production-planning.txt << 'EOF' Kubernetes生产环境规划 ===================== 1. 集群规模 - 小型集群: 3-5节点 - 中型集群: 10-20节点 - 大型集群: 50+节点 2. 节点配置 - Master: 4核8G起 - Worker: 8核16G起 - 存储: SSD推荐 3. 高可用架构 - 3节点Master集群 - etcd独立部署 - 负载均衡器 4. 网络规划 - Pod网段: 10.244.0.0/16 - Service网段: 10.96.0.0/12 - 节点网段: 业务网段 EOF # 生产环境节点规划 [root@k8s-master ~]# cat > production-nodes.txt << 'EOF' 节点角色规划 ============ 控制平面节点(3台): - k8s-master-1: 192.168.1.10 - k8s-master-2: 192.168.1.11 - k8s-master-3: 192.168.1.12 配置: 4核8G, 100G SSD 工作节点(5台): - k8s-worker-1: 192.168.1.20 - k8s-worker-2: 192.from PG视频:www.itpux.com168.1.21 - k8s-worker-3: 192.168.1.22 - k8s-worker-4: 192.168.1.23 - k8s-worker-5: 192.168.1.24 配置: 8核16G, 200G SSD 负载均衡器(2台): - k8s-lb-1: 192.168.1.30 - k8s-lb-2: 192.168.1.31 VIP: 192.168.1.100 EOF
Part02-高可用部署
2.1 高可用控制平面
[root@k8s-lb-1 ~]# cat > /etc/haproxy/haproxy.cfg << 'EOF' global log /dev/log local0 log /dev/log local1 notice chroot /var/lib/haproxy stats socket /run/haproxy/admin.sock mode 660 level admin stats timeout 30s user haproxy group haproxy daemon defaults log global mode tcp option tcplog option dontlognull timeout connect 5000 timeout client 50000 timeout server 50000 frontend k8s-api bind *:6443 mode tcp option tcplog default_backend k8s-api backend k8s-api mode tcp option tcp-check balance roundrobin server k8s-master-1 192.168.1.10:6443 check server k8s-master-2 192.168.1.11:6443 check server k8s-master-3 192.168.1.12:6443 check EOF [root@k8s-lb-1 ~]# systemctl enable haproxy --now Created symlink /etc/systemd/system/multi-user.target.wants/haproxy.service → /usr/lib/systemd/system/haproxy.service. # 配置Keepalived [root@k8s-lb-1 ~]# cat > /etc/keepalived/keepalived.conf << 'EOF' global_defs { router_id LVS_DEVEL } vrrp_script check_haproxy { script "killall -0 haproxy" interval 3 weight -2 fall 10 rise 2 } vrrp_instance VI_1 { state MASTER interface eth0 virtual_router_id 51 priority 100 advert_int 1 authentication { auth_type PASS auth_pass fgedu123 } virtual_ipaddress { 192.168.1.100 } track_script { check_haproxy } } EOF [root@k8s-lb-1 ~]# systemctl enable keepalived --now Created symlink /etc/systemd/system/multi-user.target.wants/keepalived.service → /usr/lib/systemd/system/keepalived.service. # 初始化第一个Master节点 [root@k8s-master-1 ~]# kubeadm init \ --control-plane-endpoint "192.168.1.100:6443" \ --upload-certs \ --pod-network-cidr=10.244.0.0/16 \ --service-cidr=10.96.0.0/12 [init] Using Kubernetes version: v1.28.3 [preflight] Running pre-flight checks [certs] Using certificateDir folder "/etc/kubernetes/pki" [certs] Generating "ca" certificate and key [certs] Generating "apiserver" certificate and key [certs] apiserver serving cert is signed for DNS names [k8s-master-1 kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.学习交流加群风哥QQ11325717496.0.1 192.168.1.10 192.168.1.100] [certs] Generating "apiserver-kubelet-client" certificate and key [certs] Generating "front-proxy-ca" certificate and key [certs] Generating "front-proxy-client" certificate and key [certs] Generating "etcd/ca" certificate and key [certs] Generating "etcd/server" certificate and key [certs] etcd/server serving cert is signed for DNS names [k8s-master-1 localhost] and IPs [192.168.1.10 127.0.0.1] [certs] Generating "etcd/peer" certificate and key [certs] etcd/peer serving cert is signed for DNS names [k8s-master-1 localhost] and IPs [192.168.1.10 127.0.0.1] [certs] Generating "etcd/healthcheck-client" certificate and key [certs] Generating "apiserver-etcd-client" certificate and key [certs] Generating "sa" key and public key [kubeconfig] Using kubeconfig folder "/etc/kubernetes" [kubeconfig] Writing "admin.conf" kubeconfig file [kubeconfig] Writing "kubelet.conf" kubeconfig file [kubeconfig] Writing "controller-manager.conf" kubeconfig file [kubeconfig] Writing "scheduler.conf" kubeconfig file [control-plane] Using manifest folder "/etc/kubernetes/manifests" [control-plane] Creating static Pod manifest for "kube-apiserver" [control-plane] Creating static Pod manifest for "kube-controller-manager" [control-plane] Creating static Pod manifest for "kube-scheduler" [etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests" [wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests" [apiclient] All control plane components are healthy after 10.001256 seconds [upload-config] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace [kubelet] Creating a ConfigMap "kubelet-config" in namespace kube-system with the configuration for the kubelets in the cluster [upload-certs] Storing the certificates in Secret "kubeadm-certs" in the "kube-system" Namespace [upload-certs] Using certificate directory: [mark-control-plane] Marking the node k8s-master-1 as control-plane by adding the labels: [node-role.kubernetes.io/control-plane node.kubernetes.io/exclude-from-external-load-balancers] [mark-control-plane] Marking the node k8s-master-1 as control-plane by adding the taints [node-role.kubernetes.更多视频教程www.fgedu.net.cnio/control-plane:NoSchedule] Your Kubernetes control-plane has initialized successfully! To start using your cluster, you need to run the following as a regular user: mkdir -p $HOME/.kube sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config sudo chown $(id -u):$(id -g) $HOME/.kube/config You can now join any number of control-plane nodes by running the following command: kubeadm join 192.168.1.100:6443 --token abc123.def456 \ --discovery-token-ca-cert-hash sha256:1234567890abcdef \ --control-plane --certificate-key abc123def456 You can now join any number of worker nodes by running the following command: kubeadm join 192.168.1.100:6443 --token abc123.def456 \ --discovery-token-ca-cert-hash sha256:1234567890abcdef
Part03-安全加固
3.1 生产安全配置
[root@k8s-master ~]# cat > production-rbac.yaml << 'EOF' apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: fgedu-admin rules: - apiGroups: [""] resources: ["pods", "services", "configmaps", "secrets"] verbs: ["get", "list", "watch", "create", "update", "patch", "delete"] - apiGroups: ["apps"] resources: ["deployments", "replicasets", "statefulsets"] verbs: ["get", "list", "watch", "create", "update", "patch", "delete"] - apiGroups: ["networking.k8s.io"] resources: ["ingresses", "networkpolicies"] verbs: ["get", "list", "watch", "create", "update", "patch", "delete"] --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: fgedu-admin-binding subjects: - kind: User name: fgedu-admin apiGroup: rbac.authorization.k8s.io roleRef: kind: ClusterRole name: fgedu-admin apiGroup: rbac.authorization.k8s.io EOF [root@k8s-master ~]# kubectl apply -f production-rbac更多学习教程公众号风哥教程itpux_com.yaml clusterrole.rbac.authorization.k8s.io/fgedu-admin created clusterrolebinding.rbac.authorization.k8s.io/fgedu-admin-binding created # 配置网络策略 [root@k8s-master ~]# cat > production-networkpolicy.yaml << 'EOF' apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: default-deny-all namespace: fgedu-prod spec: podSelector: {} policyTypes: - Ingress - Egress --- apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: allow-web-traffic namespace: fgedu-prod spec: podSelector: matchLabels: app: fgedu-web policyTypes: - Ingress ingress: - from: - namespaceSelector: matchLabels: name: ingress-nginx ports: - protocol: TCP port: 8080 egress: - to: - namespaceSelector: matchLabels: name: fgedu-prod - podSelector: matchLabels: app: fgedu-database EOF [root@k8s-master ~]# kubectl apply -f production-networkpolicy.yaml networkpolicy.networking.k8s.io/default-deny-all created networkpolicy.networking.k8s.io/allow-web-traffic created
Part04-监控告警
4.1 生产监控配置
[root@k8s-master ~]# cat > production-alerts.yaml << 'EOF' apiVersion: monitoring.coreos.com/v1 kind: PrometheusRule metadata: name: production-alerts namespace: monitoring spec: groups: - name: node-alerts rules: - alert: NodeHighCPU expr: 100 - (avg by(instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80
for: 5m
labels:
severity: warning
annotations:
summary: “节点CPU使用率过高”
description: “节点 {{ $labels.instance }} CPU使用率超过80%”
– alert: NodeHighMemory
expr: (1 – (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100 > 85
for: 5m
labels:
severity: warning
annotations:
summary: “节点内存使用率过高”
description: “节点 {{ $labels.instance }} 内存使用率超过85%”
– alert: NodeDiskPressure
expr: (1 – (node_filesystem_avail_bytes{fstype!=”tmpfs”} / node_filesystem_size_bytes{fstype!=”tmpfs”})) * 100 > 85
for: 5m
labels:
severity: critical
annotations:
summary: “节点磁盘空间不足”
description: “节点 {{ $labels.instance }} 磁盘使用率超过85%”
– name: pod-alerts
rules:
– alert: PodCrashLooping
expr: rate(kube_pod_container_status_restarts_total[15m]) * 60 * 15 > 0
for: 5m
labels:
severity: critical
annotations:
summary: “Pod频繁重启”
description: “Pod {{ $labels.namespace }}/{{ $labels.pod }} 频繁重启”
– alert: PodNotReady
expr: kube_pod_status_phase{phase=~”Pending|Unknown”} > 0
for: 10m
labels:
severity: warning
annotations:
summary: “Pod未就绪”
description: “Pod {{ $labels.namespace }}/{{ $labels.pod }} 状态异常”
EOF
[root@k8s-master ~]# kubectl apply -f production-alerts.yaml
prometheusrule.monitoring.coreos.com/production-alerts created
# 配置Alertmanager
[root@k8s-master ~]# cat > alertmanager-config.yaml << 'EOF'
apiVersion: v1
kind: Secret
metadata:
name: alertmanager-config
namespace: monitoring
stringData:
alertmanager.yaml: |
global:
resolve_timeout: 5m
smtp_smarthost: 'smtp.fgedu.net.cn:25'
smtp_from: 'alert@fgedu.net.cn'
smtp_auth_username: 'alert@fgedu.net.cn'
smtp_auth_password: 'password123'
route:
group_by: ['alertname', 'namespace']
group_wait: 30s
group_interval: 5m
repeat_interval: 12h
receiver: 'default-receiver'
routes:
- match:
severity: critical
receiver: 'critical-receiver'
- match:
severity: warning
receiver: 'warning-receiver'
receivers:
- name: 'default-receiver'
email_configs:
- to: 'admin@fgedu.net.cn'
send_resolved: true
- name: 'critical-receiver'
email_configs:
-学习交流加群风哥微信: itpux-com to: 'admin@fgedu.net.cn,critical@fgedu.net.cn'
send_resolved: true
webhook_configs:
- url: 'http://webhook.fgedu.net.cn/alert'
- name: 'warning-receiver'
email_configs:
- to: 'admin@fgedu.net.cn'
send_resolved: true
EOF
[root@k8s-master ~]# kubectl apply -f alertmanager-config.yaml
secret/alertmanager-config created
- 部署高可用控制平面
- 配置负载均衡器
- 实施安全加固措施
- 配置完善的监控告警
- 制定备份恢复策略
本文由风哥教程整理发布,仅用于学习测试使用,转载注明出处:http://www.fgedu.net.cn/10327.html
