Kubernetes教程FG078-Kubernetes大数据与Kubernetes实战
内容简介
本篇文章主要介绍Kubernetes中大数据工作负载的部署与管理方法。风哥教程参考Kubernetes官方文档大数据相关内容,结合生产环境实际操作场景,详细讲解大数据框架在Kubernetes上的部署和使用。
目录大纲
Part01-基础概念与理论知识
1.1 大数据概述
大数据是指无法在传统数据库系统中处理的海量数据,它具有以下特点:
- Volume( volume):数据量大,从TB级到PB级
- Velocity( velocity):数据生成速度快,实时性要求高
- Variety( variety):数据类型多样,包括结构化、半结构化和非结构化数据
- Veracity( veracity):数据质量和可靠性
1.2 大数据在Kubernetes上的优势
在Kubernetes上运行大数据工作负载的优势包括:
- 弹性扩缩容:根据工作负载需求自动调整资源
- 资源隔离:为不同的大数据任务提供隔离的环境
- 标准化部署:使用容器化技术实现一致的部署环境
- 工作流管理:使用Kubernetes原生的工作负载控制器管理大数据任务
- 高可用性:提供大数据服务的高可用性
- 成本优化:根据需求动态分配资源,降低成本
Part02-生产环境规划与建议
2.1 大数据工作负载规划
- 任务类型:
- 批处理:Hadoop MapReduce、Spark批处理
- 流处理:Kafka、Spark Streaming、Flink
- 交互式查询:Hive、Presto、Impala
- 机器学习:Spark MLlib、H2O.ai
- 资源需求:
- CPU:适合计算密集型任务
- 内存:需要大量内存存储数据和执行计算
- 存储:需要大容量存储存储数据
- 网络:需要高带宽网络传输数据
- 部署策略:
- 状态管理:使用StatefulSet管理有状态服务
- 存储管理:使用PersistentVolume和StorageClass
- 服务发现:使用Service和Ingress
2.2 资源配置建议
- CPU配置:
- Hadoop:8-32核心
- Spark:16-64核心
- Kafka:8-16核心
- 内存配置:
- Hadoop:32-256GB
- Spark:64-512GB
- Kafka:16-64GB
- 存储配置:
- HDFS:1TB以上
- Kafka:500GB以上
- 使用持久化存储
- 网络配置:
- 带宽:10Gbps以上
- 延迟:低延迟网络
Part03-生产环境项目实施方案
3.1 Hadoop部署
部署Hadoop集群:
创建HDFS存储
# 创建HDFS存储配置
[root@fgedu-master ~]# cat > hdfs-storage.yaml << EOF apiVersion: v1 kind: PersistentVolumeClaim metadata: name: hdfs-pvc namespace: fgedu-bigdata spec: accessModes: - ReadWriteOnce resources: requests: storage: 100Gi storageClassName: standard EOF
[root@fgedu-master ~]# cat > hdfs-storage.yaml << EOF apiVersion: v1 kind: PersistentVolumeClaim metadata: name: hdfs-pvc namespace: fgedu-bigdata spec: accessModes: - ReadWriteOnce resources: requests: storage: 100Gi storageClassName: standard EOF
# 应用存储配置
[root@fgedu-master ~]# kubectl apply -f hdfs-storage.yaml
[root@fgedu-master ~]# kubectl apply -f hdfs-storage.yaml
部署Hadoop Namenode
# 创建Namenode配置
[root@fgedu-master ~]# cat > hadoop-namenode.yaml << EOF apiVersion: apps/v1 kind: StatefulSet metadata: name: hadoop-namenode namespace: fgedu-bigdata spec: serviceName: hadoop-namenode replicas: 1 selector: matchLabels: app: hadoop-namenode template: metadata: labels: app: hadoop-namenode spec: containers: - name: hadoop-namenode image: hadoop:3.3.1 command: ["/bin/bash", "-c", "hdfs namenode -format && hdfs namenode"] ports: - containerPort: 9000 - containerPort: 9870 volumeMounts: - name: hdfs-volume mountPath: /hdfs volumeClaimTemplates: - metadata: name: hdfs-volume spec: accessModes: ["ReadWriteOnce"] resources: requests: storage: 100Gi storageClassName: standard --- apiVersion: v1 kind: Service metadata: name: hadoop-namenode namespace: fgedu-bigdata spec: selector: app: hadoop-namenode ports: - port: 9000 targetPort: 9000 - port: 9870 targetPort: 9870 type: ClusterIP EOF
[root@fgedu-master ~]# cat > hadoop-namenode.yaml << EOF apiVersion: apps/v1 kind: StatefulSet metadata: name: hadoop-namenode namespace: fgedu-bigdata spec: serviceName: hadoop-namenode replicas: 1 selector: matchLabels: app: hadoop-namenode template: metadata: labels: app: hadoop-namenode spec: containers: - name: hadoop-namenode image: hadoop:3.3.1 command: ["/bin/bash", "-c", "hdfs namenode -format && hdfs namenode"] ports: - containerPort: 9000 - containerPort: 9870 volumeMounts: - name: hdfs-volume mountPath: /hdfs volumeClaimTemplates: - metadata: name: hdfs-volume spec: accessModes: ["ReadWriteOnce"] resources: requests: storage: 100Gi storageClassName: standard --- apiVersion: v1 kind: Service metadata: name: hadoop-namenode namespace: fgedu-bigdata spec: selector: app: hadoop-namenode ports: - port: 9000 targetPort: 9000 - port: 9870 targetPort: 9870 type: ClusterIP EOF
# 应用Namenode配置
[root@fgedu-master ~]# kubectl apply -f hadoop-namenode.yaml
[root@fgedu-master ~]# kubectl apply -f hadoop-namenode.yaml
部署Hadoop Datanode
# 创建Datanode配置
[root@fgedu-master ~]# cat > hadoop-datanode.yaml << EOF apiVersion: apps/v1 kind: StatefulSet metadata: name: hadoop-datanode namespace: fgedu-bigdata spec: serviceName: hadoop-datanode replicas: 3 selector: matchLabels: app: hadoop-datanode template: metadata: labels: app: hadoop-datanode spec: containers: - name: hadoop-datanode image: hadoop:3.3.1 command: ["/bin/bash", "-c", "hdfs datanode"] ports: - containerPort: 9864 volumeMounts: - name: hdfs-volume mountPath: /hdfs volumeClaimTemplates: - metadata: name: hdfs-volume spec: accessModes: ["ReadWriteOnce"] resources: requests: storage: 200Gi storageClassName: standard --- apiVersion: v1 kind: Service metadata: name: hadoop-datanode namespace: fgedu-bigdata spec: selector: app: hadoop-datanode ports: - port: 9864 targetPort: 9864 type: ClusterIP EOF
[root@fgedu-master ~]# cat > hadoop-datanode.yaml << EOF apiVersion: apps/v1 kind: StatefulSet metadata: name: hadoop-datanode namespace: fgedu-bigdata spec: serviceName: hadoop-datanode replicas: 3 selector: matchLabels: app: hadoop-datanode template: metadata: labels: app: hadoop-datanode spec: containers: - name: hadoop-datanode image: hadoop:3.3.1 command: ["/bin/bash", "-c", "hdfs datanode"] ports: - containerPort: 9864 volumeMounts: - name: hdfs-volume mountPath: /hdfs volumeClaimTemplates: - metadata: name: hdfs-volume spec: accessModes: ["ReadWriteOnce"] resources: requests: storage: 200Gi storageClassName: standard --- apiVersion: v1 kind: Service metadata: name: hadoop-datanode namespace: fgedu-bigdata spec: selector: app: hadoop-datanode ports: - port: 9864 targetPort: 9864 type: ClusterIP EOF
# 应用Datanode配置
[root@fgedu-master ~]# kubectl apply -f hadoop-datanode.yaml
[root@fgedu-master ~]# kubectl apply -f hadoop-datanode.yaml
3.2 Spark部署
部署Spark集群,风哥提示:。
部署Spark Master
# 创建Spark Master配置
[root@fgedu-master ~]# cat > spark-master.yaml << EOF apiVersion: apps/v1 kind: Deployment metadata: name: spark-master namespace: fgedu-bigdata spec: replicas: 1 selector: matchLabels: app: spark-master template: metadata: labels: app: spark-master spec: containers:,风哥提示:。 - name: spark-master image: spark:3.2.1 command: ["/bin/bash", "-c", "spark-class org.apache.spark.deploy.master.Master"] ports:- containerPort: 7077 - containerPort: 8080 resources: requests: cpu: 4 memory: 8Gi limits: cpu: 8 memory: 16Gi --- apiVersion: v1 kind: Service metadata: name: spark-master namespace: fgedu-bigdata spec: selector: app: spark-master ports: - port: 7077 targetPort: 7077 - port: 8080 targetPort: 8080 type: ClusterIP EOF
[root@fgedu-master ~]# cat > spark-master.yaml << EOF apiVersion: apps/v1 kind: Deployment metadata: name: spark-master namespace: fgedu-bigdata spec: replicas: 1 selector: matchLabels: app: spark-master template: metadata: labels: app: spark-master spec: containers:,风哥提示:。 - name: spark-master image: spark:3.2.1 command: ["/bin/bash", "-c", "spark-class org.apache.spark.deploy.master.Master"] ports:- containerPort: 7077 - containerPort: 8080 resources: requests: cpu: 4 memory: 8Gi limits: cpu: 8 memory: 16Gi --- apiVersion: v1 kind: Service metadata: name: spark-master namespace: fgedu-bigdata spec: selector: app: spark-master ports: - port: 7077 targetPort: 7077 - port: 8080 targetPort: 8080 type: ClusterIP EOF
# 应用Spark Master配置
[root@fgedu-master ~]# kubectl apply -f spark-master.yaml
[root@fgedu-master ~]# kubectl apply -f spark-master.yaml
部署Spark Worker
# 创建Spark Worker配置
[root@fgedu-master ~]# cat > spark-worker.yaml << EOF apiVersion: apps/v1 kind: Deployment metadata: name: spark-worker namespace: fgedu-bigdata spec: replicas: 3 selector: matchLabels: app: spark-worker template: metadata: labels: app: spark-worker spec: containers: - name: spark-worker image: spark:3.2.1 command: ["/bin/bash", "-c", "spark-class org.apache.spark.deploy.worker.Worker spark://spark-master:7077"] ports: - containerPort: 8081 resources: requests: cpu: 8 memory: 16Gi limits: cpu: 16 memory: 32Gi --- apiVersion: v1 kind: Service metadata: name: spark-worker namespace: fgedu-bigdata spec: selector: app: spark-worker ports: - port: 8081 targetPort: 8081 type: ClusterIP EOF
[root@fgedu-master ~]# cat > spark-worker.yaml << EOF apiVersion: apps/v1 kind: Deployment metadata: name: spark-worker namespace: fgedu-bigdata spec: replicas: 3 selector: matchLabels: app: spark-worker template: metadata: labels: app: spark-worker spec: containers: - name: spark-worker image: spark:3.2.1 command: ["/bin/bash", "-c", "spark-class org.apache.spark.deploy.worker.Worker spark://spark-master:7077"] ports: - containerPort: 8081 resources: requests: cpu: 8 memory: 16Gi limits: cpu: 16 memory: 32Gi --- apiVersion: v1 kind: Service metadata: name: spark-worker namespace: fgedu-bigdata spec: selector: app: spark-worker ports: - port: 8081 targetPort: 8081 type: ClusterIP EOF
# 应用Spark Worker配置
[root@fgedu-master ~]# kubectl apply -f spark-worker.yaml
[root@fgedu-master ~]# kubectl apply -f spark-worker.yaml
3.3 Kafka部署
部署Kafka集群:
部署Zookeeper
# 创建Zookeeper配置
[root@fgedu-master ~]# cat > zookeeper.yaml << EOF apiVersion: apps/v1 kind: StatefulSet metadata: name: zookeeper namespace: fgedu-bigdata spec: serviceName: zookeeper replicas: 3 selector: matchLabels: app: zookeeper template: metadata: labels: app: zookeeper spec: containers: - name: zookeeper image: zookeeper:3.6.3 ports: - containerPort: 2181 env: - name: ZOO_MY_ID valueFrom: fieldRef: fieldPath: metadata.name - name: ZOO_SERVERS value: server.0=zookeeper-0.zookeeper.fgedu-bigdata.svc.cluster.local:2888:3888 server.1=zookeeper-1.zookeeper.fgedu-bigdata.svc.cluster.local:2888:3888 server.2=zookeeper-2.zookeeper.fgedu-bigdata.svc.cluster.local:2888:3888 volumeMounts: - name: zookeeper-data mountPath: /data volumeClaimTemplates: - metadata: name: zookeeper-data spec: accessModes: ["ReadWriteOnce"] resources: requests: storage: 50Gi storageClassName: standard --- apiVersion: v1 kind: Service metadata: name: zookeeper namespace: fgedu-bigdata spec: selector: app: zookeeper ports: - port: 2181 targetPort: 2181 type: ClusterIP EOF
[root@fgedu-master ~]# cat > zookeeper.yaml << EOF apiVersion: apps/v1 kind: StatefulSet metadata: name: zookeeper namespace: fgedu-bigdata spec: serviceName: zookeeper replicas: 3 selector: matchLabels: app: zookeeper template: metadata: labels: app: zookeeper spec: containers: - name: zookeeper image: zookeeper:3.6.3 ports: - containerPort: 2181 env: - name: ZOO_MY_ID valueFrom: fieldRef: fieldPath: metadata.name - name: ZOO_SERVERS value: server.0=zookeeper-0.zookeeper.fgedu-bigdata.svc.cluster.local:2888:3888 server.1=zookeeper-1.zookeeper.fgedu-bigdata.svc.cluster.local:2888:3888 server.2=zookeeper-2.zookeeper.fgedu-bigdata.svc.cluster.local:2888:3888 volumeMounts: - name: zookeeper-data mountPath: /data volumeClaimTemplates: - metadata: name: zookeeper-data spec: accessModes: ["ReadWriteOnce"] resources: requests: storage: 50Gi storageClassName: standard --- apiVersion: v1 kind: Service metadata: name: zookeeper namespace: fgedu-bigdata spec: selector: app: zookeeper ports: - port: 2181 targetPort: 2181 type: ClusterIP EOF
# 应用Zookeeper配置
[root@fgedu-master ~]# kubectl apply -f zookeeper.yaml
[root@fgedu-master ~]# kubectl apply -f zookeeper.yaml
部署Kafka
# 创建Kafka配置
[root@fgedu-master ~]# cat > kafka.yaml << EOF apiVersion: apps/v1 kind: StatefulSet,学习交流加群风哥微信: itpux-com。 metadata: name: kafka namespace: fgedu-bigdata spec:serviceName: kafka replicas: 3 selector: matchLabels: app: kafka template: metadata: labels: app: kafka spec: containers: - name: kafka image: kafka:2.8.1 ports: - containerPort: 9092 env: - name: KAFKA_ZOOKEEPER_CONNECT value: zookeeper.fgedu-bigdata.svc.cluster.local:2181 - name: KAFKA_ADVERTISED_LISTENERS value: PLAINTEXT://kafka-0.kafka.fgedu-bigdata.svc.cluster.local:9092,PLAINTEXT://kafka-1.kafka.fgedu-bigdata.svc.cluster.local:9092,PLAINTEXT://kafka-2.kafka.fgedu-bigdata.svc.cluster.local:9092 - name: KAFKA_BROKER_ID valueFrom: fieldRef: fieldPath: metadata.name volumeMounts: - name: kafka-data mountPath: /var/lib/kafka volumeClaimTemplates: - metadata: name: kafka-data spec: accessModes: ["ReadWriteOnce"] resources: requests: storage: 100Gi storageClassName: standard --- apiVersion: v1 kind: Service metadata: name: kafka namespace: fgedu-bigdata spec: selector: app: kafka ports: - port: 9092 targetPort: 9092 type: ClusterIP EOF
[root@fgedu-master ~]# cat > kafka.yaml << EOF apiVersion: apps/v1 kind: StatefulSet,学习交流加群风哥微信: itpux-com。 metadata: name: kafka namespace: fgedu-bigdata spec:serviceName: kafka replicas: 3 selector: matchLabels: app: kafka template: metadata: labels: app: kafka spec: containers: - name: kafka image: kafka:2.8.1 ports: - containerPort: 9092 env: - name: KAFKA_ZOOKEEPER_CONNECT value: zookeeper.fgedu-bigdata.svc.cluster.local:2181 - name: KAFKA_ADVERTISED_LISTENERS value: PLAINTEXT://kafka-0.kafka.fgedu-bigdata.svc.cluster.local:9092,PLAINTEXT://kafka-1.kafka.fgedu-bigdata.svc.cluster.local:9092,PLAINTEXT://kafka-2.kafka.fgedu-bigdata.svc.cluster.local:9092 - name: KAFKA_BROKER_ID valueFrom: fieldRef: fieldPath: metadata.name volumeMounts: - name: kafka-data mountPath: /var/lib/kafka volumeClaimTemplates: - metadata: name: kafka-data spec: accessModes: ["ReadWriteOnce"] resources: requests: storage: 100Gi storageClassName: standard --- apiVersion: v1 kind: Service metadata: name: kafka namespace: fgedu-bigdata spec: selector: app: kafka ports: - port: 9092 targetPort: 9092 type: ClusterIP EOF
# 应用Kafka配置
[root@fgedu-master ~]# kubectl apply -f kafka.yaml
[root@fgedu-master ~]# kubectl apply -f kafka.yaml
Part04-生产案例与实战讲解
4.1 企业级大数据平台部署
某企业需要部署企业级大数据平台,用于数据处理和分析。。
案例背景
- 平台需求:
- 支持批处理和流处理
- 提供数据存储和管理
- 支持交互式查询
- 集成监控和日志系统
- 技术栈:
- Hadoop
- Spark
- Kafka
- Hive
- Presto
部署方案
# 1. 准备环境
# 创建命名空间
kubectl create namespace fgedu-bigdata-platform
# 创建存储
kubectl apply -f – << EOF apiVersion: v1 kind: PersistentVolumeClaim metadata: name: hdfs-pvc namespace: fgedu-bigdata-platform spec: accessModes: - ReadWriteOnce resources: requests: storage: 500Gi storageClassName: standard --- apiVersion: v1 kind: PersistentVolumeClaim metadata: name: kafka-pvc namespace: fgedu-bigdata-platform spec: accessModes: - ReadWriteOnce resources: requests: storage: 300Gi storageClassName: standard --- apiVersion: v1 kind: PersistentVolumeClaim metadata: name: hive-pvc namespace: fgedu-bigdata-platform spec: accessModes: - ReadWriteOnce resources: requests: storage: 200Gi storageClassName: standard EOF # 2. 部署Zookeeper kubectl apply -f - << EOF apiVersion: apps/v1 kind: StatefulSet metadata: name: zookeeper namespace: fgedu-bigdata-platform spec: serviceName: zookeeper replicas: 3 selector: matchLabels: app: zookeeper template: metadata: labels: app: zookeeper spec: containers: - name: zookeeper image: zookeeper:3.6.3 ports: - containerPort: 2181 env: - name: ZOO_MY_ID valueFrom: fieldRef: fieldPath: metadata.name - name: ZOO_SERVERS,学习交流加群风哥QQ113257174。 value: server.0=zookeeper-0.zookeeper.fgedu-bigdata-platform.svc.cluster.local:2888:3888 server.1=zookeeper-1.zookeeper.fgedu-bigdata-platform.svc.cluster.local:2888:3888 server.2=zookeeper-2.zookeeper.fgedu-bigdata-platform.svc.cluster.local:2888:3888 volumeMounts: - name: zookeeper-data mountPath: /data volumeClaimTemplates: - metadata: name: zookeeper-data spec: accessModes: ["ReadWriteOnce"] resources: requests: storage: 50Gi storageClassName: standard --- apiVersion: v1 kind: Service metadata: name: zookeeper namespace: fgedu-bigdata-platform spec: selector: app: zookeeper ports: - port: 2181 targetPort: 2181 type: ClusterIP EOF # 3. 部署Kafka kubectl apply -f - << EOF apiVersion: apps/v1 kind: StatefulSet metadata: name: kafka namespace: fgedu-bigdata-platform spec: serviceName: kafka replicas: 3 selector: matchLabels: app: kafka template: metadata: labels: app: kafka spec: containers: - name: kafka image: kafka:2.8.1 ports: - containerPort: 9092 env: - name: KAFKA_ZOOKEEPER_CONNECT value: zookeeper.fgedu-bigdata-platform.svc.cluster.local:2181 - name: KAFKA_ADVERTISED_LISTENERS value: PLAINTEXT://kafka-0.kafka.fgedu-bigdata-platform.svc.cluster.local:9092,PLAINTEXT://kafka-1.kafka.fgedu-bigdata-platform.svc.cluster.local:9092,PLAINTEXT://kafka-2.kafka.fgedu-bigdata-platform.svc.cluster.local:9092 - name: KAFKA_BROKER_ID valueFrom: fieldRef: fieldPath: metadata.name volumeMounts: - name: kafka-data mountPath: /var/lib/kafka volumeClaimTemplates: - metadata: name: kafka-data spec: accessModes: ["ReadWriteOnce"] resources: requests: storage: 100Gi storageClassName: standard --- apiVersion: v1 kind: Service metadata: name: kafka namespace: fgedu-bigdata-platform spec: selector: app: kafka ports: - port: 9092 targetPort: 9092 type: ClusterIP EOF # 4. 部署Hadoop kubectl apply -f - << EOF apiVersion: apps/v1 kind: StatefulSet metadata: name: hadoop-namenode namespace: fgedu-bigdata-platform spec: serviceName: hadoop-namenode replicas: 1 selector: matchLabels: app: hadoop-namenode template: metadata: labels: app: hadoop-namenode spec: containers: - name: hadoop-namenode image: hadoop:3.3.1 command: ["/bin/bash", "-c", "hdfs namenode -format && hdfs namenode"] ports: - containerPort: 9000 - containerPort: 9870 volumeMounts: - name: hdfs-volume mountPath: /hdfs volumeClaimTemplates: - metadata: name: hdfs-volume spec: accessModes: ["ReadWriteOnce"] resources: requests: storage: 100Gi storageClassName: standard --- apiVersion: v1 kind: Service metadata: name: hadoop-namenode namespace: fgedu-bigdata-platform spec: selector: app: hadoop-namenode ports: - port: 9000 targetPort: 9000 - port: 9870 targetPort: 9870 type: ClusterIP --- apiVersion: apps/v1 kind: StatefulSet metadata: name: hadoop-datanode namespace: fgedu-bigdata-platform spec: serviceName: hadoop-datanode replicas: 3 selector: matchLabels: app: hadoop-datanode template: metadata: labels: app: hadoop-datanode spec: containers: - name: hadoop-datanode image: hadoop:3.3.1 command: ["/bin/bash", "-c", "hdfs datanode"] ports: - containerPort: 9864 volumeMounts: - name: hdfs-volume,更多视频教程www.fgedu.net.cn。 mountPath: /hdfs volumeClaimTemplates: - metadata: name: hdfs-volume spec: accessModes: ["ReadWriteOnce"] resources: requests: storage: 200Gi storageClassName: standard --- apiVersion: v1 kind: Service metadata: name: hadoop-datanode namespace: fgedu-bigdata-platform spec: selector: app: hadoop-datanode ports: - port: 9864 targetPort: 9864 type: ClusterIP EOF # 5. 部署Spark kubectl apply -f - << EOF apiVersion: apps/v1 kind: Deployment metadata: name: spark-master namespace: fgedu-bigdata-platform spec: replicas: 1 selector: matchLabels: app: spark-master template: metadata: labels: app: spark-master spec: containers: - name: spark-master image: spark:3.2.1 command: ["/bin/bash", "-c", "spark-class org.apache.spark.deploy.master.Master"] ports: - containerPort: 7077 - containerPort: 8080 resources: requests: cpu: 4 memory: 8Gi limits: cpu: 8 memory: 16Gi --- apiVersion: v1 kind: Service metadata: name: spark-master namespace: fgedu-bigdata-platform spec: selector: app: spark-master ports: - port: 7077 targetPort: 7077 - port: 8080 targetPort: 8080 type: ClusterIP --- apiVersion: apps/v1 kind: Deployment metadata: name: spark-worker namespace: fgedu-bigdata-platform spec: replicas: 3 selector: matchLabels: app: spark-worker template: metadata: labels: app: spark-worker spec: containers: - name: spark-worker image: spark:3.2.1 command: ["/bin/bash", "-c", "spark-class org.apache.spark.deploy.worker.Worker spark://spark-master:7077"] ports: - containerPort: 8081 resources: requests: cpu: 8 memory: 16Gi limits: cpu: 16 memory: 32Gi --- apiVersion: v1 kind: Service metadata: name: spark-worker namespace: fgedu-bigdata-platform spec: selector: app: spark-worker ports: - port: 8081 targetPort: 8081 type: ClusterIP EOF # 6. 部署Hive kubectl apply -f - << EOF apiVersion: apps/v1 kind: Deployment metadata: name: hive-metastore namespace: fgedu-bigdata-platform spec: replicas: 1 selector: matchLabels: app: hive-metastore template: metadata: labels: app: hive-metastore spec: containers: - name: hive-metastore image: hive:3.1.2 command: ["/bin/bash", "-c", "hive --service metastore"] ports: - containerPort: 9083 volumeMounts: - name: hive-volume mountPath: /hive volumes: - name: hive-volume persistentVolumeClaim: claimName: hive-pvc --- apiVersion: v1 kind: Service metadata: name: hive-metastore namespace: fgedu-bigdata-platform spec: selector: app: hive-metastore ports: - port: 9083 targetPort: 9083 type: ClusterIP --- apiVersion: apps/v1 kind: Deployment metadata: name: hive-server2 namespace: fgedu-bigdata-platform,更多学习教程公众号风哥教程itpux_com。 spec: replicas: 1 selector: matchLabels: app: hive-server2 template: metadata: labels: app: hive-server2 spec: containers: - name: hive-server2 image: hive:3.1.2 command: ["/bin/bash", "-c", "hive --service hiveserver2"] ports: - containerPort: 10000 volumeMounts: - name: hive-volume mountPath: /hive volumes: - name: hive-volume persistentVolumeClaim: claimName: hive-pvc --- apiVersion: v1 kind: Service metadata: name: hive-server2 namespace: fgedu-bigdata-platform spec: selector: app: hive-server2 ports: - port: 10000 targetPort: 10000 type: ClusterIP EOF # 7. 部署Presto kubectl apply -f - << EOF apiVersion: apps/v1 kind: Deployment metadata: name: presto-coordinator namespace: fgedu-bigdata-platform spec: replicas: 1 selector: matchLabels: app: presto-coordinator template: metadata: labels: app: presto-coordinator spec: containers: - name: presto-coordinator image: presto:0.267 command: ["/bin/bash", "-c", "presto-server run"] ports: - containerPort: 8080 resources: requests: cpu: 4 memory: 8Gi limits: cpu: 8 memory: 16Gi --- apiVersion: v1 kind: Service metadata: name: presto-coordinator namespace: fgedu-bigdata-platform spec: selector: app: presto-coordinator ports: - port: 8080 targetPort: 8080 type: ClusterIP --- apiVersion: apps/v1 kind: Deployment metadata: name: presto-worker namespace: fgedu-bigdata-platform spec: replicas: 3 selector: matchLabels: app: presto-worker template: metadata: labels: app: presto-worker spec: containers: - name: presto-worker image: presto:0.267 command: ["/bin/bash", "-c", "presto-server run"] resources: requests: cpu: 8 memory: 16Gi limits: cpu: 16 memory: 32Gi --- apiVersion: v1 kind: Service metadata: name: presto-worker namespace: fgedu-bigdata-platform spec: selector: app: presto-worker ports: - port: 8080 targetPort: 8080 type: ClusterIP EOF # 8. 部署监控和日志系统 # 安装Prometheus和Grafana helm repo add prometheus-community https://prometheus-community.github.io/helm-charts helm install prometheus prometheus-community/kube-prometheus-stack --namespace monitoring --create-namespace # 安装ELK helm repo add elastic https://helm.elastic.co helm install elasticsearch elastic/elasticsearch --namespace logging --create-namespace helm install kibana elastic/kibana --namespace logging helm install filebeat elastic/filebeat --namespace logging # 9. 验证部署 kubectl get pods -n fgedu-bigdata-platform kubectl get services -n fgedu-bigdata-platform
# 创建命名空间
kubectl create namespace fgedu-bigdata-platform
# 创建存储
kubectl apply -f – << EOF apiVersion: v1 kind: PersistentVolumeClaim metadata: name: hdfs-pvc namespace: fgedu-bigdata-platform spec: accessModes: - ReadWriteOnce resources: requests: storage: 500Gi storageClassName: standard --- apiVersion: v1 kind: PersistentVolumeClaim metadata: name: kafka-pvc namespace: fgedu-bigdata-platform spec: accessModes: - ReadWriteOnce resources: requests: storage: 300Gi storageClassName: standard --- apiVersion: v1 kind: PersistentVolumeClaim metadata: name: hive-pvc namespace: fgedu-bigdata-platform spec: accessModes: - ReadWriteOnce resources: requests: storage: 200Gi storageClassName: standard EOF # 2. 部署Zookeeper kubectl apply -f - << EOF apiVersion: apps/v1 kind: StatefulSet metadata: name: zookeeper namespace: fgedu-bigdata-platform spec: serviceName: zookeeper replicas: 3 selector: matchLabels: app: zookeeper template: metadata: labels: app: zookeeper spec: containers: - name: zookeeper image: zookeeper:3.6.3 ports: - containerPort: 2181 env: - name: ZOO_MY_ID valueFrom: fieldRef: fieldPath: metadata.name - name: ZOO_SERVERS,学习交流加群风哥QQ113257174。 value: server.0=zookeeper-0.zookeeper.fgedu-bigdata-platform.svc.cluster.local:2888:3888 server.1=zookeeper-1.zookeeper.fgedu-bigdata-platform.svc.cluster.local:2888:3888 server.2=zookeeper-2.zookeeper.fgedu-bigdata-platform.svc.cluster.local:2888:3888 volumeMounts: - name: zookeeper-data mountPath: /data volumeClaimTemplates: - metadata: name: zookeeper-data spec: accessModes: ["ReadWriteOnce"] resources: requests: storage: 50Gi storageClassName: standard --- apiVersion: v1 kind: Service metadata: name: zookeeper namespace: fgedu-bigdata-platform spec: selector: app: zookeeper ports: - port: 2181 targetPort: 2181 type: ClusterIP EOF # 3. 部署Kafka kubectl apply -f - << EOF apiVersion: apps/v1 kind: StatefulSet metadata: name: kafka namespace: fgedu-bigdata-platform spec: serviceName: kafka replicas: 3 selector: matchLabels: app: kafka template: metadata: labels: app: kafka spec: containers: - name: kafka image: kafka:2.8.1 ports: - containerPort: 9092 env: - name: KAFKA_ZOOKEEPER_CONNECT value: zookeeper.fgedu-bigdata-platform.svc.cluster.local:2181 - name: KAFKA_ADVERTISED_LISTENERS value: PLAINTEXT://kafka-0.kafka.fgedu-bigdata-platform.svc.cluster.local:9092,PLAINTEXT://kafka-1.kafka.fgedu-bigdata-platform.svc.cluster.local:9092,PLAINTEXT://kafka-2.kafka.fgedu-bigdata-platform.svc.cluster.local:9092 - name: KAFKA_BROKER_ID valueFrom: fieldRef: fieldPath: metadata.name volumeMounts: - name: kafka-data mountPath: /var/lib/kafka volumeClaimTemplates: - metadata: name: kafka-data spec: accessModes: ["ReadWriteOnce"] resources: requests: storage: 100Gi storageClassName: standard --- apiVersion: v1 kind: Service metadata: name: kafka namespace: fgedu-bigdata-platform spec: selector: app: kafka ports: - port: 9092 targetPort: 9092 type: ClusterIP EOF # 4. 部署Hadoop kubectl apply -f - << EOF apiVersion: apps/v1 kind: StatefulSet metadata: name: hadoop-namenode namespace: fgedu-bigdata-platform spec: serviceName: hadoop-namenode replicas: 1 selector: matchLabels: app: hadoop-namenode template: metadata: labels: app: hadoop-namenode spec: containers: - name: hadoop-namenode image: hadoop:3.3.1 command: ["/bin/bash", "-c", "hdfs namenode -format && hdfs namenode"] ports: - containerPort: 9000 - containerPort: 9870 volumeMounts: - name: hdfs-volume mountPath: /hdfs volumeClaimTemplates: - metadata: name: hdfs-volume spec: accessModes: ["ReadWriteOnce"] resources: requests: storage: 100Gi storageClassName: standard --- apiVersion: v1 kind: Service metadata: name: hadoop-namenode namespace: fgedu-bigdata-platform spec: selector: app: hadoop-namenode ports: - port: 9000 targetPort: 9000 - port: 9870 targetPort: 9870 type: ClusterIP --- apiVersion: apps/v1 kind: StatefulSet metadata: name: hadoop-datanode namespace: fgedu-bigdata-platform spec: serviceName: hadoop-datanode replicas: 3 selector: matchLabels: app: hadoop-datanode template: metadata: labels: app: hadoop-datanode spec: containers: - name: hadoop-datanode image: hadoop:3.3.1 command: ["/bin/bash", "-c", "hdfs datanode"] ports: - containerPort: 9864 volumeMounts: - name: hdfs-volume,更多视频教程www.fgedu.net.cn。 mountPath: /hdfs volumeClaimTemplates: - metadata: name: hdfs-volume spec: accessModes: ["ReadWriteOnce"] resources: requests: storage: 200Gi storageClassName: standard --- apiVersion: v1 kind: Service metadata: name: hadoop-datanode namespace: fgedu-bigdata-platform spec: selector: app: hadoop-datanode ports: - port: 9864 targetPort: 9864 type: ClusterIP EOF # 5. 部署Spark kubectl apply -f - << EOF apiVersion: apps/v1 kind: Deployment metadata: name: spark-master namespace: fgedu-bigdata-platform spec: replicas: 1 selector: matchLabels: app: spark-master template: metadata: labels: app: spark-master spec: containers: - name: spark-master image: spark:3.2.1 command: ["/bin/bash", "-c", "spark-class org.apache.spark.deploy.master.Master"] ports: - containerPort: 7077 - containerPort: 8080 resources: requests: cpu: 4 memory: 8Gi limits: cpu: 8 memory: 16Gi --- apiVersion: v1 kind: Service metadata: name: spark-master namespace: fgedu-bigdata-platform spec: selector: app: spark-master ports: - port: 7077 targetPort: 7077 - port: 8080 targetPort: 8080 type: ClusterIP --- apiVersion: apps/v1 kind: Deployment metadata: name: spark-worker namespace: fgedu-bigdata-platform spec: replicas: 3 selector: matchLabels: app: spark-worker template: metadata: labels: app: spark-worker spec: containers: - name: spark-worker image: spark:3.2.1 command: ["/bin/bash", "-c", "spark-class org.apache.spark.deploy.worker.Worker spark://spark-master:7077"] ports: - containerPort: 8081 resources: requests: cpu: 8 memory: 16Gi limits: cpu: 16 memory: 32Gi --- apiVersion: v1 kind: Service metadata: name: spark-worker namespace: fgedu-bigdata-platform spec: selector: app: spark-worker ports: - port: 8081 targetPort: 8081 type: ClusterIP EOF # 6. 部署Hive kubectl apply -f - << EOF apiVersion: apps/v1 kind: Deployment metadata: name: hive-metastore namespace: fgedu-bigdata-platform spec: replicas: 1 selector: matchLabels: app: hive-metastore template: metadata: labels: app: hive-metastore spec: containers: - name: hive-metastore image: hive:3.1.2 command: ["/bin/bash", "-c", "hive --service metastore"] ports: - containerPort: 9083 volumeMounts: - name: hive-volume mountPath: /hive volumes: - name: hive-volume persistentVolumeClaim: claimName: hive-pvc --- apiVersion: v1 kind: Service metadata: name: hive-metastore namespace: fgedu-bigdata-platform spec: selector: app: hive-metastore ports: - port: 9083 targetPort: 9083 type: ClusterIP --- apiVersion: apps/v1 kind: Deployment metadata: name: hive-server2 namespace: fgedu-bigdata-platform,更多学习教程公众号风哥教程itpux_com。 spec: replicas: 1 selector: matchLabels: app: hive-server2 template: metadata: labels: app: hive-server2 spec: containers: - name: hive-server2 image: hive:3.1.2 command: ["/bin/bash", "-c", "hive --service hiveserver2"] ports: - containerPort: 10000 volumeMounts: - name: hive-volume mountPath: /hive volumes: - name: hive-volume persistentVolumeClaim: claimName: hive-pvc --- apiVersion: v1 kind: Service metadata: name: hive-server2 namespace: fgedu-bigdata-platform spec: selector: app: hive-server2 ports: - port: 10000 targetPort: 10000 type: ClusterIP EOF # 7. 部署Presto kubectl apply -f - << EOF apiVersion: apps/v1 kind: Deployment metadata: name: presto-coordinator namespace: fgedu-bigdata-platform spec: replicas: 1 selector: matchLabels: app: presto-coordinator template: metadata: labels: app: presto-coordinator spec: containers: - name: presto-coordinator image: presto:0.267 command: ["/bin/bash", "-c", "presto-server run"] ports: - containerPort: 8080 resources: requests: cpu: 4 memory: 8Gi limits: cpu: 8 memory: 16Gi --- apiVersion: v1 kind: Service metadata: name: presto-coordinator namespace: fgedu-bigdata-platform spec: selector: app: presto-coordinator ports: - port: 8080 targetPort: 8080 type: ClusterIP --- apiVersion: apps/v1 kind: Deployment metadata: name: presto-worker namespace: fgedu-bigdata-platform spec: replicas: 3 selector: matchLabels: app: presto-worker template: metadata: labels: app: presto-worker spec: containers: - name: presto-worker image: presto:0.267 command: ["/bin/bash", "-c", "presto-server run"] resources: requests: cpu: 8 memory: 16Gi limits: cpu: 16 memory: 32Gi --- apiVersion: v1 kind: Service metadata: name: presto-worker namespace: fgedu-bigdata-platform spec: selector: app: presto-worker ports: - port: 8080 targetPort: 8080 type: ClusterIP EOF # 8. 部署监控和日志系统 # 安装Prometheus和Grafana helm repo add prometheus-community https://prometheus-community.github.io/helm-charts helm install prometheus prometheus-community/kube-prometheus-stack --namespace monitoring --create-namespace # 安装ELK helm repo add elastic https://helm.elastic.co helm install elasticsearch elastic/elasticsearch --namespace logging --create-namespace helm install kibana elastic/kibana --namespace logging helm install filebeat elastic/filebeat --namespace logging # 9. 验证部署 kubectl get pods -n fgedu-bigdata-platform kubectl get services -n fgedu-bigdata-platform
4.2 大数据处理实战
某企业需要处理海量数据,使用Spark进行数据处理。。
案例背景
- 数据量:1TB
- 处理任务:数据清洗、转换和分析
- 技术栈:Spark、HDFS、Kafka
- 硬件:10节点集群
部署方案
# 1. 准备数据
[root@fgedu-master ~]# kubectl apply -f – << EOF apiVersion: batch/v1 kind: Job metadata: name: data-ingest namespace: fgedu-bigdata spec: template: spec: containers: - name: data-ingest image: hadoop:3.3.1。 command: ["/bin/bash", "-c", "hdfs dfs -mkdir -p /data && wget -O /tmp/data.csv https://example.com/data.csv && hdfs dfs -put /tmp/data.csv /data/"],from K8S+DB视频:www.itpux.com。 volumeMounts: - name: hdfs-volume mountPath: /hdfs volumes: - name: hdfs-volume persistentVolumeClaim: claimName: hdfs-pvc restartPolicy: OnFailure EOF
[root@fgedu-master ~]# kubectl apply -f – << EOF apiVersion: batch/v1 kind: Job metadata: name: data-ingest namespace: fgedu-bigdata spec: template: spec: containers: - name: data-ingest image: hadoop:3.3.1。 command: ["/bin/bash", "-c", "hdfs dfs -mkdir -p /data && wget -O /tmp/data.csv https://example.com/data.csv && hdfs dfs -put /tmp/data.csv /data/"],from K8S+DB视频:www.itpux.com。 volumeMounts: - name: hdfs-volume mountPath: /hdfs volumes: - name: hdfs-volume persistentVolumeClaim: claimName: hdfs-pvc restartPolicy: OnFailure EOF
# 2. 提交Spark作业
[root@fgedu-master ~]# kubectl apply -f – << EOF apiVersion: batch/v1 kind: Job metadata: name: spark-job namespace: fgedu-bigdata spec: template: spec: containers: - name: spark-job image: spark:3.2.1 command: ["/bin/bash", "-c", "spark-submit --master spark://spark-master:7077 --class com.fgedu.data.ProcessingJob /app/processing.jar"] resources: requests: cpu: 8 memory: 16Gi limits: cpu: 16 memory: 32Gi restartPolicy: OnFailure EOF
[root@fgedu-master ~]# kubectl apply -f – << EOF apiVersion: batch/v1 kind: Job metadata: name: spark-job namespace: fgedu-bigdata spec: template: spec: containers: - name: spark-job image: spark:3.2.1 command: ["/bin/bash", "-c", "spark-submit --master spark://spark-master:7077 --class com.fgedu.data.ProcessingJob /app/processing.jar"] resources: requests: cpu: 8 memory: 16Gi limits: cpu: 16 memory: 32Gi restartPolicy: OnFailure EOF
# 3. 查看作业状态
[root@fgedu-master ~]# kubectl get pods -n fgedu-bigdata
[root@fgedu-master ~]# kubectl get pods -n fgedu-bigdata
NAME READY STATUS RESTARTS AGE
data-ingest-2k8x9 1/1 Running 0 5m
spark-job-5k9x8 1/1 Running 0 2m
data-ingest-2k8x9 1/1 Running 0 5m
spark-job-5k9x8 1/1 Running 0 2m
# 4. 查看作业日志
[root@fgedu-master ~]# kubectl logs -f spark-job-5k9x8 -n fgedu-bigdata
[root@fgedu-master ~]# kubectl logs -f spark-job-5k9x8 -n fgedu-bigdata
2023-10-01 10:00:00,000 – INFO – Starting Spark job…
2023-10-01 10:00:01,000 – INFO – Loading data from hdfs://hadoop-namenode:9000/data/data.csv
2023-10-01 10:00:30,000 – INFO – Data loaded successfully
2023-10-01 10:00:30,000 – INFO – Cleaning data…
2023-10-01 10:01:00,000 – INFO – Data cleaned successfully
2023-10-01 10:01:00,000 – INFO – Transforming data…
2023-10-01 10:01:30,000 – INFO – Data transformed successfully
2023-10-01 10:01:30,000 – INFO – Analyzing data…
2023-10-01 10:02:00,000 – INFO – Data analyzed successfully
2023-10-01 10:02:00,000 – INFO – Saving results to hdfs://hadoop-namenode:9000/results/
2023-10-01 10:02:30,000 – INFO – Results saved successfully
2023-10-01 10:02:30,000 – INFO – Spark job completed
2023-10-01 10:00:01,000 – INFO – Loading data from hdfs://hadoop-namenode:9000/data/data.csv
2023-10-01 10:00:30,000 – INFO – Data loaded successfully
2023-10-01 10:00:30,000 – INFO – Cleaning data…
2023-10-01 10:01:00,000 – INFO – Data cleaned successfully
2023-10-01 10:01:00,000 – INFO – Transforming data…
2023-10-01 10:01:30,000 – INFO – Data transformed successfully
2023-10-01 10:01:30,000 – INFO – Analyzing data…
2023-10-01 10:02:00,000 – INFO – Data analyzed successfully
2023-10-01 10:02:00,000 – INFO – Saving results to hdfs://hadoop-namenode:9000/results/
2023-10-01 10:02:30,000 – INFO – Results saved successfully
2023-10-01 10:02:30,000 – INFO – Spark job completed
Part05-风哥经验总结与分享
5.1 大数据与Kubernetes最佳实践
- 资源管理:
- 为大数据工作负载设置合理的资源请求和限制
- 使用节点亲和性将大数据任务调度到适合的节点
- 配置资源配额和限制范围
- 存储管理:
- 使用PersistentVolume存储大数据
- 选择合适的存储类型(HDD、SSD)
- 配置存储类以支持动态卷 provisioning
- 网络管理:
- 使用高带宽网络
- 配置网络策略以确保大数据服务的通信
- 使用Service和Ingress暴露服务
- 监控与日志:
- 使用Prometheus监控资源使用情况
- 使用Grafana创建监控面板
- 使用ELK/EFK聚合和分析日志
- 设置告警规则监控大数据服务状态
- 安全:
- 使用RBAC控制对大数据资源的访问
- 使用Secret存储敏感信息(API密钥、密码)
- 配置网络策略隔离大数据工作负载
- 高可用性:
- 部署多个副本确保服务高可用
- 使用StatefulSet管理有状态服务
- 配置备份和恢复策略
5.2 常见问题与解决方案
问题 原因 解决方案
资源不足 集群资源配置不足 增加集群资源
存储不足 存储容量不足 增加存储容量
网络带宽不足 网络配置不足 升级网络带宽
作业执行失败 配置错误或数据问题 检查配置和数据
服务启动失败 依赖项错误 检查依赖项
监控数据缺失 监控配置错误 检查监控配置
日志管理困难 日志分散 使用集中式日志管理
安全漏洞 镜像存在安全漏洞 定期扫描镜像,更新基础镜像
资源不足 集群资源配置不足 增加集群资源
存储不足 存储容量不足 增加存储容量
网络带宽不足 网络配置不足 升级网络带宽
作业执行失败 配置错误或数据问题 检查配置和数据
服务启动失败 依赖项错误 检查依赖项
监控数据缺失 监控配置错误 检查监控配置
日志管理困难 日志分散 使用集中式日志管理
安全漏洞 镜像存在安全漏洞 定期扫描镜像,更新基础镜像
本文由风哥教程整理发布,仅用于学习测试使用,转载注明出处:http://www.fgedu.net.cn/10327.html
