本文档风哥主要介绍Apache Kafka架构原理与集群部署实战,包括Kafka核心概念、Kafka集群架构设计、Kafka生产环境部署、Kafka参数配置优化、Kafka集群运维管理等内容,风哥教程参考Kafka官方文档Introduction、Getting Started、Configuration等内容,适合大数据开发运维人员在学习和测试中使用,如果要应用于生产环境则需要自行确认。更多视频教程www.fgedu.net.cn
Part01-基础概念与理论知识
1.1 Kafka核心概念与架构
Apache Kafka是一个分布式流处理平台,最初由LinkedIn开发,后来贡献给Apache基金会。Kafka具有高吞吐量、低延迟、高可用性等特点,被广泛应用于消息队列、流处理、日志收集等场景。学习交流加群风哥微信: itpux-com
- Broker:Kafka服务节点,一个Kafka集群由多个Broker组成
- Topic:消息主题,消息的逻辑分类
- Partition:分区,Topic的物理分片,实现数据并行处理
- Producer:消息生产者,向Kafka发送消息的应用
- Consumer:消息消费者,从Kafka读取消息的应用
- Consumer Group:消费者组,实现消息的负载均衡和广播
- ZooKeeper:协调服务,管理Kafka集群元数据
1.2 Kafka核心组件详解
Kafka架构主要由以下组件构成:
- 消息存储:Kafka使用日志文件存储消息,支持消息持久化和顺序读写
- 分区机制:每个Topic可以分为多个Partition,分布在不同Broker上
- 副本机制:每个Partition可以有多个副本,实现数据容错
- 生产者:支持同步/异步发送,支持消息压缩和批处理
- 消费者:支持消费者组模式,实现消息的负载均衡消费
/bigdata/kafka-logs/
├── fgedu-topic-0/ # Topic分区目录
│ ├── 00000000000000000000.log # 日志文件
│ ├── 00000000000000000000.index # 索引文件
│ ├── 00000000000000000000.timeindex # 时间索引
│ └── leader-epoch-checkpoint # Leader epoch检查点
├── fgedu-topic-1/
└── fgedu-topic-2/
1.3 Kafka核心特性与优势
Kafka具有以下核心特性:
- 高吞吐量:单节点每秒可处理百万级消息
- 低延迟:消息延迟可控制在毫秒级别
- 高可用性:支持多副本机制,自动故障转移
- 持久化存储:消息持久化到磁盘,支持消息回溯
- 水平扩展:支持在线扩容,无需停机
- 多客户端支持:支持Java、Python、Go等多种语言客户端
Part02-生产环境规划与建议
2.1 Kafka集群规划
Kafka集群规划需要考虑以下因素:
– 开发环境:3节点(1 Broker + 1 ZooKeeper + 1 Client)
– 测试环境:3节点(3 Broker + 3 ZooKeeper)
– 生产环境:5节点以上(3+ Broker + 3+ ZooKeeper)
# 生产环境推荐配置
– Broker节点:至少3个,推荐5个或更多
– ZooKeeper节点:3个或5个(奇数个)
– 副本因子:建议3(每个Partition有3个副本)
– 最小同步副本:建议2(min.insync.replicas=2)
# 集群节点角色规划
节点1(192.168.1.51):Broker + ZooKeeper
节点2(192.168.1.52):Broker + ZooKeeper
节点3(192.168.1.53):Broker + ZooKeeper
节点4(192.168.1.54):Broker
节点5(192.168.1.55):Broker
2.2 Kafka硬件资源要求
Kafka硬件资源规划建议:
– 开发环境:4核
– 测试环境:8核
– 生产环境:16核以上
# 内存要求
– 开发环境:8GB
– 测试环境:16GB
– 生产环境:32GB-128GB
– JVM堆内存:建议6-8GB,不超过32GB
# 磁盘要求
– 类型:SSD或NVMe SSD
– 容量:根据数据保留策略计算
– 建议:独立磁盘用于Kafka日志
– RAID:推荐RAID10或JBOD
# 网络要求
– 带宽:千兆网卡起步,推荐万兆
– 延迟:内网延迟小于1ms
2.3 Kafka网络与安全规划
Kafka网络与安全规划要点:
- 网络隔离:Kafka集群部署在内网,通过代理或网关对外服务
- 端口规划:Broker使用9092端口,ZooKeeper使用2181端口
- 安全认证:生产环境建议启用SASL认证和SSL加密
- 权限控制:使用ACL实现细粒度权限管理
Part03-生产环境项目实施方案
3.1 Kafka集群安装部署
3.1.1 环境准备
$ java -version
openjdk version “17.0.2” 2022-01-18 LTS
OpenJDK Runtime Environment (build 17.0.2+8-LTS-86)
OpenJDK 64-Bit Server VM (build 17.0.2+8-LTS-86, mixed mode, sharing)
# 创建Kafka用户
$ useradd -m -s /bin/bash kafka
$ id kafka
uid=1001(kafka) gid=1001(kafka) groups=1001(kafka)
# 创建安装目录
$ mkdir -p /bigdata/app/kafka
$ mkdir -p /bigdata/kafka-logs
$ mkdir -p /bigdata/zookeeper-data
$ chown -R kafka:kafka /bigdata/app/kafka
$ chown -R kafka:kafka /bigdata/kafka-logs
$ chown -R kafka:kafka /bigdata/zookeeper-data
3.1.2 下载安装Kafka
$ cd /bigdata/app
$ wget https://archive.apache.org/dist/kafka/3.6.1/kafka_2.13-3.6.1.tgz
–2026-04-08 10:00:00– https://archive.apache.org/dist/kafka/3.6.1/kafka_2.13-3.6.1.tgz
Resolving archive.apache.org… 192.168.1.100
Connecting to archive.apache.org|192.168.1.100|:443… connected.
HTTP request sent, awaiting response… 200 OK
Length: 105847234 (101M) [application/x-gzip]
Saving to: ‘kafka_2.13-3.6.1.tgz’
kafka_2.13-3.6.1.tgz 100%[=====================================>] 100.94M 10.2MB/s in 9.9s
2026-04-08 10:00:10 (10.2 MB/s) – ‘kafka_2.13-3.6.1.tgz’ saved [105847234/105847234]
# 解压安装
$ tar -zxvf kafka_2.13-3.6.1.tgz
$ mv kafka_2.13-3.6.1 kafka
$ chown -R kafka:kafka /bigdata/app/kafka
# 查看目录结构
$ ls -la /bigdata/app/kafka/
total 64
drwxr-xr-x 1 kafka kafka 4096 Apr 8 10:00 .
drwxr-xr-x 3 root root 4096 Apr 8 10:00 ..
drwxr-xr-x 3 kafka kafka 4096 Apr 8 10:00 bin
drwxr-xr-x 3 kafka kafka 4096 Apr 8 10:00 config
drwxr-xr-x 2 kafka kafka 4096 Apr 8 10:00 libs
-rw-r–r– 1 kafka kafka 1453 Jan 1 00:00 LICENSE
drwxr-xr-x 2 kafka kafka 4096 Apr 8 10:00 licenses
-rw-r–r– 1 kafka kafka 8762 Jan 1 00:00 NOTICE
drwxr-xr-x 2 kafka kafka 4096 Apr 8 10:00 site-docs
3.2 Kafka核心参数配置
3.2.1 ZooKeeper配置
$ cat > /bigdata/app/kafka/config/zookeeper.properties << 'EOF' # the directory where the snapshot is stored. dataDir=/bigdata/zookeeper-data # the port at which the clients will connect clientPort=2181 # disable the per-ip limit on the number of connections maxClientCnxns=0 # disable the adminserver by default admin.enableServer=false # 集群配置 server.1=192.168.1.51:2888:3888 server.2=192.168.1.52:2888:3888 server.3=192.168.1.53:2888:3888 # 自动清理快照 autopurge.snapRetainCount=3 autopurge.purgeInterval=1 EOF # 创建myid文件(每个节点不同) # 节点1 $ echo "1" > /bigdata/zookeeper-data/myid
# 节点2
$ echo “2” > /bigdata/zookeeper-data/myid
# 节点3
$ echo “3” > /bigdata/zookeeper-data/myid
3.2.2 Kafka Broker配置
$ cat > /bigdata/app/kafka/config/server.properties << 'EOF' # broker id,每个节点唯一 broker.id=1 # 监听地址 listeners=PLAINTEXT://192.168.1.51:9092 advertised.listeners=PLAINTEXT://192.168.1.51:9092 # 日志存储目录 log.dirs=/bigdata/kafka-logs # ZooKeeper连接 zookeeper.connect=192.168.1.51:2181,192.168.1.52:2181,192.168.1.53:2181 # 日志保留策略 num.partitions=3 default.replication.factor=3 min.insync.replicas=2 log.retention.hours=168 log.retention.bytes=1073741824 log.segment.bytes=1073741824 log.retention.check.interval.ms=300000 # 性能优化参数 num.network.threads=3 num.io.threads=8 socket.send.buffer.bytes=102400 socket.receive.buffer.bytes=102400 socket.request.max.bytes=104857600 # JVM参数(在启动脚本中设置) # KAFKA_HEAP_OPTS="-Xms8g -Xmx8g" # 日志清理策略 log.cleanup.policy=delete log.cleaner.enable=true EOF
3.3 ZooKeeper集群配置
3.3.1 配置ZooKeeper系统服务
$ cat > /etc/systemd/system/zookeeper.service << 'EOF' [Unit] Description=Apache ZooKeeper server Documentation=http://zookeeper.apache.org Requires=network.target remote-fs.target After=network.target remote-fs.target [Service] Type=simple User=kafka Group=kafka ExecStart=/bigdata/app/kafka/bin/zookeeper-server-start.sh /bigdata/app/kafka/config/zookeeper.properties ExecStop=/bigdata/app/kafka/bin/zookeeper-server-stop.sh Restart=on-failure RestartSec=5 [Install] WantedBy=multi-user.target EOF # 创建Kafka systemd服务 $ cat > /etc/systemd/system/kafka.service << 'EOF' [Unit] Description=Apache Kafka server Documentation=http://kafka.apache.org Requires=zookeeper.service After=zookeeper.service [Service] Type=simple User=kafka Group=kafka Environment="KAFKA_HEAP_OPTS=-Xms8g -Xmx8g" Environment="KAFKA_JVM_PERFORMANCE_OPTS=-XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35" ExecStart=/bigdata/app/kafka/bin/kafka-server-start.sh /bigdata/app/kafka/config/server.properties ExecStop=/bigdata/app/kafka/bin/kafka-server-stop.sh Restart=on-failure RestartSec=5 [Install] WantedBy=multi-user.target EOF # 重载systemd $ systemctl daemon-reload
Part04-生产案例与实战讲解
4.1 Kafka集群启动与验证
4.1.1 启动ZooKeeper集群
$ systemctl start zookeeper
$ systemctl status zookeeper
● zookeeper.service – Apache ZooKeeper server
Loaded: loaded (/etc/systemd/system/zookeeper.service; disabled; vendor preset: disabled)
Active: active (running) since Mon 2026-04-08 10:15:00 CST; 5s ago
Docs: http://zookeeper.apache.org
Main PID: 12345 (java)
Tasks: 15 (limit: 4915)
Memory: 128.5M
CGroup: /system.slice/zookeeper.service
└─12345 /usr/bin/java -cp /bigdata/app/kafka/libs/*…
Apr 08 10:15:00 fgedu-kafka01 systemd[1]: Started Apache ZooKeeper server.
# 检查ZooKeeper状态
$ echo “stat” | nc localhost 2181
Zookeeper version: 3.8.3
Clients:
/192.168.1.51:45678[0](queued=0,recved=1,sent=0)
Latency min/avg/max: 0/0/0
Received: 1
Sent: 0
Connections: 1
Outstanding: 0
Zxid: 0x100000000
Mode: follower
Node count: 5
4.1.2 启动Kafka集群
$ systemctl start kafka
$ systemctl status kafka
● kafka.service – Apache Kafka server
Loaded: loaded (/etc/systemd/system/kafka.service; disabled; vendor preset: disabled)
Active: active (running) since Mon 2026-04-08 10:20:00 CST; 5s ago
Docs: http://kafka.apache.org
Main PID: 12456 (java)
Tasks: 65 (limit: 4915)
Memory: 2.1G
CGroup: /system.slice/kafka.service
└─12456 /usr/bin/java -Xms8g -Xmx8g -XX:+UseG1GC…
Apr 08 10:20:00 fgedu-kafka01 systemd[1]: Started Apache Kafka server.
# 检查Kafka进程
$ jps
12345 QuorumPeerMain
12456 Kafka
12567 Jps
# 设置开机自启
$ systemctl enable zookeeper
$ systemctl enable kafka
Created symlink /etc/systemd/system/multi-user.target.wants/zookeeper.service → /etc/systemd/system/zookeeper.service.
Created symlink /etc/systemd/system/multi-user.target.wants/kafka.service → /etc/systemd/system/kafka.service.
4.2 Kafka消息收发测试
4.2.1 创建Topic
$ /bigdata/app/kafka/bin/kafka-topics.sh –create \
–topic fgedu-test-topic \
–bootstrap-server 192.168.1.51:9092 \
–partitions 3 \
–replication-factor 3
Created topic fgedu-test-topic.
# 查看Topic列表
$ /bigdata/app/kafka/bin/kafka-topics.sh –list \
–bootstrap-server 192.168.1.51:9092
fgedu-test-topic
# 查看Topic详情
$ /bigdata/app/kafka/bin/kafka-topics.sh –describe \
–topic fgedu-test-topic \
–bootstrap-server 192.168.1.51:9092
Topic: fgedu-test-topic TopicId: abc123 PartitionCount: 3 ReplicationFactor: 3
Topic: fgedu-test-topic Partition: 0 Leader: 1 Replicas: 1,2,3 Isr: 1,2,3
Topic: fgedu-test-topic Partition: 1 Leader: 2 Replicas: 2,3,1 Isr: 2,3,1
Topic: fgedu-test-topic Partition: 2 Leader: 3 Replicas: 3,1,2 Isr: 3,1,2
4.2.2 生产消息
$ /bigdata/app/kafka/bin/kafka-console-producer.sh \
–topic fgedu-test-topic \
–bootstrap-server 192.168.1.51:9092
>hello kafka from fgedu
>this is test message 1
>this is test message 2
>welcome to bigdata world
>from bigdata视频:www.itpux.com
4.2.3 消费消息
$ /bigdata/app/kafka/bin/kafka-console-consumer.sh \
–topic fgedu-test-topic \
–from-beginning \
–bootstrap-server 192.168.1.51:9092
hello kafka from fgedu
this is test message 1
this is test message 2
welcome to bigdata world
from bigdata视频:www.itpux.com
# 查看消费者组
$ /bigdata/app/kafka/bin/kafka-consumer-groups.sh –list \
–bootstrap-server 192.168.1.51:9092
console-consumer-12345
4.3 Kafka常见问题处理
4.3.1 Broker无法启动
# 排查步骤:
# 1. 检查日志文件
$ tail -100 /bigdata/app/kafka/logs/server.log
[2026-04-08 10:25:00,000] ERROR [KafkaServer id=1] Fatal error during KafkaServer startup. (kafka.server.KafkaServer)
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss
# 2. 检查ZooKeeper连接
$ echo “stat” | nc localhost 2181
# 3. 检查端口占用
$ netstat -tlnp | grep 9092
# 4. 检查磁盘空间
$ df -h /bigdata/kafka-logs
Filesystem Size Used Avail Use% Mounted on
/dev/sdb1 500G 50G 450G 10% /bigdata
# 5. 解决方案
# – 确保ZooKeeper集群正常运行
# – 检查broker.id是否唯一
# – 检查listeners配置是否正确
# – 检查磁盘空间是否充足
4.3.2 消息消费延迟
$ /bigdata/app/kafka/bin/kafka-consumer-groups.sh \
–describe \
–group fgedu-consumer-group \
–bootstrap-server 192.168.1.51:9092
GROUP TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG
fgedu-consumer-group fgedu-test-topic 0 1000 5000 4000
fgedu-consumer-group fgedu-test-topic 1 800 4500 3700
fgedu-consumer-group fgedu-test-topic 2 1200 4800 3600
# 解决方案
# 1. 增加消费者数量
# 2. 增加分区数量
# 3. 优化消费者处理逻辑
# 4. 调整fetch.min.bytes参数
Part05-风哥经验总结与分享
5.1 Kafka生产最佳实践
Kafka生产环境最佳实践建议:
1. 至少部署3个Broker节点
2. ZooKeeper独立部署或使用3节点集群
3. 启用JMX监控
4. 配置合理的日志保留策略
# 性能优化建议
1. 使用SSD磁盘
2. 调整JVM参数,使用G1垃圾回收器
3. 合理设置分区数量
4. 启用消息压缩
# 安全配置建议
1. 启用SASL认证
2. 启用SSL加密传输
3. 配置ACL权限控制
4. 定期审计访问日志
5.2 Kafka部署检查清单
Kafka部署完成后,需要检查以下项目:
- Java版本是否正确(推荐Java 17)
- ZooKeeper集群是否正常运行
- Kafka Broker是否全部启动
- Topic创建是否成功
- 消息生产消费是否正常
- 副本同步是否正常
- 监控指标是否正常采集
- 日志文件是否有异常
5.3 Kafka运维工具推荐
Kafka运维常用工具:
- Kafka Manager:Yahoo开源的Kafka集群管理工具
- Kafka Eagle:Kafka集群监控和管理平台
- CMAK:Cluster Manager for Apache Kafka
- Burrow:Kafka消费者延迟监控工具
- JMX Exporter:JMX指标导出工具,配合Prometheus使用
本文由风哥教程整理发布,仅用于学习测试使用,转载注明出处:http://www.fgedu.net.cn/10327.html
