本文档风哥主要介绍Hadoop YARN部署与资源调度实战,包括YARN架构原理、YARN核心组件角色、YARN核心特性、YARN集群规划、YARN配置文件说明、YARN核心参数配置、YARN服务启动、YARN集群验证、YARN调度器配置、YARN队列管理、YARN应用程序管理、YARN常见问题处理等内容,风哥教程参考Hadoop官方文档YARN、Cluster Setup等内容,适合大数据运维人员在学习和测试中使用,如果要应用于生产环境则需要自行确认。
Part01-基础概念与理论知识
1.1 YARN架构原理
YARN(Yet Another Resource Negotiator)是Hadoop的资源管理和作业调度框架,负责集群资源管理和应用程序调度。更多视频教程www.fgedu.net.cn
- 主从架构:ResourceManager(主节点)和NodeManager(从节点)
- 资源管理:统一管理集群资源(CPU、内存)
- 作业调度:支持多种调度器(FIFO、Capacity、Fair)
- 应用隔离:每个应用程序独立的ApplicationMaster
- 资源隔离:支持内存和CPU资源隔离
1.2 YARN核心组件角色
YARN核心组件角色:
| 组件 | 角色 | 功能 |
|---|---|---|
| ResourceManager | 主节点 | 全局资源管理、应用程序调度、集群监控 |
| NodeManager | 从节点 | 节点资源管理、Container管理、心跳上报 |
| ApplicationMaster | 应用程序主节点 | 应用程序资源申请、任务调度、任务监控 |
| Container | 资源容器 | 资源封装、任务运行环境 |
1.3 YARN核心特性
YARN核心特性:
- 资源统一管理:统一管理CPU、内存等资源
- 多租户支持:支持多用户、多应用程序共享集群
- 资源隔离:支持内存和CPU资源隔离
- 调度策略:支持多种调度策略(FIFO、Capacity、Fair)
- 高可用:支持ResourceManager HA
Part02-生产环境规划与建议
2.1 YARN集群规划
YARN集群规划需要考虑节点数量、硬件配置、资源分配等因素:
## ResourceManager节点(2台)
rm1.fgedu.net.cn(192.168.1.15)
– CPU:16核
– 内存:64GB
– 磁盘:2TB RAID1
– 部署:ResourceManager、JobHistoryServer
rm2.fgedu.net.cn(192.168.1.16)
– CPU:16核
– 内存:64GB
– 磁盘:2TB RAID1
– 部署:ResourceManager
## NodeManager节点(6台)
dn1-dn6.fgedu.net.cn(192.168.1.12-17)
– CPU:16核
– 内存:64GB
– 磁盘:12块8TB(JBOD)
– 部署:NodeManager、DataNode
2.2 YARN配置文件说明
YARN核心配置文件:
## yarn-site.xml
YARN核心配置文件,包括:
– yarn.resourcemanager.hostname:ResourceManager主机名
– yarn.nodemanager.aux-services:NodeManager辅助服务
– yarn.nodemanager.resource.memory-mb:节点内存资源
– yarn.nodemanager.resource.cpu-vcores:节点CPU资源
– yarn.scheduler.minimum-allocation-mb:最小内存分配
– yarn.scheduler.maximum-allocation-mb:最大内存分配
## mapred-site.xml
MapReduce配置文件,包括:
– mapreduce.framework.name:运行框架
– mapreduce.jobhistory.address:历史服务器地址
– yarn.app.mapreduce.am.resource.mb:ApplicationMaster内存
## yarn-env.sh
YARN环境变量配置,包括:
– YARN_RESOURCEMANAGER_HEAPSIZE:ResourceManager堆内存
– YARN_NODEMANAGER_HEAPSIZE:NodeManager堆内存
2.3 YARN核心参数配置
YARN核心参数配置建议:
## yarn-site.xml配置
## mapred-site.xml配置
Part03-生产环境项目实施方案
3.1 YARN服务启动
3.1.1 配置YARN配置文件
cat > $HADOOP_HOME/etc/hadoop/yarn-site.xml << 'EOF'
EOF
# 配置mapred-site.xml
cat > $HADOOP_HOME/etc/hadoop/mapred-site.xml << 'EOF'
EOF
3.1.2 启动YARN服务
yarn –daemon start resourcemanager
# 查看ResourceManager进程
jps
12345 ResourceManager
12456 Jps
# 查看ResourceManager日志
tail -f /bigdata/logs/hadoop/yarn-hadoop-resourcemanager-rm1.log
2026-04-08 14:00:30,123 INFO resourcemanager.ResourceManager: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting ResourceManager
STARTUP_MSG: host = rm1.fgedu.net.cn/192.168.1.15
STARTUP_MSG: args = []
STARTUP_MSG: version = 3.3.5
************************************************************/
2026-04-08 14:00:30,456 INFO resourcemanager.ResourceManager: registered UNIX signal handlers for [TERM, HUP, INT]
2026-04-08 14:00:30,789 INFO resourcemanager.ResourceManager: createResourceManager []
2026-04-08 14:00:31,012 INFO resourcemanager.ResourceManager: Transitioning to standby state
# 启动NodeManager(在所有NodeManager节点执行)
yarn –daemon start nodemanager
# 查看NodeManager进程
jps
12345 NodeManager
12456 Jps
# 启动JobHistoryServer
mr-jobhistory-daemon.sh start historyserver
# 查看所有进程
jps
12345 ResourceManager
12456 NodeManager
12567 JobHistoryServer
12678 Jps
3.2 YARN集群验证
3.2.1 查看YARN集群状态
yarn node -list
2026-04-08 14:10:30,123 INFO client.RMProxy: Connecting to ResourceManager at rm1/192.168.1.15:8032
Total Nodes:6
Node-Id Node-State Node-Http-Address Number-of-Running-Containers
dn1:42657 RUNNING dn1:8042 0
dn2:42657 RUNNING dn2:8042 0
dn3:42657 RUNNING dn3:8042 0
dn4:42657 RUNNING dn4:8042 0
dn5:42657 RUNNING dn5:8042 0
dn6:42657 RUNNING dn6:8042 0
# 查看YARN应用程序列表
yarn application -list
2026-04-08 14:10:35,123 INFO client.RMProxy: Connecting to ResourceManager at rm1/192.168.1.15:8032
Total number of applications (application-types: [], states: [SUBMITTED, ACCEPTED, RUNNING] and tags: []):0
Application-Id Application-Name Application-Type User Queue State Final-State Progress Tracking-URL
# 查看YARN队列信息
yarn queue -status default
2026-04-08 14:10:40,123 INFO client.RMProxy: Connecting to ResourceManager at rm1/192.168.1.15:8032
Queue Information:
Queue Name: default
State : RUNNING
Capacity : 100.0%
Current Capacity : 0.0%
Maximum Capacity : 100.0%
Default Node Label Expression :
Default Node Label Expression :
Accessible Node Labels : *
Preemption : disabled
Intra-queue Preemption : disabled
Priority : 0
Applications : 0
Allocated Containers : 0
Available Containers : 0
Reserved Containers : 0
Used Resources :
Configured Capacity : 100.0%
Configured Max Capacity : 100.0%
Configured Minimum User Limit Percent : 100 %
Configured User Limit Factor : 1.0
Current Absolute Capacity : 100.0 %
Current Absolute Maximum Capacity : 100.0 %
Used Resources :
Available Resources :
Reserved Resources :
Used Resources :
Num Active Applications : 0
Num Pending Applications : 0
Num Containers : 0
Max Applications : 10000
Max Applications Per User : 10000
Max Application Master Resources :
Used Application Master Resources :
Max Application Master Resources Per User :
# 访问YARN Web界面
# 浏览器访问:http://rm1:8088
3.3 YARN调度器配置
3.3.1 Capacity Scheduler配置
cat > $HADOOP_HOME/etc/hadoop/capacity-scheduler.xml << 'EOF'
EOF
# 刷新队列配置
yarn rmadmin -refreshQueues
2026-04-08 14:20:30,123 INFO client.RMProxy: Connecting to ResourceManager at rm1/192.168.1.15:8032
Refresh queues successful for capacity scheduler
Part04-生产案例与实战讲解
4.1 YARN队列管理
4.1.1 查看队列信息
yarn queue -list
2026-04-08 14:25:30,123 INFO client.RMProxy: Connecting to ResourceManager at rm1/192.168.1.15:8032
Total Queues : 4
Queue Name: root
Queue Name: root.default
Queue Name: root.prod
Queue Name: root.dev
Queue Name: root.test
# 查看指定队列状态
yarn queue -status prod
2026-04-08 14:25:35,123 INFO client.RMProxy: Connecting to ResourceManager at rm1/192.168.1.15:8032
Queue Information:
Queue Name: prod
State : RUNNING
Capacity : 50.0%
Current Capacity : 0.0%
Maximum Capacity : 70.0%
Default Node Label Expression :
Accessible Node Labels : *
Preemption : disabled
Intra-queue Preemption : disabled
Priority : 0
Applications : 0
Allocated Containers : 0
Available Containers : 0
Reserved Containers : 0
Used Resources :
Configured Capacity : 50.0%
Configured Max Capacity : 70.0%
Configured Minimum User Limit Percent : 100 %
Configured User Limit Factor : 1.0
Current Absolute Capacity : 50.0 %
Current Absolute Maximum Capacity : 70.0 %
Used Resources :
Available Resources :
Reserved Resources :
Used Resources :
Num Active Applications : 0
Num Pending Applications : 0
Num Containers : 0
Max Applications : 5000
Max Applications Per User : 5000
Max Application Master Resources :
Used Application Master Resources :
Max Application Master Resources Per User :
4.1.2 提交应用到指定队列
hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.5.jar pi -D mapreduce.job.queuename=prod 10 100
Number of Maps = 10
Samples per Map = 100
Wrote input for Map #0
Wrote input for Map #1
Wrote input for Map #2
Wrote input for Map #3
Wrote input for Map #4
Wrote input for Map #5
Wrote input for Map #6
Wrote input for Map #7
Wrote input for Map #8
Wrote input for Map #9
Starting Job
2026-04-08 14:30:30,123 INFO client.RMProxy: Connecting to ResourceManager at rm1/192.168.1.15:8032
2026-04-08 14:30:30,456 INFO input.FileInputFormat: Total input files to process : 10
2026-04-08 14:30:30,789 INFO mapreduce.JobSubmitter: number of splits:10
2026-04-08 14:30:31,012 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1234567890123_0001
2026-04-08 14:30:31,234 INFO impl.YarnClientImpl: Submitted application application_1234567890123_0001
2026-04-08 14:30:31,456 INFO mapreduce.Job: The url to track the job: http://rm1:8088/proxy/application_1234567890123_0001/
2026-04-08 14:30:31,678 INFO mapreduce.Job: Running job: job_1234567890123_0001
2026-04-08 14:30:40,890 INFO mapreduce.Job: Job job_1234567890123_0001 running in uber mode : false
2026-04-08 14:30:40,012 INFO mapreduce.Job: map 0% reduce 0%
2026-04-08 14:30:50,234 INFO mapreduce.Job: map 10% reduce 0%
2026-04-08 14:31:00,456 INFO mapreduce.Job: map 20% reduce 0%
2026-04-08 14:31:10,678 INFO mapreduce.Job: map 30% reduce 0%
2026-04-08 14:31:20,890 INFO mapreduce.Job: map 40% reduce 0%
2026-04-08 14:31:30,012 INFO mapreduce.Job: map 50% reduce 0%
2026-04-08 14:31:40,234 INFO mapreduce.Job: map 60% reduce 0%
2026-04-08 14:31:50,456 INFO mapreduce.Job: map 70% reduce 0%
2026-04-08 14:32:00,678 INFO mapreduce.Job: map 80% reduce 0%
2026-04-08 14:32:10,890 INFO mapreduce.Job: map 90% reduce 0%
2026-04-08 14:32:20,012 INFO mapreduce.Job: map 100% reduce 0%
2026-04-08 14:32:30,234 INFO mapreduce.Job: map 100% reduce 100%
Job Ended: 1234567890
Estimated value of Pi is 3.14800000000000000000
4.2 YARN应用程序管理
4.2.1 查看应用程序状态
yarn application -list -appStates RUNNING
2026-04-08 14:35:30,123 INFO client.RMProxy: Connecting to ResourceManager at rm1/192.168.1.15:8032
Total number of applications (application-types: [], states: [RUNNING] and tags: []):1
Application-Id Application-Name Application-Type User Queue State Final-State Progress Tracking-URL
application_1234567890123_0001 QuasiMonteCarlo MAPREDUCE hadoop prod RUNNING UNDEFINED 50.0% http://dn1:42657
# 查看应用程序详情
yarn application -status application_1234567890123_0001
2026-04-08 14:35:35,123 INFO client.RMProxy: Connecting to ResourceManager at rm1/192.168.1.15:8032
Application Report :
Application-Id : application_1234567890123_0001
Application-Name : QuasiMonteCarlo
Application-Type : MAPREDUCE
User : hadoop
Queue : prod
Application Priority : 0
Start-Time : 1234567890123
Finish-Time : 0
Progress : 50.0%
State : RUNNING
Final-State : UNDEFINED
Tracking-URL : http://dn1:42657
RPC Port : 42657
AM Host : dn1
Aggregate Resource Allocation : 12345678 MB-seconds, 12345 vcore-seconds
Aggregate Resource Preempted : 0 MB-seconds, 0 vcore-seconds
Log Aggregation Status : NOT_START
Diagnostics :
Unmanaged Application : false
Application Node Label Expression :
AM container Node Label Expression :
Timeout : 0
Remaining Time : -1
Timeout : 0
Remaining Time : -1
4.2.2 杀掉应用程序
yarn application -kill application_1234567890123_0001
2026-04-08 14:40:30,123 INFO client.RMProxy: Connecting to ResourceManager at rm1/192.168.1.15:8032
Killing application application_1234567890123_0001
2026-04-08 14:40:30,456 INFO impl.YarnClientImpl: Killed application application_1234567890123_0001
4.3 YARN常见问题处理
4.3.1 ResourceManager无法启动
# 分析步骤:
# 1. 查看ResourceManager日志
tail -100 /bigdata/logs/hadoop/yarn-hadoop-resourcemanager-rm1.log
# 2. 检查端口占用
netstat -tulnp | grep 8032
tcp 0 0 0.0.0.0:8032 0.0.0.0:* LISTEN 12345/java
# 3. 检查配置文件
cat $HADOOP_HOME/etc/hadoop/yarn-site.xml
# 4. 检查主机名解析
ping rm1
# 解决方案:
# – 检查配置文件是否正确
# – 检查端口是否被占用
# – 检查主机名解析是否正常
4.3.2 NodeManager无法连接ResourceManager
# 分析步骤:
# 1. 查看NodeManager日志
tail -100 /bigdata/logs/hadoop/yarn-hadoop-nodemanager-dn1.log
# 2. 检查网络连接
telnet rm1 8032
# 3. 检查防火墙
systemctl status firewalld
# 4. 检查ResourceManager是否运行
jps
# 解决方案:
# – 检查网络连接和防火墙配置
# – 确保ResourceManager正常运行
# – 检查配置文件中的主机名是否正确
Part05-风哥经验总结与分享
5.1 YARN部署最佳实践
YARN部署最佳实践:
- 高可用架构:生产环境必须配置ResourceManager HA
- 资源隔离:使用队列进行资源隔离
- 资源预留:预留足够的资源给操作系统
- 日志聚合:启用日志聚合功能
- 监控告警:建立完善的监控告警体系
5.2 YARN性能优化建议
YARN性能优化建议:
## ResourceManager优化
– 增加ResourceManager内存
– 增加处理线程数
– 优化调度器配置
## NodeManager优化
– 合理配置节点资源
– 增加处理线程数
– 优化磁盘IO
## 调度器优化
– 使用Capacity Scheduler或Fair Scheduler
– 合理配置队列资源
– 启用资源抢占
## 应用程序优化
– 合理配置应用程序内存
– 优化Map和Reduce任务数量
– 使用Combiner减少数据传输
5.3 YARN部署检查清单
YARN部署检查清单:
- □ 配置文件已正确配置
- □ ResourceManager服务已启动
- □ NodeManager服务已启动
- □ JobHistoryServer服务已启动
- □ YARN Web界面可访问
- □ 节点状态正常
- □ 应用程序可正常提交
- □ 队列配置正确
- □ 监控告警已配置
本文由风哥教程整理发布,仅用于学习测试使用,转载注明出处:http://www.fgedu.net.cn/10327.html
