1. 首页 > Hadoop教程 > 正文

大数据教程FG004-Hadoop YARN部署与资源调度实战

本文档风哥主要介绍Hadoop YARN部署与资源调度实战,包括YARN架构原理、YARN核心组件角色、YARN核心特性、YARN集群规划、YARN配置文件说明、YARN核心参数配置、YARN服务启动、YARN集群验证、YARN调度器配置、YARN队列管理、YARN应用程序管理、YARN常见问题处理等内容,风哥教程参考Hadoop官方文档YARN、Cluster Setup等内容,适合大数据运维人员在学习和测试中使用,如果要应用于生产环境则需要自行确认。

Part01-基础概念与理论知识

1.1 YARN架构原理

YARN(Yet Another Resource Negotiator)是Hadoop的资源管理和作业调度框架,负责集群资源管理和应用程序调度。更多视频教程www.fgedu.net.cn

YARN架构特点:

  • 主从架构:ResourceManager(主节点)和NodeManager(从节点)
  • 资源管理:统一管理集群资源(CPU、内存)
  • 作业调度:支持多种调度器(FIFO、Capacity、Fair)
  • 应用隔离:每个应用程序独立的ApplicationMaster
  • 资源隔离:支持内存和CPU资源隔离

1.2 YARN核心组件角色

YARN核心组件角色:

组件 角色 功能
ResourceManager 主节点 全局资源管理、应用程序调度、集群监控
NodeManager 从节点 节点资源管理、Container管理、心跳上报
ApplicationMaster 应用程序主节点 应用程序资源申请、任务调度、任务监控
Container 资源容器 资源封装、任务运行环境

1.3 YARN核心特性

YARN核心特性:

  • 资源统一管理:统一管理CPU、内存等资源
  • 多租户支持:支持多用户、多应用程序共享集群
  • 资源隔离:支持内存和CPU资源隔离
  • 调度策略:支持多种调度策略(FIFO、Capacity、Fair)
  • 高可用:支持ResourceManager HA
风哥提示:YARN是Hadoop的资源管理核心,理解YARN架构原理对于大数据运维至关重要。学习交流加群风哥微信: itpux-com

Part02-生产环境规划与建议

2.1 YARN集群规划

YARN集群规划需要考虑节点数量、硬件配置、资源分配等因素:

# YARN集群规划(示例)

## ResourceManager节点(2台)
rm1.fgedu.net.cn(192.168.1.15)
– CPU:16核
– 内存:64GB
– 磁盘:2TB RAID1
– 部署:ResourceManager、JobHistoryServer

rm2.fgedu.net.cn(192.168.1.16)
– CPU:16核
– 内存:64GB
– 磁盘:2TB RAID1
– 部署:ResourceManager

## NodeManager节点(6台)
dn1-dn6.fgedu.net.cn(192.168.1.12-17)
– CPU:16核
– 内存:64GB
– 磁盘:12块8TB(JBOD)
– 部署:NodeManager、DataNode

2.2 YARN配置文件说明

YARN核心配置文件:

# YARN配置文件说明

## yarn-site.xml
YARN核心配置文件,包括:
– yarn.resourcemanager.hostname:ResourceManager主机名
– yarn.nodemanager.aux-services:NodeManager辅助服务
– yarn.nodemanager.resource.memory-mb:节点内存资源
– yarn.nodemanager.resource.cpu-vcores:节点CPU资源
– yarn.scheduler.minimum-allocation-mb:最小内存分配
– yarn.scheduler.maximum-allocation-mb:最大内存分配

## mapred-site.xml
MapReduce配置文件,包括:
– mapreduce.framework.name:运行框架
– mapreduce.jobhistory.address:历史服务器地址
– yarn.app.mapreduce.am.resource.mb:ApplicationMaster内存

## yarn-env.sh
YARN环境变量配置,包括:
– YARN_RESOURCEMANAGER_HEAPSIZE:ResourceManager堆内存
– YARN_NODEMANAGER_HEAPSIZE:NodeManager堆内存

2.3 YARN核心参数配置

YARN核心参数配置建议:

# YARN核心参数配置

## yarn-site.xml配置
yarn.resourcemanager.hostname
rm1
yarn.nodemanager.aux-services
mapreduce_shuffle
yarn.nodemanager.aux-services.mapreduce_shuffle.class
org.apache.hadoop.mapred.ShuffleHandler
yarn.nodemanager.resource.memory-mb
49152
yarn.nodemanager.resource.cpu-vcores
16
yarn.scheduler.minimum-allocation-mb
1024
yarn.scheduler.maximum-allocation-mb
16384
yarn.nodemanager.vmem-check-enabled
false
yarn.nodemanager.pmem-check-enabled
false
yarn.log-aggregation-enable
true
yarn.nodemanager.remote-app-log-dir
/yarn/logs

## mapred-site.xml配置
mapreduce.framework.name
yarn
mapreduce.jobhistory.address
rm1:10020
mapreduce.jobhistory.webapp.address
rm1:19888
yarn.app.mapreduce.am.resource.mb
2048
mapreduce.map.memory.mb
2048
mapreduce.reduce.memory.mb
4096

生产环境建议:YARN内存配置需要考虑操作系统预留内存,建议预留20%-30%内存给操作系统。学习交流加群风哥QQ113257174

Part03-生产环境项目实施方案

3.1 YARN服务启动

3.1.1 配置YARN配置文件

# 配置yarn-site.xml
cat > $HADOOP_HOME/etc/hadoop/yarn-site.xml << 'EOF'

yarn.resourcemanager.hostname
rm1
yarn.resourcemanager.address
rm1:8032
yarn.resourcemanager.scheduler.address
rm1:8030
yarn.resourcemanager.webapp.address
rm1:8088
yarn.nodemanager.aux-services
mapreduce_shuffle
yarn.nodemanager.aux-services.mapreduce_shuffle.class
org.apache.hadoop.mapred.ShuffleHandler
yarn.nodemanager.resource.memory-mb
49152
yarn.nodemanager.resource.cpu-vcores
16
yarn.scheduler.minimum-allocation-mb
1024
yarn.scheduler.maximum-allocation-mb
16384
yarn.nodemanager.vmem-check-enabled
false
yarn.nodemanager.pmem-check-enabled
false

EOF

# 配置mapred-site.xml
cat > $HADOOP_HOME/etc/hadoop/mapred-site.xml << 'EOF'

mapreduce.framework.name
yarn
mapreduce.jobhistory.address
rm1:10020
mapreduce.jobhistory.webapp.address
rm1:19888

EOF

3.1.2 启动YARN服务

# 启动ResourceManager
yarn –daemon start resourcemanager

# 查看ResourceManager进程
jps

12345 ResourceManager
12456 Jps

# 查看ResourceManager日志
tail -f /bigdata/logs/hadoop/yarn-hadoop-resourcemanager-rm1.log

2026-04-08 14:00:30,123 INFO resourcemanager.ResourceManager: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting ResourceManager
STARTUP_MSG: host = rm1.fgedu.net.cn/192.168.1.15
STARTUP_MSG: args = []
STARTUP_MSG: version = 3.3.5
************************************************************/
2026-04-08 14:00:30,456 INFO resourcemanager.ResourceManager: registered UNIX signal handlers for [TERM, HUP, INT]
2026-04-08 14:00:30,789 INFO resourcemanager.ResourceManager: createResourceManager []
2026-04-08 14:00:31,012 INFO resourcemanager.ResourceManager: Transitioning to standby state

# 启动NodeManager(在所有NodeManager节点执行)
yarn –daemon start nodemanager

# 查看NodeManager进程
jps

12345 NodeManager
12456 Jps

# 启动JobHistoryServer
mr-jobhistory-daemon.sh start historyserver

# 查看所有进程
jps

12345 ResourceManager
12456 NodeManager
12567 JobHistoryServer
12678 Jps

3.2 YARN集群验证

3.2.1 查看YARN集群状态

# 查看YARN集群状态
yarn node -list

2026-04-08 14:10:30,123 INFO client.RMProxy: Connecting to ResourceManager at rm1/192.168.1.15:8032
Total Nodes:6
Node-Id Node-State Node-Http-Address Number-of-Running-Containers
dn1:42657 RUNNING dn1:8042 0
dn2:42657 RUNNING dn2:8042 0
dn3:42657 RUNNING dn3:8042 0
dn4:42657 RUNNING dn4:8042 0
dn5:42657 RUNNING dn5:8042 0
dn6:42657 RUNNING dn6:8042 0

# 查看YARN应用程序列表
yarn application -list

2026-04-08 14:10:35,123 INFO client.RMProxy: Connecting to ResourceManager at rm1/192.168.1.15:8032
Total number of applications (application-types: [], states: [SUBMITTED, ACCEPTED, RUNNING] and tags: []):0
Application-Id Application-Name Application-Type User Queue State Final-State Progress Tracking-URL

# 查看YARN队列信息
yarn queue -status default

2026-04-08 14:10:40,123 INFO client.RMProxy: Connecting to ResourceManager at rm1/192.168.1.15:8032
Queue Information:
Queue Name: default
State : RUNNING
Capacity : 100.0%
Current Capacity : 0.0%
Maximum Capacity : 100.0%
Default Node Label Expression :
Default Node Label Expression :
Accessible Node Labels : *
Preemption : disabled
Intra-queue Preemption : disabled
Priority : 0
Applications : 0
Allocated Containers : 0
Available Containers : 0
Reserved Containers : 0
Used Resources :
Configured Capacity : 100.0%
Configured Max Capacity : 100.0%
Configured Minimum User Limit Percent : 100 %
Configured User Limit Factor : 1.0
Current Absolute Capacity : 100.0 %
Current Absolute Maximum Capacity : 100.0 %
Used Resources :
Available Resources :
Reserved Resources :
Used Resources :
Num Active Applications : 0
Num Pending Applications : 0
Num Containers : 0
Max Applications : 10000
Max Applications Per User : 10000
Max Application Master Resources :
Used Application Master Resources :
Max Application Master Resources Per User :

# 访问YARN Web界面
# 浏览器访问:http://rm1:8088

3.3 YARN调度器配置

3.3.1 Capacity Scheduler配置

# 配置Capacity Scheduler
cat > $HADOOP_HOME/etc/hadoop/capacity-scheduler.xml << 'EOF'

yarn.scheduler.capacity.root.queues
default,prod,dev,test
yarn.scheduler.capacity.root.default.capacity
10
yarn.scheduler.capacity.root.default.maximum-capacity
20
yarn.scheduler.capacity.root.prod.capacity
50
yarn.scheduler.capacity.root.prod.maximum-capacity
70
yarn.scheduler.capacity.root.dev.capacity
30
yarn.scheduler.capacity.root.dev.maximum-capacity
50
yarn.scheduler.capacity.root.test.capacity
10
yarn.scheduler.capacity.root.test.maximum-capacity
20

EOF

# 刷新队列配置
yarn rmadmin -refreshQueues

2026-04-08 14:20:30,123 INFO client.RMProxy: Connecting to ResourceManager at rm1/192.168.1.15:8032
Refresh queues successful for capacity scheduler

风哥提示:YARN调度器配置需要根据业务需求进行合理规划,生产环境建议使用Capacity Scheduler或Fair Scheduler。更多学习教程公众号风哥教程itpux_com

Part04-生产案例与实战讲解

4.1 YARN队列管理

4.1.1 查看队列信息

# 查看所有队列
yarn queue -list

2026-04-08 14:25:30,123 INFO client.RMProxy: Connecting to ResourceManager at rm1/192.168.1.15:8032
Total Queues : 4
Queue Name: root
Queue Name: root.default
Queue Name: root.prod
Queue Name: root.dev
Queue Name: root.test

# 查看指定队列状态
yarn queue -status prod

2026-04-08 14:25:35,123 INFO client.RMProxy: Connecting to ResourceManager at rm1/192.168.1.15:8032
Queue Information:
Queue Name: prod
State : RUNNING
Capacity : 50.0%
Current Capacity : 0.0%
Maximum Capacity : 70.0%
Default Node Label Expression :
Accessible Node Labels : *
Preemption : disabled
Intra-queue Preemption : disabled
Priority : 0
Applications : 0
Allocated Containers : 0
Available Containers : 0
Reserved Containers : 0
Used Resources :
Configured Capacity : 50.0%
Configured Max Capacity : 70.0%
Configured Minimum User Limit Percent : 100 %
Configured User Limit Factor : 1.0
Current Absolute Capacity : 50.0 %
Current Absolute Maximum Capacity : 70.0 %
Used Resources :
Available Resources :
Reserved Resources :
Used Resources :
Num Active Applications : 0
Num Pending Applications : 0
Num Containers : 0
Max Applications : 5000
Max Applications Per User : 5000
Max Application Master Resources :
Used Application Master Resources :
Max Application Master Resources Per User :

4.1.2 提交应用到指定队列

# 提交MapReduce应用到prod队列
hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.5.jar pi -D mapreduce.job.queuename=prod 10 100

Number of Maps = 10
Samples per Map = 100
Wrote input for Map #0
Wrote input for Map #1
Wrote input for Map #2
Wrote input for Map #3
Wrote input for Map #4
Wrote input for Map #5
Wrote input for Map #6
Wrote input for Map #7
Wrote input for Map #8
Wrote input for Map #9
Starting Job
2026-04-08 14:30:30,123 INFO client.RMProxy: Connecting to ResourceManager at rm1/192.168.1.15:8032
2026-04-08 14:30:30,456 INFO input.FileInputFormat: Total input files to process : 10
2026-04-08 14:30:30,789 INFO mapreduce.JobSubmitter: number of splits:10
2026-04-08 14:30:31,012 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1234567890123_0001
2026-04-08 14:30:31,234 INFO impl.YarnClientImpl: Submitted application application_1234567890123_0001
2026-04-08 14:30:31,456 INFO mapreduce.Job: The url to track the job: http://rm1:8088/proxy/application_1234567890123_0001/
2026-04-08 14:30:31,678 INFO mapreduce.Job: Running job: job_1234567890123_0001
2026-04-08 14:30:40,890 INFO mapreduce.Job: Job job_1234567890123_0001 running in uber mode : false
2026-04-08 14:30:40,012 INFO mapreduce.Job: map 0% reduce 0%
2026-04-08 14:30:50,234 INFO mapreduce.Job: map 10% reduce 0%
2026-04-08 14:31:00,456 INFO mapreduce.Job: map 20% reduce 0%
2026-04-08 14:31:10,678 INFO mapreduce.Job: map 30% reduce 0%
2026-04-08 14:31:20,890 INFO mapreduce.Job: map 40% reduce 0%
2026-04-08 14:31:30,012 INFO mapreduce.Job: map 50% reduce 0%
2026-04-08 14:31:40,234 INFO mapreduce.Job: map 60% reduce 0%
2026-04-08 14:31:50,456 INFO mapreduce.Job: map 70% reduce 0%
2026-04-08 14:32:00,678 INFO mapreduce.Job: map 80% reduce 0%
2026-04-08 14:32:10,890 INFO mapreduce.Job: map 90% reduce 0%
2026-04-08 14:32:20,012 INFO mapreduce.Job: map 100% reduce 0%
2026-04-08 14:32:30,234 INFO mapreduce.Job: map 100% reduce 100%
Job Ended: 1234567890
Estimated value of Pi is 3.14800000000000000000

4.2 YARN应用程序管理

4.2.1 查看应用程序状态

# 查看运行中的应用程序
yarn application -list -appStates RUNNING

2026-04-08 14:35:30,123 INFO client.RMProxy: Connecting to ResourceManager at rm1/192.168.1.15:8032
Total number of applications (application-types: [], states: [RUNNING] and tags: []):1
Application-Id Application-Name Application-Type User Queue State Final-State Progress Tracking-URL
application_1234567890123_0001 QuasiMonteCarlo MAPREDUCE hadoop prod RUNNING UNDEFINED 50.0% http://dn1:42657

# 查看应用程序详情
yarn application -status application_1234567890123_0001

2026-04-08 14:35:35,123 INFO client.RMProxy: Connecting to ResourceManager at rm1/192.168.1.15:8032
Application Report :
Application-Id : application_1234567890123_0001
Application-Name : QuasiMonteCarlo
Application-Type : MAPREDUCE
User : hadoop
Queue : prod
Application Priority : 0
Start-Time : 1234567890123
Finish-Time : 0
Progress : 50.0%
State : RUNNING
Final-State : UNDEFINED
Tracking-URL : http://dn1:42657
RPC Port : 42657
AM Host : dn1
Aggregate Resource Allocation : 12345678 MB-seconds, 12345 vcore-seconds
Aggregate Resource Preempted : 0 MB-seconds, 0 vcore-seconds
Log Aggregation Status : NOT_START
Diagnostics :
Unmanaged Application : false
Application Node Label Expression :
AM container Node Label Expression :
Timeout : 0
Remaining Time : -1
Timeout : 0
Remaining Time : -1

4.2.2 杀掉应用程序

# 杀掉应用程序
yarn application -kill application_1234567890123_0001

2026-04-08 14:40:30,123 INFO client.RMProxy: Connecting to ResourceManager at rm1/192.168.1.15:8032
Killing application application_1234567890123_0001
2026-04-08 14:40:30,456 INFO impl.YarnClientImpl: Killed application application_1234567890123_0001

4.3 YARN常见问题处理

4.3.1 ResourceManager无法启动

# 问题现象:ResourceManager无法启动
# 分析步骤:

# 1. 查看ResourceManager日志
tail -100 /bigdata/logs/hadoop/yarn-hadoop-resourcemanager-rm1.log

# 2. 检查端口占用
netstat -tulnp | grep 8032

tcp 0 0 0.0.0.0:8032 0.0.0.0:* LISTEN 12345/java

# 3. 检查配置文件
cat $HADOOP_HOME/etc/hadoop/yarn-site.xml

# 4. 检查主机名解析
ping rm1

# 解决方案:
# – 检查配置文件是否正确
# – 检查端口是否被占用
# – 检查主机名解析是否正常

4.3.2 NodeManager无法连接ResourceManager

# 问题现象:NodeManager无法连接ResourceManager
# 分析步骤:

# 1. 查看NodeManager日志
tail -100 /bigdata/logs/hadoop/yarn-hadoop-nodemanager-dn1.log

# 2. 检查网络连接
telnet rm1 8032

# 3. 检查防火墙
systemctl status firewalld

# 4. 检查ResourceManager是否运行
jps

# 解决方案:
# – 检查网络连接和防火墙配置
# – 确保ResourceManager正常运行
# – 检查配置文件中的主机名是否正确

Part05-风哥经验总结与分享

5.1 YARN部署最佳实践

YARN部署最佳实践:

  • 高可用架构:生产环境必须配置ResourceManager HA
  • 资源隔离:使用队列进行资源隔离
  • 资源预留:预留足够的资源给操作系统
  • 日志聚合:启用日志聚合功能
  • 监控告警:建立完善的监控告警体系

5.2 YARN性能优化建议

YARN性能优化建议:

# YARN性能优化建议

## ResourceManager优化
– 增加ResourceManager内存
– 增加处理线程数
– 优化调度器配置

## NodeManager优化
– 合理配置节点资源
– 增加处理线程数
– 优化磁盘IO

## 调度器优化
– 使用Capacity Scheduler或Fair Scheduler
– 合理配置队列资源
– 启用资源抢占

## 应用程序优化
– 合理配置应用程序内存
– 优化Map和Reduce任务数量
– 使用Combiner减少数据传输

5.3 YARN部署检查清单

YARN部署检查清单:

YARN部署检查清单:

  • □ 配置文件已正确配置
  • □ ResourceManager服务已启动
  • □ NodeManager服务已启动
  • □ JobHistoryServer服务已启动
  • □ YARN Web界面可访问
  • □ 节点状态正常
  • □ 应用程序可正常提交
  • □ 队列配置正确
  • □ 监控告警已配置
风哥提示:YARN是Hadoop的资源管理核心,合理的资源规划和调度配置对于集群性能至关重要。from bigdata视频:www.itpux.com

本文由风哥教程整理发布,仅用于学习测试使用,转载注明出处:http://www.fgedu.net.cn/10327.html

联系我们

在线咨询:点击这里给我发消息

微信号:itpux-com

工作日:9:30-18:30,节假日休息