目录大纲
Part01-基础概念与理论知识
1.1 YARN资源模型
1.2 队列层次结构
1.3 资源分配策略
Part02-生产环境规划与建议
2.1 队列规划策略
2.2 资源配额规划
2.3 多租户队列规划
Part03-生产环境项目实施方案
3.1 Capacity Scheduler配置
3.2 队列创建与管理
3.3 资源限制配置
3.4 队列权限管理
Part04-生产案例与实战讲解
4.1 多业务线队列配置案例
4.2 资源隔离案例
4.3 队列资源调优案例
Part05-风哥经验总结与分享
5.1 队列管理最佳实践
5.2 资源管理经验总结
Part01-基础概念与理论知识
1.1 YARN资源模型
YARN(Yet Another Resource Negotiator)是Hadoop的资源管理系统。更多视频教程www.fgedu.net.cn YARN管理两种主要资源:内存(Memory)和CPU(vCores),通过Container为单位进行分配。
1.2 队列层次结构
YARN队列采用层次结构组织,支持多级队列。学习交流加群风哥微信: itpux-com 根队列下可以创建多个子队列,形成树状结构。
– root:根队列,包含所有资源
– root.default:默认队列
– root.production:生产环境队列
– root.development:开发测试队列
– 支持多级嵌套
1.3 资源分配策略
YARN支持多种调度器,不同调度器有不同的资源分配策略。from bigdata视频:www.itpux.com
yarn scheduler -printClusterInfo
Cluster Info:
Scheduler: org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler
Cluster Resources: memory: 491520 MB, vCores: 48
Used Resources: memory: 245760 MB, vCores: 24
Available Resources: memory: 245760 MB, vCores: 24
Part02-生产环境规划与建议
2.1 队列规划策略
队列规划需要考虑业务隔离、资源利用和公平性。更多学习教程公众号风哥教程itpux_com
– 按业务线划分:不同业务使用独立队列
– 按环境划分:生产、测试、开发分离
– 按优先级划分:核心任务和普通任务分离
– 预留资源:为系统任务预留资源
2.2 资源配额规划
资源配额规划需要平衡各业务需求。学习交流加群风哥QQ113257174
yarn node -list -showDetails
Total Nodes: 6
Node-Id State Memory-Used Memory-Avail VCores-Used VCores-Avail
fgedu01:8041 RUNNING 16384 MB 65536 MB 8 24
fgedu02:8041 RUNNING 16384 MB 65536 MB 8 24
fgedu03:8041 RUNNING 16384 MB 65536 MB 8 24
fgedu04:8041 RUNNING 16384 MB 65536 MB 8 24
fgedu05:8041 RUNNING 16384 MB 65536 MB 8 24
fgedu06:8041 RUNNING 16384 MB 65536 MB 8 24
Total: 98304 MB used, 393216 MB available
2.3 多租户队列规划
多租户环境下需要实现资源隔离和公平共享。风哥提示:建议使用层次化队列实现多租户管理。
yarn queue -list
Queue Name State Capacity Maximum Capacity Used Capacity
root RUNNING 100% 100% 50%
default RUNNING 10% 100% 5%
production RUNNING 50% 70% 60%
development RUNNING 30% 50% 40%
system RUNNING 10% 20% 10%
Part03-生产环境项目实施方案
3.1 Capacity Scheduler配置
3.1.1 配置capacity-scheduler.xml
cat /bigdata/app/hadoop/etc/hadoop/capacity-scheduler.xml
<!– 根队列配置 –>
<property>
<name>yarn.scheduler.capacity.root.queues</name>
<value>default,production,development,system</value>
</property>
<!– default队列配置 –>
<property>
<name>yarn.scheduler.capacity.root.default.capacity</name>
<value>10</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.default.maximum-capacity</name>
<value>100</value>
</property>
<!– production队列配置 –>
<property>
<name>yarn.scheduler.capacity.root.production.capacity</name>
<value>50</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.production.maximum-capacity</name>
<value>70</value>
</property>
<!– development队列配置 –>
<property>
<name>yarn.scheduler.capacity.root.development.capacity</name>
<value>30</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.development.maximum-capacity</name>
<value>50</value>
</property>
<!– system队列配置 –>
<property>
<name>yarn.scheduler.capacity.root.system.capacity</name>
<value>10</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.system.maximum-capacity</name>
<value>20</value>
</property>
</configuration>
3.1.2 刷新队列配置
yarn rmadmin -refreshQueues
# 验证配置生效
yarn queue -list
24/01/17 15:00:00 INFO client.RMProxy: Connecting to ResourceManager
Refresh queues successfully
# 验证队列
Queue Name State Capacity Maximum Capacity
root RUNNING 100% 100%
default RUNNING 10% 100%
production RUNNING 50% 70%
development RUNNING 30% 50%
system RUNNING 10% 20%
3.2 队列创建与管理
3.2.1 创建子队列
# 编辑capacity-scheduler.xml
vi /bigdata/app/hadoop/etc/hadoop/capacity-scheduler.xml
# 添加子队列配置
<property>
<name>yarn.scheduler.capacity.root.production.queues</name>
<value>etl,realtime,adhoc</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.production.etl.capacity</name>
<value>40</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.production.realtime.capacity</name>
<value>40</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.production.adhoc.capacity</name>
<value>20</value>
</property>
3.2.2 队列状态管理
yarn rmadmin -stopQueues root.development
# 启动队列
yarn rmadmin -startQueues root.development
# 查看队列状态
yarn queue -status root.production
Queue Information:
Queue Name: root.production
State: RUNNING
Capacity: 50%
Maximum Capacity: 70%
Current Capacity: 60%
Used Resources: memory: 147456 MB, vCores: 14
Available Resources: memory: 98304 MB, vCores: 10
Number of Applications: 15
Number of Containers: 45
3.3 资源限制配置
3.3.1 配置用户资源限制
cat /bigdata/app/hadoop/etc/hadoop/capacity-scheduler.xml | grep -A3 “user-limit”
<property>
<name>yarn.scheduler.capacity.root.production.user-limit-factor</name>
<value>1.5</value>
</property>
<property>
<name>yarn.scheduler.capacity.root.production.minimum-user-limit-percent</name>
<value>25</value>
</property>
<!– 最大应用数限制 –>
<property>
<name>yarn.scheduler.capacity.root.production.maximum-applications</name>
<value>1000</value>
</property>
<!– 最大AM资源比例 –>
<property>
<name>yarn.scheduler.capacity.root.production.maximum-am-resource-percent</name>
<value>0.2</value>
</property>
3.3.2 配置应用资源限制
cat /bigdata/app/hadoop/etc/hadoop/yarn-site.xml | grep -A3 “resource”
<property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>1024</value>
</property>
<!– Container最大内存 –>
<property>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>65536</value>
</property>
<!– Container最小vCores –>
<property>
<name>yarn.scheduler.minimum-allocation-vcores</name>
<value>1</value>
</property>
<!– Container最大vCores –>
<property>
<name>yarn.scheduler.maximum-allocation-vcores</name>
<value>32</value>
</property>
3.4 队列权限管理
3.4.1 配置队列ACL
cat /bigdata/app/hadoop/etc/hadoop/capacity-scheduler.xml | grep -A3 “acl”
<property>
<name>yarn.scheduler.capacity.root.production.acl_submit_applications</name>
<value>fgedu,dept1,dept2</value>
</property>
<!– 队列管理权限 –>
<property>
<name>yarn.scheduler.capacity.root.production.acl_administer_queue</name>
<value>hdfs,yarn</value>
</property>
<!– 开发队列权限 –>
<property>
<name>yarn.scheduler.capacity.root.development.acl_submit_applications</name>
<value>dev_user1,dev_user2</value>
</property>
3.4.2 提交应用到指定队列
spark-submit –master yarn –deploy-mode cluster \
–queue root.production.etl \
–class com.fgedu.etl.DataProcessor \
/bigdata/app/spark/etl-job.jar
# 查看应用运行队列
yarn application -list | grep “etl-job”
24/01/17 15:30:00 INFO client.RMProxy: Connecting to ResourceManager
24/01/17 15:30:05 INFO yarn.Client: Application report for application_1705473600000_0001
# 应用列表
Application-Id Application-Name Application-Type User Queue State
app_1705473600001 etl-job SPARK fgedu root.production.etl RUNNING
Part04-生产案例与实战讲解
4.1 多业务线队列配置案例
多业务线队列配置实现资源隔离。更多视频教程www.fgedu.net.cn
# queue_setup.sh
# from:www.itpux.com.qq113257174.wx:itpux-com
# web: http://www.fgedu.net.cn
# 多业务线队列配置
CONFIG_FILE=”/bigdata/app/hadoop/etc/hadoop/capacity-scheduler.xml”
# 备份原配置
cp ${CONFIG_FILE} ${CONFIG_FILE}.bak
# 定义队列配置
cat > ${CONFIG_FILE} << 'EOF'
<configuration>
<property>
<name>yarn.scheduler.capacity.root.queues</name>
<value>default,production,development,system</value>
</property>
<!– production子队列 –>
<property>
<name>yarn.scheduler.capacity.root.production.queues</name>
<value>etl,realtime,adhoc</value>
</property>
<!– etl队列:40%资源 –>
<property>
<name>yarn.scheduler.capacity.root.production.etl.capacity</name>
<value>40</value>
</property>
<!– realtime队列:40%资源 –>
<property>
<name>yarn.scheduler.capacity.root.production.realtime.capacity</name>
<value>40</value>
</property>
<!– adhoc队列:20%资源 –>
<property>
<name>yarn.scheduler.capacity.root.production.adhoc.capacity</name>
<value>20</value>
</property>
</configuration>
EOF
# 刷新队列配置
yarn rmadmin -refreshQueues
echo “Queue configuration completed”
./queue_setup.sh
Queue configuration completed
# 验证队列结构
yarn queue -list
Queue Name State Capacity Maximum Capacity
root RUNNING 100% 100%
default RUNNING 10% 100%
production RUNNING 50% 70%
production.etl RUNNING 40% 100%
production.realtime RUNNING 40% 100%
production.adhoc RUNNING 20% 100%
development RUNNING 30% 50%
system RUNNING 10% 20%
4.2 资源隔离案例
资源隔离保证不同业务互不影响。学习交流加群风哥微信: itpux-com
# 1. 配置队列资源限制
# 2. 配置用户资源限制
# 3. 配置应用资源限制
# 查看队列资源使用
yarn queue -status root.production.etl
Queue Information:
Queue Name: root.production.etl
State: RUNNING
Capacity: 20% (of root.production)
Maximum Capacity: 100%
Current Capacity: 85%
Used Resources: memory: 81920 MB, vCores: 8
Available Resources: memory: 16384 MB, vCores: 2
Number of Applications: 5
Number of Containers: 15
# 资源隔离效果
# etl队列最多使用20%的生产资源
# 不会影响其他队列
4.3 队列资源调优案例
4.3.1 资源利用率优化
yarn cluster –loglevel WARN -list
# 分析队列资源使用
yarn queue -status root.production | grep “Capacity”
Total Memory: 491520 MB
Used Memory: 245760 MB (50%)
Available Memory: 245760 MB
# 队列容量
Capacity: 50%
Current Capacity: 60%
# 队列资源利用率偏高,需要调整
4.3.2 队列容量调整
# 修改配置后刷新
vi /bigdata/app/hadoop/etc/hadoop/capacity-scheduler.xml
# 将production队列容量从50%调整到60%
yarn rmadmin -refreshQueues
# 验证调整结果
yarn queue -status root.production
Queue Information:
Queue Name: root.production
State: RUNNING
Capacity: 60%
Maximum Capacity: 80%
Current Capacity: 50%
Used Resources: memory: 147456 MB, vCores: 14
Available Resources: memory: 147456 MB, vCores: 14
Part05-风哥经验总结与分享
5.1 队列管理最佳实践
在实际生产环境中,队列管理需要注意以下几点:from bigdata视频:www.itpux.com
1. 按业务线划分队列,实现资源隔离
2. 合理设置队列容量和最大容量
3. 配置用户资源限制,防止单用户占用过多资源
4. 定期监控队列资源使用情况
5. 预留系统队列资源
5.2 资源管理经验总结
5.2.1 资源管理建议
– 避免队列容量设置过小,导致资源浪费
– 避免队列容量设置过大,影响其他队列
– 定期审查队列配置,根据业务变化调整
– 监控资源利用率,及时发现瓶颈
– 配置资源预留,保证核心任务运行
5.2.2 队列监控脚本
# queue_monitor.sh
# from:www.itpux.com.qq113257174.wx:itpux-com
# web: http://www.fgedu.net.cn
ALERT_THRESHOLD=80
echo “=== YARN Queue Resource Monitor ===”
echo “Date: $(date)”
# 获取所有队列
QUEUES=$(yarn queue -list | grep “RUNNING” | awk ‘{print $2}’)
for queue in ${QUEUES}; do
CAPACITY=$(yarn queue -status ${queue} | grep “Current Capacity” | awk ‘{print $3}’ | tr -d ‘%’)
echo “Queue: ${queue}, Usage: ${CAPACITY}%”
if [ ${CAPACITY} -gt ${ALERT_THRESHOLD} ]; then
echo “WARNING: Queue ${queue} usage is ${CAPACITY}%, exceeds threshold ${ALERT_THRESHOLD}%”
fi
done
=== YARN Queue Resource Monitor ===
Date: Wed Jan 17 16:00:00 CST 2024
Queue: root, Usage: 50%
Queue: root.default, Usage: 5%
Queue: root.production, Usage: 60%
Queue: root.production.etl, Usage: 85%
WARNING: Queue root.production.etl usage is 85%, exceeds threshold 80%
Queue: root.production.realtime, Usage: 40%
Queue: root.development, Usage: 30%
Queue: root.system, Usage: 10%
本文由风哥教程整理发布,仅用于学习测试使用,转载注明出处:http://www.fgedu.net.cn/10327.html
