1. 首页 > 软件安装教程 > 正文

Oozie安装配置-Oozie工作流安装配置_升级迁移详细过程

1. Oozie概述与环境规划

Oozie是Apache基金会的开源项目,是一个用于管理Hadoop生态系统中工作流和协调作业执行的系统。Oozie可以协调MapReduce、Pig、Hive、Sqoop等作业的执行,支持复杂的工作流定义和调度。更多学习教程www.fgedu.net.cn

1.1 Oozie版本说明

Oozie目前主要版本为5.x系列,本教程以Oozie 5.2.1为例进行详细讲解。Oozie 5.x版本相比之前版本在性能、稳定性和功能方面都有显著提升,支持更多的作业类型和更灵活的工作流定义。

# 查看Oozie版本
$ oozie version
Oozie client build version: 5.2.1

# 查看Java版本
$ java -version
openjdk version “1.8.0_402”
OpenJDK Runtime Environment (build 1.8.0_402-b06)
OpenJDK 64-Bit Server VM (build 25.402-b06, mixed mode)

# 查看Hadoop版本
$ hadoop version
Hadoop 3.3.6
Source code repository https://github.com/apache/hadoop.git -r 1234567890abcdef
Compiled by root on 2024-01-15T10:00Z

# 查看Hive版本
$ hive –version
Hive 3.1.3

1.2 环境规划

本次安装环境规划如下:

Oozie服务器:
oozie01.fgedu.net.cn (192.168.1.51) – 主节点
oozie02.fgedu.net.cn (192.168.1.52) – 备用节点

Oozie版本:5.2.1
Hadoop版本:3.3.6
Hive版本:3.1.3
Java版本:OpenJDK 1.8.0
安装目录:/data/oozie
配置目录:/data/oozie/conf
Web控制台端口:11000

数据库:
MySQL:192.168.1.51:3306/oozie

存储:
HDFS:hdfs://192.168.1.51:8020
Oozie共享库:hdfs://192.168.1.51:8020/user/oozie/share/lib

2. 硬件环境要求

Oozie作为工作流调度系统,对硬件资源要求相对较低,但需要考虑调度作业的数量和复杂度。学习交流加群风哥微信: itpux-com

2.1 物理主机环境要求

# 检查内存大小
# free -h
total used free shared buff/cache available
Mem: 16G 4.2G 10G 256M 1.8G 11G
Swap: 8G 0B 8G

# 检查磁盘空间
# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda1 50G 12G 39G 24% /
/dev/sdb1 500G 50G 451G 10% /data
/dev/sdc1 200G 20G 181G 10% /backup

# 检查CPU核心数
# nproc
8

# 检查系统架构
# uname -m
x86_64

生产环境建议:最小内存8GB(测试环境),生产环境建议16GB以上。磁盘空间根据工作流数量和日志大小规划,建议至少200GB。CPU核心数建议8核以上,以支持并发作业调度。

2.2 vSphere虚拟主机环境要求

虚拟机配置:
– vCPU:8核
– 内存:16GB
– 磁盘:系统盘50GB + 数据盘500GB
– 网络:VMXNET3网卡,千兆网络
– 存储:建议使用SSD存储以提高I/O性能

资源池配置:
– CPU预留:4GHz
– 内存预留:8GB
– 内存限制:16GB
– CPU份额:正常
– 内存份额:正常

2.3 云平台主机环境要求

云主机规格(阿里云/腾讯云/华为云):
– 实例规格:ecs.g6.2xlarge或同等规格
– vCPU:8核
– 内存:32GB
– 系统盘:高效云盘 100GB
– 数据盘:SSD云盘 500GB
– 网络带宽:10Mbps以上

存储配置:
– OSS对象存储:用于存储工作流定义和配置
– NAS文件存储:用于共享配置文件
– 云盘快照:定期备份配置和数据

3. 操作系统环境准备

在安装Oozie之前,需要对操作系统进行必要的配置和优化。

3.1 操作系统版本检查

# 检查操作系统版本
# cat /etc/os-release
NAME=”Oracle Linux Server”
VERSION=”8.9″
ID=”ol”
PRETTY_NAME=”Oracle Linux Server 8.9″

# 检查内核版本
# uname -r
5.4.17-2136.302.7.2.el8uek.x86_64

# 检查SELinux状态
# getenforce
Disabled

# 检查防火墙状态
# systemctl status firewalld
● firewalld.service – firewalld – dynamic firewall daemon
Loaded: loaded (/usr/lib/systemd/system/firewalld.service; disabled; vendor preset: enabled)
Active: inactive (dead)

3.2 内核参数优化

# 编辑sysctl.conf文件
# vi /etc/sysctl.conf

# 添加以下内核参数
fs.file-max = 6815744
kernel.sem = 250 32000 100 128
kernel.shmmni = 4096
kernel.shmall = 4294967296
kernel.shmmax = 68719476736
net.ipv4.ip_local_port_range = 9000 65500
net.core.rmem_default = 262144
net.core.rmem_max = 4194304
net.core.wmem_default = 262144
net.core.wmem_max = 1048576
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_fin_timeout = 30
net.ipv4.tcp_keepalive_time = 1200
net.ipv4.tcp_max_syn_backlog = 8192
net.core.somaxconn = 1024
vm.swappiness = 10
vm.dirty_ratio = 15
vm.dirty_background_ratio = 5

# 使内核参数生效
# sysctl -p

# 验证参数设置
# sysctl -a | grep fs.file-max
fs.file-max = 6815744

3.3 用户资源限制配置

# 配置用户资源限制
# vi /etc/security/limits.conf

# 添加以下内容
* soft nproc 65535
* hard nproc 65535
* soft nofile 65535
* hard nofile 65535
* soft stack 10240
* hard stack 32768
oozie soft memlock unlimited
oozie hard memlock unlimited

# 验证配置
# ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 63499
max locked memory (kbytes, -l) unlimited
max memory size (kbytes, -m) unlimited
open files (-n) 65535
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 10240
cpu time (seconds, -t) unlimited
max user processes (-u) 65535
virtual memory (kbytes, -v) unlimited

3.4 Java环境安装

# 安装OpenJDK 1.8
# yum install -y java-1.8.0-openjdk java-1.8.0-openjdk-devel

# 配置Java环境变量
# vi /etc/profile.d/java.sh

export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk
export PATH=$JAVA_HOME/bin:$PATH
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar

# 使环境变量生效
# source /etc/profile.d/java.sh

# 验证Java安装
# java -version
openjdk version “1.8.0_402”
OpenJDK Runtime Environment (build 1.8.0_402-b06)
OpenJDK 64-Bit Server VM (build 25.402-b06, mixed mode)

# 验证JAVA_HOME
# echo $JAVA_HOME
/usr/lib/jvm/java-1.8.0-openjdk

3.5 Hadoop环境安装

Oozie依赖Hadoop环境,需要提前安装配置好Hadoop。学习交流加群风哥QQ113257174

# 下载Hadoop
# cd /tmp
# wget https://archive.apache.org/dist/hadoop/common/hadoop-3.3.6/hadoop-3.3.6.tar.gz

# 解压安装
# tar -xzf hadoop-3.3.6.tar.gz
# mv hadoop-3.3.6 /data/hadoop

# 配置Hadoop环境变量
# vi /etc/profile.d/hadoop.sh

export HADOOP_HOME=/data/hadoop
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH

# 使环境变量生效
# source /etc/profile.d/hadoop.sh

# 验证Hadoop安装
# hadoop version
Hadoop 3.3.6
Source code repository https://github.com/apache/hadoop.git -r abc123
Compiled by root on 2024-01-15T10:00Z

# 验证HDFS可用性
# hdfs dfs -ls /
Found 3 items
drwxr-xr-x – root supergroup 0 2024-04-05 10:00 /tmp
drwxr-xr-x – root supergroup 0 2024-04-05 10:00 /user
drwxr-xr-x – root supergroup 0 2024-04-05 10:00 /data

3.6 数据库准备

Oozie需要使用数据库存储工作流定义和执行状态,这里使用MySQL作为后端数据库。

# 安装MySQL
# yum install -y mysql-server mysql-client

# 启动MySQL服务
# systemctl start mysqld
# systemctl enable mysqld

# 配置MySQL
# mysql -u root -p

# 创建Oozie数据库和用户
CREATE DATABASE oozie CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
CREATE USER ‘oozie’@’localhost’ IDENTIFIED BY ‘oozie123’;
CREATE USER ‘oozie’@’%’ IDENTIFIED BY ‘oozie123’;
GRANT ALL PRIVILEGES ON oozie.* TO ‘oozie’@’localhost’;
GRANT ALL PRIVILEGES ON oozie.* TO ‘oozie’@’%’;
FLUSH PRIVILEGES;
EXIT;

# 验证数据库连接
# mysql -u oozie -poozie123 -e “SELECT VERSION();”
+———–+
| VERSION() |
+———–+
| 8.0.33 |
+———–+

4. Oozie安装配置

完成环境准备后,开始安装Oozie。

4.1 下载Oozie安装包

# 创建安装目录
# mkdir -p /data/oozie
# mkdir -p /data/oozie/lib

# 下载Oozie
# cd /tmp
# wget https://archive.apache.org/dist/oozie/5.2.1/oozie-5.2.1.tar.gz

# 解压安装
# tar -xzf oozie-5.2.1.tar.gz
# mv oozie-5.2.1/* /data/oozie/

# 查看安装目录
# ls -la /data/oozie/
total 32
drwxr-xr-x 2 root root 4096 Apr 5 10:00 bin
drwxr-xr-x 2 root root 4096 Apr 5 10:00 conf
drwxr-xr-x 5 root root 4096 Apr 5 10:00 docs
drwxr-xr-x 2 root root 4096 Apr 5 10:00 lib
drwxr-xr-x 2 root root 4096 Apr 5 10:00 oozie-sharelib-5.2.1
drwxr-xr-x 2 root root 4096 Apr 5 10:00 webapps
-rw-r–r– 1 root root 8196 Apr 5 10:00 README.txt

4.2 配置Oozie环境变量

# 配置Oozie环境变量
# vi /etc/profile.d/oozie.sh

export OOZIE_HOME=/data/oozie
export OOZIE_CONFIG=$OOZIE_HOME/conf
export PATH=$OOZIE_HOME/bin:$PATH

# 使环境变量生效
# source /etc/profile.d/oozie.sh

# 验证环境变量
# echo $OOZIE_HOME
/data/oozie

4.3 配置oozie-site.xml

# 编辑oozie-site.xml
# vi /data/oozie/conf/oozie-site.xml

<?xml version=”1.0″?>
<configuration>
<property>
<name>oozie.service.JPAService.jdbc.driver</name>
<value>com.mysql.cj.jdbc.Driver</value>
<description>MySQL JDBC驱动</description>
</property>
<property>
<name>oozie.service.JPAService.jdbc.url</name>
<value>jdbc:mysql://192.168.1.51:3306/oozie?serverTimezone=UTC&characterEncoding=utf8&useSSL=false</value>
<description>MySQL数据库连接URL</description>
</property>
<property>
<name>oozie.service.JPAService.jdbc.username</name>
<value>oozie</value>
<description>MySQL数据库用户名</description>
</property>
<property>
<name>oozie.service.JPAService.jdbc.password</name>
<value>oozie123</value>
<description>MySQL数据库密码</description>
</property>
<property>
<name>oozie.service.HadoopAccessorService.hadoop.configurations</name>
<value>*=/data/hadoop/etc/hadoop</value>
<description>Hadoop配置目录</description>
</property>
<property>
<name>oozie.http.hostname</name>
<value>oozie01.fgedu.net.cn</value>
<description>Oozie服务器主机名</description>
</property>
<property>
<name>oozie.http.port</name>
<value>11000</value>
<description>Oozie Web控制台端口</description>
</property>
<property>
<name>oozie.service.WorkflowAppService.system.libpath</name>
<value>/user/oozie/share/lib</value>
<description>共享库路径</description>
</property>
<property>
<name>oozie.service.SchemaService.wf.schema.url</name>
<value>hdfs://192.168.1.51:8020/user/oozie/workflows-schema-0.4.xsd</value>
<description>工作流schema URL</description>
</property>
<property>
<name>oozie.service.SchemaService.coord.schema.url</name>
<value>hdfs://192.168.1.51:8020/user/oozie/coordinator-schema-0.4.xsd</value>
<description>协调器schema URL</description>
</property>
<property>
<name>oozie.service.SchemaService.bundle.schema.url</name>
<value>hdfs://192.168.1.51:8020/user/oozie/bundle-schema-0.1.xsd</value>
<description>bundle schema URL</description>
</property>
<property>
<name>oozie.service.PurgeService.purge.limit</name>
<value>1000</value>
<description>每次清理的工作流数量</description>
</property>
<property>
<name>oozie.service.PurgeService.purge.interval</name>
<value>1440</value>
<description>清理间隔(分钟)</description>
</property>
<property>
<name>oozie.service.PurgeService.delete.interval</name>
<value>10080</value>
<description>删除间隔(分钟)</description>
</property>
</configuration>

4.4 安装MySQL JDBC驱动

# 下载MySQL JDBC驱动
# cd /tmp
# wget https://repo1.maven.org/maven2/mysql/mysql-connector-java/8.0.28/mysql-connector-java-8.0.28.jar

# 复制到Oozie lib目录
# cp mysql-connector-java-8.0.28.jar /data/oozie/lib/

# 验证驱动安装
# ls -la /data/oozie/lib/ | grep mysql
-rw-r–r– 1 root root 2580752 Apr 5 10:00 mysql-connector-java-8.0.28.jar

4.5 配置Hadoop支持Oozie

# 编辑core-site.xml
# vi /data/hadoop/etc/hadoop/core-site.xml

<property>
<name>hadoop.proxyuser.oozie.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.oozie.groups</name>
<value>*</value>
</property>

# 编辑hdfs-site.xml
# vi /data/hadoop/etc/hadoop/hdfs-site.xml

<property>
<name>dfs.permissions.enabled</name>
<value>false</value>
</property>

# 重启Hadoop服务
# stop-all.sh
# start-all.sh

# 验证Hadoop服务状态
# jps
12345 NameNode
12346 DataNode
12347 ResourceManager
12348 NodeManager
12349 SecondaryNameNode

4.6 初始化Oozie数据库

# 复制数据库脚本
# cp /data/oozie/oozie-sharelib-5.2.1/tools/sql/dbc/mysql.sql /tmp/

# 初始化数据库
# mysql -u oozie -poozie123 oozie < /tmp/mysql.sql # 验证数据库表 # mysql -u oozie -poozie123 -e "USE oozie; SHOW TABLES;" | wc -l 42 # 运行Oozie初始化 # cd /data/oozie # bin/oozie-setup.sh db create -run -sqlfile oozie-sharelib-5.2.1/tools/sql/dbc/mysql.sql # 输出案例如下: Validate DB Connection DONE Check DB schema does not exist DONE Check if DB schema is up-to-date DONE DB schema is up-to-date DONE

4.7 部署Oozie Web控制台

# 解压Web控制台
# cd /data/oozie
# unzip -q webapps/oozie.war -d webapps/oozie

# 下载ExtJS库(Oozie 5.2.1需要ExtJS 2.2)
# cd /tmp
# wget https://archive.apache.org/dist/oozie/oozie-4.3.1/oozie-4.3.1-distro.tar.gz

# 提取ExtJS
# tar -xzf oozie-4.3.1-distro.tar.gz
# cp oozie-4.3.1-distro/oozie-4.3.1/libext/ext-2.2.zip /data/oozie/libext/

# 验证ExtJS
# ls -la /data/oozie/libext/ | grep ext
-rw-r–r– 1 root root 1434567 Apr 5 10:00 ext-2.2.zip

# 准备Oozie WAR包
# cd /data/oozie
# bin/oozie-setup.sh prepare-war

# 输出案例如下:
INFO: Oozie WAR file with added libraries at /data/oozie/oozie-server/webapps/oozie.war
INFO: Oozie WAR file prepared successfully

4.8 部署共享库

# 上传共享库到HDFS
# cd /data/oozie
# bin/oozie-setup.sh sharelib create -fs hdfs://192.168.1.51:8020

# 输出案例如下:
INFO: Uploading sharelib to /user/oozie/share/lib/lib_20240405100000
INFO: Sharelib upload complete

# 验证共享库
# hdfs dfs -ls /user/oozie/share/lib/
Found 1 items
drwxr-xr-x – root supergroup 0 2024-04-05 10:00 /user/oozie/share/lib/lib_20240405100000

# 查看共享库内容
# hdfs dfs -ls /user/oozie/share/lib/lib_20240405100000/
Found 12 items
drwxr-xr-x – root supergroup 0 2024-04-05 10:00 /user/oozie/share/lib/lib_20240405100000/cdh5
drwxr-xr-x – root supergroup 0 2024-04-05 10:00 /user/oozie/share/lib/lib_20240405100000/distcp
drwxr-xr-x – root supergroup 0 2024-04-05 10:00 /user/oozie/share/lib/lib_20240405100000/hcatalog
drwxr-xr-x – root supergroup 0 2024-04-05 10:00 /user/oozie/share/lib/lib_20240405100000/hive
drwxr-xr-x – root supergroup 0 2024-04-05 10:00 /user/oozie/share/lib/lib_20240405100000/java
drwxr-xr-x – root supergroup 0 2024-04-05 10:00
/user/oozie/share/lib/lib_20240405100000/mapreduce-streaming
drwxr-xr-x – root supergroup 0 2024-04-05 10:00 /user/oozie/share/lib/lib_20240405100000/oozie

4.9 启动Oozie服务

# 启动Oozie服务
# cd /data/oozie
# bin/oozied.sh start

# 输出案例如下:
[main] INFO org.apache.oozie.service.ServiceLoader – Excluding jar: libext/slf4j-log4j12-1.7.25.jar
from classpath
[main] INFO org.apache.oozie.service.ServiceLoader – Excluding jar: libext/slf4j-api-1.7.25.jar from
classpath
[main] INFO org.apache.oozie.server.XLogFilter – XLogFilter initialized with [\n\r\t\f|] log pattern
[main] INFO org.apache.oozie.servlet.CallbackServlet – CallbackServlet initialized with port 11000
and timeout 120 seconds
[main] INFO org.apache.oozie.service.JPAService – Database schema validated successfully
[main] INFO org.apache.oozie.service.JPAService – JPAService initialized
[main] INFO org.apache.oozie.service.Services – Services initialized in 14.234 seconds
[main] INFO org.apache.oozie.server.OozieServer – Starting Oozie server 5.2.1 on
http://oozie01.fgedu.net.cn:11000

# 检查Oozie服务状态
# bin/oozie admin -oozie http://localhost:11000/oozie -status
System mode: NORMAL

# 查看Oozie日志
# tail -n 100 /data/oozie/logs/oozie-server.log
24/04/05 10:00:00 INFO org.apache.oozie.server.OozieServer: STARTUP_MSG:
24/04/05 10:00:00 INFO org.apache.oozie.server.OozieServer: Apache Oozie Server
24/04/05 10:00:00 INFO org.apache.oozie.server.OozieServer: Version: 5.2.1
24/04/05 10:00:00 INFO org.apache.oozie.server.OozieServer: Oozie server started

风哥提示:Oozie服务启动后,需要等待几分钟让所有服务完全初始化,然后才能正常使用Web控制台。

5. Oozie配置优化

为了提高Oozie的性能和稳定性,需要进行一些配置优化。

5.1 内存配置优化

# 编辑oozie-env.sh
# vi /data/oozie/conf/oozie-env.sh

# 设置JVM内存参数
export OOZIE_JVM_OPTS=”-Xmx4096m -Xms2048m -XX:MaxPermSize=512m -XX:+UseConcMarkSweepGC
-XX:+CMSClassUnloadingEnabled -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/data/oozie/logs”

# 设置GC日志
export OOZIE_LOG4J_OPTS=”-Dlog4j.configuration=file:$OOZIE_HOME/conf/log4j.properties
-Doozie.log.dir=$OOZIE_HOME/logs -Dorg.apache.oozie.log.dir=$OOZIE_HOME/logs”

# 验证配置
# cat /data/oozie/conf/oozie-env.sh | grep OOZIE_JVM_OPTS
export OOZIE_JVM_OPTS=”-Xmx4096m -Xms2048m -XX:MaxPermSize=512m -XX:+UseConcMarkSweepGC
-XX:+CMSClassUnloadingEnabled -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/data/oozie/logs”

5.2 线程池配置优化

# 编辑oozie-site.xml
# vi /data/oozie/conf/oozie-site.xml

<property>
<name>oozie.service.CallableQueueService.queue.size</name>
<value>1000</value>
<description>可调用队列大小</description>
</property>
<property>
<name>oozie.service.CallableQueueService.threads</name>
<value>100</value>
<description>可调用线程数</description>
</property>
<property>
<name>oozie.service.LiteWorkflowStoreService.bulk.size</name>
<value>100</value>
<description>批量操作大小</description>
</property>
<property>
<name>oozie.service.PurgeService.threads</name>
<value>10</value>
<description>清理线程数</description>
</property>

5.3 数据库连接池优化

# 编辑oozie-site.xml
# vi /data/oozie/conf/oozie-site.xml

<property>
<name>oozie.service.JPAService.validation.query</name>
<value>select 1</value>
<description>数据库连接验证查询</description>
</property>
<property>
<name>oozie.service.JPAService.jdbc.max.active.conn</name>
<value>100</value>
<description>最大活跃连接数</description>
</property>
<property>
<name>oozie.service.JPAService.jdbc.max.idle.conn</name>
<value>10</value>
<description>最大空闲连接数</description>
</property>
<property>
<name>oozie.service.JPAService.jdbc.min.idle.conn</name>
<value>5</value>
<description>最小空闲连接数</description>
</property>
<property>
<name>oozie.service.JPAService.jdbc.validation.interval</name>
<value>30000</value>
<description>连接验证间隔(毫秒)</description>
</property>

5.4 日志配置优化

# 编辑log4j.properties
# vi /data/oozie/conf/log4j.properties

# 设置日志级别
log4j.rootLogger=INFO, file, console

# 文件输出配置
log4j.appender.file=org.apache.log4j.RollingFileAppender
log4j.appender.file.File=/data/oozie/logs/oozie.log
log4j.appender.file.MaxFileSize=100MB
log4j.appender.file.MaxBackupIndex=10
log4j.appender.file.layout=org.apache.log4j.PatternLayout
log4j.appender.file.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss} %-5p %c{1}:%L – %m%n

# 控制台输出配置
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss} %-5p %c{1}:%L – %m%n

# 业务日志配置
log4j.logger.org.apache.oozie=INFO
log4j.logger.org.apache.oozie.action.hadoop=INFO
log4j.logger.org.apache.oozie.service=INFO

# 创建日志目录
# mkdir -p /data/oozie/logs
# chmod 775 /data/oozie/logs

6. Oozie Web控制台配置

Oozie提供了Web控制台用于管理和监控工作流,下面详细介绍配置和使用方法。更多学习教程公众号风哥教程itpux_com

6.1 访问Web控制台

# 启动Oozie服务
# cd /data/oozie
# bin/oozied.sh start

# 访问Web控制台
# 打开浏览器,访问 http://oozie01.fgedu.net.cn:11000/oozie

# 验证Web控制台访问
# curl -I http://localhost:11000/oozie
HTTP/1.1 200 OK
Date: Fri, 05 Apr 2024 10:00:00 GMT
Content-Type: text/html;charset=UTF-8
Content-Length: 12345
Server: Apache-Coyote/1.1

6.2 Web控制台功能

Web控制台主要功能:
1. 工作流管理:查看、启动、暂停、杀死工作流
2. 协调器管理:管理定时调度的工作流
3. Bundle管理:管理一组相关的协调器
4. 作业监控:查看作业执行状态和日志
5. 系统配置:查看和修改系统配置
6. 共享库管理:管理共享库版本

Web控制台访问权限:
– 默认情况下,所有用户都可以访问
– 可以通过配置Oozie的权限系统进行访问控制

6.3 配置Web控制台安全

# 编辑oozie-site.xml
# vi /data/oozie/conf/oozie-site.xml

<property>
<name>oozie.service.AuthorizationService.security.enabled</name>
<value>true</value>
<description>启用安全授权</description>
</property>
<property>
<name>oozie.service.AuthorizationService.role.configs</name>
<value>*</value>
<description>角色配置</description>
</property>
<property>
<name>oozie.service.AuthorizationService.admin.groups</name>
<value>oozie,admin</value>
<description>管理员组</description>
</property>
<property>
<name>oozie.service.AuthorizationService.user.groups.mapping</name>
<value>org.apache.oozie.service.UserGroupInformationService</value>
<description>用户组映射服务</description>
</property>

7. Oozie工作流实战

本节介绍Oozie工作流的创建和执行。

7.1 创建工作流定义

# 创建工作流目录
# mkdir -p /data/oozie/workflows/wordcount

# 创建workflow.xml
# vi /data/oozie/workflows/wordcount/workflow.xml

<?xml version=”1.0″ encoding=”UTF-8″?>
<workflow-app xmlns=”uri:oozie:workflow:0.5″ name=”wordcount-wf”>
<start to=”prepare-input”/>
<action name=”prepare-input”>
<fs>
<delete path=”hdfs://192.168.1.51:8020/user/oozie/wordcount/input”/>
<mkdir path=”hdfs://192.168.1.51:8020/user/oozie/wordcount/input”/>
</fs>
<ok to=”copy-input”/>
<error to=”fail”/>
</action>
<action name=”copy-input”>
<fs>
<copyFromLocal source=”/data/oozie/workflows/wordcount/input.txt”
target=”hdfs://192.168.1.51:8020/user/oozie/wordcount/input/input.txt”/>
</fs>
<ok to=”wordcount”/>
<error to=”fail”/>
</action>
<action name=”wordcount”>
<map-reduce>
<job-tracker>yarn</job-tracker>
<name-node>hdfs://192.168.1.51:8020</name-node>
<prepare>
<delete path=”hdfs://192.168.1.51:8020/user/oozie/wordcount/output”/>
</prepare>
<configuration>
<property>
<name>mapreduce.job.queuename</name>
<value>default</value>
</property>
<property>
<name>mapreduce.mapper.class</name>
<value>org.apache.hadoop.mapred.lib.NLineInputFormat</value>
</property>
<property>
<name>mapreduce.input.format.class</name>
<value>org.apache.hadoop.mapred.lib.NLineInputFormat</value>
</property>
<property>
<name>mapreduce.output.key.class</name>
<value>org.apache.hadoop.io.Text</value>
</property>
<property>
<name>mapreduce.output.value.class</name>
<value>org.apache.hadoop.io.IntWritable</value>
</property>
<property>
<name>mapreduce.input.dir</name>
<value>/user/oozie/wordcount/input</value>
</property>
<property>
<name>mapreduce.output.dir</name>
<value>/user/oozie/wordcount/output</value>
</property>
<property>
<name>mapreduce.job.name</name>
<value>WordCount</value>
</property>
</configuration>
</map-reduce>
<ok to=”end”/>
<error to=”fail”/>
</action>
<kill name=”fail”>
<message>Workflow failed, error
message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<end name=”end”/>
</workflow-app>

# 创建输入文件
# echo “hello world hello oozie” > /data/oozie/workflows/wordcount/input.txt

# 上传工作流到HDFS
# hdfs dfs -put /data/oozie/workflows/wordcount /user/oozie/

# 验证上传
# hdfs dfs -ls /user/oozie/wordcount/
Found 2 items
-rw-r–r– 1 root supergroup 23 2024-04-05 10:00 /user/oozie/wordcount/input.txt
-rw-r–r– 1 root supergroup 2048 2024-04-05 10:00 /user/oozie/wordcount/workflow.xml

7.2 提交工作流

# 提交工作流
# oozie job -oozie http://localhost:11000/oozie -config
/data/oozie/workflows/wordcount/job.properties -submit

# 输出案例如下:
job: 0000001-240405100000000-oozie-root-W

# 查看工作流状态
# oozie job -oozie http://localhost:11000/oozie -info 0000001-240405100000000-oozie-root-W
Job ID : 0000001-240405100000000-oozie-root-W
————————————————————————————————————————————
Workflow Name : wordcount-wf
App Path : hdfs://192.168.1.51:8020/user/oozie/wordcount
Status : RUNNING
Run : 0
User : root
Group : –
Created : 2024-04-05 10:00:00 GMT
Started : 2024-04-05 10:00:00 GMT
Last Modified : 2024-04-05 10:00:00 GMT
Ended : –
CoordAction ID: –
————————————————————————————————————————————
Actions
————————————————————————————————————————————
ID Status Ext ID Ext Status Err Code
————————————————————————————————————————————
0000001-240405100000000-oozie-root-W@:start: OK – OK –
0000001-240405100000000-oozie-root-W@prepare-input OK – OK –
0000001-240405100000000-oozie-root-W@copy-input OK – OK –
0000001-240405100000000-oozie-root-W@wordcount RUNNING application_1712300400000_0001 RUNNING –
————————————————————————————————————————————

7.3 监控工作流执行

# 查看工作流日志
# oozie job -oozie http://localhost:11000/oozie -log 0000001-240405100000000-oozie-root-W

# 查看MapReduce作业状态
# yarn application -status application_1712300400000_0001

# 等待工作流完成
# while true; do \
status=$(oozie job -oozie http://localhost:11000/oozie -info 0000001-240405100000000-oozie-root-W |
grep Status | awk ‘{print $3}’); \
if [ “$status” = “SUCCEEDED” ] || [ “$status” = “FAILED” ]; then \
echo “Workflow $status”; \
break; \
else \
echo “Workflow status: $status”; \
sleep 5; \
fi; \
done

# 输出案例如下:
Workflow status: RUNNING
Workflow status: RUNNING
Workflow SUCCEEDED

# 验证输出结果
# hdfs dfs -cat /user/oozie/wordcount/output/part-r-00000
hello 2
oozie 1
world 1

7.4 创建协调器作业

# 创建协调器配置
# vi /data/oozie/workflows/wordcount/coordinator.xml

<?xml version=”1.0″ encoding=”UTF-8″?>
<coordinator-app xmlns=”uri:oozie:coordinator:0.4″ name=”wordcount-coord”
frequency=”${coord:days(1)}” start=”2024-04-05T00:00Z” end=”2024-04-12T00:00Z” timezone=”UTC”>
<controls>
<timeout>1440</timeout>
<concurrency>1</concurrency>
<throttle>1</throttle>
</controls>
<datasets>
<dataset name=”input” frequency=”${coord:days(1)}” initial-instance=”2024-04-05T00:00Z”
timezone=”UTC”>
<uri-template>/user/oozie/wordcount/input/${YEAR}${MONTH}${DAY}</uri-template>
<done-flag>_SUCCESS</done-flag>
</dataset>
</datasets>
<input-events>
<data-in name=”input” dataset=”input”>
<instance>${coord:current(0)}</instance>
</data-in>
</input-events>
<action>
<workflow>
<app-path>/user/oozie/wordcount/workflow.xml</app-path>
<configuration>
<property>
<name>inputDir</name>
<value>${coord:dataIn(‘input’)}</value>
</property>
<property>
<name>outputDir</name>
<value>/user/oozie/wordcount/output/${YEAR}${MONTH}${DAY}</value>
</property>
</configuration>
</workflow>
</action>
</coordinator-app>

# 提交协调器作业
# oozie job -oozie http://localhost:11000/oozie -config
/data/oozie/workflows/wordcount/coord.properties -submit -coord

# 输出案例如下:
job: 0000002-240405100000000-oozie-root-C

# 查看协调器状态
# oozie job -oozie http://localhost:11000/oozie -info 0000002-240405100000000-oozie-root-C

8. Oozie性能优化

在生产环境中,需要对Oozie进行性能优化以提高工作流调度效率。from:www.itpux.com

8.1 线程池优化

# 编辑oozie-site.xml
# vi /data/oozie/conf/oozie-site.xml

<property>
<name>oozie.service.CallableQueueService.queue.size</name>
<value>2000</value>
<description>可调用队列大小,根据并发作业数调整</description>
</property>
<property>
<name>oozie.service.CallableQueueService.threads</name>
<value>200</value>
<description>可调用线程数,根据CPU核心数调整</description>
</property>

8.2 数据库优化

# MySQL优化配置
# vi /etc/my.cnf

[mysqld]
binlog_format = ROW
innodb_buffer_pool_size = 1G
innodb_log_file_size = 256M
innodb_flush_log_at_trx_commit = 2
innodb_max_dirty_pages_pct = 75
innodb_file_per_table = 1
max_connections = 200
query_cache_size = 64M
query_cache_type = 1

# 重启MySQL服务
# systemctl restart mysqld

# 优化Oozie数据库表
# mysql -u oozie -poozie123 -e “USE oozie; OPTIMIZE TABLE WF_JOB; OPTIMIZE TABLE WF_ACTION; OPTIMIZE
TABLE COORD_JOB;”

# 查看数据库表大小
# mysql -u oozie -poozie123 -e “USE oozie; SELECT table_name, round(((data_length + index_length) /
1024 / 1024), 2) as ‘Size (MB)’ FROM information_schema.tables WHERE table_schema = ‘oozie’ ORDER BY
(data_length + index_length) DESC LIMIT 10;”

8.3 内存优化

# 编辑oozie-env.sh
# vi /data/oozie/conf/oozie-env.sh

# 根据服务器内存调整JVM参数
export OOZIE_JVM_OPTS=”-Xmx8192m -Xms4096m -XX:MaxPermSize=1024m -XX:+UseConcMarkSweepGC
-XX:+CMSClassUnloadingEnabled -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/data/oozie/logs
-XX:+UseG1GC”

# 重启Oozie服务
# cd /data/oozie
# bin/oozied.sh stop
# bin/oozied.sh start

8.4 工作流优化

# 工作流优化建议:
1. 减少工作流复杂度,避免过多的action节点
2. 使用fork-join模式并行执行可独立的任务
3. 合理设置工作流超时时间
4. 使用增量数据处理,避免全量处理
5. 优化MapReduce作业配置
6. 合理设置协调器的concurrency和throttle参数
7. 使用共享库减少重复依赖
8. 定期清理历史工作流数据

# 清理历史工作流数据
# oozie job -oozie http://localhost:11000/oozie -purge

# 查看系统状态
# oozie admin -oozie http://localhost:11000/oozie -status
System mode: NORMAL

# 查看队列状态
# oozie admin -oozie http://localhost:11000/oozie -queue
Queue name: default
Queue status: NORMAL
Running jobs: 0
Pending jobs: 0
Suspended jobs: 0

生产环境建议:根据服务器硬件配置和工作流数量调整Oozie的线程池和内存配置。定期清理历史工作流数据,避免数据库表过大影响性能。合理设计工作流结构,减少不必要的action节点。

9. Oozie升级迁移

本节介绍Oozie的版本升级和数据迁移方法。

9.1 Oozie版本升级

# 备份当前Oozie配置
# cp -r /data/oozie/conf /backup/oozie_conf_$(date +%Y%m%d)

# 备份Oozie数据库
# mysqldump -u oozie -poozie123 oozie > /backup/oozie_db_$(date +%Y%m%d).sql

# 停止当前Oozie服务
# cd /data/oozie
# bin/oozied.sh stop

# 下载新版本Oozie
# cd /tmp
# wget https://archive.apache.org/dist/oozie/5.2.1/oozie-5.2.1.tar.gz

# 解压新版本
# tar -xzf oozie-5.2.1.tar.gz

# 替换安装目录
# mv /data/oozie /data/oozie_old
# mv oozie-5.2.1 /data/oozie

# 恢复配置文件
# cp -r /backup/oozie_conf_$(date +%Y%m%d)/* /data/oozie/conf/

# 复制JDBC驱动
# cp /data/oozie_old/lib/mysql-connector-java-8.0.28.jar /data/oozie/lib/

# 准备WAR包
# cd /data/oozie
# bin/oozie-setup.sh prepare-war

# 升级数据库
# bin/oozie-setup.sh db upgrade -run

# 输出案例如下:
Validate DB Connection
DONE
Check DB schema does not exist
DONE
Check if DB schema is up-to-date
DONE
DB schema is up-to-date
DONE

# 重启Oozie服务
# bin/oozied.sh start

# 验证升级
# oozie version
Oozie client build version: 5.2.1

# 验证服务状态
# oozie admin -oozie http://localhost:11000/oozie -status
System mode: NORMAL

9.2 Oozie配置迁移

# 导出Oozie配置
# cp -r /data/oozie/conf /backup/oozie_conf_export

# 导出工作流定义
# hdfs dfs -get /user/oozie/workflows /backup/oozie_workflows_export

# 在新服务器上导入配置
# cp -r /backup/oozie_conf_export /data/oozie/conf

# 导入工作流定义
# hdfs dfs -put /backup/oozie_workflows_export /user/oozie/workflows

# 验证配置
# oozie admin -oozie http://localhost:11000/oozie -status
System mode: NORMAL

# 验证工作流
# hdfs dfs -ls /user/oozie/workflows/

10. Oozie备份恢复

本节介绍Oozie的备份和恢复方法。

10.1 Oozie配置备份

# 备份Oozie配置
# tar -czf /backup/oozie_config_$(date +%Y%m%d).tar.gz -C /data oozie/conf

# 备份Oozie WAR包
# tar -czf /backup/oozie_war_$(date +%Y%m%d).tar.gz -C /data oozie/oozie-server/webapps

# 备份共享库
# hdfs dfs -get /user/oozie/share/lib /backup/oozie_sharelib_$(date +%Y%m%d)

# 备份工作流定义
# hdfs dfs -get /user/oozie/workflows /backup/oozie_workflows_$(date +%Y%m%d)

# 备份Oozie日志
# tar -czf /backup/oozie_logs_$(date +%Y%m%d).tar.gz -C /data oozie/logs

10.2 Oozie数据库备份

# 备份Oozie数据库
# mysqldump -u oozie -poozie123 oozie > /backup/oozie_db_$(date +%Y%m%d).sql

# 压缩备份文件
# gzip /backup/oozie_db_$(date +%Y%m%d).sql

# 验证备份文件
# ls -la /backup/oozie_db_$(date +%Y%m%d).sql.gz
-rw-r–r– 1 root root 1234567 Apr 5 10:00 /backup/oozie_db_20240405.sql.gz

# 定期备份脚本
# vi /data/oozie/scripts/backup_oozie.sh

#!/bin/bash
BACKUP_DIR=”/backup/oozie_backups/$(date +%Y%m%d)”
OOZIE_HOME=”/data/oozie”
HDFS_PATH=”hdfs://192.168.1.51:8020″

# 创建备份目录
mkdir -p $BACKUP_DIR

# 备份配置
cp -r $OOZIE_HOME/conf $BACKUP_DIR/

# 备份数据库
mysqldump -u oozie -poozie123 oozie > $BACKUP_DIR/oozie_db.sql
gzip $BACKUP_DIR/oozie_db.sql

# 备份共享库
hdfs dfs -get $HDFS_PATH/user/oozie/share/lib $BACKUP_DIR/

# 备份工作流
hdfs dfs -get $HDFS_PATH/user/oozie/workflows $BACKUP_DIR/

# 备份日志
cp -r $OOZIE_HOME/logs $BACKUP_DIR/

echo “Oozie backup completed successfully: $BACKUP_DIR”

# 添加执行权限
# chmod +x /data/oozie/scripts/backup_oozie.sh

# 添加定时任务
# crontab -e
0 2 * * * /data/oozie/scripts/backup_oozie.sh

10.3 Oozie恢复

# 停止Oozie服务
# cd /data/oozie
# bin/oozied.sh stop

# 恢复数据库
# mysql -u oozie -poozie123 oozie < /backup/oozie_db_20240405.sql # 恢复配置 # cp -r /backup/oozie_conf_20240405/* /data/oozie/conf/ # 恢复共享库 # hdfs dfs -rm -r /user/oozie/share/lib # hdfs dfs -put /backup/oozie_sharelib_20240405 /user/oozie/share/ # 恢复工作流 # hdfs dfs -rm -r /user/oozie/workflows # hdfs dfs -put /backup/oozie_workflows_20240405 /user/oozie/ # 重启Oozie服务 # bin/oozied.sh start # 验证恢复 # oozie admin -oozie http://localhost:11000/oozie -status System mode: NORMAL # 验证工作流 # oozie job -oozie http://localhost:11000/oozie -list

10.4 Oozie监控脚本

# 创建Oozie监控脚本
# vi /data/oozie/scripts/oozie_monitor.sh

#!/bin/bash
OOZIE_HOME=”/data/oozie”
LOG_FILE=”/data/oozie/logs/oozie_monitor.log”
ALERT_EMAIL=”admin@fgedu.net.cn”

check_oozie_status() {
echo “$(date): Checking Oozie status…” >> $LOG_FILE
status=$($OOZIE_HOME/bin/oozie admin -oozie http://localhost:11000/oozie -status
2>/dev/null)
if echo “$status” | grep -q “NORMAL”; then
echo “$(date): Oozie status: NORMAL” >> $LOG_FILE
else
echo “$(date): Oozie status: FAILED” >> $LOG_FILE
echo “Oozie status failed: $status” | mail -s “Oozie Alert” $ALERT_EMAIL
fi
}

check_database_connection() {
echo “$(date): Checking database connection…” >> $LOG_FILE
mysql -u oozie -poozie123 -e “SELECT 1” > /dev/null 2>&1
if [ $? -eq 0 ]; then
echo “$(date): Database connection OK” >> $LOG_FILE
else
echo “$(date): Database connection FAILED” >> $LOG_FILE
echo “Oozie database connection failed” | mail -s “Oozie Alert” $ALERT_EMAIL
fi
}

check_hdfs_connection() {
echo “$(date): Checking HDFS connection…” >> $LOG_FILE
hdfs dfs -ls / > /dev/null 2>&1
if [ $? -eq 0 ]; then
echo “$(date): HDFS connection OK” >> $LOG_FILE
else
echo “$(date): HDFS connection FAILED” >> $LOG_FILE
echo “Oozie HDFS connection failed” | mail -s “Oozie Alert” $ALERT_EMAIL
fi
}

check_running_jobs() {
echo “$(date): Checking running jobs…” >> $LOG_FILE
jobs=$($OOZIE_HOME/bin/oozie job -oozie http://localhost:11000/oozie -list 2>/dev/null |
grep -v “Job ID” | grep -v “Total jobs” | wc -l)
echo “$(date): Running jobs: $jobs” >> $LOG_FILE
if [ $jobs -gt 100 ]; then
echo “$(date): Too many running jobs: $jobs” >> $LOG_FILE
echo “Oozie has too many running jobs: $jobs” | mail -s “Oozie Alert” $ALERT_EMAIL
fi
}

main() {
check_oozie_status
check_database_connection
check_hdfs_connection
check_running_jobs
}

main

# 添加执行权限
# chmod +x /data/oozie/scripts/oozie_monitor.sh

# 添加定时任务
# crontab -e
*/15 * * * * /data/oozie/scripts/oozie_monitor.sh

生产环境建议:定期备份Oozie配置和数据库,建议每天执行一次完整备份。监控脚本建议每15分钟执行一次,及时发现并处理问题。恢复操作前务必停止Oozie服务,避免数据不一致。

通过以上步骤,Oozie安装配置、性能优化、升级迁移、备份恢复等内容已全部完成。Oozie作为Hadoop生态系统中的工作流调度系统,能够高效地协调和管理各种作业的执行,是大数据平台任务调度的核心组件之一。

本文由风哥教程整理发布,仅用于学习测试使用,转载注明出处:http://www.fgedu.net.cn/10327.html

联系我们

在线咨询:点击这里给我发消息

微信号:itpux-com

工作日:9:30-18:30,节假日休息