1. Oozie简介与版本说明
Oozie是Apache开源的工作流调度系统,用于管理Hadoop作业。Oozie工作流是DAG(有向无环图),可以包含多个动作节点和控制节点,支持定时调度和依赖管理。更多学习教程www.fgedu.net.cn
Oozie最新版本:
Oozie 5.2.0 (稳定版)
Oozie 5.1.0 (稳定版)
Oozie 4.3.1 (历史版本)
Oozie 4.2.0 (历史版本)
Oozie支持的作业类型:
– MapReduce作业
– Hive作业
– Pig作业
– Sqoop作业
– Spark作业
– Shell脚本
– Java程序
– DistCp作业
– SSH命令
– Email通知
– Hive Server 2
– Spark作业
调度类型:
– 工作流(Workflow)
– 协调器(Coordinator)
– 捆绑作业(Bundle)
2. Oozie下载方式
Oozie提供多种下载方式,包括官方下载、国内镜像下载等。学习交流加群风哥微信: itpux-com
方式一:官方下载
$ mkdir -p /fgeudb/software/oozie
$ cd /fgeudb/software/oozie
# 下载Oozie 5.2.1
$ wget https://archive.apache.org/dist/oozie/5.2.1/oozie-5.2.1.tar.gz
# 下载Oozie Client
$ wget https://archive.apache.org/dist/oozie/5.2.1/oozie-client-5.2.1.tar.gz
# 查看下载文件
$ ls -lh
输出示例如下:
total 350M
-rw-r–r– 1 root root 320M Apr 4 10:00 oozie-5.2.1.tar.gz
-rw-r–r– 1 root root 30M Apr 4 10:00 oozie-client-5.2.1.tar.gz
方式二:国内镜像下载
$ wget https://mirrors.aliyun.com/apache/oozie/5.2.1/oozie-5.2.1.tar.gz
# 使用华为云镜像
$ wget https://mirrors.huawei.com/apache/oozie/5.2.1/oozie-5.2.1.tar.gz
# 使用清华大学镜像
$ wget https://mirrors.tuna.tsinghua.edu.cn/apache/oozie/5.2.1/oozie-5.2.1.tar.gz
方式三:下载扩展库
$ wget https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-common/3.3.6/hadoop-common-3.3.6.jar
$ wget https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-hdfs/3.3.6/hadoop-hdfs-3.3.6.jar
$ wget https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-mapreduce-client-core/3.3.6/hadoop-mapreduce-client-core-3.3.6.jar
# 下载MySQL JDBC驱动
$ wget https://repo1.maven.org/maven2/mysql/mysql-connector-java/8.0.33/mysql-connector-java-8.0.33.jar
# 下载ExtJS(用于Web控制台)
$ wget https://archive.apache.org/dist/oozie/ext-2.2.zip
3. Oozie安装部署
Oozie依赖Hadoop、Java和数据库环境,需要先安装配置好这些依赖。学习交流加群风哥QQ113257174
步骤1:解压安装
$ cd /fgeudb
$ tar -zxvf /fgeudb/software/oozie/oozie-5.2.1.tar.gz
$ mv oozie-5.2.1 oozie
# 查看目录结构
$ ls -la /fgeudb/oozie/
输出示例如下:
total 32
drwxr-xr-x 2 root root 4096 Apr 4 10:00 bin
drwxr-xr-x 2 root root 4096 Apr 4 10:00 conf
drwxr-xr-x 2 root root 4096 Apr 4 10:00 docs
drwxr-xr-x 2 root root 4096 Apr 4 10:00 lib
drwxr-xr-x 2 root root 4096 Apr 4 10:00 libext
drwxr-xr-x 2 root root 4096 Apr 4 10:00 oozie-server
drwxr-xr-x 2 root root 4096 Apr 4 10:00 src
步骤2:配置环境变量
$ vi ~/.bash_profile
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk
export HADOOP_HOME=/fgeudb/hadoop
export OOZIE_HOME=/fgeudb/oozie
export PATH=$OOZIE_HOME/bin:$PATH
# 使环境变量生效
$ source ~/.bash_profile
# 验证环境
$ echo $OOZIE_HOME
/fgeudb/oozie
# 验证Java
$ java -version
输出示例如下:
openjdk version “1.8.0_402”
OpenJDK Runtime Environment (build 1.8.0_402-b06)
OpenJDK 64-Bit Server VM (build 25.402-b06, mixed mode)
步骤3:创建Oozie数据库
$ mysql -h 192.168.1.51 -u root -proot123
# 创建Oozie数据库
mysql> CREATE DATABASE oozie DEFAULT CHARACTER SET utf8mb4;
# 创建Oozie用户
mysql> CREATE USER ‘oozie’@’%’ IDENTIFIED BY ‘oozie123’;
mysql> GRANT ALL PRIVILEGES ON oozie.* TO ‘oozie’@’%’;
mysql> FLUSH PRIVILEGES;
# 验证数据库
mysql> SHOW DATABASES;
输出示例如下:
+——————–+
| Database |
+——————–+
| information_schema |
| fgedu_db |
| mysql |
| oozie |
| performance_schema |
+——————–+
步骤4:配置Oozie
$ vi /fgeudb/oozie/conf/oozie-site.xml
<?xml version=”1.0″ encoding=”UTF-8″?>
<configuration>
<property>
<name>oozie.base.url</name>
<value>http://192.168.1.51:11000/oozie</value>
</property>
<property>
<name>oozie.service.JPAService.jdbc.driver</name>
<value>com.mysql.cj.jdbc.Driver</value>
</property>
<property>
<name>oozie.service.JPAService.jdbc.url</name>
<value>jdbc:mysql://192.168.1.51:3306/oozie?useSSL=false&serverTimezone=Asia/Shanghai</value>
</property>
<property>
<name>oozie.service.JPAService.jdbc.username</name>
<value>oozie</value>
</property>
<property>
<name>oozie.service.JPAService.jdbc.password</name>
<value>oozie123</value>
</property>
<property>
<name>oozie.service.HadoopAccessorService.hadoop.configurations</name>
<value>*=/fgeudb/hadoop/etc/hadoop</value>
</property>
<property>
<name>oozie.service.WorkflowAppService.system.libpath</name>
<value>/user/oozie/share/lib</value>
</property>
<property>
<name>oozie.service.SchemaService.wf.ext.schemas</name>
<value>shell-action-0.1.xsd,email-action-0.1.xsd,hive-action-0.2.xsd,sqoop-action-0.2.xsd,ssh-action-0.1.xsd,distcp-action-0.1.xsd,spark-action-0.1.xsd</value>
</property>
</configuration>
步骤5:安装依赖库
$ cp /fgeudb/hadoop/share/hadoop/common/*.jar /fgeudb/oozie/libext/
$ cp /fgeudb/hadoop/share/hadoop/common/lib/*.jar /fgeudb/oozie/libext/
$ cp /fgeudb/hadoop/share/hadoop/hdfs/*.jar /fgeudb/oozie/libext/
$ cp /fgeudb/hadoop/share/hadoop/hdfs/lib/*.jar /fgeudb/oozie/libext/
$ cp /fgeudb/hadoop/share/hadoop/mapreduce/*.jar /fgeudb/oozie/libext/
$ cp /fgeudb/hadoop/share/hadoop/yarn/*.jar /fgeudb/oozie/libext/
$ cp /fgeudb/hadoop/share/hadoop/yarn/lib/*.jar /fgeudb/oozie/libext/
# 复制MySQL JDBC驱动
$ cp /fgeudb/software/oozie/mysql-connector-java-8.0.33.jar /fgeudb/oozie/libext/
# 解压ExtJS
$ unzip /fgeudb/software/oozie/ext-2.2.zip -d /fgeudb/oozie/libext/
# 查看libext目录
$ ls /fgeudb/oozie/libext/ | wc -l
输出示例如下:
150
步骤6:初始化Oozie
$ cd /fgeudb/oozie
$ bin/oozie-setup.sh sharelib create -fs hdfs://192.168.1.51:9000
输出示例如下:
setting OOZIE_HOME=/fgeudb/oozie
setting OOZIE_CONFIG=/fgeudb/oozie/conf
setting OOZIE_DATA=/fgeudb/oozie/data
setting OOZIE_LOG=/fgeudb/oozie/logs
…
ShareLib created at: hdfs://192.168.1.51:9000/user/oozie/share/lib/lib_20260404100000
# 初始化数据库
$ bin/oozie-setup.sh db create -run
输出示例如下:
Validate DB Connection
DB Connection established
Create OOZIE Database
OOZIE Database created
Done
# 启动Oozie
$ bin/oozie-start.sh
输出示例如下:
setting OOZIE_HOME=/fgeudb/oozie
setting OOZIE_CONFIG=/fgeudb/oozie/conf
…
Oozie started
# 验证Oozie状态
$ bin/oozie admin -oozie http://192.168.1.51:11000/oozie -status
输出示例如下:
System mode: NORMAL
4. Oozie配置详解
Oozie配置包括服务配置、Hadoop配置和安全配置等。风哥提示:正确配置Hadoop访问路径是Oozie正常运行的关键。
核心配置说明
# Oozie服务URL
oozie.base.url = http://192.168.1.51:11000/oozie
# 数据库配置
oozie.service.JPAService.jdbc.driver = com.mysql.cj.jdbc.Driver
oozie.service.JPAService.jdbc.url = jdbc:mysql://192.168.1.51:3306/oozie
oozie.service.JPAService.jdbc.username = oozie
oozie.service.JPAService.jdbc.password = oozie123
# Hadoop配置路径
oozie.service.HadoopAccessorService.hadoop.configurations = *=/fgeudb/hadoop/etc/hadoop
# ShareLib路径
oozie.service.WorkflowAppService.system.libpath = /user/oozie/share/lib
# 作业回调配置
oozie.service.CallbackService.callback.url.base = http://192.168.1.51:11000/oozie
# 作业超时配置
oozie.service.CallableQueueService.callable.concurrency = 10
oozie.service.CallableQueueService.queue.size = 10000
# 日志配置
oozie.services.ext = org.apache.oozie.service.EventHandlerService
Hadoop配置集成
$ vi /fgeudb/hadoop/etc/hadoop/core-site.xml
<property>
<name>hadoop.proxyuser.oozie.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.oozie.groups</name>
<value>*</value>
</property>
# 重启Hadoop
$ stop-dfs.sh
$ start-dfs.sh
$ stop-yarn.sh
$ start-yarn.sh
5. 工作流配置实战
Oozie工作流使用XML格式定义,包含动作节点和控制节点。更多学习教程公众号风哥教程itpux_com
步骤1:创建工作流目录结构
$ mkdir -p /fgeudb/oozie/workflows/etl_workflow
$ cd /fgeudb/oozie/workflows/etl_workflow
# 创建子目录
$ mkdir -p lib
# 查看目录结构
$ tree
输出示例如下:
.
├── job.properties
├── workflow.xml
└── lib
└── custom.jar
步骤2:创建工作流定义
$ vi workflow.xml
<workflow-app xmlns=”uri:oozie:workflow:1.0″ name=”etl-workflow”>
<start to=”sqoop-import-node”/>
<action name=”sqoop-import-node”>
<sqoop xmlns=”uri:oozie:sqoop-action:1.0″>
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<prepare>
<delete path=”${nameNode}/user/hadoop/orders”/>
</prepare>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>default</value>
</property>
</configuration>
<command>import –connect jdbc:mysql://192.168.1.51:3306/fgedu_db –username root –password root123 –table orders –target-dir /user/hadoop/orders –num-mappers 4</command>
</sqoop>
<ok to=”hive-processing-node”/>
<error to=”fail-node”/>
</action>
<action name=”hive-processing-node”>
<hive xmlns=”uri:oozie:hive-action:1.0″>
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<prepare>
<delete path=”${nameNode}/user/hadoop/orders_processed”/>
</prepare>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>default</value>
</property>
</configuration>
<script>scripts/process_orders.hql</script>
<param>input_dir=/user/hadoop/orders</param>
<param>output_dir=/user/hadoop/orders_processed</param>
</hive>
<ok to=”spark-analysis-node”/>
<error to=”fail-node”/>
</action>
<action name=”spark-analysis-node”>
<spark xmlns=”uri:oozie:spark-action:1.0″>
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>mapred.job.queue.name</name>
<value>default</value>
</property>
</configuration>
<master>yarn-cluster</master>
<mode>cluster</mode>
<name>Spark Analysis Job</name>
<class>com.fgedu.spark.OrderAnalysis</class>
<jar>${nameNode}/user/oozie/lib/spark-jobs.jar</jar>
<arg>${input_dir}</arg>
<arg>${output_dir}</arg>
</spark>
<ok to=”email-node”/>
<error to=”fail-node”/>
</action>
<action name=”email-node”>
<email xmlns=”uri:oozie:email-action:1.0″>
<to>admin@fgedu.net.cn</to>
<cc>ops@fgedu.net.cn</cc>
<subject>ETL Workflow Completed</subject>
<body>ETL workflow has completed successfully.</body>
</email>
<ok to=”end”/>
<error to=”fail-node”/>
</action>
<kill name=”fail-node”>
<message>Workflow failed, error message: ${wf:errorMessage(wf:lastErrorNode())}</message>
</kill>
<end name=”end”/>
</workflow-app>
步骤3:创建作业属性文件
$ vi job.properties
nameNode=hdfs://192.168.1.51:9000
jobTracker=192.168.1.51:8032
queueName=default
oozie.use.system.libpath=true
oozie.wf.application.path=${nameNode}/user/oozie/workflows/etl_workflow
input_dir=/user/hadoop/orders
output_dir=/user/hadoop/orders_analysis
步骤4:提交工作流
$ hdfs dfs -put /fgeudb/oozie/workflows/etl_workflow /user/oozie/workflows/
# 提交工作流
$ oozie job -oozie http://192.168.1.51:11000/oozie -config job.properties -run
输出示例如下:
job: 0000000-260404100000000-oozie-oozi-W
# 查看作业状态
$ oozie job -oozie http://192.168.1.51:11000/oozie -info 0000000-260404100000000-oozie-oozi-W
输出示例如下:
Job ID : 0000000-260404100000000-oozie-oozi-W
————————————————————————
Workflow Name : etl-workflow
App Path : hdfs://192.168.1.51:9000/user/oozie/workflows/etl_workflow
Status : RUNNING
Run : 0
User : root
Group : –
Created : 2026-04-04 10:00 GMT
Started : 2026-04-04 10:00 GMT
Last Modified : 2026-04-04 10:01 GMT
Ended : –
CoordAction ID: –
————————————————————————
Actions
————————————————————————
ID Status Ext ID Ext Status Err Code
——————————————————————————————————————————————–
0000000-260404100000000-oozie-oozi-W@:start: OK – OK –
0000000-260404100000000-oozie-oozi-W@sqoop-import-node RUNNING job_1712217600000_0001 RUNNING –
6. 定时调度配置
Oozie Coordinator用于定时调度工作流,支持cron表达式和频率配置。from:www.itpux.com
步骤1:创建Coordinator定义
$ mkdir -p /fgeudb/oozie/coordinators/daily_etl
$ cd /fgeudb/oozie/coordinators/daily_etl
# 创建coordinator.xml
$ vi coordinator.xml
<coordinator-app name=”daily-etl-coordinator”
frequency=”${coord:days(1)}”
start=”2026-04-01T00:00Z”
end=”2027-04-01T00:00Z”
timezone=”Asia/Shanghai”
xmlns=”uri:oozie:coordinator:1.0″>
<controls>
<timeout>1440</timeout>
<concurrency>1</concurrency>
<execution>FIFO</execution>
<throttle>1</throttle>
</controls>
<datasets>
<dataset name=”orders-dataset” frequency=”${coord:days(1)}” initial-instance=”2026-04-01T00:00Z” timezone=”Asia/Shanghai”>
<uri-template>${nameNode}/user/hadoop/orders/${YEAR}/${MONTH}/${DAY}</uri-template>
<done-flag>_SUCCESS</done-flag>
</dataset>
</datasets>
<input-events>
<data-in name=”orders-input” dataset=”orders-dataset”>
<instance>${coord:current(0)}</instance>
</data-in>
</input-events>
<action>
<workflow>
<app-path>${workflowAppPath}</app-path>
<configuration>
<property>
<name>input_dir</name>
<value>${coord:dataIn(‘orders-input’)}</value>
</property>
<property>
<name>output_dir</name>
<value>${nameNode}/user/hadoop/output/${YEAR}/${MONTH}/${DAY}</value>
</property>
</configuration>
</workflow>
</action>
</coordinator-app>
步骤2:创建Coordinator属性文件
$ vi job.properties
nameNode=hdfs://192.168.1.51:9000
jobTracker=192.168.1.51:8032
queueName=default
oozie.use.system.libpath=true
oozie.coord.application.path=${nameNode}/user/oozie/coordinators/daily_etl
workflowAppPath=${nameNode}/user/oozie/workflows/etl_workflow
步骤3:提交Coordinator
$ hdfs dfs -put /fgeudb/oozie/coordinators/daily_etl /user/oozie/coordinators/
# 提交Coordinator
$ oozie job -oozie http://192.168.1.51:11000/oozie -config job.properties -run
输出示例如下:
job: 0000000-260404100000000-oozie-oozi-C
# 查看Coordinator状态
$ oozie job -oozie http://192.168.1.51:11000/oozie -info 0000000-260404100000000-oozie-oozi-C
输出示例如下:
Job ID : 0000000-260404100000000-oozie-oozi-C
————————————————————————
Job Name : daily-etl-coordinator
App Path : hdfs://192.168.1.51:9000/user/oozie/coordinators/daily_etl
Status : RUNNING
Start Time : 2026-04-01 00:00
End Time : 2027-04-01 00:00
Pause Time : –
Next Materialized Time : 2026-04-05 00:00
7. Oozie监控运维
Oozie提供Web控制台和命令行工具进行监控和管理。
步骤1:访问Web控制台
# URL: http://192.168.1.51:11000/oozie
# Web控制台功能:
# – 查看所有作业状态
# – 查看作业详细信息和日志
# – 暂停、恢复、杀死作业
# – 查看Coordinator和Bundle状态
步骤2:命令行管理
$ oozie jobs -oozie http://192.168.1.51:11000/oozie
输出示例如下:
Job ID Status User Group App Name
————————————————————————
0000000-260404100000000-oozie-oozi-W SUCCEEDED root – etl-workflow
0000001-260404100000000-oozie-oozi-C RUNNING root – daily-etl-coordinator
# 查看特定作业日志
$ oozie job -oozie http://192.168.1.51:11000/oozie -info 0000000-260404100000000-oozie-oozi-W -verbose
# 暂停作业
$ oozie job -oozie http://192.168.1.51:11000/oozie -suspend 0000000-260404100000000-oozie-oozi-W
# 恢复作业
$ oozie job -oozie http://192.168.1.51:11000/oozie -resume 0000000-260404100000000-oozie-oozi-W
# 杀死作业
$ oozie job -oozie http://192.168.1.51:11000/oozie -kill 0000000-260404100000000-oozie-oozi-W
# 重新运行作业
$ oozie job -oozie http://192.168.1.51:11000/oozie -rerun 0000000-260404100000000-oozie-oozi-W -D oozie.wf.rerun.failnodes=true
步骤3:查看作业日志
$ tail -f /fgeudb/oozie/logs/oozie.log
输出示例如下:
2026-04-04 10:00:00 INFO XLogService: Log4j configuration file [oozie-log4j.properties]
2026-04-04 10:00:00 INFO Services: Initialized services
2026-04-04 10:00:00 INFO OozieServer: Oozie server started
2026-04-04 10:00:30 INFO WorkflowAppService: Loading workflow app [etl-workflow]
2026-04-04 10:01:00 INFO ActionService: Action [sqoop-import-node] completed successfully
# 查看作业执行日志
$ oozie job -oozie http://192.168.1.51:11000/oozie -log 0000000-260404100000000-oozie-oozi-W
输出示例如下:
Log for job 0000000-260404100000000-oozie-oozi-W
————————————————————————
2026-04-04 10:00:00 INFO Start action [sqoop-import-node]
2026-04-04 10:00:30 INFO Action [sqoop-import-node] completed successfully
2026-04-04 10:00:30 INFO Start action [hive-processing-node]
2026-04-04 10:01:00 INFO Action [hive-processing-node] completed successfully
2026-04-04 10:01:00 INFO Start action [spark-analysis-node]
2026-04-04 10:01:30 INFO Action [spark-analysis-node] completed successfully
2026-04-04 10:01:30 INFO Workflow job completed successfully
本文由风哥教程整理发布,仅用于学习测试使用,转载注明出处:http://www.fgedu.net.cn/10327.html
