1. 首页 > 软件安装教程 > 正文

DolphinScheduler安装配置-DolphinScheduler工作流安装配置_升级迁移详细过程

1. DolphinScheduler概述与环境规划

DolphinScheduler是一个分布式、易扩展的可视化DAG工作流调度系统,用于解决大数据任务依赖关系复杂、任务调度困难等问题。DolphinScheduler具有丰富的任务类型、可视化的工作流定义、灵活的调度策略和完善的监控告警机制。更多学习教程www.fgedu.net.cn

1.1 DolphinScheduler版本说明

DolphinScheduler目前主要版本为3.x系列,本教程以DolphinScheduler 3.1.0为例进行详细讲解。DolphinScheduler 3.x版本相比之前版本在性能、稳定性和功能方面都有显著提升,支持更多的任务类型和更灵活的工作流定义。

# 查看DolphinScheduler版本
$ dolphinscheduler-server.sh -v
DolphinScheduler version: 3.1.0

# 查看Java版本
$ java -version
openjdk version “1.8.0_402”
OpenJDK Runtime Environment (build 1.8.0_402-b06)
OpenJDK 64-Bit Server VM (build 25.402-b06, mixed mode)

# 查看MySQL版本
$ mysql –version
mysql Ver 8.0.33 for Linux on x86_64 (MySQL Community Server – GPL)

# 查看Zookeeper版本
$ zkServer.sh version
ZooKeeper JMX enabled by default
Using config: /data/zookeeper/conf/zoo.cfg
Client port found: 2181. Client address: localhost.
Mode: standalone

1.2 环境规划

本次安装环境规划如下:

DolphinScheduler服务器:
dolphinscheduler01.fgedu.net.cn (192.168.1.51) – Master + Worker + API Server + UI

DolphinScheduler版本:3.1.0
Java版本:OpenJDK 1.8.0
MySQL版本:8.0.33
Zookeeper版本:3.7.1
安装目录:/data/dolphinscheduler
Master端口:5678
Worker端口:1234
API Server端口:12345
UI端口:8888

数据库:
MySQL:192.168.1.51:3306/dolphinscheduler

存储:
工作流存储:/data/dolphinscheduler/resources
日志存储:/data/dolphinscheduler/logs

2. 硬件环境要求

DolphinScheduler作为工作流调度系统,对硬件资源要求相对较低,但需要考虑调度作业的数量和复杂度。学习交流加群风哥微信: itpux-com

2.1 物理主机环境要求

# 检查内存大小
# free -h
total used free shared buff/cache available
Mem: 16G 4.2G 10G 256M 1.8G 11G
Swap: 8G 0B 8G

# 检查磁盘空间
# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda1 50G 12G 39G 24% /
/dev/sdb1 500G 50G 451G 10% /data
/dev/sdc1 200G 20G 181G 10% /backup

# 检查CPU核心数
# nproc
8

# 检查系统架构
# uname -m
x86_64

生产环境建议:最小内存8GB(测试环境),生产环境建议16GB以上。磁盘空间根据工作流数量和日志大小规划,建议至少200GB。CPU核心数建议8核以上,以支持并发作业调度。

2.2 vSphere虚拟主机环境要求

虚拟机配置:
– vCPU:8核
– 内存:16GB
– 磁盘:系统盘50GB + 数据盘500GB
– 网络:VMXNET3网卡,千兆网络
– 存储:建议使用SSD存储以提高I/O性能

资源池配置:
– CPU预留:4GHz
– 内存预留:8GB
– 内存限制:16GB
– CPU份额:正常
– 内存份额:正常

2.3 云平台主机环境要求

云主机规格(阿里云/腾讯云/华为云):
– 实例规格:ecs.g6.2xlarge或同等规格
– vCPU:8核
– 内存:32GB
– 系统盘:高效云盘 100GB
– 数据盘:SSD云盘 500GB
– 网络带宽:10Mbps以上

存储配置:
– OSS对象存储:用于存储工作流定义和配置
– NAS文件存储:用于共享配置文件
– 云盘快照:定期备份配置和数据

3. 操作系统环境准备

在安装DolphinScheduler之前,需要对操作系统进行必要的配置和优化。

3.1 操作系统版本检查

# 检查操作系统版本
# cat /etc/os-release
NAME=”Oracle Linux Server”
VERSION=”8.9″
ID=”ol”
PRETTY_NAME=”Oracle Linux Server 8.9″

# 检查内核版本
# uname -r
5.4.17-2136.302.7.2.el8uek.x86_64

# 检查SELinux状态
# getenforce
Disabled

# 检查防火墙状态
# systemctl status firewalld
● firewalld.service – firewalld – dynamic firewall daemon
Loaded: loaded (/usr/lib/systemd/system/firewalld.service; disabled; vendor preset: enabled)
Active: inactive (dead)

3.2 内核参数优化

# 编辑sysctl.conf文件
# vi /etc/sysctl.conf

# 添加以下内核参数
fs.file-max = 6815744
kernel.sem = 250 32000 100 128
kernel.shmmni = 4096
kernel.shmall = 4294967296
kernel.shmmax = 68719476736
net.ipv4.ip_local_port_range = 9000 65500
net.core.rmem_default = 262144
net.core.rmem_max = 4194304
net.core.wmem_default = 262144
net.core.wmem_max = 1048576
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_fin_timeout = 30
net.ipv4.tcp_keepalive_time = 1200
net.ipv4.tcp_max_syn_backlog = 8192
net.core.somaxconn = 1024
vm.swappiness = 10
vm.dirty_ratio = 15
vm.dirty_background_ratio = 5

# 使内核参数生效
# sysctl -p

# 验证参数设置
# sysctl -a | grep fs.file-max
fs.file-max = 6815744

3.3 用户资源限制配置

# 配置用户资源限制
# vi /etc/security/limits.conf

# 添加以下内容
* soft nproc 65535
* hard nproc 65535
* soft nofile 65535
* hard nofile 65535
* soft stack 10240
* hard stack 32768
dolphinscheduler soft memlock unlimited
dolphinscheduler hard memlock unlimited

# 验证配置
# ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 63499
max locked memory (kbytes, -l) unlimited
max memory size (kbytes, -m) unlimited
open files (-n) 65535
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 10240
cpu time (seconds, -t) unlimited
max user processes (-u) 65535
virtual memory (kbytes, -v) unlimited

3.4 Java环境安装

# 安装OpenJDK 1.8
# yum install -y java-1.8.0-openjdk java-1.8.0-openjdk-devel

# 配置Java环境变量
# vi /etc/profile.d/java.sh

export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk
export PATH=$JAVA_HOME/bin:$PATH
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar

# 使环境变量生效
# source /etc/profile.d/java.sh

# 验证Java安装
# java -version
openjdk version “1.8.0_402”
OpenJDK Runtime Environment (build 1.8.0_402-b06)
OpenJDK 64-Bit Server VM (build 25.402-b06, mixed mode)

# 验证JAVA_HOME
# echo $JAVA_HOME
/usr/lib/jvm/java-1.8.0-openjdk

3.5 依赖服务安装

DolphinScheduler需要依赖MySQL和Zookeeper服务。学习交流加群风哥QQ113257174

# 安装MySQL
# yum install -y mysql-server mysql-client

# 启动MySQL服务
# systemctl start mysqld
# systemctl enable mysqld

# 配置MySQL
# mysql -u root -p

# 创建DolphinScheduler数据库和用户
CREATE DATABASE dolphinscheduler CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
CREATE USER ‘dolphinscheduler’@’localhost’ IDENTIFIED BY ‘dolphinscheduler123’;
CREATE USER ‘dolphinscheduler’@’%’ IDENTIFIED BY ‘dolphinscheduler123’;
GRANT ALL PRIVILEGES ON dolphinscheduler.* TO ‘dolphinscheduler’@’localhost’;
GRANT ALL PRIVILEGES ON dolphinscheduler.* TO ‘dolphinscheduler’@’%’;
FLUSH PRIVILEGES;
EXIT;

# 验证数据库连接
# mysql -u dolphinscheduler -pdolphinscheduler123 -e “SELECT VERSION();”
+———–+
| VERSION() |
+———–+
| 8.0.33 |
+———–+

# 安装Zookeeper
# wget https://archive.apache.org/dist/zookeeper/zookeeper-3.7.1/apache-zookeeper-3.7.1-bin.tar.gz
# tar -xzf apache-zookeeper-3.7.1-bin.tar.gz -C /data/
# mv /data/apache-zookeeper-3.7.1-bin /data/zookeeper

# 配置Zookeeper
# cp /data/zookeeper/conf/zoo_sample.cfg /data/zookeeper/conf/zoo.cfg
# vi /data/zookeeper/conf/zoo.cfg

# 修改配置
dataDir=/data/zookeeper/data
dataLogDir=/data/zookeeper/logs
clientPort=2181

# 创建数据和日志目录
# mkdir -p /data/zookeeper/data /data/zookeeper/logs

# 启动Zookeeper
# /data/zookeeper/bin/zkServer.sh start

# 验证Zookeeper状态
# /data/zookeeper/bin/zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /data/zookeeper/conf/zoo.cfg
Client port found: 2181. Client address: localhost.
Mode: standalone

4. DolphinScheduler安装配置

完成环境准备后,开始安装DolphinScheduler。

4.1 下载DolphinScheduler安装包

# 创建安装目录
# mkdir -p /data/dolphinscheduler
# mkdir -p /data/dolphinscheduler/logs
# mkdir -p /data/dolphinscheduler/resources

# 下载DolphinScheduler
# cd /tmp
# wget https://github.com/apache/dolphinscheduler/releases/download/3.1.0/apache-dolphinscheduler-3.1.0-bin.tar.gz

# 解压安装
# tar -xzf apache-dolphinscheduler-3.1.0-bin.tar.gz -C /data/
# mv /data/apache-dolphinscheduler-3.1.0-bin /data/dolphinscheduler

# 查看安装目录
# ls -la /data/dolphinscheduler/
total 32
drwxr-xr-x 7 root root 4096 Apr 5 10:00 .
drwxr-xr-x 3 root root 4096 Apr 5 10:00 ..
drwxr-xr-x 2 root root 4096 Apr 5 10:00 bin
drwxr-xr-x 2 root root 4096 Apr 5 10:00 conf
drwxr-xr-x 4 root root 4096 Apr 5 10:00 lib
drwxr-xr-x 2 root root 4096 Apr 5 10:00 logs
drwxr-xr-x 2 root root 4096 Apr 5 10:00 resources

4.2 配置DolphinScheduler

# 编辑配置文件
# vi /data/dolphinscheduler/conf/config/application.yaml

# 基本配置
server:
port: 12345

# 数据库配置
spring:
datasource:
url: jdbc:mysql://192.168.1.51:3306/dolphinscheduler?useUnicode=true&characterEncoding=UTF-8&serverTimezone=Asia/Shanghai
username: dolphinscheduler
password: dolphinscheduler123
driver-class-name: com.mysql.cj.jdbc.Driver

# Zookeeper配置
registry:
type: zookeeper
zookeeper:
connect-string: localhost:2181
namespace: dolphinscheduler

# 日志配置
logging:
config: classpath:logback-spring.xml
level:
org.springframework: info
org.apache.dolphinscheduler: info

# 验证配置
# cat /data/dolphinscheduler/conf/config/application.yaml | grep -v “^#” | grep -v “^$”
server:
port: 12345
spring:
datasource:
url: jdbc:mysql://192.168.1.51:3306/dolphinscheduler?useUnicode=true&characterEncoding=UTF-8&serverTimezone=Asia/Shanghai
username: dolphinscheduler
password: dolphinscheduler123
driver-class-name: com.mysql.cj.jdbc.Driver
registry:
type: zookeeper
zookeeper:
connect-string: localhost:2181
namespace: dolphinscheduler
logging:
config: classpath:logback-spring.xml
level:
org.springframework: info
org.apache.dolphinscheduler: info

4.3 初始化数据库

# 初始化数据库
# cd /data/dolphinscheduler
# bash bin/init-database.sh

# 输出案例如下:
Start initialize database
Create database if not exists
Execute SQL file: /data/dolphinscheduler/sql/dolphinscheduler_mysql.sql
Initialize database successful

# 验证数据库表
# mysql -u dolphinscheduler -pdolphinscheduler123 -e “USE dolphinscheduler; SHOW TABLES;” | wc -l
100

# 查看数据库表结构
# mysql -u dolphinscheduler -pdolphinscheduler123 -e “USE dolphinscheduler; DESCRIBE t_ds_workflow;”
+——————-+————–+——+—–+———+—————-+
| Field | Type | Null | Key | Default | Extra |
+——————-+————–+——+—–+———+—————-+
| id | int(11) | NO | PRI | NULL | auto_increment |
| name | varchar(255) | NO | MUL | NULL | |
| project_code | bigint(20) | NO | MUL | NULL | |
| description | text | YES | | NULL | |
| user_id | int(11) | NO | | NULL | |
| tenant_id | int(11) | NO | | NULL | |
| release_state | int(11) | NO | | NULL | |
| global_params | text | YES | | NULL | |
| create_time | datetime | NO | | NULL | |
| update_time | datetime | NO | | NULL | |
| version | int(11) | NO | | NULL | |
| last_update_user | varchar(100) | YES | | NULL | |
| task_priority | int(11) | YES | | NULL | |
| worker_group | varchar(100) | YES | | NULL | |
| environment_code | bigint(20) | YES | MUL | NULL | |
| failure_strategy | int(11) | YES | | NULL | |
| process_instance_priority | int(11) | YES | | NULL | |
| warning_group_id | int(11) | YES | MUL | NULL | |
| timeout | int(11) | YES | | NULL | |
| cluster_name | varchar(100) | YES | | NULL | |
+——————-+————–+——+—–+———+—————-+

4.4 配置Master服务

# 编辑Master配置
# vi /data/dolphinscheduler/conf/config/master.properties

# 基本配置
master.listen.port=5678
master.exec.threads=100
master.heartbeat.interval=30
master.task.commit.retryTimes=5
master.task.commit.retryInterval=1000
master.max.cpuload.avg=100
master.reserved.memory=0.3

# 验证配置
# cat /data/dolphinscheduler/conf/config/master.properties | grep -v “^#” | grep -v “^$”
master.listen.port=5678
master.exec.threads=100
master.heartbeat.interval=30
master.task.commit.retryTimes=5
master.task.commit.retryInterval=1000
master.max.cpuload.avg=100
master.reserved.memory=0.3

4.5 配置Worker服务

# 编辑Worker配置
# vi /data/dolphinscheduler/conf/config/worker.properties

# 基本配置
worker.listen.port=1234
worker.exec.threads=100
worker.heartbeat.interval=30
worker.max.cpuload.avg=100
worker.reserved.memory=0.3
worker.groups=default

# 验证配置
# cat /data/dolphinscheduler/conf/config/worker.properties | grep -v “^#” | grep -v “^$”
worker.listen.port=1234
worker.exec.threads=100
worker.heartbeat.interval=30
worker.max.cpuload.avg=100
worker.reserved.memory=0.3
worker.groups=default

4.6 启动DolphinScheduler服务

# 启动Master服务
# cd /data/dolphinscheduler
# bash bin/start-master.sh

# 输出案例如下:
Starting master server…
Master server started successfully

# 启动Worker服务
# bash bin/start-worker.sh

# 输出案例如下:
Starting worker server…
Worker server started successfully

# 启动API Server
# bash bin/start-api.sh

# 输出案例如下:
Starting api server…
Api server started successfully

# 启动UI服务
# bash bin/start-ui.sh

# 输出案例如下:
Starting ui server…
UI server started successfully

# 检查服务状态
# ps -ef | grep dolphinscheduler
root 12345 1 0 10:00 ? 00:00:00 java -Xms2G -Xmx4G -Dlog4j.configuration=file:/data/dolphinscheduler/conf/logback.xml -cp /data/dolphinscheduler/lib/*: com.apache.dolphinscheduler.server.master.MasterServer
root 12346 1 0 10:00 ? 00:00:00 java -Xms2G -Xmx4G -Dlog4j.configuration=file:/data/dolphinscheduler/conf/logback.xml -cp /data/dolphinscheduler/lib/*: com.apache.dolphinscheduler.server.worker.WorkerServer
root 12347 1 0 10:00 ? 00:00:00 java -Xms2G -Xmx4G -Dlog4j.configuration=file:/data/dolphinscheduler/conf/logback-spring.xml -cp /data/dolphinscheduler/lib/*: org.apache.dolphinscheduler.api.ApiApplicationServer
root 12348 1 0 10:00 ? 00:00:00 node /data/dolphinscheduler/ui/server.js

# 查看日志
# tail -n 100 /data/dolphinscheduler/logs/dolphinscheduler-master.log
24/04/05 10:00:00 INFO [MasterServer] Starting master server…
24/04/05 10:00:00 INFO [MasterServer] Master server started successfully on port 5678
24/04/05 10:00:00 INFO [MasterServer] Registering master to zookeeper…
24/04/05 10:00:00 INFO [MasterServer] Master registered successfully

风哥提示:DolphinScheduler服务启动后,需要等待几分钟让所有服务完全初始化,然后才能正常使用Web控制台。

5. DolphinScheduler配置优化

为了提高DolphinScheduler的性能和稳定性,需要进行一些配置优化。

5.1 内存配置优化

# 编辑Master内存配置
# vi /data/dolphinscheduler/bin/start-master.sh

# 修改JVM内存参数
JAVA_OPTS=”-Xms4G -Xmx8G -XX:MaxPermSize=1024m -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/data/dolphinscheduler/logs”

# 编辑Worker内存配置
# vi /data/dolphinscheduler/bin/start-worker.sh

# 修改JVM内存参数
JAVA_OPTS=”-Xms4G -Xmx8G -XX:MaxPermSize=1024m -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/data/dolphinscheduler/logs”

# 编辑API Server内存配置
# vi /data/dolphinscheduler/bin/start-api.sh

# 修改JVM内存参数
JAVA_OPTS=”-Xms4G -Xmx8G -XX:MaxPermSize=1024m -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/data/dolphinscheduler/logs”

# 验证配置
# grep JAVA_OPTS /data/dolphinscheduler/bin/start-master.sh
JAVA_OPTS=”-Xms4G -Xmx8G -XX:MaxPermSize=1024m -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/data/dolphinscheduler/logs”

5.2 执行器配置优化

# 编辑Master配置
# vi /data/dolphinscheduler/conf/config/master.properties

# 优化Master参数
master.exec.threads=200
master.heartbeat.interval=20
master.task.commit.retryTimes=3
master.task.commit.retryInterval=500
master.max.cpuload.avg=80
master.reserved.memory=0.2

# 编辑Worker配置
# vi /data/dolphinscheduler/conf/config/worker.properties

# 优化Worker参数
worker.exec.threads=200
worker.heartbeat.interval=20
worker.max.cpuload.avg=80
worker.reserved.memory=0.2
worker.groups=default,bigdata,etl

# 验证配置
# cat /data/dolphinscheduler/conf/config/master.properties | grep -v “^#” | grep -v “^$”
master.listen.port=5678
master.exec.threads=200
master.heartbeat.interval=20
master.task.commit.retryTimes=3
master.task.commit.retryInterval=500
master.max.cpuload.avg=80
master.reserved.memory=0.2

5.3 数据库连接池优化

# 编辑API Server配置
# vi /data/dolphinscheduler/conf/config/application.yaml

# 优化数据库连接池
spring:
datasource:
url: jdbc:mysql://192.168.1.51:3306/dolphinscheduler?useUnicode=true&characterEncoding=UTF-8&serverTimezone=Asia/Shanghai&useSSL=false&allowPublicKeyRetrieval=true
username: dolphinscheduler
password: dolphinscheduler123
driver-class-name: com.mysql.cj.jdbc.Driver
hikari:
maximum-pool-size: 100
minimum-idle: 20
idle-timeout: 30000
max-lifetime: 1800000
connection-timeout: 30000
validation-timeout: 5000
connection-test-query: SELECT 1

# 验证配置
# cat /data/dolphinscheduler/conf/config/application.yaml | grep -A 10 “hikari”
hikari:
maximum-pool-size: 100
minimum-idle: 20
idle-timeout: 30000
max-lifetime: 1800000
connection-timeout: 30000
validation-timeout: 5000
connection-test-query: SELECT 1

5.4 日志配置优化

# 编辑logback配置
# vi /data/dolphinscheduler/conf/logback.xml

# 设置日志级别



# 文件输出配置

${log.path}/dolphinscheduler.log

${log.path}/dolphinscheduler.%d{yyyy-MM-dd}.log
7
1GB

%d{yyyy-MM-dd HH:mm:ss.SSS} [%thread] %-5level %logger{36} – %msg%n

# 控制台输出配置

%d{yyyy-MM-dd HH:mm:ss.SSS} [%thread] %-5level %logger{36} – %msg%n

# 编辑API Server日志配置
# vi /data/dolphinscheduler/conf/logback-spring.xml

# 设置日志级别



# 文件输出配置

${log.path}/dolphinscheduler-api.log

${log.path}/dolphinscheduler-api.%d{yyyy-MM-dd}.log
7
1GB

%d{yyyy-MM-dd HH:mm:ss.SSS} [%thread] %-5level %logger{36} – %msg%n

6. DolphinScheduler Web控制台配置

DolphinScheduler提供了Web控制台用于管理和监控工作流,下面详细介绍配置和使用方法。更多学习教程公众号风哥教程itpux_com

6.1 访问Web控制台

# 启动DolphinScheduler服务
# cd /data/dolphinscheduler
# bash bin/start-master.sh
# bash bin/start-worker.sh
# bash bin/start-api.sh
# bash bin/start-ui.sh

# 访问Web控制台
# 打开浏览器,访问 http://dolphinscheduler01.fgedu.net.cn:8888

# 验证Web控制台访问
# curl -I http://localhost:8888
HTTP/1.1 200 OK
Date: Fri, 05 Apr 2024 10:00:00 GMT
Content-Type: text/html
Content-Length: 12345
Server: Node.js/14.18.1

6.2 Web控制台功能

Web控制台主要功能:
1. 项目管理:创建、管理项目
2. 工作流管理:创建、编辑、执行、监控工作流
3. 资源管理:管理文件、UDF等资源
4. 环境管理:管理执行环境
5. 租户管理:管理租户信息
6. 用户管理:管理用户和权限
7. 告警管理:配置告警规则和通知
8. 监控中心:查看系统状态和执行情况

Web控制台登录:
– 用户名:admin
– 密码:dolphinscheduler123

6.3 配置Web控制台安全

# 编辑UI配置
# vi /data/dolphinscheduler/ui/config/config.js

// 配置API地址
window.SITE_CONFIG = {
baseUrl: ‘http://localhost:12345’
}

# 配置用户权限
# 登录Web控制台 -> 安全中心 -> 用户管理
# 创建新用户并分配权限

# 配置租户
# 登录Web控制台 -> 安全中心 -> 租户管理
# 创建新租户

# 配置告警
# 登录Web控制台 -> 安全中心 -> 告警组管理
# 创建告警组并配置通知方式

7. DolphinScheduler工作流实战

本节介绍DolphinScheduler工作流的创建和执行。

7.1 创建工作流定义

# 登录Web控制台
# 打开浏览器,访问 http://dolphinscheduler01.fgedu.net.cn:8888
# 登录:用户名admin,密码dolphinscheduler123

# 创建项目
# 点击”项目管理” -> “创建项目”
# 项目名称:WordCount
# 项目描述:WordCount Example
# 点击”确定”

# 创建工作流
# 点击项目名称”WordCount”
# 点击”工作流定义” -> “创建工作流”
# 工作流名称:WordCount Flow
# 点击”确定”

# 拖拽任务节点
# 拖拽”Shell”节点到画布
# 任务名称:prepare-input
# 命令:hdfs dfs -mkdir -p /user/dolphinscheduler/wordcount/input

# 拖拽”Shell”节点到画布
# 任务名称:copy-input
# 命令:hdfs dfs -copyFromLocal -f /data/dolphinscheduler/resources/input.txt /user/dolphinscheduler/wordcount/input/

# 拖拽”Shell”节点到画布
# 任务名称:wordcount
# 命令:hadoop jar /data/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.6.jar wordcount /user/dolphinscheduler/wordcount/input /user/dolphinscheduler/wordcount/output

# 拖拽”Shell”节点到画布
# 任务名称:check-output
# 命令:hdfs dfs -cat /user/dolphinscheduler/wordcount/output/part-r-00000

# 连接任务节点
# 点击prepare-input节点,拖拽箭头到copy-input节点
# 点击copy-input节点,拖拽箭头到wordcount节点
# 点击wordcount节点,拖拽箭头到check-output节点

# 保存工作流
# 点击”保存”
# 版本备注:v1.0
# 点击”确定”

# 上传资源文件
# 点击”资源管理” -> “上传文件”
# 选择文件:input.txt
# 内容:hello world hello dolphinscheduler
# 点击”确定”

7.2 执行工作流

# 执行工作流
# 点击工作流定义”WordCount Flow”
# 点击”运行”
# 点击”确定”

# 监控工作流执行
# 点击”工作流实例”
# 点击执行实例ID
# 查看执行状态和日志

# 输出案例如下:
2024-04-05 10:00:00 INFO ShellTask – [taskAppId=TASK-2-1-1, executePath=/data/dolphinscheduler/exec/process/2/1/1] task execute start
2024-04-05 10:00:01 INFO ShellTask – [taskAppId=TASK-2-1-1, executePath=/data/dolphinscheduler/exec/process/2/1/1] task execute success, return code: 0
2024-04-05 10:00:01 INFO ShellTask – [taskAppId=TASK-2-1-2, executePath=/data/dolphinscheduler/exec/process/2/1/2] task execute start
2024-04-05 10:00:02 INFO ShellTask – [taskAppId=TASK-2-1-2, executePath=/data/dolphinscheduler/exec/process/2/1/2] task execute success, return code: 0
2024-04-05 10:00:02 INFO ShellTask – [taskAppId=TASK-2-1-3, executePath=/data/dolphinscheduler/exec/process/2/1/3] task execute start
2024-04-05 10:00:30 INFO ShellTask – [taskAppId=TASK-2-1-3, executePath=/data/dolphinscheduler/exec/process/2/1/3] task execute success, return code: 0
2024-04-05 10:00:30 INFO ShellTask – [taskAppId=TASK-2-1-4, executePath=/data/dolphinscheduler/exec/process/2/1/4] task execute start
2024-04-05 10:00:31 INFO ShellTask – [taskAppId=TASK-2-1-4, executePath=/data/dolphinscheduler/exec/process/2/1/4] task execute success, return code: 0
2024-04-05 10:00:31 INFO WorkflowExecuteThread – [workflowAppId=PROCESS-2-1] workflow execute success

# 验证输出结果
# hdfs dfs -cat /user/dolphinscheduler/wordcount/output/part-r-00000
hello 2
dolphinscheduler 1
world 1

7.3 创建定时工作流

# 配置定时调度
# 点击工作流定义”WordCount Flow”
# 点击”定时”
# 点击”创建定时”
# 定时名称:WordCount Schedule
# 触发方式:CRON
# CRON表达式:0 0 * * *
# 开始时间:2024-04-05 00:00:00
# 点击”确定”

# 查看定时调度
# 点击”定时管理”
# 查看定时任务列表
# 点击定时任务ID查看详细信息

# 手动触发定时任务
# 点击定时任务”WordCount Schedule”
# 点击”立即执行”
# 点击”确定”

8. DolphinScheduler性能优化

在生产环境中,需要对DolphinScheduler进行性能优化以提高工作流调度效率。from:www.itpux.com

8.1 执行器优化

# 编辑Master配置
# vi /data/dolphinscheduler/conf/config/master.properties

# 优化Master参数
master.exec.threads=300
master.heartbeat.interval=15
master.task.commit.retryTimes=3
master.task.commit.retryInterval=500
master.max.cpuload.avg=70
master.reserved.memory=0.1

# 编辑Worker配置
# vi /data/dolphinscheduler/conf/config/worker.properties

# 优化Worker参数
worker.exec.threads=300
worker.heartbeat.interval=15
worker.max.cpuload.avg=70
worker.reserved.memory=0.1
worker.groups=default,bigdata,etl,spark,hive

# 重启服务
# cd /data/dolphinscheduler
# bash bin/stop-all.sh
# bash bin/start-master.sh
# bash bin/start-worker.sh
# bash bin/start-api.sh
# bash bin/start-ui.sh

8.2 数据库优化

# MySQL优化配置
# vi /etc/my.cnf

[mysqld]
binlog_format = ROW
innodb_buffer_pool_size = 4G
innodb_log_file_size = 1G
innodb_flush_log_at_trx_commit = 2
innodb_max_dirty_pages_pct = 75
innodb_file_per_table = 1
max_connections = 1000
query_cache_size = 256M
query_cache_type = 1

# 重启MySQL服务
# systemctl restart mysqld

# 优化DolphinScheduler数据库表
# mysql -u dolphinscheduler -pdolphinscheduler123 -e “USE dolphinscheduler; OPTIMIZE TABLE t_ds_workflow, t_ds_process_instance, t_ds_task_instance;”

# 查看数据库表大小
# mysql -u dolphinscheduler -pdolphinscheduler123 -e “USE dolphinscheduler; SELECT table_name, round(((data_length + index_length) / 1024 / 1024), 2) as ‘Size (MB)’ FROM information_schema.tables WHERE table_schema = ‘dolphinscheduler’ ORDER BY (data_length + index_length) DESC LIMIT 10;”

8.3 内存优化

# 编辑Master内存配置
# vi /data/dolphinscheduler/bin/start-master.sh

# 根据服务器内存调整JVM参数
export JAVA_OPTS=”-Xms8G -Xmx16G -XX:MaxPermSize=2048m -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/data/dolphinscheduler/logs”

# 编辑Worker内存配置
# vi /data/dolphinscheduler/bin/start-worker.sh

# 根据服务器内存调整JVM参数
export JAVA_OPTS=”-Xms8G -Xmx16G -XX:MaxPermSize=2048m -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/data/dolphinscheduler/logs”

# 编辑API Server内存配置
# vi /data/dolphinscheduler/bin/start-api.sh

# 根据服务器内存调整JVM参数
export JAVA_OPTS=”-Xms8G -Xmx16G -XX:MaxPermSize=2048m -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/data/dolphinscheduler/logs”

# 重启服务
# cd /data/dolphinscheduler
# bash bin/stop-all.sh
# bash bin/start-master.sh
# bash bin/start-worker.sh
# bash bin/start-api.sh
# bash bin/start-ui.sh

8.4 工作流优化

# 工作流优化建议:
1. 减少工作流复杂度,避免过多的节点
2. 使用并行节点执行可独立的任务
3. 合理设置任务超时时间
4. 使用增量数据处理,避免全量处理
5. 优化任务执行参数
6. 合理设置调度时间,避免高峰期
7. 使用合适的任务类型,提高执行效率
8. 定期清理历史执行数据

# 清理历史执行数据
# mysql -u dolphinscheduler -pdolphinscheduler123 -e “USE dolphinscheduler; DELETE FROM t_ds_process_instance WHERE end_time < DATE_SUB(NOW(), INTERVAL 30 DAY);" # 查看系统状态 # curl -s http://localhost:12345/dolphinscheduler/monitor/statistics {"masterCount":1,"workerCount":1,"projectCount":10,"processInstanceCount":1000,"taskInstanceCount":5000}

生产环境建议:根据服务器硬件配置和工作流数量调整DolphinScheduler的执行器参数和内存配置。定期清理历史执行数据,避免数据库表过大影响性能。合理设计工作流结构,减少不必要的节点。

9. DolphinScheduler升级迁移

本节介绍DolphinScheduler的版本升级和数据迁移方法。

9.1 DolphinScheduler版本升级

# 备份当前DolphinScheduler配置
# cp -r /data/dolphinscheduler/conf /backup/dolphinscheduler_conf_$(date +%Y%m%d)

# 备份DolphinScheduler数据库
# mysqldump -u dolphinscheduler -pdolphinscheduler123 dolphinscheduler > /backup/dolphinscheduler_db_$(date +%Y%m%d).sql

# 停止当前DolphinScheduler服务
# cd /data/dolphinscheduler
# bash bin/stop-all.sh

# 下载新版本DolphinScheduler
# cd /tmp
# wget https://github.com/apache/dolphinscheduler/releases/download/3.1.0/apache-dolphinscheduler-3.1.0-bin.tar.gz

# 解压新版本
# tar -xzf apache-dolphinscheduler-3.1.0-bin.tar.gz -C /data/
# mv /data/apache-dolphinscheduler-3.1.0-bin /data/dolphinscheduler_new

# 恢复配置文件
# cp -r /backup/dolphinscheduler_conf_$(date +%Y%m%d)/* /data/dolphinscheduler_new/conf/

# 升级数据库
# cd /data/dolphinscheduler_new
# bash bin/upgrade-database.sh

# 启动服务
# bash bin/start-master.sh
# bash bin/start-worker.sh
# bash bin/start-api.sh
# bash bin/start-ui.sh

# 验证升级
# curl -I http://localhost:8888
HTTP/1.1 200 OK

# 验证服务状态
# ps -ef | grep dolphinscheduler

9.2 DolphinScheduler配置迁移

# 导出DolphinScheduler配置
# cp -r /data/dolphinscheduler/conf /backup/dolphinscheduler_conf_export
# cp -r /data/dolphinscheduler/resources /backup/dolphinscheduler_resources_export

# 在新服务器上导入配置
# cp -r /backup/dolphinscheduler_conf_export /data/dolphinscheduler/conf
# cp -r /backup/dolphinscheduler_resources_export /data/dolphinscheduler/resources

# 导入数据库
# mysql -u dolphinscheduler -pdolphinscheduler123 dolphinscheduler < /backup/dolphinscheduler_db_20240405.sql # 启动服务 # cd /data/dolphinscheduler # bash bin/start-master.sh # bash bin/start-worker.sh # bash bin/start-api.sh # bash bin/start-ui.sh # 验证配置 # curl -I http://localhost:8888 HTTP/1.1 200 OK # 验证资源 # ls -la /data/dolphinscheduler/resources/

10. DolphinScheduler备份恢复

本节介绍DolphinScheduler的备份和恢复方法。

10.1 DolphinScheduler配置备份

# 备份DolphinScheduler配置
# tar -czf /backup/dolphinscheduler_config_$(date +%Y%m%d).tar.gz -C /data dolphinscheduler/conf dolphinscheduler/resources

# 备份日志
# tar -czf /backup/dolphinscheduler_logs_$(date +%Y%m%d).tar.gz -C /data dolphinscheduler/logs

# 验证备份文件
# ls -la /backup/dolphinscheduler_config_$(date +%Y%m%d).tar.gz
-rw-r–r– 1 root root 1234567 Apr 5 10:00 /backup/dolphinscheduler_config_20240405.tar.gz

10.2 DolphinScheduler数据库备份

# 备份DolphinScheduler数据库
# mysqldump -u dolphinscheduler -pdolphinscheduler123 dolphinscheduler > /backup/dolphinscheduler_db_$(date +%Y%m%d).sql

# 压缩备份文件
# gzip /backup/dolphinscheduler_db_$(date +%Y%m%d).sql

# 验证备份文件
# ls -la /backup/dolphinscheduler_db_$(date +%Y%m%d).sql.gz
-rw-r–r– 1 root root 1234567 Apr 5 10:00 /backup/dolphinscheduler_db_20240405.sql.gz

# 定期备份脚本
# vi /data/dolphinscheduler/scripts/backup_dolphinscheduler.sh

#!/bin/bash
BACKUP_DIR=”/backup/dolphinscheduler_backups/$(date +%Y%m%d)”
DOLPHINSCHEDULER_HOME=”/data/dolphinscheduler”

# 创建备份目录
mkdir -p $BACKUP_DIR

# 备份配置
cp -r $DOLPHINSCHEDULER_HOME/conf $BACKUP_DIR/
cp -r $DOLPHINSCHEDULER_HOME/resources $BACKUP_DIR/

# 备份数据库
mysqldump -u dolphinscheduler -pdolphinscheduler123 dolphinscheduler > $BACKUP_DIR/dolphinscheduler_db.sql
gzip $BACKUP_DIR/dolphinscheduler_db.sql

# 备份日志
cp -r $DOLPHINSCHEDULER_HOME/logs $BACKUP_DIR/

echo “DolphinScheduler backup completed successfully: $BACKUP_DIR”

# 添加执行权限
# chmod +x /data/dolphinscheduler/scripts/backup_dolphinscheduler.sh

# 添加定时任务
# crontab -e
0 2 * * * /data/dolphinscheduler/scripts/backup_dolphinscheduler.sh

10.3 DolphinScheduler恢复

# 停止DolphinScheduler服务
# cd /data/dolphinscheduler
# bash bin/stop-all.sh

# 恢复数据库
# mysql -u dolphinscheduler -pdolphinscheduler123 dolphinscheduler < /backup/dolphinscheduler_db_20240405.sql # 恢复配置 # cp -r /backup/dolphinscheduler_config_20240405/* /data/dolphinscheduler/conf/ # cp -r /backup/dolphinscheduler_config_20240405/* /data/dolphinscheduler/resources/ # 启动服务 # bash bin/start-master.sh # bash bin/start-worker.sh # bash bin/start-api.sh # bash bin/start-ui.sh # 验证恢复 # curl -I http://localhost:8888 HTTP/1.1 200 OK # 验证资源 # ls -la /data/dolphinscheduler/resources/

10.4 DolphinScheduler监控脚本

# 创建DolphinScheduler监控脚本
# vi /data/dolphinscheduler/scripts/dolphinscheduler_monitor.sh

#!/bin/bash
DOLPHINSCHEDULER_HOME=”/data/dolphinscheduler”
LOG_FILE=”/data/dolphinscheduler/logs/dolphinscheduler_monitor.log”
ALERT_EMAIL=”admin@fgedu.net.cn”

check_master_status() {
echo “$(date): Checking master status…” >> $LOG_FILE
status=$(ps -ef | grep MasterServer | grep -v grep | wc -l)
if [ “$status” -eq 1 ]; then
echo “$(date): Master status: OK” >> $LOG_FILE
else
echo “$(date): Master status: FAILED” >> $LOG_FILE
echo “DolphinScheduler master failed” | mail -s “DolphinScheduler Alert” $ALERT_EMAIL
fi
}

check_worker_status() {
echo “$(date): Checking worker status…” >> $LOG_FILE
status=$(ps -ef | grep WorkerServer | grep -v grep | wc -l)
if [ “$status” -eq 1 ]; then
echo “$(date): Worker status: OK” >> $LOG_FILE
else
echo “$(date): Worker status: FAILED” >> $LOG_FILE
echo “DolphinScheduler worker failed” | mail -s “DolphinScheduler Alert” $ALERT_EMAIL
fi
}

check_api_status() {
echo “$(date): Checking API server status…” >> $LOG_FILE
status=$(curl -s -o /dev/null -w “%{http_code}” http://localhost:12345)
if [ “$status” -eq 200 ]; then
echo “$(date): API server status: OK” >> $LOG_FILE
else
echo “$(date): API server status: FAILED” >> $LOG_FILE
echo “DolphinScheduler API server failed with status: $status” | mail -s “DolphinScheduler Alert” $ALERT_EMAIL
fi
}

check_ui_status() {
echo “$(date): Checking UI server status…” >> $LOG_FILE
status=$(curl -s -o /dev/null -w “%{http_code}” http://localhost:8888)
if [ “$status” -eq 200 ]; then
echo “$(date): UI server status: OK” >> $LOG_FILE
else
echo “$(date): UI server status: FAILED” >> $LOG_FILE
echo “DolphinScheduler UI server failed with status: $status” | mail -s “DolphinScheduler Alert” $ALERT_EMAIL
fi
}

check_database_connection() {
echo “$(date): Checking database connection…” >> $LOG_FILE
mysql -u dolphinscheduler -pdolphinscheduler123 -e “SELECT 1” > /dev/null 2>&1
if [ $? -eq 0 ]; then
echo “$(date): Database connection OK” >> $LOG_FILE
else
echo “$(date): Database connection FAILED” >> $LOG_FILE
echo “DolphinScheduler database connection failed” | mail -s “DolphinScheduler Alert” $ALERT_EMAIL
fi
}

check_zookeeper_connection() {
echo “$(date): Checking zookeeper connection…” >> $LOG_FILE
/data/zookeeper/bin/zkCli.sh -server localhost:2181 ls / > /dev/null 2>&1
if [ $? -eq 0 ]; then
echo “$(date): Zookeeper connection OK” >> $LOG_FILE
else
echo “$(date): Zookeeper connection FAILED” >> $LOG_FILE
echo “DolphinScheduler zookeeper connection failed” | mail -s “DolphinScheduler Alert” $ALERT_EMAIL
fi
}

main() {
check_master_status
check_worker_status
check_api_status
check_ui_status
check_database_connection
check_zookeeper_connection
}

main

# 添加执行权限
# chmod +x /data/dolphinscheduler/scripts/dolphinscheduler_monitor.sh

# 添加定时任务
# crontab -e
*/15 * * * * /data/dolphinscheduler/scripts/dolphinscheduler_monitor.sh

生产环境建议:定期备份DolphinScheduler配置和数据库,建议每天执行一次完整备份。监控脚本建议每15分钟执行一次,及时发现并处理问题。恢复操作前务必停止DolphinScheduler服务,避免数据不一致。

通过以上步骤,DolphinScheduler安装配置、性能优化、升级迁移、备份恢复等内容已全部完成。DolphinScheduler作为工作流调度系统,能够高效地协调和管理各种作业的执行,是大数据平台任务调度的重要组件之一。

本文由风哥教程整理发布,仅用于学习测试使用,转载注明出处:http://www.fgedu.net.cn/10327.html

联系我们

在线咨询:点击这里给我发消息

微信号:itpux-com

工作日:9:30-18:30,节假日休息