1. Superset概述与环境规划
Apache Superset是一个开源的数据可视化和探索平台,提供丰富的可视化图表类型、交互式仪表板、SQL IDE等功能。Superset支持多种数据源,包括MySQL、PostgreSQL、Oracle、Hive、Presto、Impala等,是大数据可视化分析的理想选择。更多学习教程www.fgedu.net.cn
1.1 Superset版本说明
Superset目前主要版本为3.x系列,本教程以Superset 3.1为例进行详细讲解。Superset 3.x版本相比2.x版本在性能、稳定性和功能方面都有显著提升。
$ superset version
Apache Superset 3.1.0
# 查看Python版本
$ python –version
Python 3.10.13
# 查看系统信息
$ cat /etc/os-release
NAME=”Oracle Linux Server”
VERSION=”8.9″
ID=”ol”
PRETTY_NAME=”Oracle Linux Server 8.9″
1.2 环境规划
本次安装环境规划如下:
主机名:superset01.fgedu.net.cn
IP地址:192.168.1.51
端口:8088(Web服务)
Superset版本:3.1.0
Python版本:3.10.13
数据库:PostgreSQL 15(元数据存储)
Redis版本:7.2(缓存)
安装目录:/data/superset
数据目录:/data/superset/data
日志目录:/data/superset/logs
数据源连接:
MySQL:192.168.1.51:3306
PostgreSQL:192.168.1.51:5432
Impala:192.168.1.51:21050
Hive:192.168.1.51:10000
2. 硬件环境要求
Superset对硬件资源要求适中,以下是生产环境的硬件配置建议。学习交流加群风哥微信: itpux-com
2.1 最低硬件要求
# free -h
total used free shared buff/cache available
Mem: 32G 4.5G 26G 1.2G 1.2G 27G
Swap: 16G 0B 16G
# 检查磁盘空间
# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda1 50G 12G 39G 24% /
/dev/sdb1 500G 50G 450G 10% /data
# 检查CPU核心数
# nproc
16
# 检查系统架构
# uname -m
x86_64
2.2 网络配置要求
# ip addr show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN
inet 127.0.0.1/8 scope host lo
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP
inet 192.168.1.51/24 brd 192.168.1.255 scope global eth0
# 检查主机名解析
# hostname -f
superset01.fgedu.net.cn
# 配置/etc/hosts
# cat /etc/hosts
127.0.0.1 localhost
192.168.1.51 superset01.fgedu.net.cn superset01
3. 操作系统配置
Superset运行在Linux操作系统上,需要对系统进行一系列配置。学习交流加群风哥QQ113257174
3.1 关闭防火墙和SELinux
# getenforce
Disabled
# 关闭SELinux(如未关闭)
# vi /etc/selinux/config
SELINUX=disabled
# 检查防火墙状态
# systemctl status firewalld
# 关闭防火墙(生产环境建议开放特定端口)
# systemctl stop firewalld
# systemctl disable firewalld
# 或者开放Superset所需端口
# firewall-cmd –permanent –add-port=8088/tcp
# firewall-cmd –reload
3.2 系统参数优化
# vi /etc/sysctl.conf
# 添加以下参数
fs.file-max = 6815744
vm.swappiness = 10
net.core.somaxconn = 32768
net.ipv4.tcp_max_syn_backlog = 65536
# 使配置生效
# sysctl -p
# 配置文件描述符限制
# vi /etc/security/limits.conf
# 添加以下内容
* soft nofile 65536
* hard nofile 65536
* soft nproc 65536
* hard nproc 65536
# 验证配置
# ulimit -n
65536
3.3 安装依赖包
# yum install -y gcc gcc-c++ libffi-devel \
python3-devel python3-pip \
openssl-devel cyrus-sasl-devel \
openldap-devel postgresql-devel \
libxml2-devel libxslt-devel
# 安装额外工具
# yum install -y wget curl vim git
# 验证Python版本
$ python3 –version
Python 3.9.18
# 升级pip
# pip3 install –upgrade pip setuptools wheel
# 输出示例:
Requirement already satisfied: pip in /usr/lib/python3.9/site-packages (21.2.3)
Collecting pip
Downloading pip-24.0-py3-none-any.whl (2.1 MB)
Installing collected packages: pip
Attempting uninstall: pip
Found existing installation: pip 21.2.3
Uninstalling pip-21.2.3:
Successfully uninstalled pip-21.2.3
Successfully installed pip-24.0
4. Python环境配置
Superset基于Python开发,需要配置合适的Python环境。更多学习教程公众号风哥教程itpux_com
4.1 安装Python 3.10
# yum install -y python3.10 python3.10-pip python3.10-devel
# 验证Python版本
$ python3.10 –version
Python 3.10.13
# 创建虚拟环境目录
# mkdir -p /data/superset
# cd /data/superset
# 创建Python虚拟环境
# python3.10 -m venv venv
# 激活虚拟环境
# source /data/superset/venv/bin/activate
# 验证虚拟环境
(venv) $ python –version
Python 3.10.13
(venv) $ which python
/data/superset/venv/bin/python
# 升级虚拟环境中的pip
(venv) $ pip install –upgrade pip setuptools wheel
# 输出示例:
Requirement already satisfied: pip in /data/superset/venv/lib/python3.10/site-packages (24.0)
Requirement already satisfied: setuptools in /data/superset/venv/lib/python3.10/site-packages (69.2.0)
Requirement already satisfied: wheel in /data/superset/venv/lib/python3.10/site-packages (0.43.0)
4.2 安装数据库依赖
# yum install -y postgresql-server postgresql-contrib
# 初始化PostgreSQL数据库
# postgresql-setup –initdb
# 启动PostgreSQL服务
# systemctl start postgresql
# systemctl enable postgresql
# 创建Superset数据库
# su – postgres
$ psql
postgres=# CREATE DATABASE superset;
postgres=# CREATE USER superset WITH PASSWORD ‘fgedu_superset_2024’;
postgres=# GRANT ALL PRIVILEGES ON DATABASE superset TO superset;
postgres=# \q
# 验证数据库连接
$ psql -U superset -d superset -h localhost
Password for user superset:
psql (15.5)
Type “help” for help.
superset=> \l
List of databases
Name | Owner | Encoding | Collate | Ctype | Access privileges
———–+———-+———-+————-+————-+———————–
postgres | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
superset | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | =Tc/postgres +
| | | | | postgres=CTc/postgres+
| | | | | superset=CTc/postgres
template0 | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | =c/postgres +
| | | | | postgres=CTc/postgres
template1 | postgres | UTF8 | en_US.UTF-8 | en_US.UTF-8 | =c/postgres +
| | | | | postgres=CTc/postgres
(4 rows)
superset=> \q
4.3 安装Redis缓存
# yum install -y redis
# 配置Redis
# vi /etc/redis.conf
# 修改以下配置
bind 127.0.0.1
port 6379
maxmemory 4gb
maxmemory-policy allkeys-lru
# 启动Redis服务
# systemctl start redis
# systemctl enable redis
# 验证Redis连接
# redis-cli ping
PONG
# 检查Redis状态
# redis-cli info server
# Server
redis_version:7.2.3
redis_mode:standalone
os:Linux 5.4.17-2136.302.7.2.el8uek.x86_64 x86_64
arch_bits:64
tcp_port:6379
uptime_in_seconds:60
5. Superset安装部署
完成环境配置后,开始安装Superset服务。from:www.itpux.com
5.1 安装Superset
# source /data/superset/venv/bin/activate
# 安装Superset及依赖
(venv) $ pip install apache-superset==3.1.0
# 输出示例:
Collecting apache-superset==3.1.0
Downloading apache_superset-3.1.0-py3-none-any.whl (45.2 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 45.2/45.2 MB 25.6 MB/s eta 0:00:00
Collecting flask>=2.2.0
Downloading flask-3.0.2-py3-none-any.whl (101 kB)
…
Successfully installed apache-superset-3.1.0 flask-3.0.2 …
# 安装数据库驱动
(venv) $ pip install psycopg2-binary
(venv) $ pip install redis
(venv) $ pip install mysqlclient
(venv) $ pip install pyhive
(venv) $ pip install impala
(venv) $ pip install sqlalchemy-clickhouse
# 验证安装
(venv) $ superset version
Apache Superset 3.1.0
5.2 配置Superset
# mkdir -p /data/superset/{data,logs,conf}
# 创建配置文件
# vi /data/superset/conf/superset_config.py
# 添加以下配置
import os
# 基础配置
SECRET_KEY = ‘fgedu_superset_secret_key_2024’
APP_NAME = ‘Superset’
APP_ICON = ‘/static/assets/images/superset-logo-horiz.png’
# 数据库配置
SQLALCHEMY_DATABASE_URI = ‘postgresql://superset:fgedu_superset_2024@localhost:5432/superset’
# Redis缓存配置
CACHE_CONFIG = {
‘CACHE_TYPE’: ‘RedisCache’,
‘CACHE_DEFAULT_TIMEOUT’: 300,
‘CACHE_KEY_PREFIX’: ‘superset_’,
‘CACHE_REDIS_URL’: ‘redis://localhost:6379/0’,
}
DATA_CACHE_CONFIG = {
‘CACHE_TYPE’: ‘RedisCache’,
‘CACHE_DEFAULT_TIMEOUT’: 86400,
‘CACHE_KEY_PREFIX’: ‘superset_data_’,
‘CACHE_REDIS_URL’: ‘redis://localhost:6379/1’,
}
FILTER_STATE_CACHE_CONFIG = {
‘CACHE_TYPE’: ‘RedisCache’,
‘CACHE_DEFAULT_TIMEOUT’: 86400,
‘CACHE_KEY_PREFIX’: ‘superset_filter_’,
‘CACHE_REDIS_URL’: ‘redis://localhost:6379/2’,
}
EXPLORE_FORM_DATA_CACHE_CONFIG = {
‘CACHE_TYPE’: ‘RedisCache’,
‘CACHE_DEFAULT_TIMEOUT’: 86400,
‘CACHE_KEY_PREFIX’: ‘superset_explore_’,
‘CACHE_REDIS_URL’: ‘redis://localhost:6379/3’,
}
# Celery配置
class CeleryConfig:
broker_url = ‘redis://localhost:6379/4’
result_backend = ‘redis://localhost:6379/5’
imports = (‘superset.sql_lab’, ‘superset.tasks.scheduler’)
task_annotations = {
‘sql_lab.get_sql_results’: {
‘rate_limit’: ‘100/s’,
},
}
beat_schedule = {
‘reports.scheduler’: {
‘task’: ‘reports.scheduler’,
‘schedule’: crontab(minute=’*’, hour=’*’),
},
‘reports.prune_log’: {
‘task’: ‘reports.prune_log’,
‘schedule’: crontab(minute=0, hour=0),
},
}
CELERY_CONFIG = CeleryConfig
# SQL Lab配置
SQLLAB_CTAS_NO_LIMIT = True
SQLLAB_QUERY_COST_ESTIMATES_ENABLED = True
SQLLAB_TIMEOUT = 3600
SQLLAB_ASYNC_TIME_LIMIT_SEC = 3600
# 安全配置
WTF_CSRF_ENABLED = True
WTF_CSRF_EXEMPT_LIST = [‘superset.views.core.log’]
WTF_CSRF_TIME_LIMIT = 60 * 60 * 24 * 365
SESSION_COOKIE_SECURE = False
SESSION_COOKIE_HTTPONLY = True
SESSION_COOKIE_SAMESITE = ‘Lax’
# 日志配置
ENABLE_TIME_ROTATE = True
TIME_ROTATE_LOG_LEVEL = ‘INFO’
FILENAME = ‘/data/superset/logs/superset.log’
ROLLOVER = ‘midnight’
INTERVAL = 1
BACKUP_COUNT = 30
# 其他配置
ROW_LIMIT = 100000
SUPERSET_WORKERS = 4
SUPERSET_CELERY_WORKERS = 4
ENABLE_PROXY_FIX = True
PREFERRED_URL_SCHEME = ‘http’
# 设置环境变量
# vi /etc/profile.d/superset.sh
export SUPERSET_HOME=/data/superset
export SUPERSET_CONFIG_PATH=/data/superset/conf/superset_config.py
# 使环境变量生效
# source /etc/profile.d/superset.sh
5.3 初始化Superset数据库
# source /data/superset/venv/bin/activate
# 设置环境变量
(venv) $ export FLASK_APP=superset
# 初始化数据库
(venv) $ superset db upgrade
# 输出示例:
INFO [alembic.runtime.migration] Context impl PostgresqlImpl.
INFO [alembic.runtime.migration] Will assume transactional DDL.
INFO [alembic.runtime.migration] Running upgrade -> 4e6a06bad7a8, Init
INFO [alembic.runtime.migration] Running upgrade 4e6a06bad7a8 -> 289ce07647b, Add encrypted password to databases
…
INFO [alembic.runtime.migration] Running upgrade 1a1d627ebd8e -> 3b626e2a6783, Sync db migration
INFO [alembic.runtime.migration] Running upgrade 3b626e2a6783 -> cca2f40d6f5d, add_extra_column_to_query
INFO [alembic.runtime.migration] Running upgrade cca2f40d6f5d -> 070c043f2a9b, add_grant_permision_to_dbs
# 创建管理员用户
(venv) $ superset fab create-admin
# 输出示例:
Username [admin]: admin
User first name [admin]: Admin
User last name [user]: User
Email [admin@superset.org]: admin@fgedu.net.cn
Password:
Repeat for confirmation:
Recognized Database Authentications.
Admin User admin created.
# 加载示例数据(可选)
(venv) $ superset load_examples
# 输出示例:
Loading examples into Superset…
Creating dashboards…
Creating charts…
Loading examples completed!
# 初始化缓存
(venv) $ superset init
# 输出示例:
Logging was configured successfully.
2024-04-05 10:00:00,000:INFO:superset.utils.logging_configurator:Logging was configured successfully.
2024-04-05 10:00:00,100:INFO:superset.security.manager:Syncing role definition
2024-04-05 10:00:00,200:INFO:superset.security.manager:Syncing Admin perms
2024-04-05 10:00:00,300:INFO:superset.security.manager:Syncing Alpha perms
2024-04-05 10:00:00,400:INFO:superset.security.manager:Syncing Gamma perms
2024-04-05 10:00:00,500:INFO:superset.security.manager:Syncing granter perms
2024-04-05 10:00:00,600:INFO:superset.security.manager:Syncing sql_lab perms
6. Superset服务启动
配置完成后,启动Superset服务。
6.1 启动Web服务
# source /data/superset/venv/bin/activate
# 启动Superset Web服务(开发模式)
(venv) $ superset run -p 8088 –with-threads –reload
# 输出示例:
* Serving Flask app ‘superset’
* Debug mode: off
* Running on http://0.0.0.0:8088
* Running on http://192.168.1.51:8088
# 生产环境使用Gunicorn启动
(venv) $ pip install gunicorn
(venv) $ gunicorn \
-w 4 \
-k gevent \
–timeout 120 \
-b 0.0.0.0:8088 \
–limit-request-line 0 \
–limit-request-fields 0 \
–limit-request-field_size 0 \
“superset.app:create_app()”
# 输出示例:
[2024-04-05 10:00:00 +0800] [12345] [INFO] Starting gunicorn 21.2.0
[2024-04-05 10:00:00 +0800] [12345] [INFO] Listening at: http://0.0.0.0:8088 (12345)
[2024-04-05 10:00:00 +0800] [12345] [INFO] Using worker: gevent
[2024-04-05 10:00:00 +0800] [12567] [INFO] Booting worker with pid: 12567
[2024-04-05 10:00:00 +0800] [12568] [INFO] Booting worker with pid: 12568
[2024-04-05 10:00:00 +0800] [12569] [INFO] Booting worker with pid: 12569
[2024-04-05 10:00:00 +0800] [12570] [INFO] Booting worker with pid: 12570
6.2 启动Celery Worker
(venv) $ celery –app=superset.tasks.celery_app:app worker –loglevel=info \
> /data/superset/logs/celery_worker.log 2>&1 &
# 输出示例:
[2024-04-05 10:00:00,000: INFO/MainProcess] Connected to redis://localhost:6379/4
[2024-04-05 10:00:00,100: INFO/MainProcess] mingle: searching for neighbors
[2024-04-05 10:00:01,000: INFO/MainProcess] mingle: all alone
[2024-04-05 10:00:01,100: INFO/MainProcess] celery@superset01.fgedu.net.cn ready.
# 启动Celery Beat(定时任务调度)
(venv) $ celery –app=superset.tasks.celery_app:app beat –loglevel=info \
> /data/superset/logs/celery_beat.log 2>&1 &
# 输出示例:
[2024-04-05 10:00:00,000: INFO/MainProcess] beat: Starting…
[2024-04-05 10:00:00,100: INFO/MainProcess] Scheduler: Sending due task reports.scheduler (reports.scheduler)
[2024-04-05 10:00:00,200: INFO/MainProcess] Scheduler: Sending due task reports.prune_log (reports.prune_log)
# 检查服务状态
$ ps -ef | grep -E ‘gunicorn|celery’ | grep -v grep
root 12345 1 2 10:00 ? 00:00:05 gunicorn: master [superset]
root 12567 12345 5 10:00 ? 00:00:10 gunicorn: worker [superset]
root 12568 12345 5 10:00 ? 00:00:10 gunicorn: worker [superset]
root 12569 12345 5 10:00 ? 00:00:10 gunicorn: worker [superset]
root 12570 12345 5 10:00 ? 00:00:10 gunicorn: worker [superset]
root 12789 1 3 10:01 ? 00:00:15 celery worker
root 12890 1 1 10:02 ? 00:00:05 celery beat
6.3 配置系统服务
# vi /etc/systemd/system/superset.service
[Unit]
Description=Apache Superset Web Server
After=network.target postgresql.service redis.service
[Service]
Type=simple
User=root
Environment=”SUPERSET_HOME=/data/superset”
Environment=”SUPERSET_CONFIG_PATH=/data/superset/conf/superset_config.py”
ExecStart=/data/superset/venv/bin/gunicorn \
-w 4 \
-k gevent \
–timeout 120 \
-b 0.0.0.0:8088 \
–limit-request-line 0 \
–limit-request-fields 0 \
–limit-request-field_size 0 \
“superset.app:create_app()”
ExecReload=/bin/kill -s HUP $MAINPID
ExecStop=/bin/kill -s QUIT $MAINPID
Restart=on-failure
RestartSec=5s
[Install]
WantedBy=multi-user.target
# 创建Celery Worker服务文件
# vi /etc/systemd/system/superset-celery-worker.service
[Unit]
Description=Apache Superset Celery Worker
After=network.target redis.service
[Service]
Type=simple
User=root
Environment=”SUPERSET_HOME=/data/superset”
Environment=”SUPERSET_CONFIG_PATH=/data/superset/conf/superset_config.py”
ExecStart=/data/superset/venv/bin/celery –app=superset.tasks.celery_app:app worker –loglevel=info
Restart=on-failure
RestartSec=5s
[Install]
WantedBy=multi-user.target
# 重载systemd配置
# systemctl daemon-reload
# 启动服务
# systemctl start superset
# systemctl start superset-celery-worker
# 设置开机自启
# systemctl enable superset
# systemctl enable superset-celery-worker
# 检查服务状态
# systemctl status superset
# 输出示例:
● superset.service – Apache Superset Web Server
Loaded: loaded (/etc/systemd/system/superset.service; enabled; vendor preset: disabled)
Active: active (running) since Fri 2024-04-05 10:00:00 CST; 1min ago
Main PID: 12345 (gunicorn)
Tasks: 5 (limit: 49143)
Memory: 512.0M
CGroup: /system.slice/superset.service
├─12345 gunicorn: master [superset]
├─12567 gunicorn: worker [superset]
├─12568 gunicorn: worker [superset]
├─12569 gunicorn: worker [superset]
└─12570 gunicorn: worker [superset]
7. Superset功能测试
完成安装后,需要进行功能测试验证Superset是否正常工作。
7.1 访问Web界面
$ netstat -tlnp | grep 8088
tcp 0 0 0.0.0.0:8088 0.0.0.0:* LISTEN 12345/gunicorn
# 访问Web界面
# 浏览器访问:http://192.168.1.51:8088
# 使用管理员账号登录
# 用户名:admin
# 密码:之前设置的密码
# 检查日志
$ tail -f /data/superset/logs/superset.log
2024-04-05 10:00:00,000:INFO:superset.views.core:User admin logged in
2024-04-05 10:00:05,000:INFO:superset.views.core:User admin accessed dashboard list
7.2 创建测试数据源
# 步骤:
# 1. 点击 “Data” -> “Databases”
# 2. 点击 “+ DATABASE” 按钮
# 3. 选择数据库类型(如PostgreSQL)
# 4. 填写连接信息:
# – Host: localhost
# – Port: 5432
# – Database: superset
# – Username: superset
# – Password: fgedu_superset_2024
# 5. 点击 “Test Connection” 测试连接
# 6. 点击 “Connect” 保存连接
# 通过命令行创建数据源
(venv) $ superset set-database-uri \
-d fgedudb_postgres \
-u “postgresql://superset:fgedu_superset_2024@localhost:5432/superset”
# 输出示例:
Database fgedudb_postgres created successfully
# 创建数据集
# 步骤:
# 1. 点击 “Data” -> “Datasets”
# 2. 点击 “+ DATASET” 按钮
# 3. 选择数据库和数据表
# 4. 点击 “Add” 保存
7.3 创建测试图表
# 步骤:
# 1. 点击 “Charts” -> “+ CHART”
# 2. 选择数据集
# 3. 选择可视化类型(如折线图、柱状图、饼图等)
# 4. 配置指标和维度
# 5. 点击 “Create Chart” 生成图表
# 6. 保存图表到仪表板
# 创建测试仪表板
# 步骤:
# 1. 点击 “Dashboards” -> “+ DASHBOARD”
# 2. 拖拽已创建的图表到仪表板
# 3. 调整布局和样式
# 4. 保存仪表板
# 查看创建的资源
(venv) $ superset list-dashboards
# 输出示例:
Dashboard ID | Dashboard Title | URL
————-|—————–|—–
1 | Sales Overview | /superset/dashboard/1/
2 | User Analytics | /superset/dashboard/2/
8. 数据源配置
Superset支持多种数据源,需要进行相应的配置。
8.1 配置MySQL数据源
(venv) $ pip install mysqlclient
# 通过Web界面添加MySQL数据源
# 连接字符串格式:
mysql://username:password@host:port/database
# 示例:
mysql://fgedu:fgedu_mysql_2024@192.168.1.51:3306/fgedudb
# 测试连接
# 在Web界面点击 “Test Connection” 按钮
# 输出示例:
Connection successful!
# 通过命令行添加
(venv) $ superset set-database-uri \
-d fgedudb_mysql \
-u “mysql://fgedu:fgedu_mysql_2024@192.168.1.51:3306/fgedudb”
# 输出示例:
Database fgedudb_mysql created successfully
8.2 配置Impala数据源
(venv) $ pip install impyla
(venv) $ pip install thrift
(venv) $ pip install sasl
(venv) $ pip install thrift-sasl
# 通过Web界面添加Impala数据源
# 连接字符串格式:
impala://host:port/database
# 示例:
impala://192.168.1.51:21050/fgedudb
# 测试连接
# 在Web界面点击 “Test Connection” 按钮
# 输出示例:
Connection successful!
# 通过命令行添加
(venv) $ superset set-database-uri \
-d fgedudb_impala \
-u “impala://192.168.1.51:21050/fgedudb”
# 输出示例:
Database fgedudb_impala created successfully
8.3 配置Hive数据源
(venv) $ pip install pyhive
(venv) $ pip install thrift
(venv) $ pip install sasl
(venv) $ pip install thrift-sasl
# 通过Web界面添加Hive数据源
# 连接字符串格式:
hive://host:port/database
# 示例:
hive://192.168.1.51:10000/fgedudb
# 测试连接
# 在Web界面点击 “Test Connection” 按钮
# 输出示例:
Connection successful!
# 通过命令行添加
(venv) $ superset set-database-uri \
-d fgedudb_hive \
-u “hive://192.168.1.51:10000/fgedudb”
# 输出示例:
Database fgedudb_hive created successfully
9. Superset性能优化
Superset性能优化涉及多个方面,包括缓存配置、查询优化、资源调整等。
9.1 缓存优化配置
# vi /data/superset/conf/superset_config.py
# 增加缓存超时时间
CACHE_CONFIG = {
‘CACHE_TYPE’: ‘RedisCache’,
‘CACHE_DEFAULT_TIMEOUT’: 3600, # 1小时
‘CACHE_KEY_PREFIX’: ‘superset_’,
‘CACHE_REDIS_URL’: ‘redis://localhost:6379/0’,
‘CACHE_OPTIONS’: {
‘CACHE_CLIENT_CLASS’: ‘redis.client.StrictRedis’,
}
}
# 数据缓存配置
DATA_CACHE_CONFIG = {
‘CACHE_TYPE’: ‘RedisCache’,
‘CACHE_DEFAULT_TIMEOUT’: 86400, # 1天
‘CACHE_KEY_PREFIX’: ‘superset_data_’,
‘CACHE_REDIS_URL’: ‘redis://localhost:6379/1’,
}
# 检查Redis缓存状态
# redis-cli
127.0.0.1:6379> INFO memory
# Memory
used_memory:123456789
used_memory_human:117.74M
used_memory_rss:134217728
used_memory_rss_human:128.00M
used_memory_peak:234567890
used_memory_peak_human:223.74M
# 清理缓存
127.0.0.1:6379> FLUSHDB
OK
9.2 查询优化配置
# vi /data/superset/conf/superset_config.py
# SQL Lab配置
SQLLAB_CTAS_NO_LIMIT = True
SQLLAB_QUERY_COST_ESTIMATES_ENABLED = True
SQLLAB_TIMEOUT = 3600
SQLLAB_ASYNC_TIME_LIMIT_SEC = 3600
SQLLAB_MAX_RESULTS = 100000
# 查询结果限制
ROW_LIMIT = 100000
SQL_MAX_ROW = 100000
DISPLAY_MAX_ROW = 10000
# 启用查询缓存
ENABLE_QUERY_CACHE = True
QUERY_CACHE_CONFIG = {
‘CACHE_TYPE’: ‘RedisCache’,
‘CACHE_DEFAULT_TIMEOUT’: 86400,
‘CACHE_KEY_PREFIX’: ‘superset_query_’,
‘CACHE_REDIS_URL’: ‘redis://localhost:6379/6’,
}
# 配置数据库连接池
SQLALCHEMY_ENGINE_OPTIONS = {
‘pool_size’: 10,
‘pool_recycle’: 3600,
‘pool_pre_ping’: True,
‘max_overflow’: 20,
}
# 重启服务使配置生效
# systemctl restart superset
9.3 资源调整
# 根据CPU核心数调整,建议 (2 * CPU核心数) + 1
# 查看CPU核心数
$ nproc
16
# 修改服务配置
# vi /etc/systemd/system/superset.service
ExecStart=/data/superset/venv/bin/gunicorn \
-w 8 \
-k gevent \
–timeout 120 \
-b 0.0.0.0:8088 \
–limit-request-line 0 \
–limit-request-fields 0 \
–limit-request-field_size 0 \
–worker-connections 1000 \
–max-requests 1000 \
–max-requests-jitter 100 \
“superset.app:create_app()”
# 重载配置
# systemctl daemon-reload
# systemctl restart superset
# 调整Celery worker数量
# vi /etc/systemd/system/superset-celery-worker.service
ExecStart=/data/superset/venv/bin/celery –app=superset.tasks.celery_app:app worker \
–loglevel=info \
–concurrency=4 \
–max-tasks-per-child=1000
# 重载配置
# systemctl daemon-reload
# systemctl restart superset-celery-worker
10. Superset升级迁移
Superset升级需要谨慎操作,确保数据安全和业务连续性。
10.1 升级前准备
(venv) $ superset version
Apache Superset 2.1.0
# 备份配置文件
# cp -r /data/superset/conf /backup/superset_conf_$(date +%Y%m%d)
# 备份数据库
# pg_dump -U superset superset > /backup/superset_db_$(date +%Y%m%d).sql
# 输出示例:
Password:
pg_dump: reading extensions
pg_dump: identifying extension members
pg_dump: reading schemas
pg_dump: reading user-defined tables
pg_dump: reading user-defined functions
pg_dump: reading user-defined types
…
# 备份仪表板和图表配置
(venv) $ superset export-dashboards -o /backup/dashboards_$(date +%Y%m%d).json
# 输出示例:
Exporting dashboards…
Exported 5 dashboards to /backup/dashboards_20260405.json
# 停止服务
# systemctl stop superset
# systemctl stop superset-celery-worker
10.2 执行升级操作
# source /data/superset/venv/bin/activate
# 升级Superset
(venv) $ pip install –upgrade apache-superset==3.1.0
# 输出示例:
Collecting apache-superset==3.1.0
Downloading apache_superset-3.1.0-py3-none-any.whl (45.2 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 45.2/45.2 MB 25.6 MB/s eta 0:00:00
Requirement already satisfied: flask>=2.2.0 in ./venv/lib/python3.10/site-packages (from apache-superset==3.1.0) (3.0.2)
…
Successfully installed apache-superset-3.1.0
# 升级依赖包
(venv) $ pip install –upgrade -r /data/superset/venv/lib/python3.10/site-packages/superset/requirements.txt
# 升级数据库
(venv) $ superset db upgrade
# 输出示例:
INFO [alembic.runtime.migration] Context impl PostgresqlImpl.
INFO [alembic.runtime.migration] Will assume transactional DDL.
INFO [alembic.runtime.migration] Running upgrade 1a1d627ebd8e -> 3b626e2a6783, Sync db migration
INFO [alembic.runtime.migration] Running upgrade 3b626e2a6783 -> cca2f40d6f5d, add_extra_column_to_query
# 初始化
(venv) $ superset init
# 输出示例:
Logging was configured successfully.
2024-04-05 10:00:00,000:INFO:superset.utils.logging_configurator:Logging was configured successfully.
2024-04-05 10:00:00,100:INFO:superset.security.manager:Syncing role definition
10.3 升级后验证
# systemctl start superset
# systemctl start superset-celery-worker
# 验证版本
(venv) $ superset version
Apache Superset 3.1.0
# 检查服务状态
# systemctl status superset
# 输出示例:
● superset.service – Apache Superset Web Server
Loaded: loaded (/etc/systemd/system/superset.service; enabled; vendor preset: disabled)
Active: active (running) since Fri 2024-04-05 10:00:00 CST; 1min ago
# 访问Web界面验证
# 浏览器访问:http://192.168.1.51:8088
# 验证仪表板
(venv) $ superset list-dashboards
# 输出示例:
Dashboard ID | Dashboard Title | URL
————-|—————–|—–
1 | Sales Overview | /superset/dashboard/1/
2 | User Analytics | /superset/dashboard/2/
# 验证数据源连接
# 在Web界面测试各数据源连接
# 检查日志
$ tail -f /data/superset/logs/superset.log
2024-04-05 10:00:00,000:INFO:superset.views.core:User admin logged in
2024-04-05 10:00:05,000:INFO:superset.views.core:User admin accessed dashboard list
11. Superset监控运维
Superset的监控运维包括服务状态监控、日志管理、性能监控等。
11.1 服务状态监控
# systemctl status superset
# 输出示例:
● superset.service – Apache Superset Web Server
Loaded: loaded (/etc/systemd/system/superset.service; enabled; vendor preset: disabled)
Active: active (running) since Fri 2024-04-05 10:00:00 CST; 1h ago
Main PID: 12345 (gunicorn)
Tasks: 5 (limit: 49143)
Memory: 512.0M
CGroup: /system.slice/superset.service
├─12345 gunicorn: master [superset]
├─12567 gunicorn: worker [superset]
├─12568 gunicorn: worker [superset]
├─12569 gunicorn: worker [superset]
└─12570 gunicorn: worker [superset]
# 检查端口监听
$ netstat -tlnp | grep 8088
tcp 0 0 0.0.0.0:8088 0.0.0.0:* LISTEN 12345/gunicorn
# 检查Celery Worker状态
# systemctl status superset-celery-worker
# 输出示例:
● superset-celery-worker.service – Apache Superset Celery Worker
Loaded: loaded (/etc/systemd/system/superset-celery-worker.service; enabled; vendor preset: disabled)
Active: active (running) since Fri 2024-04-05 10:00:00 CST; 1h ago
Main PID: 12789 (celery)
Tasks: 4 (limit: 49143)
Memory: 256.0M
# 检查Redis连接
# redis-cli ping
PONG
# 检查PostgreSQL连接
# su – postgres
$ psql -c “SELECT count(*) FROM superset.dashboards;”
# 输出示例:
count
——-
5
(1 row)
11.2 日志管理
$ tail -100 /data/superset/logs/superset.log
2024-04-05 10:00:00,000:INFO:superset.views.core:User admin logged in
2024-04-05 10:00:05,000:INFO:superset.views.core:User admin accessed dashboard list
2024-04-05 10:00:10,000:INFO:superset.views.core:User admin created chart
# 查看Celery Worker日志
$ tail -100 /data/superset/logs/celery_worker.log
[2024-04-05 10:00:00,000: INFO/MainProcess] Task reports.scheduler received
[2024-04-05 10:00:05,000: INFO/MainProcess] Task reports.scheduler succeeded in 5.0s
# 查看Gunicorn访问日志
$ tail -100 /data/superset/logs/gunicorn_access.log
192.168.1.100 – – [05/Apr/2024:10:00:00 +0800] “GET /superset/dashboard/1/ HTTP/1.1” 200 12345
# 配置日志轮转
# vi /etc/logrotate.d/superset
/data/superset/logs/*.log {
daily
rotate 30
compress
delaycompress
missingok
notifempty
create 0644 root root
sharedscripts
postrotate
systemctl reload superset > /dev/null 2>&1 || true
endscript
}
# 测试日志轮转配置
# logrotate -d /etc/logrotate.d/superset
11.3 性能监控
$ top -p $(pgrep -d’,’ -f gunicorn)
# 输出示例:
top – 10:00:00 up 1 day, 2:00, 2 users, load average: 0.50, 0.45, 0.40
Tasks: 5 total, 0 running, 5 sleeping, 0 stopped, 0 zombie
%Cpu(s): 5.0 us, 2.0 sy, 0.0 ni, 92.0 id, 1.0 wa, 0.0 hi, 0.0 si
MiB Mem : 32768.0 total, 28000.0 free, 4000.0 used, 768.0 buff/cache
MiB Swap: 16384.0 total, 16384.0 free, 0.0 used. 28000.0 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
12345 root 20 0 512000 51200 10240 S 5.0 0.2 0:10.00 gunicorn
12567 root 20 0 512000 51200 10240 S 2.0 0.2 0:05.00 gunicorn
12568 root 20 0 512000 51200 10240 S 2.0 0.2 0:05.00 gunicorn
# 监控Redis内存使用
# redis-cli info memory | grep used_memory_human
used_memory_human:117.74M
# 监控PostgreSQL连接数
# su – postgres
$ psql -c “SELECT count(*) FROM pg_stat_activity;”
# 输出示例:
count
——-
15
(1 row)
# 监控查询性能
# 在Superset Web界面查看SQL Lab查询历史
# 创建监控脚本
# vi /backup/scripts/superset_monitor.sh
#!/bin/bash
echo “=== Superset Monitor ===”
echo “Date: $(date)”
echo “”
echo “=== Service Status ===”
systemctl is-active superset
systemctl is-active superset-celery-worker
echo “”
echo “=== Memory Usage ===”
free -h
echo “”
echo “=== Redis Status ===”
redis-cli ping
redis-cli info memory | grep used_memory_human
echo “”
echo “=== PostgreSQL Connections ===”
su – postgres -c “psql -c ‘SELECT count(*) FROM pg_stat_activity;'”
echo “”
echo “=== Active Users ===”
tail -100 /data/superset/logs/superset.log | grep “logged in” | tail -5
本文由风哥教程整理发布,仅用于学习测试使用,转载注明出处:http://www.fgedu.net.cn/10327.html
