1. Trino简介与版本说明
Trino(原名PrestoSQL)是一个开源的分布式SQL查询引擎,专为大数据分析而设计。更多学习教程www.fgedu.net.cn。Trino能够高效查询各种数据源,包括Hive、MySQL、PostgreSQL、Kafka等,支持PB级数据的交互式分析。
Trino采用MPP(大规模并行处理)架构,支持标准ANSI SQL,提供丰富的连接器生态系统。学习交流加群风哥微信: itpux-com。它广泛应用于数据湖分析、实时数据仓库、联邦查询等场景,是现代数据架构的核心组件。
Trino核心特性:
– 多数据源:支持Hive、MySQL、PostgreSQL、Kafka等数十种数据源
– 标准SQL:支持ANSI SQL标准
– 高性能:内存计算,亚秒级响应
– 可扩展:支持数千节点集群
– 联邦查询:跨数据源联合查询
– 安全性:支持Kerberos、LDAP、TLS等认证
– 资源管理:支持资源组和工作负载管理
– 容错性:自动故障恢复
– 插件架构:丰富的连接器和函数扩展
Trino架构组件:
Coordinator 协调节点,负责解析SQL、生成执行计划、调度任务
Worker 工作节点,负责执行任务和处理数据
Connector 连接器,用于连接不同数据源
Catalog 目录,数据源的逻辑分组
Schema 模式,数据库的命名空间
Table 表,数据的具体存储
2. Trino版本选择与下载地址
Trino采用快速迭代发布模式,每两周发布一个新版本。
Trino版本状态:
480 2026-XX-XX 最新稳定版
479 2026-XX-XX 稳定版
478 2025-10-29 稳定版
477 2025-10-15 稳定版
Trino 480主要更新:
– 新增连接器和函数
– 性能优化
– 安全性增强
– Bug修复
官方下载地址:
下载页面:https://trino.io/download.html
源码仓库:https://github.com/trinodb/trino
文档中心:https://trino.io/docs/current/
Docker镜像:https://hub.docker.com/r/trinodb/trino
3. Trino下载方式详解
方式一:下载二进制包(推荐)
$ cd /fgeudb/software
$ wget https://repo1.maven.org/maven2/io/trino/trino-server/480/trino-server-480.tar.gz
输出示例如下:
–2026-04-04 10:00:00– https://repo1.maven.org/maven2/io/trino/trino-server/480/trino-server-480.tar.gz
Resolving repo1.maven.org… 199.232.196.215
Connecting to repo1.maven.org|199.232.196.215|:443… connected.
HTTP request sent, awaiting response… 200 OK
Length: 456789012 (435M) [application/octet-stream]
Saving to: ‘trino-server-480.tar.gz’
trino-server-480.tar.gz 100%[======================================================================>] 435.23M 25.6MB/s in 17s
2026-04-04 10:00:18 (25.6 MB/s) – ‘trino-server-480.tar.gz’ saved [456789012/456789012]
下载Trino CLI:
$ wget https://repo1.maven.org/maven2/io/trino/trino-cli/480/trino-cli-480-executable.jar -O trino
输出示例如下:
–2026-04-04 10:00:20– https://repo1.maven.org/maven2/io/trino/trino-cli/480/trino-cli-480-executable.jar
Resolving repo1.maven.org… 199.232.196.215
Connecting to repo1.maven.org|199.232.196.215|:443… connected.
HTTP request sent, awaiting response… 200 OK
Length: 12345678 (12M) [application/octet-stream]
Saving to: ‘trino’
trino 100%[======================================================================>] 11.77M 8.5MB/s in 1.4s
2026-04-04 10:00:22 (8.5 MB/s) – ‘trino’ saved [12345678]
设置执行权限:
$ chmod +x trino
解压Server:
$ tar -zxvf trino-server-480.tar.gz -C /fgeudb/
方式二:Docker容器部署
$ docker pull trinodb/trino:480
输出示例如下:
480: Pulling from trinodb/trino
Digest: sha256:abc123def456…
Status: Downloaded newer image for trinodb/trino:480
启动Trino容器:
$ docker run –name trino-coordinator \
-p 8080:8080 \
-v /fgeudb/trino/etc:/etc/trino:ro \
-v /fgeudb/trino/data:/data/trino \
-d trinodb/trino:480
输出示例如下:
abc123def456789…
查看容器状态:
$ docker ps | grep trino
输出示例如下:
abc123def456 trinodb/trino:480 “/usr/lib/trino/bin/…” 10 seconds ago Up 9 seconds 0.0.0.0:8080->8080/tcp trino-coordinator
方式三:Kubernetes部署
$ helm repo add trino https://trinodb.github.io/charts
$ helm repo update
安装Trino:
$ helm install trino trino/trino –namespace trino –create-namespace
输出示例如下:
NAME: trino
LAST DEPLOYED: Fri Apr 4 10:05:00 2026
NAMESPACE: trino
STATUS: deployed
REVISION: 1
TEST SUITE: None
查看部署状态:
$ kubectl get pods -n trino
输出示例如下:
NAME READY STATUS RESTARTS AGE
trino-coordinator-0 1/1 Running 0 2m
trino-worker-0 1/1 Running 0 2m
trino-worker-1 1/1 Running 0 2m
4. Trino安装部署实战
步骤1:安装Java环境
# yum install -y java-24-openjdk java-24-openjdk-devel
或Ubuntu/Debian:
# apt install -y openjdk-24-jdk
验证Java版本:
$ java -version
输出示例如下:
openjdk version “24.0.1” 2026-03-18
OpenJDK Runtime Environment (build 24.0.1+0-20260318)
OpenJDK 64-Bit Server VM (build 24.0.1+0-20260318, mixed mode, sharing)
设置JAVA_HOME:
# vi /etc/profile.d/java.sh
export JAVA_HOME=/usr/lib/jvm/java-24-openjdk
export PATH=$JAVA_HOME/bin:$PATH
使配置生效:
$ source /etc/profile.d/java.sh
步骤2:创建目录结构
# mkdir -p /fgeudb/trino/{etc,data,logs}
# mkdir -p /fgeudb/trino/etc/{catalog}
创建trino用户:
# groupadd trino
# useradd -g trino -s /sbin/nologin -M trino
设置权限:
# chown -R trino:trino /fgeudb/trino
# chmod -R 755 /fgeudb/trino
步骤3:配置Coordinator节点
# vi /fgeudb/trino/etc/node.properties
node.environment = production
node.id = coordinator-1
node.data-dir = /fgeudb/trino/data
创建jvm.config:
# vi /fgeudb/trino/etc/jvm.config
-server
-Xmx16G
-XX:InitialRAMPercentage=80
-XX:MaxRAMPercentage=80
-XX:G1HeapRegionSize=32M
-XX:+ExplicitGCInvokesConcurrent
-XX:+ExitOnOutOfMemoryError
-XX:+HeapDumpOnOutOfMemoryError
-XX:-OmitStackTraceInFastThrow
-XX:ReservedCodeCacheSize=512M
-XX:PerMethodRecompilationCutoff=10000
-XX:PerBytecodeRecompilationCutoff=10000
-Djdk.attach.allowAttachSelf=true
-Djdk.nio.maxCachedBufferSize=2000000
创建config.properties(Coordinator):
# vi /fgeudb/trino/etc/config.properties
coordinator = true
node-scheduler.include-coordinator = false
http-server.http.port = 8080
discovery.uri = http://192.168.1.51:8080
query.max-memory = 64GB
query.max-memory-per-node = 8GB
query.max-total-memory-per-node = 16GB
创建log.properties:
# vi /fgeudb/trino/etc/log.properties
io.trino = INFO
步骤4:配置Worker节点
# vi /fgeudb/trino/etc/node.properties
node.environment = production
node.id = worker-1
node.data-dir = /fgeudb/trino/data
创建config.properties(Worker):
# vi /fgeudb/trino/etc/config.properties
coordinator = false
http-server.http.port = 8080
discovery.uri = http://192.168.1.51:8080
query.max-memory-per-node = 8GB
query.max-total-memory-per-node = 16GB
步骤5:配置systemd服务
# vi /etc/systemd/system/trino.service
[Unit]
Description=Trino Server
After=network.target
[Service]
Type=forking
User=trino
Group=trino
Environment=”JAVA_HOME=/usr/lib/jvm/java-24-openjdk”
ExecStart=/fgeudb/trino-server-480/bin/launcher start
ExecStop=/fgeudb/trino-server-480/bin/launcher stop
PIDFile=/fgeudb/trino/data/var/run/trino.pid
LimitNOFILE=65535
LimitNPROC=65535
[Install]
WantedBy=multi-user.target
重载服务配置:
# systemctl daemon-reload
启动服务:
# systemctl start trino
设置开机自启:
# systemctl enable trino
查看服务状态:
# systemctl status trino
输出示例如下:
● trino.service – Trino Server
Loaded: loaded (/etc/systemd/system/trino.service; enabled)
Active: active (running) since Fri 2026-04-04 10:10:00 CST; 10s ago
Main PID: 12345 (java)
CGroup: /system.slice/trino.service
└─12345 /usr/lib/jvm/java-24-openjdk/bin/java …
5. Trino配置文件详解
config.properties详解
coordinator = true 是否为Coordinator节点
node-scheduler.include-coordinator = false Coordinator是否参与计算
http-server.http.port = 8080 HTTP服务端口
discovery.uri = http://192.168.1.51:8080 服务发现地址
query.max-memory = 64GB 查询最大内存
query.max-memory-per-node = 8GB 每节点最大内存
query.max-total-memory-per-node = 16GB 每节点总内存上限
query.max-length = 1000000 查询最大长度
query.max-stage-count = 150 最大Stage数量
Worker配置:
coordinator = false 是否为Coordinator节点
http-server.http.port = 8080 HTTP服务端口
discovery.uri = http://192.168.1.51:8080 服务发现地址
query.max-memory-per-node = 8GB 每节点最大内存
query.max-total-memory-per-node = 16GB 每节点总内存上限
JVM参数优化
-server
-Xmx32G 最大堆内存
-XX:InitialRAMPercentage=80 初始RAM百分比
-XX:MaxRAMPercentage=80 最大RAM百分比
-XX:G1HeapRegionSize=32M G1区域大小
-XX:+ExplicitGCInvokesConcurrent 并发显式GC
-XX:+ExitOnOutOfMemoryError OOM时退出
-XX:+HeapDumpOnOutOfMemoryError OOM时生成堆转储
-XX:ReservedCodeCacheSize=512M 代码缓存大小
-Djdk.attach.allowAttachSelf=true 允许自我附加
-Djdk.nio.maxCachedBufferSize=2000000 NIO缓存大小
GC日志配置:
-Xlog:gc*:file=/fgeudb/trino/logs/gc.log:time,uptime,level,tags:filecount=10,filesize=100M
6. Trino连接器配置
步骤1:配置Hive连接器
# vi /fgeudb/trino/etc/catalog/hive.properties
connector.name = hive
hive.metastore.uri = thrift://192.168.1.51:9083
hive.allow-drop-table = true
hive.allow-rename-table = true
hive.max-partitions-per-scan = 1000000
hive.storage-format = ORC
hive.compression-codec = ZSTD
配置HDFS:
# vi /fgeudb/trino/etc/catalog/hive.properties
connector.name = hive
hive.metastore.uri = thrift://192.168.1.51:9083
hive.metastore.authentication.type = KERBEROS
hive.metastore.service.principal = hive/_HOST@.NET.CN
hive.metastore.client.principal = trino/_HOST@.NET.CN
hive.metastore.client.keytab = /etc/security/keytabs/trino.service.keytab
hive.hdfs.authentication.type = KERBEROS
hive.hdfs.trino.principal = trino/_HOST@.NET.CN
hive.hdfs.trino.keytab = /etc/security/keytabs/trino.service.keytab
步骤2:配置MySQL连接器
# vi /fgeudb/trino/etc/catalog/mysql.properties
connector.name = mysql
connection-url = jdbc:mysql://192.168.1.51:3306
connection-user = trino
connection-password = trino123
case-insensitive-name-matching = true
case-insensitive-name-matching.strategy = LOWERCASE
连接多个MySQL实例:
# vi /fgeudb/trino/etc/catalog/mysql_fgedu.properties
connector.name = mysql
connection-url = jdbc:mysql://192.168.1.51:3306,192.168.1.52:3306/fgedudb
connection-user = trino
connection-password = trino123
步骤3:配置PostgreSQL连接器
# vi /fgeudb/trino/etc/catalog/postgres.properties
connector.name = postgresql
connection-url = jdbc:postgresql://192.168.1.51:5432/postgres
connection-user = trino
connection-password = trino123
步骤4:配置Kafka连接器
# vi /fgeudb/trino/etc/catalog/kafka.properties
connector.name = kafka
kafka.nodes = 192.168.1.51:9092,192.168.1.52:9092,192.168.1.53:9092
kafka.table-names = topic1,topic2,topic3
kafka.default-schema = default
7. Trino查询实战
使用CLI连接Trino
$ ./trino –server http://192.168.1.51:8080 –catalog hive –schema default
输出示例如下:
trino:default>
查看Catalog:
trino:default> SHOW CATALOGS;
Catalog
———
hive
mysql
postgres
kafka
system
(5 rows)
Query 20260404_101000_00001_xxxxx, FINISHED, 3 nodes
Splits: 3 total, 3 done (100.00%)
0:00 [5 rows, 123B] [25 rows/s, 615B/s]
查看Schema:
trino:default> SHOW SCHEMAS FROM hive;
Schema
——————–
default
information_schema
sys
(3 rows)
查看表:
trino:default> SHOW TABLES FROM hive.default;
Table
——-
users
orders
products
(3 rows)
执行SQL查询
trino:default> SELECT * FROM hive.default.users LIMIT 10;
id | name | email | created_at
—-+———+——————–+——————
1 | user1 | user1@fgedu.net.cn | 2026-01-01 00:00
2 | user2 | user2@fgedu.net.cn | 2026-01-02 00:00
…
聚合查询:
trino:default> SELECT
date_trunc(‘day’, created_at) as day,
COUNT(*) as user_count
FROM hive.default.users
GROUP BY date_trunc(‘day’, created_at)
ORDER BY day DESC
LIMIT 10;
跨数据源联合查询:
trino:default> SELECT
h.user_id,
h.order_amount,
m.user_name
FROM hive.default.orders h
JOIN mysql.fgedu.users m ON h.user_id = m.id
WHERE h.order_date >= DATE ‘2026-01-01’
LIMIT 100;
8. 安装验证与测试
查看Trino状态
$ ps -ef | grep trino
输出示例如下:
trino 12345 1 1 10:10 ? 00:00:30 /usr/lib/jvm/java-24-openjdk/bin/java …
查看端口监听:
$ netstat -tlnp | grep 8080
输出示例如下:
tcp6 0 0 :::8080 :::* LISTEN 12345/java
访问Web UI:
打开浏览器访问:http://192.168.1.51:8080
查看集群状态:
$ curl http://192.168.1.51:8080/v1/info
输出示例如下:
{
“nodeVersion”: {
“version”: “480”
},
“environment”: “production”,
“coordinator”: true,
“state”: “ACTIVE”
}
查看Worker节点:
$ curl http://192.168.1.51:8080/v1/node
输出示例如下:
[
{
“nodeIdentifier”: “coordinator-1”,
“nodeVersion”: “480”,
“coordinator”: true,
“state”: “ACTIVE”
},
{
“nodeIdentifier”: “worker-1”,
“nodeVersion”: “480”,
“coordinator”: false,
“state”: “ACTIVE”
}
]
性能测试
trino:tpch> SELECT
nationkey,
COUNT(*) as customer_count
FROM tpch.tiny.customer
GROUP BY nationkey
ORDER BY customer_count DESC;
Query 20260404_102000_00001_xxxxx, FINISHED, 3 nodes
Splits: 6 total, 6 done (100.00%)
0:02 [15000 rows, 2.3MB] [7500 rows/s, 1.15MB/s]
查看查询计划:
trino:tpch> EXPLAIN SELECT * FROM customer WHERE nationkey = 1;
查看查询执行详情:
trino:tpch> EXPLAIN ANALYZE SELECT COUNT(*) FROM orders;
9. 常见问题与解决方案
问题1:内存不足
解决方案:
1. 增加每节点内存限制:
query.max-memory-per-node = 16GB
query.max-total-memory-per-node = 32GB
2. 增加JVM堆内存:
-Xmx32G
3. 优化查询:
– 减少处理的数据量
– 添加分区过滤条件
– 使用LIMIT限制结果集
4. 查看内存使用:
trino> SELECT * FROM system.runtime.queries WHERE state = ‘RUNNING’;
问题2:连接器连接失败
解决方案:
1. 检查连接器配置:
$ cat /fgeudb/trino/etc/catalog/hive.properties
2. 检查网络连通性:
$ telnet 192.168.1.51 9083
3. 检查认证配置:
– Kerberos配置是否正确
– Keytab文件是否存在
– Principal是否正确
4. 查看日志:
$ tail -f /fgeudb/trino/data/var/log/server.log
问题3:查询超时
解决方案:
1. 增加查询超时时间:
query.max-execution-time = 1h
2. 优化查询性能:
– 添加索引
– 使用分区裁剪
– 减少数据扫描量
3. 检查资源使用:
trino> SELECT * FROM system.runtime.nodes;
4. 查看查询统计:
trino> SELECT * FROM system.runtime.queries ORDER BY created DESC LIMIT 10;
Trino管理命令
# systemctl start trino
或
$ /fgeudb/trino-server-480/bin/launcher start
停止服务:
# systemctl stop trino
或
$ /fgeudb/trino-server-480/bin/launcher stop
重启服务:
# systemctl restart trino
或
$ /fgeudb/trino-server-480/bin/launcher restart
查看状态:
$ /fgeudb/trino-server-480/bin/launcher status
查看日志:
$ tail -f /fgeudb/trino/data/var/log/server.log
使用CLI:
$ ./trino –server http://192.168.1.51:8080
$ ./trino –server http://192.168.1.51:8080 –catalog hive –schema default
$ ./trino –server http://192.168.1.51:8080 –execute “SELECT 1”
执行SQL文件:
$ ./trino –server http://192.168.1.51:8080 -f query.sql
1. 使用Trino 480最新稳定版本;2. Coordinator和Worker分离部署;3. 配置合理的内存参数;4. 使用G1垃圾收集器;5. 配置Kerberos认证保障安全;6. 使用资源组管理工作负载;7. 配置监控和告警;8. 定期备份配置文件;9. 优化查询减少资源消耗;10. 配置日志轮转防止磁盘满。
本文由风哥教程整理发布,仅用于学习测试使用,转载注明出处:http://www.fgedu.net.cn/10327.html
