1. Apache Kudu简介与版本说明
Apache Kudu是一个开源的分布式列式存储引擎,专为需要快速分析和实时更新的工作负载而设计。更多学习教程www.fgedu.net.cn。Kudu填补了HDFS(只读)和HBase(随机读写)之间的空白,提供高性能的随机读写和快速扫描能力。
Kudu与Impala深度集成,支持SQL查询和实时更新。学习交流加群风哥微信: itpux-com。它广泛应用于实时数据仓库、时间序列数据、机器学习特征存储等场景,是现代数据架构的关键组件。
Apache Kudu核心特性:
– 列式存储:优化的列式存储格式
– 快速扫描:支持高性能的批量扫描
– SQL集成:与Impala深度集成
– 强一致性:支持ACID事务
– 水平扩展:支持自动分区和副本
– Schema灵活:支持表结构变更
– 高可用:支持多副本和自动故障恢复
– 压缩编码:支持多种压缩和编码算法
– 安全性:支持Kerberos认证和加密
Kudu架构组件:
Master 主节点,管理元数据和协调集群
Tablet Server 表服务节点,存储数据和响应请求
Tablet 表分区,数据的基本存储单元
Catalog Table 目录表,存储表和元数据信息
2. Kudu版本选择与下载地址
Apache Kudu采用语义化版本号,当前主要维护1.18.x系列。
Kudu版本状态:
1.18.0 2025-07-14 最新稳定版
1.17.1 2024-11-15 稳定版,持续支持
1.16.0 2023-XX-XX 旧版
1.15.0 2022-XX-XX 旧版
Kudu 1.18.0主要更新:
– 性能优化
– 安全性增强
– Bug修复
– 兼容性改进
– 新增功能特性
官方下载地址:
下载页面:https://kudu.apache.org/releases/
源码仓库:https://github.com/apache/kudu
文档中心:https://kudu.apache.org/docs/
3. Kudu下载方式详解
方式一:下载源码包(推荐)
$ cd /fgeudb/software
$ wget https://archive.apache.org/dist/kudu/1.18.0/apache-kudu-1.18.0.tar.gz
输出示例如下:
–2026-04-04 10:00:00– https://archive.apache.org/dist/kudu/1.18.0/apache-kudu-1.18.0.tar.gz
Resolving archive.apache.org… 163.172.17.49
Connecting to archive.apache.org|163.172.17.49|:443… connected.
HTTP request sent, awaiting response… 200 OK
Length: 12345678 (12M) [application/octet-stream]
Saving to: ‘apache-kudu-1.18.0.tar.gz’
apache-kudu-1.18.0.tar.gz 100%[======================================================================>] 11.77M 8.5MB/s in 1.4s
2026-04-04 10:00:02 (8.5 MB/s) – ‘apache-kudu-1.18.0.tar.gz’ saved [12345678/12345678]
验证下载:
$ wget https://archive.apache.org/dist/kudu/1.18.0/apache-kudu-1.18.0.tar.gz.sha512
$ sha512sum -c apache-kudu-1.18.0.tar.gz.sha512
输出示例如下:
apache-kudu-1.18.0.tar.gz: OK
解压源码:
$ tar -zxvf apache-kudu-1.18.0.tar.gz -C /fgeudb/
方式二:RPM包安装
# vi /etc/yum.repos.d/cloudera.repo
[cloudera-runtime]
name=Cloudera Runtime
baseurl=https://archive.cloudera.com/p/cdh7/7.3.1/redhat7/yum/
gpgkey=https://archive.cloudera.com/p/cdh7/7.3.1/redhat7/yum/RPM-GPG-KEY-cloudera
gpgcheck=1
安装Kudu:
# yum install -y kudu kudu-master kudu-tserver kudu-client0 kudu-client-devel
输出示例如下:
Loaded plugins: fastestmirror
Loading mirror speeds from cached hostfile
Resolving Dependencies
–> Running transaction check
—> Package kudu.x86_64 0:1.18.0-1.el7 will be installed
—> Package kudu-master.x86_64 0:1.18.0-1.el7 will be installed
—> Package kudu-tserver.x86_64 0:1.18.0-1.el7 will be installed
–> Finished Dependency Resolution
Dependencies Resolved
================================================================================
Package Arch Version Repository Size
================================================================================
Installing:
kudu x86_64 1.18.0-1.el7 cloudera-runtime 50 M
kudu-master x86_64 1.18.0-1.el7 cloudera-runtime 100 M
kudu-tserver x86_64 1.18.0-1.el7 cloudera-runtime 100 M
Transaction Summary
================================================================================
Install 3 Packages
Total download size: 250 M
Installed size: 500 M
Downloading packages:
(1/3): kudu-1.18.0-1.el7.x86_64.rpm | 50 MB 00:00:30
(2/3): kudu-master-1.18.0-1.el7.x86_64.rpm | 100 MB 00:01:00
(3/3): kudu-tserver-1.18.0-1.el7.x86_64.rpm | 100 MB 00:01:00
Complete!
方式三:源码编译安装
# yum install -y autoconf automake cmake gcc-c++ git \
krb5-devel libtool make ncurses-devel openssl-devel \
python3-devel rsync unzip wget which zip
编译Kudu:
$ cd /fgeudb/apache-kudu-1.18.0
$ mkdir -p build/release
$ cd build/release
$ ../thirdparty/build-if-necessary.sh
$ cmake -DCMAKE_BUILD_TYPE=release ../..
$ make -j4
输出示例如下:
[ 1%] Building CXX object src/kudu/CMakeFiles/kudu.dir/…
…
[100%] Built target kudu
安装:
# make install
4. Kudu安装部署实战
步骤1:创建目录结构
# mkdir -p /fgeudb/kudu/{master,data,logs}
# mkdir -p /fgeudb/kudu/master/{data,wal}
# mkdir -p /fgeudb/kudu/tserver/{data,wal}
创建kudu用户:
# groupadd kudu
# useradd -g kudu -s /sbin/nologin -M kudu
设置权限:
# chown -R kudu:kudu /fgeudb/kudu
# chmod -R 755 /fgeudb/kudu
步骤2:配置Master节点
# vi /etc/kudu/conf/master.gflagfile
–master_addresses=192.168.1.51:7051,192.168.1.52:7051,192.168.1.53:7051
–fs_wal_dir=/fgeudb/kudu/master/wal
–fs_data_dirs=/fgeudb/kudu/master/data
–log_dir=/fgeudb/kudu/logs
–rpc_bind_addresses=0.0.0.0:7051
–webserver_port=8051
–webserver_interface=0.0.0.0
–num_replicas=3
–default_num_replicas=3
–heartbeat_interval_ms=3000
单节点Master配置:
# vi /etc/kudu/conf/master.gflagfile
–fs_wal_dir=/fgeudb/kudu/master/wal
–fs_data_dirs=/fgeudb/kudu/master/data
–log_dir=/fgeudb/kudu/logs
–rpc_bind_addresses=0.0.0.0:7051
–webserver_port=8051
步骤3:配置Tablet Server节点
# vi /etc/kudu/conf/tserver.gflagfile
–tserver_master_addrs=192.168.1.51:7051,192.168.1.52:7051,192.168.1.53:7051
–fs_wal_dir=/fgeudb/kudu/tserver/wal
–fs_data_dirs=/fgeudb/kudu/tserver/data
–log_dir=/fgeudb/kudu/logs
–rpc_bind_addresses=0.0.0.0:7050
–webserver_port=8050
–webserver_interface=0.0.0.0
–heartbeat_interval_ms=3000
–memory_limit_hard_bytes=4294967296
单节点配置:
# vi /etc/kudu/conf/tserver.gflagfile
–tserver_master_addrs=192.168.1.51:7051
–fs_wal_dir=/fgeudb/kudu/tserver/wal
–fs_data_dirs=/fgeudb/kudu/tserver/data
–log_dir=/fgeudb/kudu/logs
–rpc_bind_addresses=0.0.0.0:7050
–webserver_port=8050
–memory_limit_hard_bytes=4294967296
步骤4:启动Kudu服务
# systemctl start kudu-master
输出示例如下:
Starting Kudu Master: [ OK ]
启动Tablet Server:
# systemctl start kudu-tserver
输出示例如下:
Starting Kudu Tablet Server: [ OK ]
设置开机自启:
# systemctl enable kudu-master
# systemctl enable kudu-tserver
查看服务状态:
# systemctl status kudu-master
输出示例如下:
● kudu-master.service – Apache Kudu Master Server
Loaded: loaded (/usr/lib/systemd/system/kudu-master.service; enabled)
Active: active (running) since Fri 2026-04-04 10:10:00 CST; 10s ago
Main PID: 12345 (kudu-master)
CGroup: /system.slice/kudu-master.service
└─12345 /usr/bin/kudu-master –flagfile=/etc/kudu/conf/master.gflagfile
查看Tablet Server状态:
# systemctl status kudu-tserver
输出示例如下:
● kudu-tserver.service – Apache Kudu Tablet Server
Loaded: loaded (/usr/lib/systemd/system/kudu-tserver.service; enabled)
Active: active (running) since Fri 2026-04-04 10:10:00 CST; 10s ago
Main PID: 12346 (kudu-tserver)
5. Kudu配置文件详解
Master核心配置参数
–rpc_bind_addresses=0.0.0.0:7051 RPC绑定地址
–webserver_port=8051 Web UI端口
–webserver_interface=0.0.0.0 Web UI绑定接口
存储配置:
–fs_wal_dir=/fgeudb/kudu/master/wal WAL日志目录
–fs_data_dirs=/fgeudb/kudu/master/data 数据目录
集群配置:
–master_addresses=192.168.1.51:7051,… Master地址列表
–num_replicas=3 默认副本数
日志配置:
–log_dir=/fgeudb/kudu/logs 日志目录
–log_level=INFO 日志级别
Tablet Server核心配置参数
–rpc_bind_addresses=0.0.0.0:7050 RPC绑定地址
–webserver_port=8050 Web UI端口
存储配置:
–fs_wal_dir=/fgeudb/kudu/tserver/wal WAL日志目录
–fs_data_dirs=/fgeudb/kudu/tserver/data 数据目录
内存配置:
–memory_limit_hard_bytes=4294967296 内存限制(4GB)
–block_cache_capacity_mb=512 块缓存大小
性能配置:
–num_tablet_servers_to_describe=3 描述Tablet Server数量
–heartbeat_interval_ms=3000 心跳间隔
–scanner_batch_size_rows=1024 扫描批次大小
生产环境配置优化
–master_addresses=192.168.1.51:7051,192.168.1.52:7051,192.168.1.53:7051
–fs_wal_dir=/fgeudb/kudu/master/wal
–fs_data_dirs=/fgeudb/kudu/master/data
–log_dir=/fgeudb/kudu/logs
–num_replicas=3
–default_num_replicas=3
–heartbeat_interval_ms=3000
–catalog_manager_wait_for_new_tablets_to_elect_leader_timeout_ms=60000
Tablet Server生产配置:
–tserver_master_addrs=192.168.1.51:7051,192.168.1.52:7051,192.168.1.53:7051
–fs_wal_dir=/fgeudb/kudu/tserver/wal
–fs_data_dirs=/fgeudb/kudu/tserver/data
–log_dir=/fgeudb/kudu/logs
–memory_limit_hard_bytes=17179869184
–block_cache_capacity_mb=2048
–heartbeat_interval_ms=3000
–scanner_batch_size_rows=2048
–num_scanner_threads=4
–log_level=INFO
6. Kudu表操作实战
使用kudu CLI
$ kudu cluster ksck 192.168.1.51:7051
输出示例如下:
Connected to the Master
Master Summary
UUID: abc123def456
Address: 192.168.1.51:7051
State: RUNNING
Tablet Server Summary
UUID: def456ghi789
Address: 192.168.1.51:7050
State: RUNNING
创建表:
$ kudu table create 192.168.1.51:7051 fgedu_db.users ‘
{
“table_name”: “fgedu_db.users”,
“schema”: {
“columns”: [
{“name”: “id”, “type”: “INT64”, “nullable”: false},
{“name”: “name”, “type”: “STRING”, “nullable”: true},
{“name”: “email”, “type”: “STRING”, “nullable”: true},
{“name”: “created_at”, “type”: “UNIXTIME_MICROS”, “nullable”: true}
],
“key_column_names”: [“id”]
},
“partitioning”: {
“hash_partitions”: [{“columns”: [“id”], “num_buckets”: 4}],
“num_replicas”: 3
}
}’
输出示例如下:
Created table fgedu_db.users
列出表:
$ kudu table list 192.168.1.51:7051
输出示例如下:
fgedu_db.users
查看表结构:
$ kudu table describe 192.168.1.51:7051 fgedu_db.users
输出示例如下:
TABLE: fgedu_db.users
COLUMN | TYPE | NULLABLE | KEY
————-+——————-+———-+—–
id | INT64 | false | true
name | STRING | true | false
email | STRING | true | false
created_at | UNIXTIME_MICROS | true | false
使用Impala操作Kudu表
$ impala-shell -i 192.168.1.51:21000
创建Kudu表:
[192.168.1.51:21000] > CREATE TABLE fgedu_db.orders (
> id BIGINT,
> user_id BIGINT,
> product_name STRING,
> amount DECIMAL(10,2),
> order_time TIMESTAMP,
> PRIMARY KEY (id)
> )
> PARTITION BY HASH(id) PARTITIONS 4
> STORED AS KUDU;
输出示例如下:
Query: create TABLE fgedu_db.orders (
id BIGINT,
user_id BIGINT,
product_name STRING,
amount DECIMAL(10,2),
order_time TIMESTAMP,
PRIMARY KEY (id)
)
PARTITION BY HASH(id) PARTITIONS 4
STORED AS KUDU
Fetched 0 row(s) in 0.25s
插入数据:
[192.168.1.51:21000] > INSERT INTO fgedu_db.orders VALUES
> (1, 100, ‘Product A’, 1000.00, NOW()),
> (2, 101, ‘Product B’, 2000.00, NOW()),
> (3, 102, ‘Product C’, 3000.00, NOW());
更新数据:
[192.168.1.51:21000] > UPDATE fgedu_db.orders
> SET amount = 1500.00
> WHERE id = 1;
删除数据:
[192.168.1.51:21000] > DELETE FROM fgedu_db.orders WHERE id = 3;
查询数据:
[192.168.1.51:21000] > SELECT * FROM fgedu_db.orders;
输出示例如下:
+—-+———+————–+———+———————+
| id | user_id | product_name | amount | order_time |
+—-+———+————–+———+———————+
| 1 | 100 | Product A | 1500.00 | 2026-04-04 10:15:00 |
| 2 | 101 | Product B | 2000.00 | 2026-04-04 10:15:00 |
+—-+———+————–+———+———————+
Fetched 2 row(s) in 0.05s
分区策略
CREATE TABLE fgedu_db.events (
id BIGINT,
event_type STRING,
event_data STRING,
event_time TIMESTAMP,
PRIMARY KEY (id)
)
PARTITION BY HASH(id) PARTITIONS 8
STORED AS KUDU;
Range分区:
CREATE TABLE fgedu_db.logs (
id BIGINT,
log_date DATE,
log_message STRING,
PRIMARY KEY (id, log_date)
)
PARTITION BY RANGE(log_date) (
PARTITION VALUES < '2026-01-01',
PARTITION '2026-01-01' <= VALUES < '2026-02-01',
PARTITION '2026-02-01' <= VALUES < '2026-03-01',
PARTITION '2026-03-01' <= VALUES
)
STORED AS KUDU;
Hash + Range复合分区:
CREATE TABLE fgedu_db.metrics (
id BIGINT,
metric_date DATE,
metric_name STRING,
metric_value DOUBLE,
PRIMARY KEY (id, metric_date)
)
PARTITION BY HASH(id) PARTITIONS 4,
RANGE(metric_date) (
PARTITION VALUES < '2026-01-01',
PARTITION '2026-01-01' <= VALUES < '2026-04-01',
PARTITION '2026-04-01' <= VALUES
)
STORED AS KUDU;
7. 安装验证与测试
查看Kudu状态
$ ps -ef | grep kudu
输出示例如下:
kudu 12345 1 5 10:10 ? 00:00:30 /usr/bin/kudu-master –flagfile=/etc/kudu/conf/master.gflagfile
kudu 12346 1 5 10:10 ? 00:00:30 /usr/bin/kudu-tserver –flagfile=/etc/kudu/conf/tserver.gflagfile
查看端口监听:
$ netstat -tlnp | grep kudu
输出示例如下:
tcp6 0 0 :::7050 :::* LISTEN 12346/kudu-tserver
tcp6 0 0 :::7051 :::* LISTEN 12345/kudu-master
tcp6 0 0 :::8050 :::* LISTEN 12346/kudu-tserver
tcp6 0 0 :::8051 :::* LISTEN 12345/kudu-master
访问Web UI:
Master Web UI:http://192.168.1.51:8051
Tablet Server Web UI:http://192.168.1.51:8050
检查集群健康:
$ kudu cluster ksck 192.168.1.51:7051
输出示例如下:
Connected to the Master
Master Summary
UUID: abc123def456
Address: 192.168.1.51:7051
State: RUNNING
Tablet Server Summary
UUID: def456ghi789
Address: 192.168.1.51:7050
State: RUNNING
Tables Summary
Name: fgedu_db.users
Replicas: 3
State: HEALTHY
Cluster Health: OK
性能测试
$ kudu perf loadgen 192.168.1.51:7051 \
–num_threads_per_client=4 \
–num_rows_per_thread=100000 \
–string_len=100
输出示例如下:
Starting load generation…
Rows inserted: 400000
Time elapsed: 45.23 seconds
Throughput: 8845.67 rows/sec
使用Impala测试查询:
[192.168.1.51:21000] > SELECT COUNT(*) FROM fgedu_db.orders;
输出示例如下:
+———-+
| count(*) |
+———-+
| 1000000 |
+———-+
Fetched 1 row(s) in 0.25s
8. 常见问题与解决方案
问题1:内存不足
解决方案:
1. 增加内存限制:
–memory_limit_hard_bytes=17179869184
2. 增加块缓存:
–block_cache_capacity_mb=2048
3. 优化查询:
– 减少扫描数据量
– 使用分区裁剪
– 限制返回列
4. 监控内存使用:
访问Web UI查看内存使用情况
问题2:Tablet不平衡
解决方案:
1. 检查分区策略:
确保Hash分区键选择合理
2. 重新分区:
ALTER TABLE … ADD RANGE PARTITION …
3. 手动迁移Tablet:
$ kudu tablet move 192.168.1.51:7051 tablet_id new_server
4. 查看Tablet分布:
访问Master Web UI查看Tablet分布
问题3:写入性能低
解决方案:
1. 增加分区数:
PARTITION BY HASH(id) PARTITIONS 16
2. 调整内存配置:
–memory_limit_hard_bytes=17179869184
3. 批量写入:
使用批量INSERT提高吞吐量
4. 调整WAL配置:
–log_level=INFO
–fs_wal_dir使用SSD存储
Kudu管理命令
# systemctl start kudu-master
# systemctl start kudu-tserver
停止服务:
# systemctl stop kudu-tserver
# systemctl stop kudu-master
重启服务:
# systemctl restart kudu-master
# systemctl restart kudu-tserver
查看状态:
# systemctl status kudu-master
# systemctl status kudu-tserver
集群检查:
$ kudu cluster ksck 192.168.1.51:7051
表管理:
$ kudu table list 192.168.1.51:7051
$ kudu table describe 192.168.1.51:7051 table_name
$ kudu table delete 192.168.1.51:7051 table_name
Tablet管理:
$ kudu tablet list 192.168.1.51:7051 table_name
$ kudu tablet change_config 192.168.1.51:7051 tablet_id –replica_count=3
Master管理:
$ kudu master list 192.168.1.51:7051
$ kudu master status 192.168.1.51:7051
Tablet Server管理:
$ kudu tserver list 192.168.1.51:7051
$ kudu tserver status 192.168.1.51:7050
1. 使用Kudu 1.18.0最新稳定版本;2. 部署多Master高可用集群;3. 配置至少3副本;4. 合理设计分区策略;5. 使用SSD存储WAL;6. 配置足够的内存资源;7. 启用Kerberos认证;8. 监控集群健康状态;9. 定期备份数据;10. 与Impala配合使用。
本文由风哥教程整理发布,仅用于学习测试使用,转载注明出处:http://www.fgedu.net.cn/10327.html
