1. Kudu概述与环境规划
Apache Kudu是一个开源的分布式列式存储引擎,专为快速分析而设计。Kudu结合了HDFS的高吞吐量和HBase的低延迟特性,支持实时读写和批量分析,是大数据实时分析场景的理想选择。更多学习教程www.fgedu.net.cn
1.1 Kudu版本说明
Kudu目前主要版本为1.x系列,本教程以Kudu 1.17为例进行详细讲解。Kudu与Impala紧密集成,可以提供高性能的SQL查询能力。
$ kudu –version
kudu 1.17.0
revision 1a2b3c4d5e6f7g8h9i0j
# 查看系统信息
$ cat /etc/os-release
NAME=”Oracle Linux Server”
VERSION=”8.9″
ID=”ol”
PRETTY_NAME=”Oracle Linux Server 8.9″
# 查看Java版本
$ java -version
openjdk version “1.8.0_402”
OpenJDK Runtime Environment (build 1.8.0_402-b06)
OpenJDK 64-Bit Server VM (build 25.402-b06, mixed mode)
1.2 环境规划
本次安装环境规划如下:
节点1:kudu01.fgedu.net.cn (192.168.1.51) – Master + Tablet Server
节点2:kudu02.fgedu.net.cn (192.168.1.52) – Master + Tablet Server
节点3:kudu03.fgedu.net.cn (192.168.1.53) – Master + Tablet Server
Kudu版本:1.17.0
操作系统:Oracle Linux 8.9 / RHEL 8.9
安装目录:/data/kudu
数据目录:/data/kudu/data
日志目录:/data/kudu/logs
WAL目录:/data/kudu/wal
Master端口:
– RPC端口:7051
– Web端口:8051
Tablet Server端口:
– RPC端口:7050
– Web端口:8050
2. 硬件环境要求
Kudu对硬件资源要求较高,特别是内存和磁盘I/O。以下是生产环境的硬件配置建议。学习交流加群风哥微信: itpux-com
2.1 最低硬件要求
# free -h
total used free shared buff/cache available
Mem: 128G 8.5G 115G 2.1G 4.2G 118G
Swap: 32G 0B 32G
# 检查磁盘空间
# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda1 50G 12G 39G 24% /
/dev/sdb1 2.0T 50G 2.0T 3% /data
/dev/sdc1 500G 20G 480G 4% /backup
# 检查CPU核心数
# nproc
64
# 检查磁盘I/O性能
# fio –name=randread –ioengine=libaio –iodepth=16 –rw=randread –bs=4k –direct=1 –size=1G –numjobs=4 –runtime=60 –group_reporting
randread: (g=0): rw=randread, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=16
…
read : io=1024.0MB, bw=17485KB/s, iops=4371, runt= 60013msec
2.2 网络配置要求
# ip addr show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN
inet 127.0.0.1/8 scope host lo
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP
inet 192.168.1.51/24 brd 192.168.1.255 scope global eth0
# 检查网络连通性
# ping -c 3 192.168.1.52
PING 192.168.1.52 (192.168.1.52) 56(84) bytes of data.
64 bytes from 192.168.1.52: icmp_seq=1 ttl=64 time=0.234 ms
64 bytes from 192.168.1.52: icmp_seq=2 ttl=64 time=0.198 ms
64 bytes from 192.168.1.52: icmp_seq=3 ttl=64 time=0.212 ms
# 检查主机名解析
# hostname -f
kudu01.fgedu.net.cn
# 配置/etc/hosts
# cat /etc/hosts
127.0.0.1 localhost
192.168.1.51 kudu01.fgedu.net.cn kudu01
192.168.1.52 kudu02.fgedu.net.cn kudu02
192.168.1.53 kudu03.fgedu.net.cn kudu03
# 检查时钟同步
# chronyc sources
210 Number of sources = 4
MS Name/IP address Stratum Poll Reach LastRx Last sample
===============================================================================
^* 192.168.1.1 2 6 37 15 +12us[ +15us] +/- 23ms
3. 操作系统配置
Kudu运行在Linux操作系统上,需要对系统进行一系列优化配置。学习交流加群风哥QQ113257174
3.1 关闭防火墙和SELinux
# getenforce
Disabled
# 关闭SELinux(如未关闭)
# vi /etc/selinux/config
SELINUX=disabled
# 检查防火墙状态
# systemctl status firewalld
# 关闭防火墙(生产环境建议开放特定端口)
# systemctl stop firewalld
# systemctl disable firewalld
# 或者开放Kudu所需端口
# firewall-cmd –permanent –add-port=7050/tcp
# firewall-cmd –permanent –add-port=7051/tcp
# firewall-cmd –permanent –add-port=8050/tcp
# firewall-cmd –permanent –add-port=8051/tcp
# firewall-cmd –reload
3.2 系统参数优化
# vi /etc/sysctl.conf
# 添加以下参数
fs.file-max = 6815744
vm.swappiness = 10
vm.dirty_background_ratio = 5
vm.dirty_ratio = 10
vm.overcommit_memory = 1
vm.max_map_count = 262144
vm.min_free_kbytes = 2097152
net.core.somaxconn = 32768
net.ipv4.tcp_max_syn_backlog = 65536
net.ipv4.tcp_syncookies = 1
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_fin_timeout = 30
net.ipv4.ip_local_port_range = 1024 65535
# 使配置生效
# sysctl -p
# 配置文件描述符限制
# vi /etc/security/limits.conf
# 添加以下内容
* soft nofile 65536
* hard nofile 65536
* soft nproc 65536
* hard nproc 65536
* soft memlock unlimited
* hard memlock unlimited
# 验证配置
# ulimit -n
65536
# ulimit -u
65536
3.3 禁用透明大页和NUMA
# cat /sys/kernel/mm/transparent_hugepage/enabled
[always] madvise never
# 禁用透明大页
# echo never > /sys/kernel/mm/transparent_hugepage/enabled
# echo never > /sys/kernel/mm/transparent_hugepage/defrag
# 永久禁用
# vi /etc/rc.d/rc.local
# 添加以下内容
echo never > /sys/kernel/mm/transparent_hugepage/enabled
echo never > /sys/kernel/mm/transparent_hugepage/defrag
# 添加执行权限
# chmod +x /etc/rc.d/rc.local
# 检查NUMA状态
# numactl –hardware
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 4 5 6 7 16 17 18 19 20 21 22 23
node 0 size: 65536 MB
node 0 free: 62000 MB
node 1 cpus: 8 9 10 11 12 13 14 15 24 25 26 27 28 29 30 31
node 1 size: 65536 MB
node 1 free: 63000 MB
3.4 时间同步配置
# yum install -y chrony
# 配置chrony
# vi /etc/chrony.conf
# 修改时间服务器
server 192.168.1.1 iburst
server ntp.aliyun.com iburst
server ntp.tencent.com iburst
# 启动chrony服务
# systemctl start chronyd
# systemctl enable chronyd
# 检查时间同步状态
# chronyc sources
210 Number of sources = 3
MS Name/IP address Stratum Poll Reach LastRx Last sample
===============================================================================
^* 192.168.1.1 2 6 37 15 +12us[ +15us] +/- 23ms
^+ ntp.aliyun.com 2 6 37 17 -23us[ -20us] +/- 45ms
^+ ntp.tencent.com 2 6 37 19 +18us[ +21us] +/- 38ms
# 检查时间偏差
# chronyc tracking
Reference ID : C0A80101 (192.168.1.1)
Stratum : 3
Ref time (UTC) : Fri Apr 05 10:00:00 2024
System time : 0.000012345 seconds fast of NTP time
Last offset : +0.000012345 seconds
RMS offset : 0.000023456 seconds
Frequency : 12.345 ppm fast
4. Kudu安装部署
完成操作系统配置后,开始安装Kudu服务。更多学习教程公众号风哥教程itpux_com
4.1 创建Kudu用户和目录
# useradd kudu
# echo “kudu:fgedu_kudu_2024” | chpasswd
# 创建目录
# mkdir -p /data/kudu/{data,logs,wal,tmp}
# mkdir -p /var/run/kudu
# chown -R kudu:kudu /data/kudu
# chown -R kudu:kudu /var/run/kudu
# 检查目录权限
# ls -la /data/kudu/
total 0
drwxr-xr-x 2 kudu kudu 6 Apr 5 10:00 data
drwxr-xr-x 2 kudu kudu 6 Apr 5 10:00 logs
drwxr-xr-x 2 kudu kudu 6 Apr 5 10:00 tmp
drwxr-xr-x 2 kudu kudu 6 Apr 5 10:00 wal
4.2 安装依赖包
# yum install -y autoconf automake cmake gcc gcc-c++ git \
krb5-devel libtool make ncurses-devel openssl-devel \
cyrus-sasl-devel cyrus-sasl-gssapi cyrus-sasl-plain \
libev-devel libutf8proc-devel lz4-devel \
protobuf-devel python2 python2-devel \
rapidjson-devel snappy-devel thrift-devel \
boost-devel gflags-devel glog-devel \
google-perftools-devel libunwind-devel
# 安装额外依赖
# yum install -y ntp ntpdate
# 验证依赖安装
$ gcc –version
gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-20)
4.3 下载安装Kudu
# cd /tmp
# wget https://downloads.apache.org/kudu/1.17.0/apache-kudu-1.17.0.tar.gz
# 解压安装
# tar -xzf apache-kudu-1.17.0.tar.gz
# mv apache-kudu-1.17.0 /data/kudu/kudu
# chown -R kudu:kudu /data/kudu
# 配置环境变量
# vi /etc/profile.d/kudu.sh
export KUDU_HOME=/data/kudu/kudu
export PATH=$KUDU_HOME/bin:$PATH
# 使环境变量生效
# source /etc/profile.d/kudu.sh
# 验证安装
$ kudu –version
kudu 1.17.0
revision 1a2b3c4d5e6f7g8h9i0j
5. Kudu参数配置
Kudu的参数配置对性能和稳定性至关重要,需要根据实际环境进行调整。from:www.itpux.com
5.1 配置Master
# vi /data/kudu/conf/master.gflagfile
# Master基本配置
–log_dir=/data/kudu/logs
–log_filename=kudu-master
–log_level=1
# 数据目录配置
–fs_data_dirs=/data/kudu/data
–fs_wal_dir=/data/kudu/wal
–fs_metadata_dir=/data/kudu/data
# 网络配置
–rpc_bind_addresses=0.0.0.0:7051
–webserver_interface=0.0.0.0
–webserver_port=8051
–webserver_doc_root=/data/kudu/kudu/www
# Master集群配置
–master_addresses=192.168.1.51:7051,192.168.1.52:7051,192.168.1.53:7051
# 内存配置
–memory_limit_hard_bytes=53687091200
# 心跳配置
–heartbeat_timeout_ms=30000
–leader_failure_max_missed_heartbeat_periods=3
# 安全配置
–rpc_authentication=disabled
–rpc_encryption=disabled
–rpc_negotiation_timeout_ms=3000
# 日志配置
–logbuflevel=-1
–max_log_files=10
–max_log_size=100
# 表配置
–num_tablet_servers_to_describe=10
–num_tablets_per_range_partition=1
# 垃圾回收配置
–tablet_server_heartbeat_rpc_timeout_ms=15000
–tserver_unresponsive_timeout_ms=60000
5.2 配置Tablet Server
# vi /data/kudu/conf/tserver.gflagfile
# Tablet Server基本配置
–log_dir=/data/kudu/logs
–log_filename=kudu-tserver
–log_level=1
# 数据目录配置
–fs_data_dirs=/data/kudu/data
–fs_wal_dir=/data/kudu/wal
–fs_metadata_dir=/data/kudu/data
# 网络配置
–rpc_bind_addresses=0.0.0.0:7050
–webserver_interface=0.0.0.0
–webserver_port=8050
–webserver_doc_root=/data/kudu/kudu/www
# Master地址配置
–tserver_master_addrs=192.168.1.51:7051,192.168.1.52:7051,192.168.1.53:7051
# 内存配置
–memory_limit_hard_bytes=107374182400
–memory_soft_limit_in_bytes=85899345920
# 线程配置
–num_io_threads=16
–num_network_reactor_threads=16
–num_reactor_threads=16
–num_tablet_copy_threads=4
–num_tablet_server_workers=16
# WAL配置
–log_level=1
–log_segment_size_mb=64
–log_async_preallocate_segments=true
# 数据块配置
–block_manager=file
–data_log_max_segments=300
# 内存压力配置
–memory_pressure_percentage=60
–memory_limit_soft_percentage=80
# 安全配置
–rpc_authentication=disabled
–rpc_encryption=disabled
–rpc_negotiation_timeout_ms=3000
# 压缩配置
–compression_algorithm=lz4
# 日志配置
–logbuflevel=-1
–max_log_files=10
–max_log_size=100
# 磁盘配置
–disk_reserved_bytes_free_for_testing=0
–fs_data_dirs_full_disk_cache_seconds=30
6. Kudu服务启动
配置完成后,按照正确的顺序启动Kudu服务。
6.1 启动Master服务
# su – kudu
$ nohup $KUDU_HOME/bin/kudu-master \
–flagfile=/data/kudu/conf/master.gflagfile \
> /data/kudu/logs/kudu-master.out 2>&1 &
# 检查进程
$ ps -ef | grep kudu-master
kudu 12345 1 5 10:00 ? 00:00:10 /data/kudu/kudu/bin/kudu-master –flagfile=/data/kudu/conf/master.gflagfile
# 检查端口
$ netstat -tlnp | grep kudu
tcp 0 0 0.0.0.0:7051 0.0.0.0:* LISTEN 12345/kudu-master
tcp 0 0 0.0.0.0:8051 0.0.0.0:* LISTEN 12345/kudu-master
# 查看日志
$ tail -f /data/kudu/logs/kudu-master.INFO
I0405 10:00:00.000000 12345 master_main.cc:65] Master server started
I0405 10:00:00.000100 12345 master.cc:80] Master listening on 0.0.0.0:7051
I0405 10:00:00.000200 12345 webserver.cc:100] Webserver started on port 8051
I0405 10:00:05.000300 12345 raft_consensus.cc:150] Master raft consensus initialized
6.2 启动Tablet Server服务
$ nohup $KUDU_HOME/bin/kudu-tserver \
–flagfile=/data/kudu/conf/tserver.gflagfile \
> /data/kudu/logs/kudu-tserver.out 2>&1 &
# 检查进程
$ ps -ef | grep kudu-tserver
kudu 12567 1 15 10:01 ? 00:00:30 /data/kudu/kudu/bin/kudu-tserver –flagfile=/data/kudu/conf/tserver.gflagfile
# 检查端口
$ netstat -tlnp | grep kudu
tcp 0 0 0.0.0.0:7050 0.0.0.0:* LISTEN 12567/kudu-tserver
tcp 0 0 0.0.0.0:8050 0.0.0.0:* LISTEN 12567/kudu-tserver
# 查看日志
$ tail -f /data/kudu/logs/kudu-tserver.INFO
I0405 10:01:00.000000 12567 tserver_main.cc:65] Tablet server started
I0405 10:01:00.000100 12567 tserver.cc:80] Tablet server listening on 0.0.0.0:7050
I0405 10:01:00.000200 12567 webserver.cc:100] Webserver started on port 8050
I0405 10:01:10.000300 12567 heartbeater.cc:150] Connected to master at 192.168.1.51:7051
6.3 验证集群状态
$ kudu cluster_info kudu01.fgedu.net.cn:7051,kudu02.fgedu.net.cn:7051,kudu03.fgedu.net.cn:7051
# 输出示例:
Master addresses: 192.168.1.51:7051,192.168.1.52:7051,192.168.1.53:7051
Masters:
UUID: 1a2b3c4d5e6f7a8b9c0d1e2f3a4b5c6d
RPC Address: 192.168.1.51:7051
State: RUNNING
UUID: 2b3c4d5e6f7a8b9c0d1e2f3a4b5c6d7e
RPC Address: 192.168.1.52:7051
State: RUNNING
UUID: 3c4d5e6f7a8b9c0d1e2f3a4b5c6d7e8f
RPC Address: 192.168.1.53:7051
State: RUNNING
Tablet Servers:
UUID: 4d5e6f7a8b9c0d1e2f3a4b5c6d7e8f9a
RPC Address: 192.168.1.51:7050
State: RUNNING
Num Tablets: 0
UUID: 5e6f7a8b9c0d1e2f3a4b5c6d7e8f9a0b
RPC Address: 192.168.1.52:7050
State: RUNNING
Num Tablets: 0
UUID: 6f7a8b9c0d1e2f3a4b5c6d7e8f9a0b1c
RPC Address: 192.168.1.53:7050
State: RUNNING
Num Tablets: 0
# 检查Tablet Server状态
$ kudu tserver_status kudu01.fgedu.net.cn:7050
# 输出示例:
UUID: 4d5e6f7a8b9c0d1e2f3a4b5c6d7e8f9a
RPC Address: 192.168.1.51:7050
State: RUNNING
Num Tablets: 0
Memory Used: 1.2 GB
Memory Soft Limit: 80.0 GB
Memory Hard Limit: 100.0 GB
7. Kudu功能测试
完成安装后,需要进行功能测试验证Kudu是否正常工作。
7.1 创建测试表
$ kudu table create kudu01.fgedu.net.cn:7051,kudu02.fgedu.net.cn:7051,kudu03.fgedu.net.cn:7051 \
‘fgedudb.fgedu_employees’ \
‘id INT64 NOT NULL, name STRING NOT NULL, department STRING, salary DOUBLE, hire_date STRING’ \
‘id’ \
–num_replicas=3
# 输出示例:
Created table fgedudb.fgedu_employees
# 查看表结构
$ kudu table describe kudu01.fgedu.net.cn:7051,kudu02.fgedu.net.cn:7051,kudu03.fgedu.net.cn:7051 \
‘fgedudb.fgedu_employees’
# 输出示例:
TABLE: fgedudb.fgedu_employees
OWNER: kudu
NUM_REPLICAS: 3
COLUMN | TYPE | NULLABLE | DEFAULT | ENCODING | COMPRESSION | COMMENT
————|———|———-|———|———-|————-|——–
id | INT64 | false | | AUTO | DEFAULT |
name | STRING | false | | AUTO | DEFAULT |
department | STRING | true | | AUTO | DEFAULT |
salary | DOUBLE | true | | AUTO | DEFAULT |
hire_date | STRING | true | | AUTO | DEFAULT |
PRIMARY KEY (id)
7.2 插入测试数据
$ kudu table insert kudu01.fgedu.net.cn:7051,kudu02.fgedu.net.cn:7051,kudu03.fgedu.net.cn:7051 \
‘fgedudb.fgedu_employees’ \
‘(1, “张三”, “技术部”, 15000.00, “2020-01-15”)’ \
‘(2, “李四”, “市场部”, 12000.00, “2020-03-20”)’ \
‘(3, “王五”, “财务部”, 13000.00, “2020-05-10”)’ \
‘(4, “赵六”, “技术部”, 18000.00, “2019-08-01”)’ \
‘(5, “钱七”, “人事部”, 11000.00, “2021-02-15”)’
# 输出示例:
Inserted 5 rows
# 扫描表数据
$ kudu table scan kudu01.fgedu.net.cn:7051,kudu02.fgedu.net.cn:7051,kudu03.fgedu.net.cn:7051 \
‘fgedudb.fgedu_employees’
# 输出示例:
id: 1
name: “张三”
department: “技术部”
salary: 15000.0
hire_date: “2020-01-15”
id: 2
name: “李四”
department: “市场部”
salary: 12000.0
hire_date: “2020-03-20”
id: 3
name: “王五”
department: “财务部”
salary: 13000.0
hire_date: “2020-05-10”
id: 4
name: “赵六”
department: “技术部”
salary: 18000.0
hire_date: “2019-08-01”
id: 5
name: “钱七”
department: “人事部”
salary: 11000.0
hire_date: “2021-02-15”
Fetched 5 rows
7.3 更新和删除数据
$ kudu table update kudu01.fgedu.net.cn:7051,kudu02.fgedu.net.cn:7051,kudu03.fgedu.net.cn:7051 \
‘fgedudb.fgedu_employees’ \
‘(1, “张三”, “技术部”, 16000.00, “2020-01-15”)’
# 输出示例:
Updated 1 row
# 删除数据
$ kudu table delete kudu01.fgedu.net.cn:7051,kudu02.fgedu.net.cn:7051,kudu03.fgedu.net.cn:7051 \
‘fgedudb.fgedu_employees’ \
‘(5)’
# 输出示例:
Deleted 1 row
# 再次扫描验证
$ kudu table scan kudu01.fgedu.net.cn:7051,kudu02.fgedu.net.cn:7051,kudu03.fgedu.net.cn:7051 \
‘fgedudb.fgedu_employees’
# 输出示例:
id: 1
name: “张三”
department: “技术部”
salary: 16000.0
hire_date: “2020-01-15”
id: 2
name: “李四”
department: “市场部”
salary: 12000.0
hire_date: “2020-03-20”
id: 3
name: “王五”
department: “财务部”
salary: 13000.0
hire_date: “2020-05-10”
id: 4
name: “赵六”
department: “技术部”
salary: 18000.0
hire_date: “2019-08-01”
Fetched 4 rows
8. Kudu与Impala集成
Kudu与Impala紧密集成,可以通过Impala SQL语句操作Kudu表。
8.1 创建Impala外部表映射Kudu表
$ impala-shell -i impala01.fgedu.net.cn
# 创建数据库
[impala01.fgedu.net.cn:21000] > CREATE DATABASE fgedudb;
Query: create DATABASE fgedudb
# 创建Impala表映射到Kudu
[impala01.fgedu.net.cn:21000] > CREATE EXTERNAL TABLE fgedudb.fgedu_employees_impala
> STORED AS KUDU
> TBLPROPERTIES (
> ‘kudu.table_name’ = ‘fgedudb.fgedu_employees’,
> ‘kudu.master_addresses’ = ‘kudu01.fgedu.net.cn:7051,kudu02.fgedu.net.cn:7051,kudu03.fgedu.net.cn:7051’
> );
Query: create EXTERNAL TABLE fgedudb.fgedu_employees_impala
STORED AS KUDU
TBLPROPERTIES (
‘kudu.table_name’ = ‘fgedudb.fgedu_employees’,
‘kudu.master_addresses’ = ‘kudu01.fgedu.net.cn:7051,kudu02.fgedu.net.cn:7051,kudu03.fgedu.net.cn:7051’
)
# 查询Kudu表数据
[impala01.fgedu.net.cn:21000] > SELECT * FROM fgedudb.fgedu_employees_impala;
Query: select * from fgedudb.fgedu_employees_impala
+—-+——–+————+———-+————+
| id | name | department | salary | hire_date |
+—-+——–+————+———-+————+
| 1 | 张三 | 技术部 | 16000.00 | 2020-01-15 |
| 2 | 李四 | 市场部 | 12000.00 | 2020-03-20 |
| 3 | 王五 | 财务部 | 13000.00 | 2020-05-10 |
| 4 | 赵六 | 技术部 | 18000.00 | 2019-08-01 |
+—-+——–+————+———-+————+
Fetched 4 row(s) in 0.15s
8.2 通过Impala创建Kudu表
[impala01.fgedu.net.cn:21000] > CREATE TABLE fgedudb.fgedu_sales
> (
> sale_id BIGINT,
> product_id INT,
> customer_id INT,
> sale_amount DECIMAL(12,2),
> sale_date STRING,
> region STRING,
> PRIMARY KEY (sale_id)
> )
> PARTITION BY HASH(sale_id) PARTITIONS 16
> STORED AS KUDU
> TBLPROPERTIES (
> ‘kudu.master_addresses’ = ‘kudu01.fgedu.net.cn:7051,kudu02.fgedu.net.cn:7051,kudu03.fgedu.net.cn:7051’,
> ‘kudu.num_tablet_replicas’ = ‘3’
> );
Query: create TABLE fgedudb.fgedu_sales
(
sale_id BIGINT,
product_id INT,
customer_id INT,
sale_amount DECIMAL(12,2),
sale_date STRING,
region STRING,
PRIMARY KEY (sale_id)
)
PARTITION BY HASH(sale_id) PARTITIONS 16
STORED AS KUDU
TBLPROPERTIES (
‘kudu.master_addresses’ = ‘kudu01.fgedu.net.cn:7051,kudu02.fgedu.net.cn:7051,kudu03.fgedu.net.cn:7051’,
‘kudu.num_tablet_replicas’ = ‘3’
)
# 插入数据
[impala01.fgedu.net.cn:21000] > INSERT INTO fgedudb.fgedu_sales VALUES
> (1, 100, 1001, 1500.00, ‘2024-01-15’, ‘华东’),
> (2, 200, 1002, 2300.00, ‘2024-01-16’, ‘华南’),
> (3, 300, 1003, 1800.00, ‘2024-01-17’, ‘华北’);
Query: insert into fgedudb.fgedu_sales values
(1, 100, 1001, 1500.00, ‘2024-01-15’, ‘华东’),
(2, 200, 1002, 2300.00, ‘2024-01-16’, ‘华南’),
(3, 300, 1003, 1800.00, ‘2024-01-17’, ‘华北’)
Inserted 3 row(s) in 0.35s
# 查询数据
[impala01.fgedu.net.cn:21000] > SELECT * FROM fgedudb.fgedu_sales;
Query: select * from fgedudb.fgedu_sales
+———+————+————-+————-+———–+——–+
| sale_id | product_id | customer_id | sale_amount | sale_date | region |
+———+————+————-+————-+———–+——–+
| 1 | 100 | 1001 | 1500.00 | 2024-01-15| 华东 |
| 2 | 200 | 1002 | 2300.00 | 2024-01-16| 华南 |
| 3 | 300 | 1003 | 1800.00 | 2024-01-17| 华北 |
+———+————+————-+————-+———–+——–+
Fetched 3 row(s) in 0.12s
8.3 更新和删除操作
[impala01.fgedu.net.cn:21000] > UPDATE fgedudb.fgedu_sales
> SET sale_amount = 2000.00
> WHERE sale_id = 1;
Query: update fgedudb.fgedu_sales
set sale_amount = 2000.00
where sale_id = 1
Updated 1 row(s) in 0.15s
# 删除数据
[impala01.fgedu.net.cn:21000] > DELETE FROM fgedudb.fgedu_sales
> WHERE sale_id = 3;
Query: delete from fgedudb.fgedu_sales
where sale_id = 3
Deleted 1 row(s) in 0.12s
# Upsert操作(插入或更新)
[impala01.fgedu.net.cn:21000] > UPSERT INTO fgedudb.fgedu_sales VALUES
> (1, 100, 1001, 2500.00, ‘2024-01-15’, ‘华东’),
> (4, 400, 1004, 3200.00, ‘2024-01-18’, ‘华中’);
Query: upsert into fgedudb.fgedu_sales values
(1, 100, 1001, 2500.00, ‘2024-01-15’, ‘华东’),
(4, 400, 1004, 3200.00, ‘2024-01-18’, ‘华中’)
Upserted 2 row(s) in 0.18s
# 查询验证
[impala01.fgedu.net.cn:21000] > SELECT * FROM fgedudb.fgedu_sales;
Query: select * from fgedudb.fgedu_sales
+———+————+————-+————-+———–+——–+
| sale_id | product_id | customer_id | sale_amount | sale_date | region |
+———+————+————-+————-+———–+——–+
| 1 | 100 | 1001 | 2500.00 | 2024-01-15| 华东 |
| 2 | 200 | 1002 | 2300.00 | 2024-01-16| 华南 |
| 4 | 400 | 1004 | 3200.00 | 2024-01-18| 华中 |
+———+————+————-+————-+———–+——–+
Fetched 3 row(s) in 0.10s
9. Kudu性能优化
Kudu性能优化涉及多个方面,包括内存配置、磁盘I/O、表设计等。
9.1 内存优化配置
$ kudu tserver_status kudu01.fgedu.net.cn:7050
# 输出示例:
UUID: 4d5e6f7a8b9c0d1e2f3a4b5c6d7e8f9a
RPC Address: 192.168.1.51:7050
State: RUNNING
Num Tablets: 16
Memory Used: 15.5 GB
Memory Soft Limit: 80.0 GB
Memory Hard Limit: 100.0 GB
# 调整内存配置
# vi /data/kudu/conf/tserver.gflagfile
–memory_limit_hard_bytes=107374182400
–memory_soft_limit_in_bytes=85899345920
–memory_pressure_percentage=60
–memory_limit_soft_percentage=80
# 重启Tablet Server使配置生效
$ pkill -f kudu-tserver
$ nohup $KUDU_HOME/bin/kudu-tserver \
–flagfile=/data/kudu/conf/tserver.gflagfile \
> /data/kudu/logs/kudu-tserver.out 2>&1 &
9.2 表设计优化
[impala01.fgedu.net.cn:21000] > CREATE TABLE fgedudb.fgedu_sales_optimized
> (
> sale_id BIGINT,
> product_id INT,
> customer_id INT,
> sale_amount DECIMAL(12,2),
> sale_date STRING,
> region STRING,
> PRIMARY KEY (sale_id, sale_date)
> )
> PARTITION BY HASH(sale_id) PARTITIONS 8,
> RANGE(sale_date) (
> PARTITION VALUE = ‘2024-01’,
> PARTITION VALUE = ‘2024-02’,
> PARTITION VALUE = ‘2024-03’
> )
> STORED AS KUDU
> TBLPROPERTIES (
> ‘kudu.master_addresses’ = ‘kudu01.fgedu.net.cn:7051,kudu02.fgedu.net.cn:7051,kudu03.fgedu.net.cn:7051’,
> ‘kudu.num_tablet_replicas’ = ‘3’
> );
Query: create TABLE fgedudb.fgedu_sales_optimized
(
sale_id BIGINT,
product_id INT,
customer_id INT,
sale_amount DECIMAL(12,2),
sale_date STRING,
region STRING,
PRIMARY KEY (sale_id, sale_date)
)
PARTITION BY HASH(sale_id) PARTITIONS 8,
RANGE(sale_date) (
PARTITION VALUE = ‘2024-01’,
PARTITION VALUE = ‘2024-02’,
PARTITION VALUE = ‘2024-03’
)
STORED AS KUDU
TBLPROPERTIES (
‘kudu.master_addresses’ = ‘kudu01.fgedu.net.cn:7051,kudu02.fgedu.net.cn:7051,kudu03.fgedu.net.cn:7051’,
‘kudu.num_tablet_replicas’ = ‘3’
)
# 查看表结构
[impala01.fgedu.net.cn:21000] > DESCRIBE fgedudb.fgedu_sales_optimized;
Query: describe fgedudb.fgedu_sales_optimized
+————-+—————+———+
| name | type | comment |
+————-+—————+———+
| sale_id | bigint | |
| product_id | int | |
| customer_id | int | |
| sale_amount | decimal(12,2) | |
| sale_date | string | |
| region | string | |
+————-+—————+———+
9.3 压缩和编码优化
[impala01.fgedu.net.cn:21000] > CREATE TABLE fgedudb.fgedu_sales_compressed
> (
> sale_id BIGINT,
> product_id INT ENCODING RLE,
> customer_id INT ENCODING BIT_SHUFFLE,
> sale_amount DECIMAL(12,2) ENCODING BIT_SHUFFLE,
> sale_date STRING ENCODING PREFIX_ENCODING,
> region STRING ENCODING DICT_ENCODING,
> PRIMARY KEY (sale_id)
> )
> PARTITION BY HASH(sale_id) PARTITIONS 16
> STORED AS KUDU
> TBLPROPERTIES (
> ‘kudu.master_addresses’ = ‘kudu01.fgedu.net.cn:7051,kudu02.fgedu.net.cn:7051,kudu03.fgedu.net.cn:7051’,
> ‘kudu.num_tablet_replicas’ = ‘3’,
> ‘kudu.compression’ = ‘LZ4’
> );
Query: create TABLE fgedudb.fgedu_sales_compressed
(
sale_id BIGINT,
product_id INT ENCODING RLE,
customer_id INT ENCODING BIT_SHUFFLE,
sale_amount DECIMAL(12,2) ENCODING BIT_SHUFFLE,
sale_date STRING ENCODING PREFIX_ENCODING,
region STRING ENCODING DICT_ENCODING,
PRIMARY KEY (sale_id)
)
PARTITION BY HASH(sale_id) PARTITIONS 16
STORED AS KUDU
TBLPROPERTIES (
‘kudu.master_addresses’ = ‘kudu01.fgedu.net.cn:7051,kudu02.fgedu.net.cn:7051,kudu03.fgedu.net.cn:7051’,
‘kudu.num_tablet_replicas’ = ‘3’,
‘kudu.compression’ = ‘LZ4’
)
10. Kudu升级迁移
Kudu升级需要谨慎操作,确保数据安全和业务连续性。
10.1 升级前准备
$ kudu –version
kudu 1.15.0
# 备份配置文件
# cp -r /data/kudu/conf /backup/kudu_conf_$(date +%Y%m%d)
# 检查集群状态
$ kudu cluster_info kudu01.fgedu.net.cn:7051,kudu02.fgedu.net.cn:7051,kudu03.fgedu.net.cn:7051
# 输出示例:
Master addresses: 192.168.1.51:7051,192.168.1.52:7051,192.168.1.53:7051
Masters:
UUID: 1a2b3c4d5e6f7a8b9c0d1e2f3a4b5c6d
RPC Address: 192.168.1.51:7051
State: RUNNING
# 检查表状态
$ kudu table list kudu01.fgedu.net.cn:7051,kudu02.fgedu.net.cn:7051,kudu03.fgedu.net.cn:7051
# 输出示例:
fgedudb.fgedu_employees
fgedudb.fgedu_sales
10.2 执行滚动升级
# 在kudu01节点停止Tablet Server
$ pkill -f kudu-tserver
# 等待数据迁移完成
$ sleep 60
# 备份旧版本
# mv /data/kudu/kudu /data/kudu/kudu_1.15.0_backup
# 解压新版本
# cd /tmp
# tar -xzf apache-kudu-1.17.0.tar.gz
# mv apache-kudu-1.17.0 /data/kudu/kudu
# chown -R kudu:kudu /data/kudu/kudu
# 恢复配置文件
# cp -r /backup/kudu_conf_$(date +%Y%m%d)/* /data/kudu/conf/
# 启动新版本Tablet Server
$ nohup $KUDU_HOME/bin/kudu-tserver \
–flagfile=/data/kudu/conf/tserver.gflagfile \
> /data/kudu/logs/kudu-tserver.out 2>&1 &
# 检查服务状态
$ ps -ef | grep kudu-tserver
kudu 12567 1 15 10:01 ? 00:00:30 /data/kudu/kudu/bin/kudu-tserver
# 验证版本
$ kudu –version
kudu 1.17.0
# 重复以上步骤升级kudu02和kudu03节点
# 滚动升级Master(逐个节点)
# 在kudu01节点停止Master
$ pkill -f kudu-master
# 等待Master选举完成
$ sleep 30
# 启动新版本Master
$ nohup $KUDU_HOME/bin/kudu-master \
–flagfile=/data/kudu/conf/master.gflagfile \
> /data/kudu/logs/kudu-master.out 2>&1 &
# 重复以上步骤升级kudu02和kudu03节点
10.3 升级后验证
$ kudu cluster_info kudu01.fgedu.net.cn:7051,kudu02.fgedu.net.cn:7051,kudu03.fgedu.net.cn:7051
# 输出示例:
Master addresses: 192.168.1.51:7051,192.168.1.52:7051,192.168.1.53:7051
Masters:
UUID: 1a2b3c4d5e6f7a8b9c0d1e2f3a4b5c6d
RPC Address: 192.168.1.51:7051
State: RUNNING
Tablet Servers:
UUID: 4d5e6f7a8b9c0d1e2f3a4b5c6d7e8f9a
RPC Address: 192.168.1.51:7050
State: RUNNING
Num Tablets: 16
# 验证表数据
$ kudu table scan kudu01.fgedu.net.cn:7051,kudu02.fgedu.net.cn:7051,kudu03.fgedu.net.cn:7051 \
‘fgedudb.fgedu_employees’ | head -5
# 输出示例:
id: 1
name: “张三”
department: “技术部”
salary: 16000.0
hire_date: “2020-01-15”
# 验证Impala集成
$ impala-shell -i impala01.fgedu.net.cn -q “SELECT COUNT(*) FROM fgedudb.fgedu_sales;”
# 输出示例:
Query: select count(*) from fgedudb.fgedu_sales
+———-+
| count(*) |
+———-+
| 3 |
+———-+
Fetched 1 row(s) in 0.15s
11. Kudu监控运维
Kudu的监控运维包括服务状态监控、性能监控、日志管理等。
11.1 服务状态监控
$ ps -ef | grep -E ‘kudu-master|kudu-tserver’ | grep -v grep
kudu 12345 1 2 10:00 ? 00:05:23 /data/kudu/kudu/bin/kudu-master
kudu 12567 1 15 10:01 ? 00:35:67 /data/kudu/kudu/bin/kudu-tserver
# 检查端口监听
$ netstat -tlnp | grep -E ‘7050|7051|8050|8051’
tcp 0 0 0.0.0.0:7050 0.0.0.0:* LISTEN 12567/kudu-tserver
tcp 0 0 0.0.0.0:7051 0.0.0.0:* LISTEN 12345/kudu-master
tcp 0 0 0.0.0.0:8050 0.0.0.0:* LISTEN 12567/kudu-tserver
tcp 0 0 0.0.0.0:8051 0.0.0.0:* LISTEN 12345/kudu-master
# 通过Web UI监控
# Master Web UI: http://192.168.1.51:8051
# Tablet Server Web UI: http://192.168.1.51:8050
# 查看Tablet Server状态
$ kudu tserver_status kudu01.fgedu.net.cn:7050
# 输出示例:
UUID: 4d5e6f7a8b9c0d1e2f3a4b5c6d7e8f9a
RPC Address: 192.168.1.51:7050
State: RUNNING
Num Tablets: 16
Memory Used: 15.5 GB
11.2 性能监控
$ kudu cluster_info kudu01.fgedu.net.cn:7051,kudu02.fgedu.net.cn:7051,kudu03.fgedu.net.cn:7051
# 输出示例:
Cluster Statistics:
Total Tablets: 48
Total Replicas: 144
Total Disk Size: 125.5 GB
Total Memory Used: 45.2 GB
Average Tablet Size: 2.6 GB
# 查看表统计信息
$ kudu table statistics kudu01.fgedu.net.cn:7051,kudu02.fgedu.net.cn:7051,kudu03.fgedu.net.cn:7051 \
‘fgedudb.fgedu_sales’
# 输出示例:
Table: fgedudb.fgedu_sales
Statistics:
Total Tablets: 16
Total Replicas: 48
Total Disk Size: 12.5 GB
Total Memory Used: 3.2 GB
On-disk Size: 12.5 GB
Live Row Count: 1000000
# 查看Tablet分布
$ kudu table list kudu01.fgedu.net.cn:7051,kudu02.fgedu.net.cn:7051,kudu03.fgedu.net.cn:7051 \
–list_tablets
# 输出示例:
fgedudb.fgedu_sales:
Tablet ID: 1a2b3c4d5e6f7a8b9c0d1e2f3a4b5c6d
Partition: HASH (sale_id) = 0
State: RUNNING
Replicas: 3
Tablet ID: 2b3c4d5e6f7a8b9c0d1e2f3a4b5c6d7e
Partition: HASH (sale_id) = 1
State: RUNNING
Replicas: 3
11.3 日志管理
$ tail -100 /data/kudu/logs/kudu-master.INFO
I0405 10:00:00.000000 12345 master_main.cc:65] Master server started
I0405 10:00:00.000100 12345 master.cc:80] Master listening on 0.0.0.0:7051
# 查看Tablet Server日志
$ tail -100 /data/kudu/logs/kudu-tserver.INFO
I0405 10:01:00.000000 12567 tserver_main.cc:65] Tablet server started
I0405 10:01:00.000100 12567 tserver.cc:80] Tablet server listening on 0.0.0.0:7050
# 查看错误日志
$ tail -100 /data/kudu/logs/kudu-tserver.ERROR
E0405 10:05:00.000000 12567 tserver.cc:300] Tablet write failed
# 日志轮转配置
$ cat /data/kudu/conf/tserver.gflagfile | grep log
–log_dir=/data/kudu/logs
–log_filename=kudu-tserver
–log_level=1
–max_log_files=10
–max_log_size=100
12. Kudu备份恢复
Kudu的备份恢复是数据安全的重要保障。
12.1 数据备份
$ impala-shell -i impala01.fgedu.net.cn -q ”
INSERT OVERWRITE DIRECTORY ‘/backup/kudu/fgedu_sales_$(date +%Y%m%d)’
STORED AS PARQUET
SELECT * FROM fgedudb.fgedu_sales;”
# 输出示例:
Query: insert overwrite directory ‘/backup/kudu/fgedu_sales_20260405’
stored as parquet
select * from fgedudb.fgedu_sales
Inserted 3 row(s) in 0.35s
# 验证备份文件
$ hdfs dfs -ls /backup/kudu/fgedu_sales_20260405
Found 1 items
-rw-r–r– 3 hadoop supergroup 1024 2024-04-05 10:00 /backup/kudu/fgedu_sales_20260405/1234567890.parquet
# 使用Kudu工具导出表结构
$ kudu table describe kudu01.fgedu.net.cn:7051,kudu02.fgedu.net.cn:7051,kudu03.fgedu.net.cn:7051 \
‘fgedudb.fgedu_sales’ > /backup/kudu/fgedu_sales_schema_$(date +%Y%m%d).txt
# 备份配置文件
# tar -czf /backup/kudu/kudu_conf_$(date +%Y%m%d).tar.gz /data/kudu/conf
12.2 数据恢复
$ impala-shell -i impala01.fgedu.net.cn
# 创建新表
[impala01.fgedu.net.cn:21000] > CREATE TABLE fgedudb.fgedu_sales_restore
> (
> sale_id BIGINT,
> product_id INT,
> customer_id INT,
> sale_amount DECIMAL(12,2),
> sale_date STRING,
> region STRING,
> PRIMARY KEY (sale_id)
> )
> PARTITION BY HASH(sale_id) PARTITIONS 16
> STORED AS KUDU
> TBLPROPERTIES (
> ‘kudu.master_addresses’ = ‘kudu01.fgedu.net.cn:7051,kudu02.fgedu.net.cn:7051,kudu03.fgedu.net.cn:7051’,
> ‘kudu.num_tablet_replicas’ = ‘3’
> );
# 从备份文件导入数据
[impala01.fgedu.net.cn:21000] > INSERT INTO fgedudb.fgedu_sales_restore
> SELECT * FROM parquet.’/backup/kudu/fgedu_sales_20260405/*.parquet’;
# 输出示例:
Inserted 3 row(s) in 0.45s
# 验证恢复数据
[impala01.fgedu.net.cn:21000] > SELECT COUNT(*) FROM fgedudb.fgedu_sales_restore;
Query: select count(*) from fgedudb.fgedu_sales_restore
+———-+
| count(*) |
+———-+
| 3 |
+———-+
Fetched 1 row(s) in 0.12s
12.3 定期备份脚本
# vi /backup/scripts/kudu_backup.sh
#!/bin/bash
# Kudu备份脚本
BACKUP_DATE=$(date +%Y%m%d)
BACKUP_DIR=”/backup/kudu”
KUDU_MASTERS=”kudu01.fgedu.net.cn:7051,kudu02.fgedu.net.cn:7051,kudu03.fgedu.net.cn:7051″
IMPALA_HOST=”impala01.fgedu.net.cn”
# 创建备份目录
mkdir -p $BACKUP_DIR/$BACKUP_DATE
# 获取所有表列表
TABLES=$(impala-shell -i $IMPALA_HOST -q “SHOW TABLES IN fgedudb” -B)
# 备份每个表
for TABLE in $TABLES; do
echo “Backing up table: $TABLE”
# 导出数据到HDFS
impala-shell -i $IMPALA_HOST -q ”
INSERT OVERWRITE DIRECTORY ‘$BACKUP_DIR/$BACKUP_DATE/${TABLE}’
STORED AS PARQUET
SELECT * FROM fgedudb.$TABLE;”
# 导出表结构
impala-shell -i $IMPALA_HOST -q “SHOW CREATE TABLE fgedudb.$TABLE;” > $BACKUP_DIR/$BACKUP_DATE/${TABLE}_schema.sql
done
# 备份配置文件
tar -czf $BACKUP_DIR/$BACKUP_DATE/kudu_conf.tar.gz /data/kudu/conf
# 清理30天前的备份
find $BACKUP_DIR -type d -mtime +30 -exec rm -rf {} \;
echo “Backup completed at $(date)”
# 添加执行权限
# chmod +x /backup/scripts/kudu_backup.sh
# 配置定时任务
# crontab -e
0 2 * * * /backup/scripts/kudu_backup.sh >> /backup/logs/kudu_backup.log 2>&1
本文由风哥教程整理发布,仅用于学习测试使用,转载注明出处:http://www.fgedu.net.cn/10327.html
