目录大纲
Part01-基础概念与理论知识
1.1 HBase架构概述
1.2 HBase数据模型
1.3 HBase存储原理
Part02-生产环境规划与建议
2.1 集群部署规划
2.2 表设计规划
2.3 性能优化规划
Part03-生产环境项目实施方案
3.1 HBase安装配置
3.2 表与数据管理
3.3 数据读写操作
3.4 集群管理维护
Part04-生产案例与实战讲解
4.1 实时查询案例
4.2 批量导入案例
4.3 性能调优案例
Part05-风哥经验总结与分享
5.1 HBase最佳实践
5.2 性能优化经验总结
Part01-基础概念与理论知识
1.1 HBase架构概述
HBase是构建在Hadoop之上的分布式列式数据库。更多视频教程www.fgedu.net.cn 提供高可靠性、高性能、可伸缩的数据存储。
1.2 HBase数据模型
HBase数据模型包括表、行、列族、列限定符等概念。学习交流加群风哥微信: itpux-com
– Table:表,由行组成
– Row:行,由RowKey唯一标识
– Column Family:列族,列的逻辑分组
– Column Qualifier:列限定符,列的具体名称
– Timestamp:时间戳,版本标识
1.3 HBase存储原理
HBase基于LSM树存储,数据先写入MemStore,再刷写到HFile。from bigdata视频:www.itpux.com
hbase version
# 查看HBase状态
hbase hbck
HBase 2.4.14
Source code repository git://git.apache.org/hbase.git revision=1234567890abcdef
# HBase状态
Number of live regionservers: 6
Number of dead regionservers: 0
Number of regions: 100
Number of tables: 10
# 集群状态正常
Part02-生产环境规划与建议
2.1 集群部署规划
HBase集群部署需要考虑节点角色和资源分配。更多学习教程公众号风哥教程itpux_com
– Master节点:2个,配置高可用
– RegionServer节点:根据数据量确定
– ZooKeeper节点:3或5个
– 内存配置:RegionServer建议32GB以上
2.2 表设计规划
表设计是HBase性能的关键。学习交流加群风哥QQ113257174
echo “list” | hbase shell
# 查看表结构
echo “describe ‘fgedu_user'” | hbase shell
TABLE
fgedu_order
fgedu_user
fgedu_log
3 row(s)
Took 0.3000 seconds
# 表结构
Table fgedu_user is ENABLED
fgedu_user
COLUMN FAMILIES DESCRIPTION
{NAME => ‘cf’, BLOOMFILTER => ‘ROW’, VERSIONS => ‘3’, COMPRESSION => ‘SNAPPY’, …}
1 row(s)
Took 0.1000 seconds
2.3 性能优化规划
性能优化需要从多个角度考虑。风哥提示:RowKey设计是HBase性能优化的核心。
– RowKey设计避免热点
– 合理设置列族数量
– 使用压缩算法
– 预分区表
– 调整MemStore大小
Part03-生产环境项目实施方案
3.1 HBase安装配置
3.1.1 安装HBase
wget https://downloads.apache.org/hbase/2.4.14/hbase-2.4.14-bin.tar.gz
tar -xzf hbase-2.4.14-bin.tar.gz -C /bigdata/app/
ln -s /bigdata/app/hbase-2.4.14 /bigdata/app/hbase
# 配置环境变量
export HBASE_HOME=/bigdata/app/hbase
export PATH=$PATH:$HBASE_HOME/bin
# 配置hbase-site.xml
cat > /bigdata/app/hbase/conf/hbase-site.xml << 'EOF'
<configuration>
<property>
<name>hbase.rootdir</name>
<value>hdfs://fgedu01:9000/hbase</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>fgedu01,fgedu02,fgedu03</value>
</property>
</configuration>
EOF
# 验证安装
hbase version
# 完成
# 环境变量
# 配置完成
# 配置文件
# 配置完成
# 验证安装
HBase 2.4.14
Source code repository git://git.apache.org/hbase.git revision=1234567890abcdef
# HBase安装成功
3.1.2 启动HBase
start-hbase.sh
# 查看进程
jps | grep -E “HMaster|HRegionServer”
# 访问Web UI
echo “HBase Web UI: http://fgedu01:16010”
# 验证集群状态
echo “status” | hbase shell
starting master, logging to /bigdata/app/hbase/logs/hbase-root-master-fgedu01.log
fgedu02: starting regionserver, logging to /bigdata/app/hbase/logs/hbase-root-regionserver-fgedu02.log
fgedu03: starting regionserver, logging to /bigdata/app/hbase/logs/hbase-root-regionserver-fgedu03.log
# 进程查看
12345 HMaster
12346 HRegionServer
# Web UI
HBase Web UI: http://fgedu01:16010
# 集群状态
active master: fgedu01,16000,1705540000000
backup master: fgedu02,16000,1705540000000
number of live regionservers: 6
number of dead regionservers: 0
# HBase启动成功
3.2 表与数据管理
3.2.1 创建表
hbase shell
# 创建表
create ‘fgedu_user’, ‘cf’, {NAME => ‘info’, VERSIONS => 3}, {NAME => ‘order’, VERSIONS => 1}
# 创建预分区表
create ‘fgedu_order’, ‘cf’, {SPLITS => [’10’,’20’,’30’,’40’,’50’,’60’,’70’,’80’,’90’]}
# 查看表
list
# 查看表结构
describe ‘fgedu_user’
HBase Shell
Use “help” to get list of supported commands.
hbase(main):001:0>
# 创建表
Created table fgedu_user
Took 2.3000 seconds
# 创建预分区表
Created table fgedu_order
Took 2.5000 seconds
# 表列表
TABLE
fgedu_order
fgedu_user
2 row(s)
Took 0.3000 seconds
# 表结构
Table fgedu_user is ENABLED
fgedu_user
COLUMN FAMILIES DESCRIPTION
{NAME => ‘cf’, BLOOMFILTER => ‘ROW’, VERSIONS => ‘1’, …}
{NAME => ‘info’, BLOOMFILTER => ‘ROW’, VERSIONS => ‘3’, …}
{NAME => ‘order’, BLOOMFILTER => ‘ROW’, VERSIONS => ‘1’, …}
3.2.2 修改表
disable ‘fgedu_user’
# 添加列族
alter ‘fgedu_user’, NAME => ‘extra’, VERSIONS => 1
# 删除列族
alter ‘fgedu_user’, NAME => ‘extra’, METHOD => ‘delete’
# 修改列族属性
alter ‘fgedu_user’, {NAME => ‘info’, VERSIONS => 5}
# 启用表
enable ‘fgedu_user’
# 删除表
disable ‘fgedu_test’
drop ‘fgedu_test’
Took 1.2000 seconds
# 添加列族
Updating all regions with the new schema…
Took 2.3000 seconds
# 删除列族
Updating all regions with the new schema…
Took 2.1000 seconds
# 修改列族属性
Updating all regions with the new schema…
Took 2.2000 seconds
# 启用表
Took 1.5000 seconds
# 删除表
Took 1.2000 seconds
Took 0.5000 seconds
3.3 数据读写操作
3.3.1 数据写入
put ‘fgedu_user’, ‘user001’, ‘cf:name’, ‘fgedu01’
put ‘fgedu_user’, ‘user001’, ‘cf:age’, ’25’
put ‘fgedu_user’, ‘user001’, ‘cf:gender’, ‘M’
put ‘fgedu_user’, ‘user001’, ‘info:email’, ‘fgedu01@fgedu.net.cn’
# 批量插入
put ‘fgedu_user’, ‘user002’, ‘cf:name’, ‘fgedu02’
put ‘fgedu_user’, ‘user002’, ‘cf:age’, ’30’
put ‘fgedu_user’, ‘user002’, ‘cf:gender’, ‘F’
# 使用批量写入
echo “put ‘fgedu_user’, ‘user003’, ‘cf:name’, ‘fgedu03′” | hbase shell
echo “put ‘fgedu_user’, ‘user003’, ‘cf:age’, ’28′” | hbase shell
Took 0.1000 seconds
Took 0.0500 seconds
Took 0.0500 seconds
Took 0.0500 seconds
# 批量插入
Took 0.0500 seconds
Took 0.0500 seconds
Took 0.0500 seconds
# 批量写入
Took 0.1000 seconds
Took 0.1000 seconds
3.3.2 数据查询
get ‘fgedu_user’, ‘user001’
# 查询指定列族
get ‘fgedu_user’, ‘user001’, ‘cf’
# 查询指定列
get ‘fgedu_user’, ‘user001’, ‘cf:name’
# 查询多个版本
get ‘fgedu_user’, ‘user001’, {COLUMN => ‘info’, VERSIONS => 3}
# 扫描表
scan ‘fgedu_user’, {LIMIT => 10}
# 范围扫描
scan ‘fgedu_user’, {STARTROW => ‘user001’, STOPROW => ‘user003’}
# 过滤扫描
scan ‘fgedu_user’, {FILTER => “ValueFilter(=, ‘binary:fgedu01’)”}
COLUMN CELL
cf:age timestamp=1705540800000, value=25
cf:gender timestamp=1705540800000, value=M
cf:name timestamp=1705540800000, value=fgedu01
info:email timestamp=1705540800000, value=fgedu01@fgedu.net.cn
1 row(s)
Took 0.1000 seconds
# 查询指定列族
COLUMN CELL
cf:age timestamp=1705540800000, value=25
cf:gender timestamp=1705540800000, value=M
cf:name timestamp=1705540800000, value=fgedu01
1 row(s)
Took 0.0500 seconds
# 查询指定列
COLUMN CELL
cf:name timestamp=1705540800000, value=fgedu01
1 row(s)
Took 0.0500 seconds
# 扫描表
ROW COLUMN+CELL
user001 column=cf:age, timestamp=1705540800000, value=25
user001 column=cf:gender, timestamp=1705540800000, value=M
user001 column=cf:name, timestamp=1705540800000, value=fgedu01
user002 column=cf:age, timestamp=1705540800000, value=30
user002 column=cf:gender, timestamp=1705540800000, value=F
user002 column=cf:name, timestamp=1705540800000, value=fgedu02
2 row(s)
Took 0.1000 seconds
3.3.3 数据删除
delete ‘fgedu_user’, ‘user001’, ‘cf:gender’
# 删除指定列族
deleteall ‘fgedu_user’, ‘user001’, ‘cf’
# 删除整行
deleteall ‘fgedu_user’, ‘user003’
# 验证删除
get ‘fgedu_user’, ‘user001’
Took 0.0500 seconds
# 删除指定列族
Took 0.0500 seconds
# 删除整行
Took 0.0500 seconds
# 验证删除
COLUMN CELL
info:email timestamp=1705540800000, value=fgedu01@fgedu.net.cn
1 row(s)
Took 0.0500 seconds
3.4 集群管理维护
3.4.1 集群状态监控
echo “status” | hbase shell
# 查看Region分布
echo “status ‘detailed'” | hbase shell
# 查看表Region分布
echo “describe ‘fgedu_user'” | hbase shell | grep -E “region|start|end”
# 查看负载均衡状态
echo “balancer_enabled” | hbase shell
active master: fgedu01,16000,1705540000000
backup master: fgedu02,16000,1705540000000
number of live regionservers: 6
number of dead regionservers: 0
# Region分布
fgedu01:16020
numberOfRegions: 20
fgedu02:16020
numberOfRegions: 18
fgedu03:16020
numberOfRegions: 22
# 表Region分布
fgedu_user,,1705540000000.fgedu01
fgedu_user,user005,1705540000000.fgedu02
# 负载均衡状态
true
3.4.2 Region管理
echo “major_compact ‘fgedu_user'” | hbase shell
# 手动分裂Region
echo “split ‘fgedu_user’, ‘user050′” | hbase shell
# 手动移动Region
echo “move ‘region_encoded_name’, ‘fgedu02,16020,1705540000000′” | hbase shell
# 手动触发负载均衡
echo “balancer” | hbase shell
Took 30.0000 seconds
# 分裂Region
Took 5.0000 seconds
# 移动Region
Took 2.0000 seconds
# 负载均衡
true
Took 10.0000 seconds
Part04-生产案例与实战讲解
4.1 实时查询案例
HBase适合实时查询场景。更多视频教程www.fgedu.net.cn
# hbase_realtime_query.sh
# from:www.itpux.com.qq113257174.wx:itpux-com
# web: http://www.fgedu.net.cn
echo “=== HBase Realtime Query ===”
echo “Date: $(date)”
# 查询用户信息
echo “Query user info…”
echo “get ‘fgedu_user’, ‘user001′” | hbase shell
# 查询订单信息
echo “Query order info…”
echo “scan ‘fgedu_order’, {STARTROW => ‘order001’, LIMIT => 10}” | hbase shell
# 统计查询
echo “Count records…”
echo “count ‘fgedu_user'” | hbase shell
echo “=== Query Completed ===”
./hbase_realtime_query.sh
=== HBase Realtime Query ===
Date: Thu Jan 18 23:00:00 CST 2024
Query user info…
COLUMN CELL
cf:age timestamp=1705540800000, value=25
cf:name timestamp=1705540800000, value=fgedu01
1 row(s)
Query order info…
ROW COLUMN+CELL
order001 column=cf:amount, value=100.00
order001 column=cf:user_id, value=user001
order002 column=cf:amount, value=200.00
order002 column=cf:user_id, value=user002
2 row(s)
Count records…
Current count: 1000, row: user999
1000 row(s)
Took 5.0000 seconds
=== Query Completed ===
4.2 批量导入案例
批量导入是HBase数据迁移的重要方式。学习交流加群风哥微信: itpux-com
cat > /tmp/hbase_data.txt << 'EOF'
put ‘fgedu_user’, ‘user101’, ‘cf:name’, ‘fgedu101’
put ‘fgedu_user’, ‘user101’, ‘cf:age’, ’25’
put ‘fgedu_user’, ‘user102’, ‘cf:name’, ‘fgedu102’
put ‘fgedu_user’, ‘user102’, ‘cf:age’, ’30’
put ‘fgedu_user’, ‘user103’, ‘cf:name’, ‘fgedu103’
put ‘fgedu_user’, ‘user103’, ‘cf:age’, ’28’
EOF
# 批量导入
hbase shell /tmp/hbase_data.txt
# 使用Bulk Load导入
hbase org.apache.hadoop.hbase.mapreduce.ImportTsv \
-Dimporttsv.columns=HBASE_ROW_KEY,cf:name,cf:age \
-Dimporttsv.bulk.output=/tmp/hbase_output \
fgedu_user \
/bigdata/data/user_data.tsv
# 完成Bulk Load
hbase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles \
/tmp/hbase_output \
fgedu_user
# 准备完成
# 批量导入
Took 0.1000 seconds
Took 0.0500 seconds
Took 0.0500 seconds
Took 0.0500 seconds
Took 0.0500 seconds
Took 0.0500 seconds
# Bulk Load导入
24/01/18 23:30:00 INFO mapreduce.ImportTsv: Import complete
# 完成Bulk Load
24/01/18 23:35:00 INFO mapreduce.LoadIncrementalHFiles: Load complete
# 批量导入成功
4.3 性能调优案例
4.3.1 RowKey设计优化
# user001, user002, user003…
# 优化后的RowKey设计(加盐)
# 0_user001, 1_user002, 2_user003…
# 创建预分区表
echo “create ‘fgedu_user_optimized’, ‘cf’, {SPLITS => [‘0′,’1′,’2′,’3′,’4′,’5′,’6′,’7′,’8′,’9’]}” | hbase shell
# 插入数据
for i in {1..100}; do
salt=$((i % 10))
echo “put ‘fgedu_user_optimized’, ‘${salt}_user${i}’, ‘cf:name’, ‘fgedu${i}'” | hbase shell
done
# 验证数据分布
echo “status ‘detailed'” | hbase shell | grep “numberOfRegions”
Created table fgedu_user_optimized
Took 2.3000 seconds
# 插入数据
Took 0.0500 seconds
…
Took 0.0500 seconds
# 数据分布验证
fgedu01:16020
numberOfRegions: 15
fgedu02:16020
numberOfRegions: 15
fgedu03:16020
numberOfRegions: 15
# 数据分布均匀
4.3.2 参数优化
cat >> /bigdata/app/hbase/conf/hbase-site.xml << 'EOF'
<property>
<name>hbase.regionserver.global.memstore.size</name>
<value>0.4</value>
</property>
<property>
<name>hbase.hregion.memstore.flush.size</name>
<value>134217728</value>
</property>
<property>
<name>hbase.regionserver.handler.count</name>
<value>200</value>
</property>
EOF
# 重启HBase
stop-hbase.sh
start-hbase.sh
# 验证配置
echo “status” | hbase shell
# 配置完成
# 重启HBase
stopping hbase……….
starting master, logging to /bigdata/app/hbase/logs/hbase-root-master-fgedu01.log
# 验证配置
active master: fgedu01,16000,1705540800000
number of live regionservers: 6
# 配置生效
Part05-风哥经验总结与分享
5.1 HBase最佳实践
在实际生产环境中,HBase使用需要注意以下几点:from bigdata视频:www.itpux.com
1. RowKey设计避免热点
2. 合理设置列族数量
3. 使用预分区表
4. 定期执行Major Compaction
5. 监控集群状态
5.2 性能优化经验总结
5.2.1 优化建议
– RowKey设计避免热点
– 使用压缩算法
– 合理设置MemStore大小
– 调整Handler数量
– 监控GC情况
5.2.2 HBase运维脚本
# hbase_maintenance.sh
# from:www.itpux.com.qq113257174.wx:itpux-com
# web: http://www.fgedu.net.cn
echo “=== HBase Maintenance ===”
echo “Date: $(date)”
# 1. 检查集群状态
echo “=== Check Cluster Status ===”
echo “status” | hbase shell
# 2. 检查表状态
echo “=== Check Table Status ===”
echo “list” | hbase shell
# 3. 执行Major Compaction
echo “=== Major Compaction ===”
echo “major_compact ‘fgedu_user'” | hbase shell
# 4. 检查Region分布
echo “=== Check Region Distribution ===”
echo “status ‘detailed'” | hbase shell | grep “numberOfRegions”
echo “=== Maintenance Completed ===”
./hbase_maintenance.sh
=== HBase Maintenance ===
Date: Fri Jan 19 00:00:00 CST 2024
=== Check Cluster Status ===
active master: fgedu01,16000,1705540800000
number of live regionservers: 6
=== Check Table Status ===
TABLE
fgedu_order
fgedu_user
fgedu_user_optimized
=== Major Compaction ===
Took 60.0000 seconds
=== Check Region Distribution ===
numberOfRegions: 20
numberOfRegions: 18
numberOfRegions: 22
=== Maintenance Completed ===
本文由风哥教程整理发布,仅用于学习测试使用,转载注明出处:http://www.fgedu.net.cn/10327.html
