ElasticSearch教程FG012-ElasticSearch聚合统计分析实战
Part01-基础概念与理论知识
1.1 聚合概念
聚合(Aggregation)是ElasticSearch中用于数据统计和分析的功能,类似于SQL中的GROUP BY、SUM、AVG等操作。聚合可以对数据进行各种统计分析,如求和、平均值、最大值、最小值、分组等。更多视频教程www.fgedu.net.cn
聚合的特点:
- 支持多种聚合类型
- 可以嵌套使用
- 支持复杂的数据分析
- 性能高效
1.2 聚合类型
ElasticSearch支持多种聚合类型:
- 指标聚合(Metric Aggregations):计算数值指标,如sum、avg、max、min、count等
- 桶聚合(Bucket Aggregations):将数据分组,如terms、range、date_histogram等
- 管道聚合(Pipeline Aggregations):基于其他聚合的结果进行计算
- 矩阵聚合(Matrix Aggregations):处理多个字段的关系
1.3 聚合结构
聚合的基本结构:
- 聚合名称:自定义的聚合标识符
- 聚合类型:指定使用哪种聚合
- 聚合参数:根据聚合类型设置相应的参数
- 子聚合:可以在聚合内部嵌套其他聚合
学习交流加群风哥微信: itpux-com
Part02-生产环境规划与建议
2.1 聚合性能优化
聚合性能优化的方法:
- 合理设置size参数,避免返回过多结果
- 使用近似聚合减少计算开销
- 避免深度嵌套聚合
- 使用filter查询减少聚合的数据量
- 优化索引结构和Mapping
2.2 聚合缓存策略
聚合缓存策略:
- 使用请求缓存缓存聚合结果
- 合理设置缓存大小
- 监控缓存命中率
- 避免缓存膨胀
2.3 生产环境最佳实践
生产环境中,聚合操作应注意:
- 限制聚合的复杂度
- 监控聚合性能
- 设置合理的超时时间
- 使用滚动查询处理大量数据
- 考虑使用专门的分析工具
学习交流加群风哥QQ113257174
Part03-生产环境项目实施方案
3.1 指标聚合实战
指标聚合的使用:
curl -X GET “http://192.168.1.10:9200/fgedu-products/_search” -H “Content-Type: application/json” -d ‘{
“size”: 0,
“aggs”: {
“total_products”: {
“count”: {}
},
“avg_price”: {
“avg”: {
“field”: “price”
}
},
“max_price”: {
“max”: {
“field”: “price”
}
},
“min_price”: {
“min”: {
“field”: “price”
}
},
“sum_stock”: {
“sum”: {
“field”: “stock”
}
}
}
}’
# 执行
# 输出日志
{
“took”: 15,
“timed_out”: false,
“_shards”: {
“total”: 5,
“successful”: 5,
“skipped”: 0,
“failed”: 0
},
“hits”: {
“total”: {
“value”: 3,
“relation”: “eq”
},
“max_score”: null,
“hits”: []
},
“aggregations”: {
“total_products”: {
“value”: 3
},
“avg_price”: {
“value”: 99.9
},
“max_price”: {
“value”: 119.9
},
“min_price”: {
“value”: 79.9
},
“sum_stock”: {
“value”: 215
}
}
}
3.2 桶聚合实战
桶聚合的使用:
curl -X GET “http://192.168.1.10:9200/fgedu-products/_search” -H “Content-Type: application/json” -d ‘{
“size”: 0,
“aggs”: {
“category_stats”: {
“terms”: {
“field”: “category”
},
“aggs”: {
“product_count”: {
“count”: {}
},
“avg_price”: {
“avg”: {
“field”: “price”
}
},
“sum_stock”: {
“sum”: {
“field”: “stock”
}
}
}
}
}
}’
# 执行
# 输出日志
{
“took”: 20,
“timed_out”: false,
“_shards”: {
“total”: 5,
“successful”: 5,
“skipped”: 0,
“failed”: 0
},
“hits”: {
“total”: {
“value”: 3,
“relation”: “eq”
},
“max_score”: null,
“hits”: []
},
“aggregations”: {
“category_stats”: {
“doc_count_error_upper_bound”: 0,
“sum_other_doc_count”: 0,
“buckets”: [
{
“key”: “技术书籍”,
“doc_count”: 3,
“product_count”: {
“value”: 3
},
“avg_price”: {
“value”: 99.9
},
“sum_stock”: {
“value”: 215
}
}
]
}
}
}
# 范围桶聚合
curl -X GET “http://192.168.1.10:9200/fgedu-products/_search” -H “Content-Type: application/json” -d ‘{
“size”: 0,
“aggs”: {
“price_ranges”: {
“range”: {
“field”: “price”,
“ranges”: [
{
“to”: 80
},
{
“from”: 80,
“to”: 100
},
{
“from”: 100
}
]
},
“aggs”: {
“product_count”: {
“count”: {}
}
}
}
}
}’
# 执行
# 输出日志
{
“took”: 15,
“timed_out”: false,
“_shards”: {
“total”: 5,
“successful”: 5,
“skipped”: 0,
“failed”: 0
},
“hits”: {
“total”: {
“value”: 3,
“relation”: “eq”
},
“max_score”: null,
“hits”: []
},
“aggregations”: {
“price_ranges”: {
“buckets”: [
{
“key”: “*-80.0”,
“to”: 80.0,
“doc_count”: 1,
“product_count”: {
“value”: 1
}
},
{
“key”: “80.0-100.0”,
“from”: 80.0,
“to”: 100.0,
“doc_count”: 1,
“product_count”: {
“value”: 1
}
},
{
“key”: “100.0-*”,
“from”: 100.0,
“doc_count”: 1,
“product_count”: {
“value”: 1
}
}
]
}
}
}
3.3 管道聚合实战
管道聚合的使用:
curl -X GET “http://192.168.1.10:9200/fgedu-products/_search” -H “Content-Type: application/json” -d ‘{
“size”: 0,
“aggs”: {
“category_stats”: {
“terms”: {
“field”: “category”
},
“aggs”: {
“avg_price”: {
“avg”: {
“field”: “price”
}
}
}
},
“avg_price_overall”: {
“avg_bucket”: {
“buckets_path”: “category_stats>avg_price”
}
}
}
}’
# 执行
# 输出日志
{
“took”: 25,
“timed_out”: false,
“_shards”: {
“total”: 5,
“successful”: 5,
“skipped”: 0,
“failed”: 0
},
“hits”: {
“total”: {
“value”: 3,
“relation”: “eq”
},
“max_score”: null,
“hits”: []
},
“aggregations”: {
“category_stats”: {
“doc_count_error_upper_bound”: 0,
“sum_other_doc_count”: 0,
“buckets”: [
{
“key”: “技术书籍”,
“doc_count”: 3,
“avg_price”: {
“value”: 99.9
}
}
]
},
“avg_price_overall”: {
“value”: 99.9
}
}
}
更多学习教程公众号风哥教程itpux_com
Part04-生产案例与实战讲解
4.1 电商销售统计
电商销售统计场景:
curl -X GET “http://192.168.1.10:9200/fgedu-orders/_search” -H “Content-Type: application/json” -d ‘{
“size”: 0,
“query”: {
“range”: {
“order_date”: {
“gte”: “2024-01-01”,
“lte”: “2024-01-31”
}
}
},
“aggs”: {
“daily_sales”: {
“date_histogram”: {
“field”: “order_date”,
“interval”: “day”,
“format”: “yyyy-MM-dd”
},
“aggs”: {
“total_amount”: {
“sum”: {
“field”: “amount”
}
},
“order_count”: {
“count”: {}
}
}
},
“category_stats”: {
“terms”: {
“field”: “category”
},
“aggs”: {
“total_amount”: {
“sum”: {
“field”: “amount”
}
},
“order_count”: {
“count”: {}
}
}
},
“total_stats”: {
“stats”: {
“field”: “amount”
}
}
}
}’
# 执行
# 输出日志
{
“took”: 30,
“timed_out”: false,
“_shards”: {
“total”: 5,
“successful”: 5,
“skipped”: 0,
“failed”: 0
},
“hits”: {
“total”: {
“value”: 1000,
“relation”: “eq”
},
“max_score”: null,
“hits”: []
},
“aggregations”: {
“daily_sales”: {
“buckets”: [
{
“key_as_string”: “2024-01-01”,
“key”: 1704067200000,
“doc_count”: 50,
“total_amount”: {
“value”: 50000
},
“order_count”: {
“value”: 50
}
},
{
“key_as_string”: “2024-01-02”,
“key”: 1704153600000,
“doc_count”: 60,
“total_amount”: {
“value”: 60000
},
“order_count”: {
“value”: 60
}
}
]
},
“category_stats”: {
“doc_count_error_upper_bound”: 0,
“sum_other_doc_count”: 0,
“buckets”: [
{
“key”: “技术书籍”,
“doc_count”: 300,
“total_amount”: {
“value”: 300000
},
“order_count”: {
“value”: 300
}
},
{
“key”: “电子产品”,
“doc_count”: 200,
“total_amount”: {
“value”: 200000
},
“order_count”: {
“value”: 200
}
}
]
},
“total_stats”: {
“count”: 1000,
“min”: 100,
“max”: 10000,
“avg”: 1000,
“sum”: 1000000
}
}
}
4.2 日志分析统计
日志分析统计场景:
curl -X GET “http://192.168.1.10:9200/fgedu-logs-2024.01/_search” -H “Content-Type: application/json” -d ‘{
“size”: 0,
“query”: {
“range”: {
“timestamp”: {
“gte”: “2024-01-01T00:00:00”,
“lte”: “2024-01-01T23:59:59”
}
}
},
“aggs”: {
“hourly_stats”: {
“date_histogram”: {
“field”: “timestamp”,
“interval”: “hour”,
“format”: “HH:mm”
},
“aggs”: {
“error_count”: {
“filter”: {
“term”: {
“level”: “ERROR”
}
}
},
“warn_count”: {
“filter”: {
“term”: {
“level”: “WARN”
}
}
}
}
},
“level_stats”: {
“terms”: {
“field”: “level”
}
},
“service_stats”: {
“terms”: {
“field”: “service”
},
“aggs”: {
“error_count”: {
“filter”: {
“term”: {
“level”: “ERROR”
}
}
}
}
}
}
}’
# 执行
# 输出日志
{
“took”: 25,
“timed_out”: false,
“_shards”: {
“total”: 3,
“successful”: 3,
“skipped”: 0,
“failed”: 0
},
“hits”: {
“total”: {
“value”: 10000,
“relation”: “eq”
},
“max_score”: null,
“hits”: []
},
“aggregations”: {
“hourly_stats”: {
“buckets”: [
{
“key_as_string”: “00:00”,
“key”: 1704067200000,
“doc_count”: 400,
“error_count”: {
“doc_count”: 10
},
“warn_count”: {
“doc_count”: 20
}
},
{
“key_as_string”: “01:00”,
“key”: 1704070800000,
“doc_count”: 350,
“error_count”: {
“doc_count”: 8
},
“warn_count”: {
“doc_count”: 15
}
}
]
},
“level_stats”: {
“doc_count_error_upper_bound”: 0,
“sum_other_doc_count”: 0,
“buckets”: [
{
“key”: “INFO”,
“doc_count”: 9000
},
{
“key”: “ERROR”,
“doc_count”: 500
},
{
“key”: “WARN”,
“doc_count”: 500
}
]
},
“service_stats”: {
“doc_count_error_upper_bound”: 0,
“sum_other_doc_count”: 0,
“buckets”: [
{
“key”: “fgedu-api”,
“doc_count”: 5000,
“error_count”: {
“doc_count”: 300
}
},
{
“key”: “fgedu-web”,
“doc_count”: 3000,
“error_count”: {
“doc_count”: 150
}
},
{
“key”: “fgedu-mobile”,
“doc_count”: 2000,
“error_count”: {
“doc_count”: 50
}
}
]
}
}
}
4.3 性能调优实战
聚合性能调优:
curl -X GET “http://192.168.1.10:9200/fgedu-products/_search” -H “Content-Type: application/json” -d ‘{
“size”: 0,
“query”: {
“bool”: {
“filter”: [
{
“term”: {
“category”: “技术书籍”
}
}
]
}
},
“aggs”: {
“price_stats”: {
“stats”: {
“field”: “price”
}
}
}
}’
# 执行
# 输出日志
{
“took”: 5,
“timed_out”: false,
“_shards”: {
“total”: 5,
“successful”: 5,
“skipped”: 0,
“failed”: 0
},
“hits”: {
“total”: {
“value”: 3,
“relation”: “eq”
},
“max_score”: null,
“hits”: []
},
“aggregations”: {
“price_stats”: {
“count”: 3,
“min”: 79.9,
“max”: 119.9,
“avg”: 99.9,
“sum”: 299.7
}
}
}
# 性能调优 – 限制桶数量
curl -X GET “http://192.168.1.10:9200/fgedu-products/_search” -H “Content-Type: application/json” -d ‘{
“size”: 0,
“aggs”: {
“category_stats”: {
“terms”: {
“field”: “category”,
“size”: 10
}
}
}
}’
# 执行
# 输出日志
{
“took”: 3,
“timed_out”: false,
“_shards”: {
“total”: 5,
“successful”: 5,
“skipped”: 0,
“failed”: 0
},
“hits”: {
“total”: {
“value”: 3,
“relation”: “eq”
},
“max_score”: null,
“hits”: []
},
“aggregations”: {
“category_stats”: {
“doc_count_error_upper_bound”: 0,
“sum_other_doc_count”: 0,
“buckets”: [
{
“key”: “技术书籍”,
“doc_count”: 3
}
]
}
}
}
风哥提示:使用filter查询减少聚合的数据量可以显著提高性能
Part05-风哥经验总结与分享
5.1 聚合最佳实践
- 使用filter查询减少聚合的数据量
- 合理设置桶的大小和数量
- 避免深度嵌套聚合
- 使用近似聚合处理大规模数据
- 监控聚合性能指标
5.2 常见问题与解决方案
- 聚合性能慢:检查数据量和聚合复杂度
- 内存溢出:控制聚合规模和结果集大小
- 聚合结果不准确:检查字段类型和数据质量
- 缓存命中率低:优化查询结构和缓存策略
- 聚合超时:设置合理的超时时间
5.3 生产环境调优建议
- 使用filter查询提高性能
- 合理设置缓存大小
- 监控聚合性能指标
- 使用滚动查询处理大量数据
- 考虑使用专门的分析工具
from ElasticSearch视频:www.itpux.com
本文由风哥教程整理发布,仅用于学习测试使用,转载注明出处:http://www.fgedu.net.cn/10327.html
