KubeSphere教程FG036-KubeSphere应用性能监控与链路追踪实战
本教程详细介绍KubeSphere中应用性能监控与链路追踪的实战操作,包括基础概念、生产环境规划、具体实施方案和实战案例。风哥教程参考KubeSphere官方文档KubeSphere容器平台使用指南、KubeSphere可观测性文档、Jaeger链路追踪文档等相关内容。
目录大纲
Part01-基础概念与理论知识
1.1 应用性能监控核心概念
应用性能监控(APM)是指监控应用程序的性能指标,它包括:
- 响应时间:请求的响应时间
- 吞吐量:单位时间处理的请求数
- 错误率:请求的错误率
- 资源使用率:CPU、内存、磁盘、网络等资源使用率
- 用户体验:用户体验指标
1.2 链路追踪核心概念
链路追踪是指追踪请求在分布式系统中的调用链路,它包括:
- Trace:一个完整的请求调用链路
- Span:调用链路中的一个步骤
- Trace ID:唯一标识一个Trace
- Span ID:唯一标识一个Span
- Parent Span ID:标识Span的父Span
1.3 监控指标核心概念
监控指标是指用于监控系统状态的指标,它包括:
- Counter:计数器,只增不减
- Gauge:仪表盘,可增可减
- Histogram:直方图,记录值的分布
- Summary:摘要,记录值的统计信息
- Label:标签,用于标识指标的维度
Part02-生产环境规划与建议
2.1 监控架构规划
在实施应用性能监控与链路追踪时,监控架构规划是非常重要的:
- 监控组件选择:选择合适的监控组件
- 监控数据采集:规划监控数据的采集方式
- 监控数据存储:规划监控数据的存储方式
- 监控数据展示:规划监控数据的展示方式
- 监控数据告警:规划监控数据的告警方式
2.2 链路追踪规划
链路追踪规划对于应用性能监控与链路追踪也非常重要:
- 链路追踪组件选择:选择合适的链路追踪组件
- 链路追踪数据采集:规划链路追踪数据的采集方式
- 链路追踪数据存储:规划链路追踪数据的存储方式
- 链路追踪数据展示:规划链路追踪数据的展示方式
- 链路追踪数据分析:规划链路追踪数据的分析方式
2.3 告警规划
告警规划是应用性能监控与链路追踪的重要组成部分:
- 告警规则设计:设计合理的告警规则
- 告警级别设置:设置合理的告警级别
- 告警通知方式:选择合适的告警通知方式
- 告警处理流程:设计合理的告警处理流程
- 告警优化:定期优化告警规则
Part03-生产环境项目实施方案
3.1 应用性能监控配置
应用性能监控的配置步骤:
- 部署监控组件:部署Prometheus、Grafana等监控组件
- 配置监控采集:配置监控数据的采集
- 配置监控存储:配置监控数据的存储
- 配置监控展示:配置监控数据的展示
- 配置监控告警:配置监控数据的告警
3.2 链路追踪配置
链路追踪的配置步骤:
- 部署链路追踪组件:部署Jaeger、Zipkin等链路追踪组件
- 配置链路追踪采集:配置链路追踪数据的采集
- 配置链路追踪存储:配置链路追踪数据的存储
- 配置链路追踪展示:配置链路追踪数据的展示
- 配置链路追踪分析:配置链路追踪数据的分析
3.3 告警配置
告警的配置步骤:
- 创建告警规则:创建告警规则
- 配置告警通知:配置告警通知方式
- 配置告警路由:配置告警路由
- 测试告警:测试告警是否正常工作
- 优化告警:优化告警规则
Part04-生产案例与实战讲解
4.1 应用性能监控实战
下面我们来实战演示应用性能监控: 风哥提示:
kubectl apply -f https://raw.githubusercontent.com/prometheus-operator/prometheus-operator/main/bundle.yaml
customresourcedefinition.apiextensions.k8s.io/alertmanagers.monitoring.coreos.com created
customresourcedefinition.apiextensions.k8s.io/podmonitors.monitoring.coreos.com created
customresourcedefinition.apiextensions.k8s.io/probes.monitoring.coreos.com created
customresourcedefinition.apiextensions.k8s.io/prometheusagents.monitoring.coreos.com created
customresourcedefinition.apiextensions.k8s.io/prometheuses.monitoring.coreos.com created
customresourcedefinition.apiextensions.k8s.io/prometheusrules.monitoring.coreos.com created
customresourcedefinition.apiextensions.k8s.io/servicemonitors.monitoring.coreos.com created
customresourcedefinition.apiextensions.k8s.io/thanosrulers.monitoring.coreos.com created
namespace/monitoring created
serviceaccount/prometheus-operator created
clusterrole.rbac.authorization.k8s.io/prometheus-operator created
clusterrolebinding.rbac.authorization.k8s.io/prometheus-operator created
deployment.apps/prometheus-operator created
cat <<EOF | kubectl apply -f –
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
name: prometheus
namespace: monitoring
spec:
serviceAccountName: prometheus
,
serviceMonitorSelector:
matchLabels:
release: prometheus
resources:
requests:
memory: 400Mi
storage:
volumeClaimTemplate:
spec:
storageClassName: standard
accessModes: [“ReadWriteOnce”]
resources:
requests:
storage: 10Gi
EOF
prometheus.monitoring.coreos.com/prometheus created
cat <<EOF | kubectl apply -f –
apiVersion: v1
kind: Service
metadata:
name: grafana
namespace: monitoring
spec:
selector:
app: grafana
ports:
– port: 3000
targetPort: 3000
type: LoadBalancer
—
apiVersion: apps/v1
kind: Deployment
metadata:
name: grafana
namespace: monitoring
spec:
replicas: 1
selector:
matchLabels:
app: grafana
template:
metadata:
labels:
app: grafana
spec:
containers:
– name: grafana
image: grafana/grafana:latest
ports:
– containerPort: 3000
env:
– name: GF_SECURITY_ADMIN_PASSWORD
value: “admin”
volumeMounts:
– name: grafana-storage
mountPath: /var/lib/grafana
volumes:
– name: grafana-storage
emptyDir: {}
EOF
service/grafana created
deployment.apps/grafana created
cat <<EOF | kubectl apply -f –
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: nginx-monitor
namespace: monitoring
labels:
release: prometheus
spec:
selector:
matchLabels:
app: nginx
endpoints:
– port: metrics
path: /metrics
interval: 30s
EOF
servicemonitor.monitoring.coreos.com/nginx-monitor created
cat <<EOF | kubectl apply -f –
apiVersion: v1
kind: Service
metadata:
name: nginx
labels:
app: nginx
spec:
selector:
app: nginx
ports:
– name: http
port: 80
targetPort: 80
– name: metrics
port: 9113
targetPort: 9113
—
,
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx
spec:
replicas: 2
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
– name: nginx
image: nginx:latest
ports:
– containerPort: 80
– name: nginx-exporter
image: nginx/nginx-prometheus-exporter:latest
ports:
– containerPort: 9113
args:
– -nginx.scrape-uri=http://localhost:80/nginx_status
EOF
service/nginx created
deployment.apps/nginx created
kubectl port-forward -n monitoring svc/prometheus-operated 9090:9090 &
Forwarding from 127.0.0.1:9090 -> 9090
Forwarding from [::1]:9090 -> 9090
curl http://localhost:9090/api/v1/targets
{
“status”: “success”,
“data”: {
“activeTargets”: [
{
“discoveryLabels”: {},
“labels”: {
“app”: “nginx”,
“endpoint”: “metrics”,
“instance”: “10.244.1.20:9113”,
“job”: “nginx-monitor”,
“namespace”: “default”,
“pod”: “nginx-6b8d9b8c7f-abcde”
},
“scrapePool”: “nginx-monitor”,
“scrapeUrl”: “http://10.244.1.20:9113/metrics”,
“globalUrl”: “http://10.244.1.20:9113/metrics”,
“lastError”: “”,
“lastScrape”: “2024-01-01T00:00:00.000Z”,
“lastScrapeDuration”: 0.015,
“health”: “up”
}
]
}
}
curl http://localhost:9090/api/v1/query?query=nginx_up
{
“status”: “success”,
“data”: {
“resultType”: “vector”,
“result”: [
{
“metric”: {
“app”: “nginx”,
“instance”: “10.244.1.20:9113”,
“job”: “nginx-monitor”
},
“value”: [1704067200, “1”]
}
]
}
}
4.2 链路追踪实战
下面我们来实战演示链路追踪: 学习交流加群风哥微信: itpux-com 学习交流加群风哥QQ113257174
kubectl create namespace observability
namespace/observability created
cat <<EOF | kubectl apply -f –
apiVersion: jaegertracing.io/v1
kind: Jaeger
metadata:
name: simplest
namespace: observability
spec:
strategy: allInOne
allInOne:
image: jaegertracing/all-in-one:latest
options:
,
query:
base-path: /jaeger
storage:
type: memory
EOF
jaeger.jaegertracing.io/simplest created
kubectl get jaeger -n observability
NAME STATUS VERSION STRATEGY STORAGE AGE
simplest Running 1.50 allInOne memory 10s
cat <<EOF | kubectl apply -f –
apiVersion: v1
kind: Service
metadata:
name: jaeger-query
namespace: observability
spec:
selector:
app: jaeger
app.kubernetes.io/component: all-in-one
app.kubernetes.io/instance: simplest
ports:
– port: 16686
targetPort: 16686
name: query
– port: 16685
targetPort: 16685
name: grpc
type: LoadBalancer
EOF
service/jaeger-query created
cat <<EOF | kubectl apply -f –
apiVersion: v1
kind: ConfigMap
metadata:
name: app-config
data:
JAEGER_AGENT_HOST: simplest-collector.observability.svc
JAEGER_AGENT_PORT: “6831”
JAEGER_SAMPLER_TYPE: const
JAEGER_SAMPLER_PARAM: “1”
—
apiVersion: apps/v1
kind: Deployment
metadata:
name: demo-app
spec:
replicas: 2
selector:
matchLabels:
app: demo-app
template:
metadata:
labels:
app: demo-app
spec:
containers:
– name: demo-app
image: jaegertracing/vertx-create-span:operator-e2e-tests
envFrom:
– configMapRef:
name: app-config
ports:
– containerPort: 8080
EOF
configmap/app-config created
deployment.apps/demo-app created
kubectl get pods -l app=demo-app
NAME READY STATUS RESTARTS AGE
demo-app-7b8d9b8c7f-abcde 1/1 Running 0 10s
demo-app-7b8d9b8c7f-fghij 1/1 Running 0 10s
kubectl port-forward -l app=demo-app 8080:8080 &
Forwarding from 127.0.0.1:8080 -> 8080
Forwarding from [::1]:8080 -> 8080
curl http://localhost:8080
Hello from Vert.x!
kubectl port-forward -n observability svc/jaeger-query 16686:16686 &
Forwarding from 127.0.0.1:16686 -> 16686
Forwarding from [::1]:16686 -> 16686
4.3 告警实战
下面我们来实战演示告警: 更多视频教程www.fgedu.net.cn
cat <<EOF | kubectl apply -f –
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: nginx-alerts
namespace: monitoring
labels:
release: prometheus
spec:
groups:
– name: nginx.rules
rules:
– alert: NginxDown
expr: nginx_up == 0
for: 5m
labels:
severity: critical
annotations:
summary: “Nginx is down”
description: “Nginx instance {{ $labels.instance }} is down”
– alert: NginxHighResponseTime
expr: nginx_http_request_duration_seconds_sum / nginx_http_request_duration_seconds_count > 1
for: 5m
labels:
severity: warning
annotations:
summary: “Nginx high response time”
description: “Nginx instance {{ $labels.instance }} has high response time”
cat <<EOF | kubectl apply -f –
apiVersion: monitoring.coreos.com/v1
kind: Alertmanager
metadata:
name: alertmanager
namespace: monitoring
spec:
replicas: 1
serviceAccountName: alertmanager
EOF
alertmanager.monitoring.coreos.com/alertmanager created
cat <<EOF | kubectl apply -f –
apiVersion: v1
kind: Secret
metadata:
name: alertmanager-config
namespace: monitoring
stringData:
alertmanager.yaml: |-
global:
resolve_timeout: 5m
route:
group_by: [‘alertname’, ‘cluster’, ‘service’]
group_wait: 10s
group_interval: 10s
repeat_interval: 12h
receiver: ‘default’
routes:
– match:
severity: critical
receiver: ‘critical’
receivers:
– name: ‘default’
email_configs:
– to: ‘admin@example.com’
from: ‘alertmanager@example.com’
smarthost: ‘smtp.example.com:587’
– name: ‘critical’
email_configs:
– to: ‘critical@example.com’
from: ‘alertmanager@example.com’
smarthost: ‘smtp.example.com:587’
EOF
secret/alertmanager-config created
kubectl get prometheusrules -n monitoring
NAME AGE
nginx-alerts 10s
kubectl port-forward -n monitoring svc/alertmanager-operated 9093:9093 &
Forwarding from 127.0.0.1:9093 -> 9093
Forwarding from [::1]:9093 -> 9093
curl http://localhost:9093/api/v1/alerts
{
“status”: “success”,
“data”: []
}
Part05-风哥经验总结与分享
5.1 常见问题与解决方案
问题1:Prometheus无法采集指标
现象:Prometheus显示目标状态为down 更多学习教程公众号风哥教程itpux_com
原因:目标地址配置错误或网络不通
解决方案:
kubectl port-forward -n monitoring svc/prometheus-operated 9090:9090 &
Forwarding from 127.0.0.1:9090 -> 9090
Forwarding from [::1]:9090 -> 9090
curl http://localhost:9090/api/v1/targets | jq ‘.data.activeTargets[] | select(.labels.job == “nginx-monitor”)’
{
“discoveryLabels”: {},
“labels”: {
“app”: “nginx”,
“endpoint”: “metrics”,
“instance”: “10.244.1.20:9113”,
“job”: “nginx-monitor”,
“namespace”: “default”,
“pod”: “nginx-6b8d9b8c7f-abcde”
},
“scrapePool”: “nginx-monitor”,
“scrapeUrl”: “http://10.244.1.20:9113/metrics”,
“globalUrl”: “http://10.244.1.20:9113/metrics”,
“lastError”: “context deadline exceeded”,
“lastScrape”: “2024-01-01T00:00:00.000Z”,
“lastScrapeDuration”: 30.015,
“health”: “down”
}
kubectl exec -n monitoring prometheus-prometheus-0 — wget -O- http://10.244.1.20:9113/metrics
Connecting to 10.244.1.20:9113 (10.244.1.20:9113)
nginx_up 1
…
问题2:链路追踪数据不完整
现象:Jaeger中看不到完整的调用链路
原因:应用没有正确配置链路追踪或采样率过低
解决方案:
kubectl get jaeger simplest -n observability -o yaml
apiVersion: jaegertracing.io/v1
kind: Jaeger
metadata:
name: simplest
namespace: observability
spec:
strategy: allInOne
allInOne:
image: jaegertracing/all-in-one:latest
options:
query:
base-path: /jaeger
storage:
type: memory
kubectl get configmap app-config -o yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: app-config
data:
JAEGER_AGENT_HOST: simplest-collector.observability.svc
JAEGER_AGENT_PORT: “6831”
JAEGER_SAMPLER_TYPE: const
JAEGER_SAMPLER_PARAM: “1”
问题3:告警不触发
现象:配置了告警规则,但告警没有触发
原因:告警规则配置错误或条件不满足
解决方案:
kubectl get prometheusrules nginx-alerts -n monitoring -o yaml
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: nginx-alerts
namespace: monitoring
labels:
release: prometheus
spec:
groups:
– name: nginx.rules
rules:
– alert: NginxDown
expr: nginx_up == 0
for: 5m
labels:
severity: critical
annotations:
summary: “Nginx is down”
description: “Nginx instance {{ $labels.instance }} is down”
curl http://localhost:9090/api/v1/query?query=nginx_up
{
,
“status”: “success”,
“data”: {
“resultType”: “vector”,
“result”: [
{
“metric”: {
“app”: “nginx”,
“instance”: “10.244.1.20:9113”,
“job”: “nginx-monitor”
},
“value”: [1704067200, “1”]
}
]
}
}
5.2 最佳实践建议
建议1:合理规划监控指标
在规划监控指标时,应该: from K8S+DB视频:www.itpux.com
- 选择关键的业务指标
- 选择关键的技术指标
- 避免采集过多的指标
- 使用合理的标签维度
- 定期审查和优化指标
建议2:合理配置链路追踪
在配置链路追踪时,应该:
- 选择合适的采样率
- 选择合适的存储后端
- 配置合理的链路追踪数据保留策略
- 使用合理的标签
- 定期审查和优化链路追踪配置
建议3:合理配置告警
在配置告警时,应该:
- 设置合理的告警阈值
- 设置合理的告警级别
- 避免告警风暴
- 配置合理的告警通知方式
- 定期审查和优化告警规则
5.3 性能优化技巧
技巧1:优化Prometheus性能
Prometheus的性能优化可以通过以下方式实现:
- 减少采集的指标数量
- 增加采集间隔
- 使用合理的存储策略
- 使用Prometheus联邦
- 使用Thanos进行长期存储
技巧2:优化链路追踪性能
链路追踪的性能优化可以通过以下方式实现:
- 使用合适的采样率
- 使用高效的存储后端
- 使用异步采集
- 使用批量采集
- 定期清理链路追踪数据
技巧3:优化告警性能
告警的性能优化可以通过以下方式实现:
- 减少告警规则的数量
- 使用简单的告警表达式
- 增加告警评估间隔
- 使用告警路由
- 定期审查和优化告警规则
应用性能监控与链路追踪是KubeSphere企业级应用的重要组成部分,需要根据实际业务需求进行合理规划和配置。在生产环境中,建议先在测试环境验证配置,然后再应用到生产环境。
本教程由风哥提供,更多KubeSphere实战教程请关注风哥课堂
本文由风哥教程整理发布,仅用于学习测试使用,转载注明出处:http://www.fgedu.net.cn/10327.html
