1. 首页 > KubeSphere教程 > 正文

KubeSphere教程FG036-KubeSphere应用性能监控与链路追踪实战

本教程详细介绍KubeSphere中应用性能监控与链路追踪的实战操作,包括基础概念、生产环境规划、具体实施方案和实战案例。风哥教程参考KubeSphere官方文档KubeSphere容器平台使用指南、KubeSphere可观测性文档、Jaeger链路追踪文档等相关内容。

目录大纲

Part01-基础概念与理论知识

1.1 应用性能监控核心概念

应用性能监控(APM)是指监控应用程序的性能指标,它包括:

  • 响应时间:请求的响应时间
  • 吞吐量:单位时间处理的请求数
  • 错误率:请求的错误率
  • 资源使用率:CPU、内存、磁盘、网络等资源使用率
  • 用户体验:用户体验指标

1.2 链路追踪核心概念

链路追踪是指追踪请求在分布式系统中的调用链路,它包括:

  • Trace:一个完整的请求调用链路
  • Span:调用链路中的一个步骤
  • Trace ID:唯一标识一个Trace
  • Span ID:唯一标识一个Span
  • Parent Span ID:标识Span的父Span

1.3 监控指标核心概念

监控指标是指用于监控系统状态的指标,它包括:

  • Counter:计数器,只增不减
  • Gauge:仪表盘,可增可减
  • Histogram:直方图,记录值的分布
  • Summary:摘要,记录值的统计信息
  • Label:标签,用于标识指标的维度

Part02-生产环境规划与建议

2.1 监控架构规划

在实施应用性能监控与链路追踪时,监控架构规划是非常重要的:

  • 监控组件选择:选择合适的监控组件
  • 监控数据采集:规划监控数据的采集方式
  • 监控数据存储:规划监控数据的存储方式
  • 监控数据展示:规划监控数据的展示方式
  • 监控数据告警:规划监控数据的告警方式

2.2 链路追踪规划

链路追踪规划对于应用性能监控与链路追踪也非常重要:

  • 链路追踪组件选择:选择合适的链路追踪组件
  • 链路追踪数据采集:规划链路追踪数据的采集方式
  • 链路追踪数据存储:规划链路追踪数据的存储方式
  • 链路追踪数据展示:规划链路追踪数据的展示方式
  • 链路追踪数据分析:规划链路追踪数据的分析方式

2.3 告警规划

告警规划是应用性能监控与链路追踪的重要组成部分:

  • 告警规则设计:设计合理的告警规则
  • 告警级别设置:设置合理的告警级别
  • 告警通知方式:选择合适的告警通知方式
  • 告警处理流程:设计合理的告警处理流程
  • 告警优化:定期优化告警规则

Part03-生产环境项目实施方案

3.1 应用性能监控配置

应用性能监控的配置步骤:

  • 部署监控组件:部署Prometheus、Grafana等监控组件
  • 配置监控采集:配置监控数据的采集
  • 配置监控存储:配置监控数据的存储
  • 配置监控展示:配置监控数据的展示
  • 配置监控告警:配置监控数据的告警

3.2 链路追踪配置

链路追踪的配置步骤:

  • 部署链路追踪组件:部署Jaeger、Zipkin等链路追踪组件
  • 配置链路追踪采集:配置链路追踪数据的采集
  • 配置链路追踪存储:配置链路追踪数据的存储
  • 配置链路追踪展示:配置链路追踪数据的展示
  • 配置链路追踪分析:配置链路追踪数据的分析

3.3 告警配置

告警的配置步骤:

  • 创建告警规则:创建告警规则
  • 配置告警通知:配置告警通知方式
  • 配置告警路由:配置告警路由
  • 测试告警:测试告警是否正常工作
  • 优化告警:优化告警规则

Part04-生产案例与实战讲解

4.1 应用性能监控实战

下面我们来实战演示应用性能监控: 风哥提示:

# 部署Prometheus Operator
kubectl apply -f https://raw.githubusercontent.com/prometheus-operator/prometheus-operator/main/bundle.yaml
customresourcedefinition.apiextensions.k8s.io/alertmanagers.monitoring.coreos.com created
customresourcedefinition.apiextensions.k8s.io/podmonitors.monitoring.coreos.com created
customresourcedefinition.apiextensions.k8s.io/probes.monitoring.coreos.com created
customresourcedefinition.apiextensions.k8s.io/prometheusagents.monitoring.coreos.com created
customresourcedefinition.apiextensions.k8s.io/prometheuses.monitoring.coreos.com created
customresourcedefinition.apiextensions.k8s.io/prometheusrules.monitoring.coreos.com created
customresourcedefinition.apiextensions.k8s.io/servicemonitors.monitoring.coreos.com created
customresourcedefinition.apiextensions.k8s.io/thanosrulers.monitoring.coreos.com created
namespace/monitoring created
serviceaccount/prometheus-operator created
clusterrole.rbac.authorization.k8s.io/prometheus-operator created
clusterrolebinding.rbac.authorization.k8s.io/prometheus-operator created
deployment.apps/prometheus-operator created
# 创建Prometheus实例
cat <<EOF | kubectl apply -f –
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
name: prometheus
namespace: monitoring
spec:
serviceAccountName: prometheus

serviceMonitorSelector:
matchLabels:
release: prometheus
resources:
requests:
memory: 400Mi
storage:
volumeClaimTemplate:
spec:
storageClassName: standard
accessModes: [“ReadWriteOnce”]
resources:
requests:
storage: 10Gi
EOF
prometheus.monitoring.coreos.com/prometheus created
# 创建Grafana实例
cat <<EOF | kubectl apply -f –
apiVersion: v1
kind: Service
metadata:
name: grafana
namespace: monitoring
spec:
selector:
app: grafana
ports:
– port: 3000
targetPort: 3000
type: LoadBalancer

apiVersion: apps/v1
kind: Deployment
metadata:
name: grafana
namespace: monitoring
spec:
replicas: 1
selector:
matchLabels:
app: grafana
template:
metadata:
labels:
app: grafana
spec:
containers:
– name: grafana
image: grafana/grafana:latest
ports:
– containerPort: 3000
env:
– name: GF_SECURITY_ADMIN_PASSWORD
value: “admin”
volumeMounts:
– name: grafana-storage
mountPath: /var/lib/grafana
volumes:
– name: grafana-storage
emptyDir: {}
EOF
service/grafana created
deployment.apps/grafana created
# 创建ServiceMonitor
cat <<EOF | kubectl apply -f –
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: nginx-monitor
namespace: monitoring
labels:
release: prometheus
spec:
selector:
matchLabels:
app: nginx
endpoints:
– port: metrics
path: /metrics
interval: 30s
EOF
servicemonitor.monitoring.coreos.com/nginx-monitor created
# 创建带有监控端点的应用
cat <<EOF | kubectl apply -f –
apiVersion: v1
kind: Service
metadata:
name: nginx
labels:
app: nginx
spec:
selector:
app: nginx
ports:
– name: http
port: 80
targetPort: 80
– name: metrics
port: 9113
targetPort: 9113


apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx
spec:
replicas: 2
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
– name: nginx
image: nginx:latest
ports:
– containerPort: 80
– name: nginx-exporter
image: nginx/nginx-prometheus-exporter:latest
ports:
– containerPort: 9113
args:
– -nginx.scrape-uri=http://localhost:80/nginx_status
EOF
service/nginx created
deployment.apps/nginx created
# 查看Prometheus目标
kubectl port-forward -n monitoring svc/prometheus-operated 9090:9090 &
Forwarding from 127.0.0.1:9090 -> 9090
Forwarding from [::1]:9090 -> 9090
# 访问Prometheus UI
curl http://localhost:9090/api/v1/targets
{
“status”: “success”,
“data”: {
“activeTargets”: [
{
“discoveryLabels”: {},
“labels”: {
“app”: “nginx”,
“endpoint”: “metrics”,
“instance”: “10.244.1.20:9113”,
“job”: “nginx-monitor”,
“namespace”: “default”,
“pod”: “nginx-6b8d9b8c7f-abcde”
},
“scrapePool”: “nginx-monitor”,
“scrapeUrl”: “http://10.244.1.20:9113/metrics”,
“globalUrl”: “http://10.244.1.20:9113/metrics”,
“lastError”: “”,
“lastScrape”: “2024-01-01T00:00:00.000Z”,
“lastScrapeDuration”: 0.015,
“health”: “up”
}
]
}
}
# 查看Prometheus指标
curl http://localhost:9090/api/v1/query?query=nginx_up
{
“status”: “success”,
“data”: {
“resultType”: “vector”,
“result”: [
{
“metric”: {
“app”: “nginx”,
“instance”: “10.244.1.20:9113”,
“job”: “nginx-monitor”
},
“value”: [1704067200, “1”]
}
]
}
}

4.2 链路追踪实战

下面我们来实战演示链路追踪: 学习交流加群风哥微信: itpux-com 学习交流加群风哥QQ113257174

# 部署Jaeger Operator
kubectl create namespace observability
namespace/observability created
# 部署Jaeger
cat <<EOF | kubectl apply -f –
apiVersion: jaegertracing.io/v1
kind: Jaeger
metadata:
name: simplest
namespace: observability
spec:
strategy: allInOne
allInOne:
image: jaegertracing/all-in-one:latest
options:

query:
base-path: /jaeger
storage:
type: memory
EOF
jaeger.jaegertracing.io/simplest created
# 查看Jaeger状态
kubectl get jaeger -n observability
NAME STATUS VERSION STRATEGY STORAGE AGE
simplest Running 1.50 allInOne memory 10s
# 创建Jaeger服务
cat <<EOF | kubectl apply -f –
apiVersion: v1
kind: Service
metadata:
name: jaeger-query
namespace: observability
spec:
selector:
app: jaeger
app.kubernetes.io/component: all-in-one
app.kubernetes.io/instance: simplest
ports:
– port: 16686
targetPort: 16686
name: query
– port: 16685
targetPort: 16685
name: grpc
type: LoadBalancer
EOF
service/jaeger-query created
# 创建带有链路追踪的应用
cat <<EOF | kubectl apply -f –
apiVersion: v1
kind: ConfigMap
metadata:
name: app-config
data:
JAEGER_AGENT_HOST: simplest-collector.observability.svc
JAEGER_AGENT_PORT: “6831”
JAEGER_SAMPLER_TYPE: const
JAEGER_SAMPLER_PARAM: “1”

apiVersion: apps/v1
kind: Deployment
metadata:
name: demo-app
spec:
replicas: 2
selector:
matchLabels:
app: demo-app
template:
metadata:
labels:
app: demo-app
spec:
containers:
– name: demo-app
image: jaegertracing/vertx-create-span:operator-e2e-tests
envFrom:
– configMapRef:
name: app-config
ports:
– containerPort: 8080
EOF
configmap/app-config created
deployment.apps/demo-app created
# 查看Pod状态
kubectl get pods -l app=demo-app
NAME READY STATUS RESTARTS AGE
demo-app-7b8d9b8c7f-abcde 1/1 Running 0 10s
demo-app-7b8d9b8c7f-fghij 1/1 Running 0 10s
# 访问应用生成链路追踪数据
kubectl port-forward -l app=demo-app 8080:8080 &
Forwarding from 127.0.0.1:8080 -> 8080
Forwarding from [::1]:8080 -> 8080
# 发送测试请求
curl http://localhost:8080
Hello from Vert.x!
# 访问Jaeger UI查看链路追踪数据
kubectl port-forward -n observability svc/jaeger-query 16686:16686 &
Forwarding from 127.0.0.1:16686 -> 16686
Forwarding from [::1]:16686 -> 16686

4.3 告警实战

下面我们来实战演示告警: 更多视频教程www.fgedu.net.cn

# 创建告警规则
cat <<EOF | kubectl apply -f –
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: nginx-alerts
namespace: monitoring
labels:
release: prometheus
spec:
groups:
– name: nginx.rules
rules:
– alert: NginxDown
expr: nginx_up == 0
for: 5m
labels:
severity: critical
annotations:
summary: “Nginx is down”
description: “Nginx instance {{ $labels.instance }} is down”
– alert: NginxHighResponseTime
expr: nginx_http_request_duration_seconds_sum / nginx_http_request_duration_seconds_count > 1
for: 5m
labels:
severity: warning
annotations:
summary: “Nginx high response time”
description: “Nginx instance {{ $labels.instance }} has high response time”
# 创建Alertmanager
cat <<EOF | kubectl apply -f –
apiVersion: monitoring.coreos.com/v1
kind: Alertmanager
metadata:
name: alertmanager
namespace: monitoring
spec:
replicas: 1
serviceAccountName: alertmanager
EOF
alertmanager.monitoring.coreos.com/alertmanager created
# 创建Alertmanager配置
cat <<EOF | kubectl apply -f –
apiVersion: v1
kind: Secret
metadata:
name: alertmanager-config
namespace: monitoring
stringData:
alertmanager.yaml: |-
global:
resolve_timeout: 5m
route:
group_by: [‘alertname’, ‘cluster’, ‘service’]
group_wait: 10s
group_interval: 10s
repeat_interval: 12h
receiver: ‘default’
routes:
– match:
severity: critical
receiver: ‘critical’
receivers:
– name: ‘default’
email_configs:
– to: ‘admin@example.com’
from: ‘alertmanager@example.com’
smarthost: ‘smtp.example.com:587’
– name: ‘critical’
email_configs:
– to: ‘critical@example.com’
from: ‘alertmanager@example.com’
smarthost: ‘smtp.example.com:587’
EOF
secret/alertmanager-config created
# 查看告警规则
kubectl get prometheusrules -n monitoring
NAME AGE
nginx-alerts 10s
# 查看告警状态
kubectl port-forward -n monitoring svc/alertmanager-operated 9093:9093 &
Forwarding from 127.0.0.1:9093 -> 9093
Forwarding from [::1]:9093 -> 9093
# 访问Alertmanager API
curl http://localhost:9093/api/v1/alerts
{
“status”: “success”,
“data”: []
}

Part05-风哥经验总结与分享

5.1 常见问题与解决方案

问题1:Prometheus无法采集指标

现象:Prometheus显示目标状态为down 更多学习教程公众号风哥教程itpux_com

原因:目标地址配置错误或网络不通

解决方案:

# 检查目标状态
kubectl port-forward -n monitoring svc/prometheus-operated 9090:9090 &
Forwarding from 127.0.0.1:9090 -> 9090
Forwarding from [::1]:9090 -> 9090
# 查看目标详情
curl http://localhost:9090/api/v1/targets | jq ‘.data.activeTargets[] | select(.labels.job == “nginx-monitor”)’
{
“discoveryLabels”: {},
“labels”: {
“app”: “nginx”,
“endpoint”: “metrics”,
“instance”: “10.244.1.20:9113”,
“job”: “nginx-monitor”,
“namespace”: “default”,
“pod”: “nginx-6b8d9b8c7f-abcde”
},
“scrapePool”: “nginx-monitor”,
“scrapeUrl”: “http://10.244.1.20:9113/metrics”,
“globalUrl”: “http://10.244.1.20:9113/metrics”,
“lastError”: “context deadline exceeded”,
“lastScrape”: “2024-01-01T00:00:00.000Z”,
“lastScrapeDuration”: 30.015,
“health”: “down”
}
# 检查网络连通性
kubectl exec -n monitoring prometheus-prometheus-0 — wget -O- http://10.244.1.20:9113/metrics
Connecting to 10.244.1.20:9113 (10.244.1.20:9113)
nginx_up 1

问题2:链路追踪数据不完整

现象:Jaeger中看不到完整的调用链路

原因:应用没有正确配置链路追踪或采样率过低

解决方案:

# 检查Jaeger配置
kubectl get jaeger simplest -n observability -o yaml
apiVersion: jaegertracing.io/v1
kind: Jaeger
metadata:
name: simplest
namespace: observability
spec:
strategy: allInOne
allInOne:
image: jaegertracing/all-in-one:latest
options:
query:
base-path: /jaeger
storage:
type: memory
# 检查应用配置
kubectl get configmap app-config -o yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: app-config
data:
JAEGER_AGENT_HOST: simplest-collector.observability.svc
JAEGER_AGENT_PORT: “6831”
JAEGER_SAMPLER_TYPE: const
JAEGER_SAMPLER_PARAM: “1”

问题3:告警不触发

现象:配置了告警规则,但告警没有触发

原因:告警规则配置错误或条件不满足

解决方案:

# 检查告警规则
kubectl get prometheusrules nginx-alerts -n monitoring -o yaml
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: nginx-alerts
namespace: monitoring
labels:
release: prometheus
spec:
groups:
– name: nginx.rules
rules:
– alert: NginxDown
expr: nginx_up == 0
for: 5m
labels:
severity: critical
annotations:
summary: “Nginx is down”
description: “Nginx instance {{ $labels.instance }} is down”
# 检查指标值
curl http://localhost:9090/api/v1/query?query=nginx_up
{

“status”: “success”,
“data”: {
“resultType”: “vector”,
“result”: [
{
“metric”: {
“app”: “nginx”,
“instance”: “10.244.1.20:9113”,
“job”: “nginx-monitor”
},
“value”: [1704067200, “1”]
}
]
}
}

5.2 最佳实践建议

建议1:合理规划监控指标

在规划监控指标时,应该: from K8S+DB视频:www.itpux.com

  • 选择关键的业务指标
  • 选择关键的技术指标
  • 避免采集过多的指标
  • 使用合理的标签维度
  • 定期审查和优化指标

建议2:合理配置链路追踪

在配置链路追踪时,应该:

  • 选择合适的采样率
  • 选择合适的存储后端
  • 配置合理的链路追踪数据保留策略
  • 使用合理的标签
  • 定期审查和优化链路追踪配置

建议3:合理配置告警

在配置告警时,应该:

  • 设置合理的告警阈值
  • 设置合理的告警级别
  • 避免告警风暴
  • 配置合理的告警通知方式
  • 定期审查和优化告警规则

5.3 性能优化技巧

技巧1:优化Prometheus性能

Prometheus的性能优化可以通过以下方式实现:

  • 减少采集的指标数量
  • 增加采集间隔
  • 使用合理的存储策略
  • 使用Prometheus联邦
  • 使用Thanos进行长期存储

技巧2:优化链路追踪性能

链路追踪的性能优化可以通过以下方式实现:

  • 使用合适的采样率
  • 使用高效的存储后端
  • 使用异步采集
  • 使用批量采集
  • 定期清理链路追踪数据

技巧3:优化告警性能

告警的性能优化可以通过以下方式实现:

  • 减少告警规则的数量
  • 使用简单的告警表达式
  • 增加告警评估间隔
  • 使用告警路由
  • 定期审查和优化告警规则

应用性能监控与链路追踪是KubeSphere企业级应用的重要组成部分,需要根据实际业务需求进行合理规划和配置。在生产环境中,建议先在测试环境验证配置,然后再应用到生产环境。

本教程由风哥提供,更多KubeSphere实战教程请关注风哥课堂

本文由风哥教程整理发布,仅用于学习测试使用,转载注明出处:http://www.fgedu.net.cn/10327.html

联系我们

在线咨询:点击这里给我发消息

微信号:itpux-com

工作日:9:30-18:30,节假日休息