1. SkyWalking概述与环境规划
SkyWalking是一个开源的可观测性平台,用于分布式系统的监控、追踪和诊断。SkyWalking支持多种编程语言和框架,提供了丰富的可视化界面和分析工具。更多学习教程www.fgedu.net.cn
1.1 SkyWalking版本说明
SkyWalking目前主要版本为9.x系列,本教程以SkyWalking 9.6.0为例进行详细讲解。SkyWalking 9.x版本相比之前版本在性能、稳定性和功能方面都有显著提升,支持更多的可观测性特性。
$ java -jar skywalking-oap-server-9.6.0/bin/oapService.sh version
SkyWalking OAP Server version: 9.6.0
# 查看系统版本
$ cat /etc/os-release
NAME=”Oracle Linux Server”
VERSION=”8.9″
ID=”ol”
PRETTY_NAME=”Oracle Linux Server 8.9″
# 查看内核版本
$ uname -r
5.4.17-2136.302.7.2.el8uek.x86_64
# 查看Java版本
$ java -version
openjdk version “11.0.18” 2023-01-17
OpenJDK Runtime Environment (build 11.0.18+10-post-Oracle-10513082)
OpenJDK 64-Bit Server VM (build 11.0.18+10-post-Oracle-10513082, mixed mode, sharing)
1.2 环境规划
本次安装环境规划如下:
skywalking01.fgedu.net.cn (192.168.1.99) – SkyWalking主服务器
skywalking02.fgedu.net.cn (192.168.1.100) – SkyWalking备用服务器
SkyWalking版本:9.6.0
存储后端:Elasticsearch 8.0.0
安装方式:二进制安装
数据存储:Elasticsearch
2. 硬件环境要求
SkyWalking作为可观测性平台,对硬件资源要求根据监控目标数量和数据量而定。学习交流加群风哥微信: itpux-com
2.1 物理主机环境要求
– CPU:至少8核
– 内存:至少32GB
– 磁盘:系统盘120GB SSD + 数据盘1TB SSD
# Elasticsearch服务器要求
– CPU:至少16核
– 内存:至少64GB
– 磁盘:系统盘120GB SSD + 数据盘2TB SSD
# 检查SkyWalking服务器资源
# free -h
total used free shared buff/cache available
Mem: 32G 8.4G 22G 512M 3.6G 23G
Swap: 8G 0B 8G
# 检查磁盘空间
# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda1 120G 20G 100G 17% /
/dev/sdb1 1TB 50G 950G 5% /data
2.2 vSphere虚拟主机环境要求
– SkyWalking服务器:
– vCPU:8核
– 内存:32GB
– 磁盘:系统盘120GB SSD + 数据盘1TB SSD
– 网络:VMXNET3网卡,10Gbps网络
– Elasticsearch服务器:
– vCPU:16核
– 内存:64GB
– 磁盘:系统盘120GB SSD + 数据盘2TB SSD
– 网络:VMXNET3网卡,10Gbps网络
资源池配置:
– CPU预留:SkyWalking服务器4GHz,Elasticsearch服务器8GHz
– 内存预留:SkyWalking服务器16GB,Elasticsearch服务器32GB
– 内存限制:SkyWalking服务器32GB,Elasticsearch服务器64GB
– CPU份额:正常
– 内存份额:正常
2.3 云平台主机环境要求
– SkyWalking服务器:
– 实例规格:ecs.g6.4xlarge或同等规格
– vCPU:16核
– 内存:64GB
– 系统盘:SSD云盘 120GB
– 数据盘:SSD云盘 1TB
– 网络带宽:10Gbps以上
– Elasticsearch服务器:
– 实例规格:ecs.g6.8xlarge或同等规格
– vCPU:32核
– 内存:128GB
– 系统盘:SSD云盘 120GB
– 数据盘:SSD云盘 2TB
– 网络带宽:10Gbps以上
存储配置:
– OSS对象存储:用于存储配置备份
– NAS文件存储:用于共享配置文件
– 云盘快照:定期备份数据
3. 操作系统环境准备
在安装SkyWalking之前,需要对操作系统进行必要的配置和优化。
3.1 操作系统版本检查
# cat /etc/os-release
NAME=”Oracle Linux Server”
VERSION=”8.9″
ID=”ol”
PRETTY_NAME=”Oracle Linux Server 8.9″
# 检查内核版本
# uname -r
5.4.17-2136.302.7.2.el8uek.x86_64
# 检查SELinux状态
# getenforce
Enforcing
# 检查防火墙状态
# systemctl status firewalld
● firewalld.service – firewalld – dynamic firewall daemon
Loaded: loaded (/usr/lib/systemd/system/firewalld.service; enabled; vendor preset: enabled)
Active: active (running)
3.2 依赖服务安装
# dnf install -y wget curl tar gzip java-11-openjdk-devel
# 关闭防火墙
# systemctl stop firewalld
# systemctl disable firewalld
# 关闭SELinux
# setenforce 0
# sed -i ‘s/SELINUX=enforcing/SELINUX=disabled/’ /etc/selinux/config
# 创建SkyWalking用户
# useradd -r -s /bin/false skywalking
# 创建目录结构
# mkdir -p /data/skywalking/{config,bin,data}
# chown -R skywalking:skywalking /data/skywalking
3.3 安装Elasticsearch
# wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-8.0.0-linux-x86_64.tar.gz
# 解压文件
# tar -xzf elasticsearch-8.0.0-linux-x86_64.tar.gz
# mv elasticsearch-8.0.0 /data/elasticsearch
# 创建Elasticsearch用户
# useradd -r -s /bin/false elasticsearch
# chown -R elasticsearch:elasticsearch /data/elasticsearch
# 配置Elasticsearch
# vi /data/elasticsearch/config/elasticsearch.yml
cluster.name: skywalking
node.name: node-1
path.data: /data/elasticsearch/data
path.logs: /data/elasticsearch/logs
network.host: 0.0.0.0
http.port: 9200
discovery.seed_hosts: [“192.168.1.99”, “192.168.1.100”]
cluster.initial_master_nodes: [“node-1”]
# 启动Elasticsearch
# su – elasticsearch -c “/data/elasticsearch/bin/elasticsearch -d”
# 验证安装
# curl -X GET http://localhost:9200
4. SkyWalking安装配置
完成环境准备后,开始安装SkyWalking。
4.1 安装SkyWalking OAP Server
# wget https://archive.apache.org/dist/skywalking/9.6.0/apache-skywalking-apm-9.6.0.tar.gz
# 解压文件
# tar -xzf apache-skywalking-apm-9.6.0.tar.gz
# mv apache-skywalking-apm-9.6.0 /data/skywalking
# 配置SkyWalking
# vi /data/skywalking/config/application.yml
storage:
selector: elasticsearch
elasticsearch:
clusterNodes: 192.168.1.99:9200,192.168.1.100:9200
protocol: http
connectTimeout: 3000
socketTimeout: 30000
indexShardsNumber: 1
indexReplicasNumber: 1
user: elastic
password: password
# 创建systemd服务文件
# vi /etc/systemd/system/skywalking-oap.service
[Unit]
Description=SkyWalking OAP Server
After=network.target
[Service]
User=skywalking
ExecStart=/data/skywalking/bin/oapService.sh
Restart=always
RestartSec=10
[Install]
WantedBy=multi-user.target
# 启动SkyWalking OAP Server
# systemctl daemon-reload
# systemctl start skywalking-oap
# systemctl enable skywalking-oap
# 验证安装
# systemctl status skywalking-oap
# curl http://localhost:12800
4.2 安装SkyWalking UI
# vi /data/skywalking/webapp/application.yml
server:
port: 8080
spring:
cloud:
gateway:
routes:
– id: oap-route
uri: http://localhost:12800
predicates:
– Path=/graphql/**
– Path=/v3/**
# 创建systemd服务文件
# vi /etc/systemd/system/skywalking-ui.service
[Unit]
Description=SkyWalking UI
After=network.target
[Service]
User=skywalking
ExecStart=/data/skywalking/bin/webappService.sh
Restart=always
RestartSec=10
[Install]
WantedBy=multi-user.target
# 启动SkyWalking UI
# systemctl daemon-reload
# systemctl start skywalking-ui
# systemctl enable skywalking-ui
# 验证安装
# systemctl status skywalking-ui
# curl http://localhost:8080
5. SkyWalking配置优化
为了提高SkyWalking的性能和稳定性,需要进行一些配置优化。
5.1 OAP Server配置优化
# vi /data/skywalking/config/application.yml
core:
default:
restHost: 0.0.0.0
restPort: 12800
gRPCHost: 0.0.0.0
gRPCPort: 11800
downsampling:
– Hour
– Day
– Month
recordDataTTL: 90 # 90 days
metricsDataTTL: 30 # 30 days
profileDataTTL: 3 # 3 days
storage:
selector: elasticsearch
elasticsearch:
clusterNodes: 192.168.1.99:9200,192.168.1.100:9200
protocol: http
connectTimeout: 3000
socketTimeout: 30000
indexShardsNumber: 3
indexReplicasNumber: 2
user: elastic
password: password
receiver:
zipkin:
enabled: true
jaeger:
enabled: true
# 重启SkyWalking OAP Server
# systemctl restart skywalking-oap
5.2 高可用配置
# vi /data/skywalking/config/application.yml
cluster:
selector: kubernetes
kubernetes:
namespace: skywalking
labelSelector: app=skywalking-oap
# 启动SkyWalking OAP Server
# systemctl start skywalking-oap
# 验证集群状态
# curl http://localhost:12800/health
5.3 内存配置
# vi /data/skywalking/bin/oapService.sh
JAVA_OPTS=”-Xms8g -Xmx16g -XX:MaxMetaspaceSize=512m -XX:+UseG1GC -XX:MaxGCPauseMillis=200″
# 编辑SkyWalking UI启动脚本
# vi /data/skywalking/bin/webappService.sh
JAVA_OPTS=”-Xms2g -Xmx4g -XX:MaxMetaspaceSize=256m -XX:+UseG1GC -XX:MaxGCPauseMillis=200″
# 重启SkyWalking服务
# systemctl restart skywalking-oap skywalking-ui
6. SkyWalking Agent配置
SkyWalking Agent用于收集应用的链路追踪数据。
6.1 配置Java Agent
# wget https://archive.apache.org/dist/skywalking/9.6.0/apache-skywalking-java-agent-9.6.0.tgz
# 解压文件
# tar -xzf apache-skywalking-java-agent-9.6.0.tgz
# mv apache-skywalking-java-agent-9.6.0 /data/skywalking/agent
# 配置Java Agent
# vi /data/skywalking/agent/config/agent.config
agent.service_name=demo-application
collector.backend_service=192.168.1.99:11800
logging.level=info
# 启动应用时添加Agent
# java -javaagent:/data/skywalking/agent/skywalking-agent.jar -jar demo-application.jar
6.2 配置其他语言Agent
# npm install skywalking-nodejs –save
// 在应用入口文件中添加
const agent = require(‘skywalking-nodejs’);
agent.start({
serviceName: ‘demo-application’,
collectorAddress: ‘192.168.1.99:11800′
});
# Python Agent配置
# pip install apache-skywalking
# 在应用入口文件中添加
from skywalking import agent
agent.start(
service_name=’demo-application’,
collector_address=’192.168.1.99:11800′
)
# Go Agent配置
# go get github.com/apache/skywalking-go
// 在应用入口文件中添加
import (
“github.com/apache/skywalking-go”
)
func main() {
skywalking.Start(
skywalking.WithServiceName(“demo-application”),
skywalking.WithCollectorAddress(“192.168.1.99:11800”),
)
}
7. SkyWalking UI配置
SkyWalking UI用于可视化链路追踪数据。
7.1 访问SkyWalking UI
# 打开浏览器访问 http://skywalking01.fgedu.net.cn:8080
# 登录SkyWalking UI
# 初始登录不需要用户名和密码
# 查看链路追踪数据
# 1. 点击左侧菜单的”Trace”
# 2. 输入查询条件
# 3. 点击”Query”
# 4. 查看追踪详情
# 查看监控指标
# 1. 点击左侧菜单的”Dashboard”
# 2. 选择服务
# 3. 查看监控指标
7.2 配置UI主题
# 1. 点击右上角的用户图标
# 2. 选择”Settings”
# 3. 选择”Theme”
# 4. 选择主题(Light/Dark)
# 5. 点击”Save”
8. SkyWalking安全配置
SkyWalking提供了多种安全功能,包括认证、授权、TLS加密等。
8.1 认证配置
# vi /data/skywalking/config/application.yml
security:
auth:
active: default
default:
users:
– username: admin
password: admin
roles: admin
# 重启SkyWalking OAP Server
# systemctl restart skywalking-oap
# 访问SkyWalking UI
# 打开浏览器访问 http://skywalking01.fgedu.net.cn:8080
# 输入用户名和密码
8.2 TLS加密配置
# openssl req -newkey rsa:2048 -nodes -keyout /data/skywalking/config/key.pem -x509 -days 365 -out /data/skywalking/config/cert.pem
# 编辑SkyWalking配置
# vi /data/skywalking/config/application.yml
core:
default:
restHost: 0.0.0.0
restPort: 12800
restTLS:
enabled: true
keyPath: /data/skywalking/config/key.pem
certPath: /data/skywalking/config/cert.pem
# 重启SkyWalking OAP Server
# systemctl restart skywalking-oap
# 访问SkyWalking UI
# 打开浏览器访问 https://skywalking01.fgedu.net.cn:8080
9. SkyWalking性能优化
在生产环境中,需要对SkyWalking进行性能优化以提高链路追踪效率。from:www.itpux.com
9.1 OAP Server性能优化
# vi /data/skywalking/config/application.yml
core:
default:
restHost: 0.0.0.0
restPort: 12800
gRPCHost: 0.0.0.0
gRPCPort: 11800
maxConcurrentCallsPerConnection: 100
maxMessageSize: 10485760
# 编辑启动脚本
# vi /data/skywalking/bin/oapService.sh
JAVA_OPTS=”-Xms16g -Xmx32g -XX:MaxMetaspaceSize=1024m -XX:+UseG1GC -XX:MaxGCPauseMillis=100 -XX:ParallelGCThreads=8 -XX:ConcGCThreads=4″
# 重启SkyWalking OAP Server
# systemctl restart skywalking-oap
9.2 Elasticsearch性能优化
# vi /data/elasticsearch/config/elasticsearch.yml
cluster.name: skywalking
node.name: node-1
path.data: /data/elasticsearch/data
path.logs: /data/elasticsearch/logs
network.host: 0.0.0.0
http.port: 9200
discovery.seed_hosts: [“192.168.1.99”, “192.168.1.100”]
cluster.initial_master_nodes: [“node-1”]
# 编辑jvm.options
# vi /data/elasticsearch/config/jvm.options
-Xms32g
-Xmx32g
-XX:MaxMetaspaceSize=512m
-XX:+UseG1GC
-XX:MaxGCPauseMillis=200
# 重启Elasticsearch
# su – elasticsearch -c “pkill -f elasticsearch”
# su – elasticsearch -c “/data/elasticsearch/bin/elasticsearch -d”
9.3 Agent性能优化
# vi /data/skywalking/agent/config/agent.config
# 采样率配置
agent.sample_n_per_3_secs=-1 # 不限制采样
# 缓冲区配置
collector.queue_size=10000
collector.batch_size=1000
# 日志级别
logging.level=info
# 启动应用时添加Agent
# java -javaagent:/data/skywalking/agent/skywalking-agent.jar -jar demo-application.jar
10. SkyWalking升级迁移
本节介绍SkyWalking的版本升级和数据迁移方法。
10.1 SkyWalking版本升级
# cp -r /data/skywalking/config /backup/skywalking-config-$(date +%Y%m%d)
# 停止SkyWalking服务
# systemctl stop skywalking-oap skywalking-ui
# 下载新版本SkyWalking
# wget https://archive.apache.org/dist/skywalking/9.6.1/apache-skywalking-apm-9.6.1.tar.gz
# 解压文件
# tar -xzf apache-skywalking-apm-9.6.1.tar.gz
# mv apache-skywalking-apm-9.6.1 /data/skywalking-new
# 复制配置文件
# cp -r /data/skywalking/config/* /data/skywalking-new/config/
# 替换SkyWalking目录
# mv /data/skywalking /data/skywalking-old
# mv /data/skywalking-new /data/skywalking
# 启动SkyWalking服务
# systemctl start skywalking-oap skywalking-ui
# 验证升级
# curl http://localhost:12800/version
10.2 SkyWalking数据迁移
# su – elasticsearch -c “/data/elasticsearch/bin/elasticsearch-dump –input=http://localhost:9200/skywalking* –output=/backup/skywalking-data-$(date +%Y%m%d).json”
# 在新服务器上恢复数据
# su – elasticsearch -c “/data/elasticsearch/bin/elasticsearch-dump –input=/backup/skywalking-data-20230405.json –output=http://localhost:9200/”
# 安装SkyWalking
# 重复安装步骤
# 启动SkyWalking服务
# systemctl start skywalking-oap skywalking-ui
# 验证迁移
# curl http://localhost:12800/health
11. SkyWalking备份恢复
本节介绍SkyWalking的备份和恢复方法。
11.1 SkyWalking备份
# vi /data/skywalking/scripts/backup.sh
#!/bin/bash
BACKUP_DIR=”/backup/skywalking”
DATE=$(date +%Y%m%d)
# 创建备份目录
mkdir -p $BACKUP_DIR
# 停止SkyWalking服务
systemctl stop skywalking-oap skywalking-ui
# 备份配置文件
cp -r /data/skywalking/config $BACKUP_DIR/config-$DATE
# 备份Elasticsearch数据
su – elasticsearch -c “/data/elasticsearch/bin/elasticsearch-dump –input=http://localhost:9200/skywalking* –output=$BACKUP_DIR/skywalking-data-$DATE.json”
# 启动SkyWalking服务
systemctl start skywalking-oap skywalking-ui
# 清理旧备份(保留7天)
find $BACKUP_DIR -type d -mtime +7 -exec rm -rf {} \;
# 添加执行权限
# chmod +x /data/skywalking/scripts/backup.sh
# 添加定时任务
# crontab -e
0 0 * * * /data/skywalking/scripts/backup.sh
11.2 SkyWalking恢复
# systemctl stop skywalking-oap skywalking-ui
# 清理现有数据
# su – elasticsearch -c “curl -X DELETE http://localhost:9200/skywalking*”
# 恢复数据
# su – elasticsearch -c “/data/elasticsearch/bin/elasticsearch-dump –input=/backup/skywalking/skywalking-data-20230405.json –output=http://localhost:9200/”
# 恢复配置文件
# cp -r /backup/skywalking/config-20230405/* /data/skywalking/config/
# 启动SkyWalking服务
# systemctl start skywalking-oap skywalking-ui
# 验证恢复
# systemctl status skywalking-oap skywalking-ui
# curl http://localhost:12800/health
11.3 SkyWalking监控脚本
# vi /data/skywalking/scripts/monitor.sh
#!/bin/bash
LOG_FILE=”/var/log/skywalking_monitor.log”
ALERT_EMAIL=”admin@fgedu.net.cn”
check_oap_status() {
echo “$(date): Checking skywalking oap status…” >> $LOG_FILE
status=$(systemctl status skywalking-oap | grep Active | awk ‘{print $2}’)
if [ “$status” != “active” ]; then
echo “$(date): SkyWalking OAP is not running” >> $LOG_FILE
echo “SkyWalking OAP is not running” | mail -s “SkyWalking Alert” $ALERT_EMAIL
systemctl start skywalking-oap
else
echo “$(date): SkyWalking OAP is running” >> $LOG_FILE
fi
}
check_ui_status() {
echo “$(date): Checking skywalking ui status…” >> $LOG_FILE
status=$(systemctl status skywalking-ui | grep Active | awk ‘{print $2}’)
if [ “$status” != “active” ]; then
echo “$(date): SkyWalking UI is not running” >> $LOG_FILE
echo “SkyWalking UI is not running” | mail -s “SkyWalking Alert” $ALERT_EMAIL
systemctl start skywalking-ui
else
echo “$(date): SkyWalking UI is running” >> $LOG_FILE
fi
}
check_elasticsearch_status() {
echo “$(date): Checking elasticsearch status…” >> $LOG_FILE
status=$(curl -s -o /dev/null -w “%{http_code}” http://localhost:9200)
if [ “$status” = “200” ]; then
echo “$(date): Elasticsearch: OK” >> $LOG_FILE
else
echo “$(date): Elasticsearch: FAILED” >> $LOG_FILE
echo “Elasticsearch failed” | mail -s “SkyWalking Alert” $ALERT_EMAIL
fi
}
main() {
check_oap_status
check_ui_status
check_elasticsearch_status
}
main
# 添加执行权限
# chmod +x /data/skywalking/scripts/monitor.sh
# 添加定时任务
# crontab -e
*/15 * * * * /data/skywalking/scripts/monitor.sh
通过以上步骤,SkyWalking安装配置、性能优化、升级迁移、备份恢复等内容已全部完成。SkyWalking作为开源可观测性平台,能够高效地监控和分析分布式系统,是企业级微服务架构的重要工具。
本文由风哥教程整理发布,仅用于学习测试使用,转载注明出处:http://www.fgedu.net.cn/10327.html
