本文档详细介绍TiDB在Kubernetes环境下的部署与TiDB Operator的使用,包括K8s集群规划、TiDB Operator安装配置、TiDB集群部署管理、监控告警配置等内容。风哥教程参考TiDB官方文档TiDB on Kubernetes、TiDB Operator用户指南等内容,适合DBA和运维人员在生产环境中部署和管理TiDB K8s集群。
Part01-基础概念与理论知识
1.1 TiDB K8s部署概述
TiDB Operator是Kubernetes上的TiDB集群自动运维系统,提供部署、升级、扩缩容、配置变更、备份恢复等全生命周期管理能力。通过TiDB Operator,可以在K8s环境中快速部署和管理TiDB集群,充分利用K8s的容器编排能力。
- 自动化部署:一键部署完整的TiDB集群
- 弹性伸缩:支持在线扩缩容TiDB/TiKV/PD节点
- 故障自愈:自动检测和恢复故障节点
- 滚动升级:支持零停机版本升级
- 资源隔离:利用K8s命名空间实现多租户隔离
1.2 TiDB Operator核心概念
TiDB Operator基于Kubernetes Custom Resource Definition(CRD)实现,主要包含以下核心组件:
- TidbCluster:定义TiDB集群的期望状态,包括版本、节点数量、配置等
- TidbMonitor:定义监控组件,部署Prometheus和Grafana
- TidbInitializer:定义集群初始化任务,如创建数据库、用户等
- Backup/Restore:定义备份恢复任务
- TidbNGMonitoring:定义NGMonitoring组件,用于TiDB Dashboard
1.3 TiDB K8s架构原理
TiDB在K8s中的部署架构:
- PD Pod:部署PD服务,管理集群元数据和调度
- TiKV Pod:部署TiKV存储节点,使用本地存储或云存储
- TiDB Pod:部署TiDB计算节点,处理SQL请求
- TiFlash Pod:可选,部署TiFlash列式存储节点
- Monitor Pod:部署Prometheus和Grafana监控
Part02-生产环境规划与建议
2.1 K8s集群规划
生产环境K8s集群规划建议:
– Master节点:3节点,8核16GB内存
– Worker节点:至少3节点,16核64GB内存以上
– 节点配置建议:
– PD节点:4核8GB内存
– TiKV节点:8核32GB内存 + 本地SSD
– TiDB节点:8核16GB内存
风哥提示:
# K8s版本要求
– Kubernetes >= 1.19
– 推荐使用1.24-1.28版本
# 系统要求
– 操作系统:CentOS 7.9+ / RHEL 8.x / Ubuntu 20.04+
– 内核版本:4.19+
– Docker或containerd运行时
2.2 存储方案规划
TiDB K8s存储方案选择:
1. 本地存储(Local PV)
– 优点:性能最好,延迟最低
– 缺点:节点绑定,无法跨节点迁移
– 适用:生产环境TiKV存储
2. 网络存储(NFS/Ceph/Rook)
– 优点:灵活性高,支持动态扩容
– 缺点:性能较本地存储差
– 适用:PD存储、监控数据
3. 云存储(EBS/SSD Cloud Disk)
– 优点:管理方便,高可用
– 缺点:成本较高,性能依赖云厂商
– 适用:云平台部署
# TiKV存储要求
– 类型:本地SSD或高性能云盘
– IOPS:> 10000
– 吞吐量:> 500MB/s
– 容量:根据数据量规划,建议预留30%余量
2.3 网络与资源规划
网络和资源规划要点:
– Pod网络:/16网段,确保IP充足
– Service网络:/16网段
– 网络插件:Calico/Flannel/Cilium
– 网络策略:配置NetworkPolicy限制访问
# 资源配额规划
– 命名空间:为每个TiDB集群创建独立namespace
– ResourceQuota:限制CPU、内存、存储使用
– LimitRange:设置默认资源限制
# 端口规划
– TiDB端口:4000(SQL)、10080(HTTP)
– PD端口:2379(Client)、2380(Peer)
– TiKV端口:20160(Server)、20180(Status)
Part03-生产环境项目实施方案
3.1 K8s集群安装配置
3.1.1 使用kubeadm安装K8s集群
# 关闭防火墙和SELinux
# systemctl stop firewalld && systemctl disable firewalld
# setenforce 0
# sed -i ‘s/SELINUX=enforcing/SELINUX=disabled/g’ /etc/selinux/config
# 2. 配置主机名和hosts
# hostnamectl set-hostname k8s-master01
# cat >> /etc/hosts << EOF
192.168.1.101 k8s-master01
192.168.1.102 k8s-master02
192.168.1.103 k8s-master03
192.168.1.111 k8s-worker01
192.168.1.112 k8s-worker02
192.168.1.113 k8s-worker03
EOF
# 3. 安装容器运行时containerd
# yum install -y containerd.io
# systemctl enable containerd && systemctl start containerd
# 4. 安装kubeadm、kubelet、kubectl
# cat > /etc/yum.repos.d/kubernetes.repo << EOF
[kubernetes]
name=Kubernetes
baseurl=https://pkgs.k8s.io/core:/stable:/v1.28/rpm/
enabled=1
gpgcheck=1
gpgkey=https://pkgs.k8s.io/core:/stable:/v1.28/rpm/repodata/repomd.xml.key
EOF
# yum install -y kubelet kubeadm kubectl学习交流加群风哥QQ113257174
# systemctl enable kubelet
# 5. 初始化Master节点
# kubeadm init --pod-network-cidr=10.244.0.0/16 --service-cidr=10.96.0.0/12 \
# --apiserver-advertise-address=192.168.1.101
# 6. 配置kubectl
# mkdir -p $HOME/.kube
# cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
# chown $(id -u):$(id -g) $HOME/.kube/config
# 7. 安装网络插件Calico
# kubectl apply -f https://raw.githubusercontent.com/projectcalico/calico/v3.26.0/manifests/calico.yaml
# 8. Worker节点加入集群
# kubeadm join 192.168.1.101:6443 --token xxx --discovery-token-ca-cert-hash sha256:xxx
3.1.2 配置本地存储类
# cat > local-storage-class.yaml << EOF apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: name: local-storage provisioner: kubernetes.io/no-provisioner volumeBindingMode: WaitForFirstConsumer reclaimPolicy: Delete EOF # kubectl apply -f local-storage-class.yaml # 创建Local PV # cat > local-pv.yaml << EOF apiVersion: v1 kind: PersistentVolume metadata: name: local-pv-tikv-1 spec: capacity: storage: 500Gi accessModes: - ReadWriteOnce persistentVolumeReclaimPolicy: Retain storageClassName: local-storage local: path: /tidb/fgdata/tikv nodeAffinity: required: nodeSelectorTerms: - matchExpressions: - key: kubernetes.io/hostname operator: In values: - k8s-worker01 EOF # kubectl apply -f local-pv.yaml # 查看存储类 # kubectl get sc NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE local-storage kubernetes.io/no-provisioner Delete WaitForFirstConsumer false 10m # 查看PV # kubectl get pv NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE local-pv-tikv-1 500Gi RWO Retain Available local-storage 5m
3.2 TiDB Operator安装
3.2.1 安装TiDB Operator
# kubectl create namespace tidb-admin
namespace/tidb-admin created
# 2. 安装TiDB Operator CRD
# kubectl apply -f https://raw.githubusercontent.com/pingcap/tidb-operator/v1.5.0/manifests/crd.yaml
customresourcedefinition.apiextensions.k8s.io/tidbclusters.pingcap.com created
customresourcedefinition.apiextensions.k8s.io/backups.pingcap.com created
customresourcedefinition.apiextensions.k8s.io/restores.pingcap.com created
customresourcedefinition.apiextensions.k8s.io/backupschedules.pingcap.com created
customresourcedefinition.apiextensions.k8s.io/tidbmonitors.pingcap.com created
customresourcedefinition.apiextensions.k8s.io/tidbinitializers.pingcap.com created
customresourcedefinition.apiextensions.k8s.io/tidbgroups.pingcap.com created
# 3. 安装TiDB Operator
# kubectl apply -f https://raw.githubusercontent.com/pingcap/tidb-operator/v1.5.0/manifests/tidb-operator.yaml
deployment.apps/tidb-controller-manager created
service/tidb-controller-manager created
deployment.apps/tidb-scheduler created
service/tidb-scheduler created
# 4. 验证安装
# kubectl get pods -n tidb-admin
NAME READY STATUS RESTARTS AGE
tidb-controller-manager-7d5f8b9c4-x2v9p 1/1 Running 0 2m
tidb-scheduler-5c8d7f6b9e-y4w5z 2/2 Running 0 2m
# 5. 查看CRD
# kubectl get crd | grep pingcap.com
tidbclusters.pingcap.com 2024-04-09T10:00:00Z
tidbmonitors.pingcap.com 2024-04-09T10:00:00Z
backups.pingcap.com 2024-04-09T10:00:00Z
restores.pingcap.com 2024-04-09T10:00:00Z
3.2.2 配置TiDB Operator权限
# cat > tidb-operator-rbac.yaml << EOF apiVersion: v1 kind: ServiceAccount metadata: name: tidb-operator namespace: tidb-admin --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: tidb-operator rules: - apiGroups: ["*"] resources: ["*"] verbs: ["*"] --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: tidb-operator roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: tidb-operator subjects: - kind: ServiceAccount name: tidb-operator namespace: tidb-admin EOF # kubectl apply -f tidb-operator-rbac.yaml # 验证RBAC配置 # kubectl get clusterrolebinding tidb-operator NAME ROLE AGE tidb-operator ClusterRole/tidb-operator 5m
3.3 TiDB集群部署
3.3.1 创建TiDB集群
# kubectl create namespace fgedudb
namespace/fgedudb created
# 创建TiDB集群配置
# cat > tidb-cluster.yaml << EOF
apiVersion: pingcap.com/v1alpha1
kind: TidbCluster
metadata:
name: fgedudb
namespace: fgedudb
spec:
version: "v7.5.0"
timezone: UTC
configUpdateStrategy: RollingUpdate
helper:
image: busybox:1.34.1
pd:
baseImage: pingcap/pd
replicas: 3
requests:
storage: "50Gi"
config: |
[log]
level = "info"
storageClassName: standard
tikv:
baseImage: pingcap/tikv
replicas: 3
requests:
storage: "500Gi"
config: |
[rocksdb]
max-open-files = 10240
[raftdb]
max-open-files = 10240
storageClassName: local-storage
nodeSelector:
dedicated: tikv
tolerations:
- key: dedicated
operator: Equal
value: tikv
effect: NoSchedule
tidb:
baseImage: pingcap/tidb
replicas: 2
service:
type: NodePort
externalTrafficPolicy: Local
annotations:
"service.beta.kubernetes.io/aws-load-balancer-type": "nlb"
config: |
[log]
level = "info"
[performance]
max-procs = 0
nodeSelector:
dedicated: tidb
tiflash:
baseImage: pingcap/tiflash
replicas: 2
storageClaims:
- resources:
requests:
storage: "500Gi"
storageClassName: local-storage
nodeSelector:
dedicated: tiflash
EOF
# kubectl apply -f tidb-cluster.yaml
tidbcluster.pingcap.com/fgedudb created
# 查看集群创建状态
# kubectl get tidbcluster -n fgedudb
NAME PD TIKV TIDB AGE
fgedudb 3/3 3/3 2/2 2/2 2/2 2/2 5m
# 查看Pod状态
# kubectl get pods -n fgedudb
NAME READY STATUS RESTARTS AGE
fgedudb-discovery-5f8d7b9c4-x2v9p 1/1 Running 0 3m
fgedudb-pd-0 1/1 Running 0 3m
fgedudb-pd-1 1/1 Running 0 3m
fgedudb-pd-2 1/1 Running 0 3m
fgedudb-tikv-0 1/1 Running 0 2m
fgedudb-tikv-1 1/1 Running 0 2m
fgedudb-tikv-2 1/1 Running 0 2m
fgedudb-tidb-0 2/2 Running 0 1m
fgedudb-tidb-1 2/2 Running 0 1m
fgedudb-tiflash-0 1/1 Running 0 1m
fgedudb-tiflash-1 1/1 Running 0 1m
3.3.2 配置TiDB集群访问
# kubectl get svc -n fgedudb
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
fgedudb-discovery ClusterIP 10.96.123.45
fgedudb-pd ClusterIP 10.96.123.46
fgedudb-pd-peer ClusterIP None
fgedudb-tidb NodePort 10.96.123.47
fgedudb-tikv-peer ClusterIP None
fgedudb-tiflash ClusterIP 10.96.123.48
# 通过NodePort访问TiDB
# mysql -h 192.168.1.111 -P 30036 -u root
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 1
Server version: 5.7.25-TiDB-v7.5.0 TiDB Server (Apache License 2.0) Community Edition, MySQL 5.7 compatible
Type ‘help;’ or ‘\h’ for help. Type ‘\c’ to clear the current input statement.
mysql> select tidb_version();
+————————————————————————————————————————————————————————————————————————————————————————————+
| tidb_version() |
+————————————————————————————————————————————————————————————————————————————————————————————+
| Release Version: v7.5.0
Edition: Community
Git Commit Hash: 01fe22c99c0c6f7c1f2c3d4e5f6a7b8c9d0e1f2a
Git Branch: heads/refs/tags/v7.5.0
UTC Build Time: 2024-01-15 10:00:00
GoVersion: go1.21.5
Race Enabled: false
Check Table Before Drop: false
Store: tikv |
+————————————————————————————————————————————————————————————————————————————————————————————+
1 row in set (0.01 sec)
# 创建测试数据库
mysql> create database fgedudb01;
Query OK, 0 rows affected (0.12 sec)
mysql> use fgedudb01;
Database changed
mysql> create table fgedu_users (
-> id int primary key auto_increment,
-> username varchar(50) not null,
-> email varchar(100),
-> created_at timestamp default current_timestamp
-> );
Query OK, 0 rows affected (0.15 sec)
mysql> insert into fgedu_users (username, email) values
-> (‘fgedu01’, ‘fgedu01@fgedu.net.cn’),
-> (‘fgedu02’, ‘fgedu02@fgedu.net.cn’),
-> (‘fgedu03’, ‘fgedu03@fgedu.net.cn’);
Query OK, 3 rows affected (0.02 sec)
Records: 3 Duplicates: 0 Warnings: 0
mysql> select * from fgedu_users;
+—-+———-+————————+———————+
| id | username | email | created_at |
+—-+———-+————————+———————+
| 1 | fgedu01 | fgedu01@fgedu.net.cn | 2024-04-09 10:30:00 |
| 2 | fgedu02 | fgedu02@fgedu.net.cn | 2024-04-09 10:30:00 |
| 3 | fgedu03 | fgedu03@fgedu.net.cn | 2024-04-09 10:30:00 |
+—-+———-+————————+———————+
3 rows in set (0.01 sec)
Part04-生产案例与实战讲解
4.1 TiDB集群管理操作
4.1.1 TiDB集群扩缩容
# kubectl patch tidbcluster fgedudb -n fgedudb –type merge -p ‘{“spec”:{“tikv”:{“replicas”:5}}}’
tidbcluster.pingcap.com/fgedudb patched
# 查看扩容进度
# kubectl get pods -n fgedudb -w
NAME READY STATUS RESTARTS AGE
fgedudb-tikv-0 1/1 Running 0 30m
fgedudb-tikv-1 1/1 Running 0 30m
fgedudb-tikv-2 1/1 Running 0 30m
fgedudb-tikv-3 0/1 Pending 0 10s
fgedudb-tikv-3 0/1 Init:0/1 0 30s
fgedudb-tikv-3 0/1 Running 0 2m
fgedudb-tikv-3 1/1 Running 0 3m
fgedudb-tikv-4 0/1 Pending 0 0s
fgedudb-tikv-4 1/1 Running 0 5m
# 扩容TiDB节点(从2节点扩容到3节点)
# kubectl patch tidbcluster fgedudb -n fgedudb –type merge -p ‘{“spec”:{“tidb”:{“replicas”:3}}}’
tidbcluster.pingcap.com/fgedudb patched
# 查看TiDB节点状态
# kubectl get pods -n fgedudb -l app.kubernetes.io/component=tidb
NAME READY STATUS RESTARTS AGE
fgedudb-tidb-0 2/2 Running 0 35m
fgedudb-tidb-1 2/2 Running 0 35m
fgedudb-tidb-2 2/2 Running 0 2m
# 缩容操作(谨慎操作)
# kubectl patch tidbcluster fgedudb -n fgedudb –type merge -p ‘{“spec”:{“tikv”:{“replicas”:3}}}’
4.1.2 TiDB集群升级
# kubectl get tidbcluster fgedudb -n fgedudb -o jsonpath='{.spec.version}’
v7.5.0
# 升级TiDB集群到v7.5.1
# kubectl patch tidbcluster fgedudb -n fgedudb –type merge -p ‘{“spec”:{“version”:”v7.5.1″}}’
tidbcluster.pingcap.com/fgedudb patched
# 查看升级进度
# kubectl get pods -n fgedudb -w
NAME READY STATUS RESTARTS AGE
fgedudb-pd-0 1/1 Running 0 40m
fgedudb-pd-1 1/1 Running 0 40m
fgedudb-pd-2 1/1 Terminating 0 40m
fgedudb-pd-2 0/1 Terminating 0 40m
fgedudb-pd-2 0/1 ContainerCreating 0 0s
fgedudb-pd-2 1/1 Running 0 2m
fgedudb-tikv-0 1/1 Terminating 0 40m
fgedudb-tikv-0 0/1 Terminating 0 40m
fgedudb-tikv-0 0/1 ContainerCreating 0 0s
fgedudb-tikv-0 1/1 Running 0 3m
# 升级完成后验证版本
# mysql -h 192.168.1.111 -P 30036 -u root -e “select tidb_version();”
+————————————————————————————————————————————————————————————————————————————————————————————+
| tidb_version() |
+————————————————————————————————————————————————————————————————————————————————————————————+
| Release Version: v7.5.1
Edition: Community
Git Commit Hash: 12ab34cd56ef78ab90cd12ef34ab56cd78ef90ab
Git Branch: heads/refs/tags/v7.5.1
UTC Build Time: 2024-02-15 10:00:00
GoVersion: go1.21.6
Race Enabled: false
Check Table Before Drop: false
Store: tikv |
+————————————————————————————————————————————————————————————————————————————————————————————+
4.2 监控告警配置
4.2.1 部署TiDB Monitor
# cat > tidb-monitor.yaml << EOF apiVersion: pingcap.com/v1alpha1 kind: TidbMonitor metadata: name: fgedudb-monitor namespace: fgedudb spec: clusters: - name: fgedudb namespace: fgedudb prometheus: baseImage: prom/prometheus version: v2.49.0 service: type: NodePort portName: http-prometheus grafana: baseImage: grafana/grafana version: 10.2.3 service: type: NodePort portName: http-grafana initializer: baseImage: pingcap/tidb-monitor-initializer version: v7.5.0 reloader: baseImage: pingcap/tidb-monitor-reloader version: v1.0.1 prometheusReloader: baseImage: quay.io/prometheus-operator/prometheus-config-reloader version: v0.70.0 imagePullPolicy: IfNotPresent EOF # kubectl apply -f tidb-monitor.yaml tidbmonitor.pingcap.com/fgedudb-monitor created # 查看监控Pod # kubectl get pods -n fgedudb -l app.kubernetes.io/component=monitor NAME READY STATUS RESTARTS AGE fgedudb-monitor-monitor-0 3/3 Running 0 3m # 查看监控服务 # kubectl get svc -n fgedudb | grep monitor fgedudb-monitor-grafana NodePort 10.96.123.50
fgedudb-monitor-prometheus NodePort 10.96.123.51
# 访问Grafana
# http://192.168.1.111:30038
# 默认账号:admin/admin
4.2.2 配置告警规则
# cat > tidb-alert-rules.yaml << EOF apiVersion: v1 kind: ConfigMap metadata: name: tidb-alert-rules namespace: fgedudb data: tidb-alert-rules.yml: | groups: - name: tidb-alerts rules: - alert: TiDBDown expr: up{job="fgedudb-tidb"} == 0 for: 1m labels: severity: critical annotations: summary: "TiDB instance down" description: "TiDB instance {{ \$labels.instance }} is down" - alert: TiKVDown expr: up{job="fgedudb-tikv"} == 0 for: 1m labels: severity: critical annotations: summary: "TiKV instance down" description: "TiKV instance {{ \$labels.instance }} is down" - alert: PDDown expr: up{job="fgedudb-pd"} == 0 for: 1m labels: severity: critical annotations: summary: "PD instance down" description: "PD instance {{ \$labels.instance }} is down" - alert: TiDBHighQPS expr: rate(tidb_executor_statement_total[1m]) > 10000
for: 5m
labels:
severity: warning
annotations:
summary: “TiDB high QPS”
description: “TiDB QPS is higher than 10000″
– alert: TiKVHighCPU
expr: rate(process_cpu_seconds_total{job=”fgedudb-tikv”}[1m]) > 0.8
for: 5m
labels:
severity: warning
annotations:
summary: “TiKV high CPU usage”
description: “TiKV CPU usage is above 80%”
EOF
# kubectl apply -f tidb-alert-rules.yaml
# 配置Alertmanager
# cat > alertmanager-config.yaml << EOF
apiVersion: v1
kind: ConfigMap
metadata:
name: alertmanager-config
namespace: fgedudb
data:
alertmanager.yml: |
global:
smtp_smarthost: 'smtp.fgedu.net.cn:587'
smtp_from: 'alert@fgedu.net.cn'
smtp_auth_username: 'alert@fgedu.net.cn'
smtp_auth_password: 'your-password'
route:
receiver: 'default-receiver'
group_wait: 10s
group_interval: 10s
repeat_interval: 1h
receivers:
- name: 'default-receiver'
email_configs:
- to: 'dba@fgedu.net.cn'
subject: 'TiDB Alert: {{ .GroupLabels.alertname }}'
body: '{{ range .Alerts }}{{ .Annotations.description }}{{ end }}'
EOF
# kubectl apply -f alertmanager-config.yaml
4.3 备份恢复配置
4.3.1 配置定时备份
# tidb-backup.sh
# from:www.itpux.com.qq113257174.wx:itpux-com
# web: `http://www.fgedu.net.cn`
# 创建备份Secret(存储S3访问密钥)
# kubectl create secret generic backup-secret -n fgedudb \
# –from-literal=access_key=your-access-key \
# –from-literal=secret_key=your-secret-key
# 创建定时备份任务
# cat > tidb-backup-schedule.yaml << EOF
apiVersion: pingcap.com/v1alpha1
kind: BackupSchedule
metadata:
name: fgedudb-backup-schedule
namespace: fgedudb
spec:
maxBackups: 7
schedule: "0 2 * * *"
backupTemplate:
cluster:
name: fgedudb
namespace: fgedudb
from:
host: fgedudb-tidb
port: 4000
user: root
secretName: fgedudb-backup-secret
s3:
provider: aws
region: cn-north-1
bucket: fgedudb-backup
prefix: tidb-backup
storageClassName: standard
storageSize: 100Gi
EOF
# kubectl apply -f tidb-backup-schedule.yaml
# 查看备份任务
# kubectl get backupschedule -n fgedudb
NAME SCHEDULE MAXBACKUPS LASTBACKUP AGE
fgedudb-backup-schedule 0 2 * * * 7 10h 1d
# 查看备份历史
# kubectl get backups -n fgedudb
NAME STATUS BACKUPPATH SIZE AGE
fgedudb-backup-20240408 Complete s3://fgedudb-backup/tidb/ 50Gi 10h
fgedudb-backup-20240407 Complete s3://fgedudb-backup/tidb/ 48Gi 1d
fgedudb-backup-20240406 Complete s3://fgedudb-backup/tidb/ 47Gi 2d
4.3.2 执行数据恢复
# cat > tidb-restore.yaml << EOF apiVersion: pingcap.com/v1alpha1 kind: Restore metadata: name: fgedudb-restore namespace: fgedudb spec: cluster: name: fgedudb namespace: fgedudb to: host: fgedudb-tidb port: 4000 user: root secretName: fgedudb-backup-secret s3: provider: aws region: cn-north-1 bucket: fgedudb-backup path: tidb-backup/fgedudb-backup-20240408 storageClassName: standard storageSize: 100Gi EOF # kubectl apply -f tidb-restore.yaml # 查看恢复进度 # kubectl get restore -n fgedudb -w NAME STATUS PROGRESS AGE fgedudb-restore Running 50% 5m fgedudb-restore Complete 100% 15m # 查看恢复详情 # kubectl describe restore fgedudb-restore -n fgedudb Name: fgedudb-restore Namespace: fgedudb Labels:
Annotations:
API Version: pingcap.com/v1alpha1
Kind: Restore
Metadata:
Creation Timestamp: 2024-04-09T12:00:00Z
Spec:
Cluster:
Name: fgedudb
Namespace: fgedudb
S3:
Bucket: fgedudb-backup
Path: tidb-backup/fgedudb-backup-20240408
Provider: aws
Region: cn-north-1
Status:
Commit Ts: 4489234567890123456
Conditions:
Last Transition Time: 2024-04-09T12:15:00Z
Message: Restore completed successfully
Reason: RestoreComplete
Status: True
Type: Complete
Phase: Complete
Progress: 100%
Part05-风哥经验总结与分享
5.1 K8s部署最佳实践
TiDB K8s部署最佳实践总结:
- 存储选择:TiKV必须使用本地SSD存储,确保IO性能满足要求
- 节点规划:TiKV节点独立部署,避免与其他应用混部
- 资源限制:为所有组件配置合理的资源请求和限制
- 高可用配置:PD至少3节点,TiKV至少3节点,TiDB至少2节点
- 监控告警:必须部署完整的监控告警体系
- 备份策略:配置定时备份,并定期验证备份有效性
5.2 常见问题处理
# 检查存储挂载
# kubectl describe pod fgedudb-tikv-0 -n fgedudb
# 检查节点磁盘空间
# kubectl exec -it fgedudb-tikv-0 -n fgedudb — df -h
# 问题2:TiDB连接超时
# 检查Service配置
# kubectl get svc fgedudb-tidb -n fgedudb
# 检查网络策略
# kubectl get networkpolicy -n fgedudb
# 问题3:PD集群异常
# 查看PD日志
# kubectl logs fgedudb-pd-0 -n fgedudb
# 检查PD成员状态
# kubectl exec -it fgedudb-pd-0 -n fgedudb — /pd-ctl member list
# 问题4:集群升级失败
# 查看Operator日志
# kubectl logs -n tidb-admin deployment/tidb-controller-manager
# 回滚升级
# kubectl patch tidbcluster fgedudb -n fgedudb –type merge -p ‘{“spec”:{“version”:”v7.5.0″}}’
5.3 运维管理建议
TiDB K8s集群运维管理建议:
- 日常巡检:定期检查Pod状态、资源使用、存储容量
- 日志管理:配置日志收集系统(EFK/Loki)集中管理日志
- 容量规划:监控存储使用率,提前规划扩容
- 安全加固:配置NetworkPolicy、RBAC、Secret加密
- 灾难恢复:定期演练备份恢复流程
- 版本管理:关注TiDB和Operator版本更新,规划升级
本文由风哥教程整理发布,仅用于学习测试使用,转载注明出处:http://www.fgedu.net.cn/10327.html
