1. 首页 > 国产数据库教程 > TiDB教程 > 正文

tidb教程FG101-TiDB K8s部署与Operator使用实战

本文档详细介绍TiDB在Kubernetes环境下的部署与TiDB Operator的使用,包括K8s集群规划、TiDB Operator安装配置、TiDB集群部署管理、监控告警配置等内容。风哥教程参考TiDB官方文档TiDB on Kubernetes、TiDB Operator用户指南等内容,适合DBA和运维人员在生产环境中部署和管理TiDB K8s集群。

Part01-基础概念与理论知识

1.1 TiDB K8s部署概述

TiDB Operator是Kubernetes上的TiDB集群自动运维系统,提供部署、升级、扩缩容、配置变更、备份恢复等全生命周期管理能力。通过TiDB Operator,可以在K8s环境中快速部署和管理TiDB集群,充分利用K8s的容器编排能力。

TiDB K8s部署优势:

  • 自动化部署:一键部署完整的TiDB集群
  • 弹性伸缩:支持在线扩缩容TiDB/TiKV/PD节点
  • 故障自愈:自动检测和恢复故障节点
  • 滚动升级:支持零停机版本升级
  • 资源隔离:利用K8s命名空间实现多租户隔离

1.2 TiDB Operator核心概念

TiDB Operator基于Kubernetes Custom Resource Definition(CRD)实现,主要包含以下核心组件:

  • TidbCluster:定义TiDB集群的期望状态,包括版本、节点数量、配置等
  • TidbMonitor:定义监控组件,部署Prometheus和Grafana
  • TidbInitializer:定义集群初始化任务,如创建数据库、用户等
  • Backup/Restore:定义备份恢复任务
  • TidbNGMonitoring:定义NGMonitoring组件,用于TiDB Dashboard

1.3 TiDB K8s架构原理

TiDB在K8s中的部署架构:

  • PD Pod:部署PD服务,管理集群元数据和调度
  • TiKV Pod:部署TiKV存储节点,使用本地存储或云存储
  • TiDB Pod:部署TiDB计算节点,处理SQL请求
  • TiFlash Pod:可选,部署TiFlash列式存储节点
  • Monitor Pod:部署Prometheus和Grafana监控
风哥提示:在K8s环境中部署TiDB,存储方案是关键。生产环境建议使用本地SSD存储或高性能云存储,确保TiKV的IO性能。更多视频教程www.fgedu.net.cn

Part02-生产环境规划与建议

2.1 K8s集群规划

生产环境K8s集群规划建议:

# K8s集群节点规划
– Master节点:3节点,8核16GB内存
– Worker节点:至少3节点,16核64GB内存以上
– 节点配置建议:
– PD节点:4核8GB内存
– TiKV节点:8核32GB内存 + 本地SSD
– TiDB节点:8核16GB内存
风哥提示:
# K8s版本要求
– Kubernetes >= 1.19
– 推荐使用1.24-1.28版本

# 系统要求
– 操作系统:CentOS 7.9+ / RHEL 8.x / Ubuntu 20.04+
– 内核版本:4.19+
– Docker或containerd运行时

2.2 存储方案规划

TiDB K8s存储方案选择:

# 存储方案对比
1. 本地存储(Local PV)
– 优点:性能最好,延迟最低
– 缺点:节点绑定,无法跨节点迁移
– 适用:生产环境TiKV存储

2. 网络存储(NFS/Ceph/Rook)
– 优点:灵活性高,支持动态扩容
– 缺点:性能较本地存储差
– 适用:PD存储、监控数据

3. 云存储(EBS/SSD Cloud Disk)
– 优点:管理方便,高可用
– 缺点:成本较高,性能依赖云厂商
– 适用:云平台部署

# TiKV存储要求
– 类型:本地SSD或高性能云盘
– IOPS:> 10000
– 吞吐量:> 500MB/s
– 容量:根据数据量规划,建议预留30%余量

2.3 网络与资源规划

网络和资源规划要点:

# 网络规划
– Pod网络:/16网段,确保IP充足
– Service网络:/16网段
– 网络插件:Calico/Flannel/Cilium
– 网络策略:配置NetworkPolicy限制访问

# 资源配额规划
– 命名空间:为每个TiDB集群创建独立namespace
– ResourceQuota:限制CPU、内存、存储使用
– LimitRange:设置默认资源限制

# 端口规划
– TiDB端口:4000(SQL)、10080(HTTP)
– PD端口:2379(Client)、2380(Peer)
– TiKV端口:20160(Server)、20180(Status)

生产环境建议:K8s集群至少3个Worker节点,TiKV使用本地SSD存储,PD和TiDB可以使用网络存储。配置节点亲和性,确保TiKV Pod均匀分布在不同节点上。学习交流加群风哥微信: itpux-com

Part03-生产环境项目实施方案

3.1 K8s集群安装配置

3.1.1 使用kubeadm安装K8s集群

# 1. 准备节点环境(所有节点执行)
# 关闭防火墙和SELinux
# systemctl stop firewalld && systemctl disable firewalld
# setenforce 0
# sed -i ‘s/SELINUX=enforcing/SELINUX=disabled/g’ /etc/selinux/config

# 2. 配置主机名和hosts
# hostnamectl set-hostname k8s-master01
# cat >> /etc/hosts << EOF 192.168.1.101 k8s-master01 192.168.1.102 k8s-master02 192.168.1.103 k8s-master03 192.168.1.111 k8s-worker01 192.168.1.112 k8s-worker02 192.168.1.113 k8s-worker03 EOF # 3. 安装容器运行时containerd # yum install -y containerd.io # systemctl enable containerd && systemctl start containerd # 4. 安装kubeadm、kubelet、kubectl # cat > /etc/yum.repos.d/kubernetes.repo << EOF [kubernetes] name=Kubernetes baseurl=https://pkgs.k8s.io/core:/stable:/v1.28/rpm/ enabled=1 gpgcheck=1 gpgkey=https://pkgs.k8s.io/core:/stable:/v1.28/rpm/repodata/repomd.xml.key EOF # yum install -y kubelet kubeadm kubectl学习交流加群风哥QQ113257174 # systemctl enable kubelet # 5. 初始化Master节点 # kubeadm init --pod-network-cidr=10.244.0.0/16 --service-cidr=10.96.0.0/12 \ # --apiserver-advertise-address=192.168.1.101 # 6. 配置kubectl # mkdir -p $HOME/.kube # cp -i /etc/kubernetes/admin.conf $HOME/.kube/config # chown $(id -u):$(id -g) $HOME/.kube/config # 7. 安装网络插件Calico # kubectl apply -f https://raw.githubusercontent.com/projectcalico/calico/v3.26.0/manifests/calico.yaml # 8. Worker节点加入集群 # kubeadm join 192.168.1.101:6443 --token xxx --discovery-token-ca-cert-hash sha256:xxx

3.1.2 配置本地存储类

# 创建Local PV存储类
# cat > local-storage-class.yaml << EOF apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: name: local-storage provisioner: kubernetes.io/no-provisioner volumeBindingMode: WaitForFirstConsumer reclaimPolicy: Delete EOF # kubectl apply -f local-storage-class.yaml # 创建Local PV # cat > local-pv.yaml << EOF apiVersion: v1 kind: PersistentVolume metadata: name: local-pv-tikv-1 spec: capacity: storage: 500Gi accessModes: - ReadWriteOnce persistentVolumeReclaimPolicy: Retain storageClassName: local-storage local: path: /tidb/fgdata/tikv nodeAffinity: required: nodeSelectorTerms: - matchExpressions: - key: kubernetes.io/hostname operator: In values: - k8s-worker01 EOF # kubectl apply -f local-pv.yaml # 查看存储类 # kubectl get sc NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE local-storage kubernetes.io/no-provisioner Delete WaitForFirstConsumer false 10m # 查看PV # kubectl get pv NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE local-pv-tikv-1 500Gi RWO Retain Available local-storage 5m

3.2 TiDB Operator安装

3.2.1 安装TiDB Operator

# 1. 创建tidb-admin命名空间
# kubectl create namespace tidb-admin

namespace/tidb-admin created

# 2. 安装TiDB Operator CRD
# kubectl apply -f https://raw.githubusercontent.com/pingcap/tidb-operator/v1.5.0/manifests/crd.yaml

customresourcedefinition.apiextensions.k8s.io/tidbclusters.pingcap.com created
customresourcedefinition.apiextensions.k8s.io/backups.pingcap.com created
customresourcedefinition.apiextensions.k8s.io/restores.pingcap.com created
customresourcedefinition.apiextensions.k8s.io/backupschedules.pingcap.com created
customresourcedefinition.apiextensions.k8s.io/tidbmonitors.pingcap.com created
customresourcedefinition.apiextensions.k8s.io/tidbinitializers.pingcap.com created
customresourcedefinition.apiextensions.k8s.io/tidbgroups.pingcap.com created

# 3. 安装TiDB Operator
# kubectl apply -f https://raw.githubusercontent.com/pingcap/tidb-operator/v1.5.0/manifests/tidb-operator.yaml

deployment.apps/tidb-controller-manager created
service/tidb-controller-manager created
deployment.apps/tidb-scheduler created
service/tidb-scheduler created

# 4. 验证安装
# kubectl get pods -n tidb-admin
NAME READY STATUS RESTARTS AGE
tidb-controller-manager-7d5f8b9c4-x2v9p 1/1 Running 0 2m
tidb-scheduler-5c8d7f6b9e-y4w5z 2/2 Running 0 2m

# 5. 查看CRD
# kubectl get crd | grep pingcap.com
tidbclusters.pingcap.com 2024-04-09T10:00:00Z
tidbmonitors.pingcap.com 2024-04-09T10:00:00Z
backups.pingcap.com 2024-04-09T10:00:00Z
restores.pingcap.com 2024-04-09T10:00:00Z

3.2.2 配置TiDB Operator权限

# 创建ServiceAccount和RBAC
# cat > tidb-operator-rbac.yaml << EOF apiVersion: v1 kind: ServiceAccount metadata: name: tidb-operator namespace: tidb-admin --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: tidb-operator rules: - apiGroups: ["*"] resources: ["*"] verbs: ["*"] --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: tidb-operator roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: tidb-operator subjects: - kind: ServiceAccount name: tidb-operator namespace: tidb-admin EOF # kubectl apply -f tidb-operator-rbac.yaml # 验证RBAC配置 # kubectl get clusterrolebinding tidb-operator NAME ROLE AGE tidb-operator ClusterRole/tidb-operator 5m

3.3 TiDB集群部署

3.3.1 创建TiDB集群

# 创建命名空间
# kubectl create namespace fgedudb

namespace/fgedudb created

# 创建TiDB集群配置
# cat > tidb-cluster.yaml << EOF apiVersion: pingcap.com/v1alpha1 kind: TidbCluster metadata: name: fgedudb namespace: fgedudb spec: version: "v7.5.0" timezone: UTC configUpdateStrategy: RollingUpdate helper: image: busybox:1.34.1 pd: baseImage: pingcap/pd replicas: 3 requests: storage: "50Gi" config: | [log] level = "info" storageClassName: standard tikv: baseImage: pingcap/tikv replicas: 3 requests: storage: "500Gi" config: | [rocksdb] max-open-files = 10240 [raftdb] max-open-files = 10240 storageClassName: local-storage nodeSelector: dedicated: tikv tolerations: - key: dedicated operator: Equal value: tikv effect: NoSchedule tidb: baseImage: pingcap/tidb replicas: 2 service: type: NodePort externalTrafficPolicy: Local annotations: "service.beta.kubernetes.io/aws-load-balancer-type": "nlb" config: | [log] level = "info" [performance] max-procs = 0 nodeSelector: dedicated: tidb tiflash: baseImage: pingcap/tiflash replicas: 2 storageClaims: - resources: requests: storage: "500Gi" storageClassName: local-storage nodeSelector: dedicated: tiflash EOF # kubectl apply -f tidb-cluster.yaml tidbcluster.pingcap.com/fgedudb created # 查看集群创建状态 # kubectl get tidbcluster -n fgedudb NAME PD TIKV TIDB AGE fgedudb 3/3 3/3 2/2 2/2 2/2 2/2 5m # 查看Pod状态 # kubectl get pods -n fgedudb NAME READY STATUS RESTARTS AGE fgedudb-discovery-5f8d7b9c4-x2v9p 1/1 Running 0 3m fgedudb-pd-0 1/1 Running 0 3m fgedudb-pd-1 1/1 Running 0 3m fgedudb-pd-2 1/1 Running 0 3m fgedudb-tikv-0 1/1 Running 0 2m fgedudb-tikv-1 1/1 Running 0 2m fgedudb-tikv-2 1/1 Running 0 2m fgedudb-tidb-0 2/2 Running 0 1m fgedudb-tidb-1 2/2 Running 0 1m fgedudb-tiflash-0 1/1 Running 0 1m fgedudb-tiflash-1 1/1 Running 0 1m

3.3.2 配置TiDB集群访问

# 查看TiDB服务
# kubectl get svc -n fgedudb
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
fgedudb-discovery ClusterIP 10.96.123.45 10261/TCP,10262/TCP 10m
fgedudb-pd ClusterIP 10.96.123.46 2379/TCP 10m
fgedudb-pd-peer ClusterIP None 2380/TCP 10m
fgedudb-tidb NodePort 10.96.123.47 4000:30036/TCP,10080:30037/TCP 8m
fgedudb-tikv-peer ClusterIP None 20160/TCP 9m
fgedudb-tiflash ClusterIP 10.96.123.48 9000/TCP,8123/TCP,3930/TCP 7m

# 通过NodePort访问TiDB
# mysql -h 192.168.1.111 -P 30036 -u root

Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 1
Server version: 5.7.25-TiDB-v7.5.0 TiDB Server (Apache License 2.0) Community Edition, MySQL 5.7 compatible

Type ‘help;’ or ‘\h’ for help. Type ‘\c’ to clear the current input statement.

mysql> select tidb_version();
+————————————————————————————————————————————————————————————————————————————————————————————+
| tidb_version() |
+————————————————————————————————————————————————————————————————————————————————————————————+
| Release Version: v7.5.0
Edition: Community
Git Commit Hash: 01fe22c99c0c6f7c1f2c3d4e5f6a7b8c9d0e1f2a
Git Branch: heads/refs/tags/v7.5.0
UTC Build Time: 2024-01-15 10:00:00
GoVersion: go1.21.5
Race Enabled: false
Check Table Before Drop: false
Store: tikv |
+————————————————————————————————————————————————————————————————————————————————————————————+
1 row in set (0.01 sec)

# 创建测试数据库
mysql> create database fgedudb01;
Query OK, 0 rows affected (0.12 sec)

mysql> use fgedudb01;
Database changed

mysql> create table fgedu_users (
-> id int primary key auto_increment,
-> username varchar(50) not null,
-> email varchar(100),
-> created_at timestamp default current_timestamp
-> );
Query OK, 0 rows affected (0.15 sec)

mysql> insert into fgedu_users (username, email) values
-> (‘fgedu01’, ‘fgedu01@fgedu.net.cn’),
-> (‘fgedu02’, ‘fgedu02@fgedu.net.cn’),
-> (‘fgedu03’, ‘fgedu03@fgedu.net.cn’);
Query OK, 3 rows affected (0.02 sec)
Records: 3 Duplicates: 0 Warnings: 0

mysql> select * from fgedu_users;
+—-+———-+————————+———————+
| id | username | email | created_at |
+—-+———-+————————+———————+
| 1 | fgedu01 | fgedu01@fgedu.net.cn | 2024-04-09 10:30:00 |
| 2 | fgedu02 | fgedu02@fgedu.net.cn | 2024-04-09 10:30:00 |
| 3 | fgedu03 | fgedu03@fgedu.net.cn | 2024-04-09 10:30:00 |
+—-+———-+————————+———————+
3 rows in set (0.01 sec)

风哥提示:TiDB集群部署成功后,建议立即配置监控和备份策略。K8s环境中的TiDB集群管理更加便捷,但也需要掌握K8s的基本操作和故障排查方法。学习交流加群风哥QQ113257174

Part04-生产案例与实战讲解

4.1 TiDB集群管理操作

4.1.1 TiDB集群扩缩容

# 扩容TiKV节点(从3节点扩容到5节点)
# kubectl patch tidbcluster fgedudb -n fgedudb –type merge -p ‘{“spec”:{“tikv”:{“replicas”:5}}}’

tidbcluster.pingcap.com/fgedudb patched

# 查看扩容进度
# kubectl get pods -n fgedudb -w
NAME READY STATUS RESTARTS AGE
fgedudb-tikv-0 1/1 Running 0 30m
fgedudb-tikv-1 1/1 Running 0 30m
fgedudb-tikv-2 1/1 Running 0 30m
fgedudb-tikv-3 0/1 Pending 0 10s
fgedudb-tikv-3 0/1 Init:0/1 0 30s
fgedudb-tikv-3 0/1 Running 0 2m
fgedudb-tikv-3 1/1 Running 0 3m
fgedudb-tikv-4 0/1 Pending 0 0s
fgedudb-tikv-4 1/1 Running 0 5m

# 扩容TiDB节点(从2节点扩容到3节点)
# kubectl patch tidbcluster fgedudb -n fgedudb –type merge -p ‘{“spec”:{“tidb”:{“replicas”:3}}}’

tidbcluster.pingcap.com/fgedudb patched

# 查看TiDB节点状态
# kubectl get pods -n fgedudb -l app.kubernetes.io/component=tidb
NAME READY STATUS RESTARTS AGE
fgedudb-tidb-0 2/2 Running 0 35m
fgedudb-tidb-1 2/2 Running 0 35m
fgedudb-tidb-2 2/2 Running 0 2m

# 缩容操作(谨慎操作)
# kubectl patch tidbcluster fgedudb -n fgedudb –type merge -p ‘{“spec”:{“tikv”:{“replicas”:3}}}’

4.1.2 TiDB集群升级

# 查看当前版本
# kubectl get tidbcluster fgedudb -n fgedudb -o jsonpath='{.spec.version}’
v7.5.0

# 升级TiDB集群到v7.5.1
# kubectl patch tidbcluster fgedudb -n fgedudb –type merge -p ‘{“spec”:{“version”:”v7.5.1″}}’

tidbcluster.pingcap.com/fgedudb patched

# 查看升级进度
# kubectl get pods -n fgedudb -w
NAME READY STATUS RESTARTS AGE
fgedudb-pd-0 1/1 Running 0 40m
fgedudb-pd-1 1/1 Running 0 40m
fgedudb-pd-2 1/1 Terminating 0 40m
fgedudb-pd-2 0/1 Terminating 0 40m
fgedudb-pd-2 0/1 ContainerCreating 0 0s
fgedudb-pd-2 1/1 Running 0 2m
fgedudb-tikv-0 1/1 Terminating 0 40m
fgedudb-tikv-0 0/1 Terminating 0 40m
fgedudb-tikv-0 0/1 ContainerCreating 0 0s
fgedudb-tikv-0 1/1 Running 0 3m

# 升级完成后验证版本
# mysql -h 192.168.1.111 -P 30036 -u root -e “select tidb_version();”
+————————————————————————————————————————————————————————————————————————————————————————————+
| tidb_version() |
+————————————————————————————————————————————————————————————————————————————————————————————+
| Release Version: v7.5.1
Edition: Community
Git Commit Hash: 12ab34cd56ef78ab90cd12ef34ab56cd78ef90ab
Git Branch: heads/refs/tags/v7.5.1
UTC Build Time: 2024-02-15 10:00:00
GoVersion: go1.21.6
Race Enabled: false
Check Table Before Drop: false
Store: tikv |
+————————————————————————————————————————————————————————————————————————————————————————————+

4.2 监控告警配置

4.2.1 部署TiDB Monitor

# 创建TidbMonitor配置
# cat > tidb-monitor.yaml << EOF apiVersion: pingcap.com/v1alpha1 kind: TidbMonitor metadata: name: fgedudb-monitor namespace: fgedudb spec: clusters: - name: fgedudb namespace: fgedudb prometheus: baseImage: prom/prometheus version: v2.49.0 service: type: NodePort portName: http-prometheus grafana: baseImage: grafana/grafana version: 10.2.3 service: type: NodePort portName: http-grafana initializer: baseImage: pingcap/tidb-monitor-initializer version: v7.5.0 reloader: baseImage: pingcap/tidb-monitor-reloader version: v1.0.1 prometheusReloader: baseImage: quay.io/prometheus-operator/prometheus-config-reloader version: v0.70.0 imagePullPolicy: IfNotPresent EOF # kubectl apply -f tidb-monitor.yaml tidbmonitor.pingcap.com/fgedudb-monitor created # 查看监控Pod # kubectl get pods -n fgedudb -l app.kubernetes.io/component=monitor NAME READY STATUS RESTARTS AGE fgedudb-monitor-monitor-0 3/3 Running 0 3m # 查看监控服务 # kubectl get svc -n fgedudb | grep monitor fgedudb-monitor-grafana NodePort 10.96.123.50 3000:30038/TCP 5m
fgedudb-monitor-prometheus NodePort 10.96.123.51 9090:30039/TCP 5m

# 访问Grafana
# http://192.168.1.111:30038
# 默认账号:admin/admin

4.2.2 配置告警规则

# 创建告警规则ConfigMap
# cat > tidb-alert-rules.yaml << EOF apiVersion: v1 kind: ConfigMap metadata: name: tidb-alert-rules namespace: fgedudb data: tidb-alert-rules.yml: | groups: - name: tidb-alerts rules: - alert: TiDBDown expr: up{job="fgedudb-tidb"} == 0 for: 1m labels: severity: critical annotations: summary: "TiDB instance down" description: "TiDB instance {{ \$labels.instance }} is down" - alert: TiKVDown expr: up{job="fgedudb-tikv"} == 0 for: 1m labels: severity: critical annotations: summary: "TiKV instance down" description: "TiKV instance {{ \$labels.instance }} is down" - alert: PDDown expr: up{job="fgedudb-pd"} == 0 for: 1m labels: severity: critical annotations: summary: "PD instance down" description: "PD instance {{ \$labels.instance }} is down" - alert: TiDBHighQPS expr: rate(tidb_executor_statement_total[1m]) > 10000
for: 5m
labels:
severity: warning
annotations:
summary: “TiDB high QPS”
description: “TiDB QPS is higher than 10000″

– alert: TiKVHighCPU
expr: rate(process_cpu_seconds_total{job=”fgedudb-tikv”}[1m]) > 0.8
for: 5m
labels:
severity: warning
annotations:
summary: “TiKV high CPU usage”
description: “TiKV CPU usage is above 80%”
EOF

# kubectl apply -f tidb-alert-rules.yaml

# 配置Alertmanager
# cat > alertmanager-config.yaml << EOF apiVersion: v1 kind: ConfigMap metadata: name: alertmanager-config namespace: fgedudb data: alertmanager.yml: | global: smtp_smarthost: 'smtp.fgedu.net.cn:587' smtp_from: 'alert@fgedu.net.cn' smtp_auth_username: 'alert@fgedu.net.cn' smtp_auth_password: 'your-password' route: receiver: 'default-receiver' group_wait: 10s group_interval: 10s repeat_interval: 1h receivers: - name: 'default-receiver' email_configs: - to: 'dba@fgedu.net.cn' subject: 'TiDB Alert: {{ .GroupLabels.alertname }}' body: '{{ range .Alerts }}{{ .Annotations.description }}{{ end }}' EOF # kubectl apply -f alertmanager-config.yaml

4.3 备份恢复配置

4.3.1 配置定时备份

#!/bin/bash
# tidb-backup.sh
# from:www.itpux.com.qq113257174.wx:itpux-com
# web: `http://www.fgedu.net.cn`

# 创建备份Secret(存储S3访问密钥)
# kubectl create secret generic backup-secret -n fgedudb \
# –from-literal=access_key=your-access-key \
# –from-literal=secret_key=your-secret-key

# 创建定时备份任务
# cat > tidb-backup-schedule.yaml << EOF apiVersion: pingcap.com/v1alpha1 kind: BackupSchedule metadata: name: fgedudb-backup-schedule namespace: fgedudb spec: maxBackups: 7 schedule: "0 2 * * *" backupTemplate: cluster: name: fgedudb namespace: fgedudb from: host: fgedudb-tidb port: 4000 user: root secretName: fgedudb-backup-secret s3: provider: aws region: cn-north-1 bucket: fgedudb-backup prefix: tidb-backup storageClassName: standard storageSize: 100Gi EOF # kubectl apply -f tidb-backup-schedule.yaml # 查看备份任务 # kubectl get backupschedule -n fgedudb NAME SCHEDULE MAXBACKUPS LASTBACKUP AGE fgedudb-backup-schedule 0 2 * * * 7 10h 1d # 查看备份历史 # kubectl get backups -n fgedudb NAME STATUS BACKUPPATH SIZE AGE fgedudb-backup-20240408 Complete s3://fgedudb-backup/tidb/ 50Gi 10h fgedudb-backup-20240407 Complete s3://fgedudb-backup/tidb/ 48Gi 1d fgedudb-backup-20240406 Complete s3://fgedudb-backup/tidb/ 47Gi 2d

4.3.2 执行数据恢复

# 创建恢复任务
# cat > tidb-restore.yaml << EOF apiVersion: pingcap.com/v1alpha1 kind: Restore metadata: name: fgedudb-restore namespace: fgedudb spec: cluster: name: fgedudb namespace: fgedudb to: host: fgedudb-tidb port: 4000 user: root secretName: fgedudb-backup-secret s3: provider: aws region: cn-north-1 bucket: fgedudb-backup path: tidb-backup/fgedudb-backup-20240408 storageClassName: standard storageSize: 100Gi EOF # kubectl apply -f tidb-restore.yaml # 查看恢复进度 # kubectl get restore -n fgedudb -w NAME STATUS PROGRESS AGE fgedudb-restore Running 50% 5m fgedudb-restore Complete 100% 15m # 查看恢复详情 # kubectl describe restore fgedudb-restore -n fgedudb Name: fgedudb-restore Namespace: fgedudb Labels:
Annotations:
API Version: pingcap.com/v1alpha1
Kind: Restore
Metadata:
Creation Timestamp: 2024-04-09T12:00:00Z
Spec:
Cluster:
Name: fgedudb
Namespace: fgedudb
S3:
Bucket: fgedudb-backup
Path: tidb-backup/fgedudb-backup-20240408
Provider: aws
Region: cn-north-1
Status:
Commit Ts: 4489234567890123456
Conditions:
Last Transition Time: 2024-04-09T12:15:00Z
Message: Restore completed successfully
Reason: RestoreComplete
Status: True
Type: Complete
Phase: Complete
Progress: 100%

Part05-风哥经验总结与分享

5.1 K8s部署最佳实践

TiDB K8s部署最佳实践总结:

  • 存储选择:TiKV必须使用本地SSD存储,确保IO性能满足要求
  • 节点规划:TiKV节点独立部署,避免与其他应用混部
  • 资源限制:为所有组件配置合理的资源请求和限制
  • 高可用配置:PD至少3节点,TiKV至少3节点,TiDB至少2节点
  • 监控告警:必须部署完整的监控告警体系
  • 备份策略:配置定时备份,并定期验证备份有效性
风哥提示:生产环境部署TiDB on K8s时,务必做好存储规划和网络规划。建议使用专用的K8s集群部署TiDB,避免与其他业务混部。更多学习教程公众号风哥教程itpux_com

5.2 常见问题处理

# 问题1:TiKV Pod启动失败
# 检查存储挂载
# kubectl describe pod fgedudb-tikv-0 -n fgedudb
# 检查节点磁盘空间
# kubectl exec -it fgedudb-tikv-0 -n fgedudb — df -h

# 问题2:TiDB连接超时
# 检查Service配置
# kubectl get svc fgedudb-tidb -n fgedudb
# 检查网络策略
# kubectl get networkpolicy -n fgedudb

# 问题3:PD集群异常
# 查看PD日志
# kubectl logs fgedudb-pd-0 -n fgedudb
# 检查PD成员状态
# kubectl exec -it fgedudb-pd-0 -n fgedudb — /pd-ctl member list

# 问题4:集群升级失败
# 查看Operator日志
# kubectl logs -n tidb-admin deployment/tidb-controller-manager
# 回滚升级
# kubectl patch tidbcluster fgedudb -n fgedudb –type merge -p ‘{“spec”:{“version”:”v7.5.0″}}’

5.3 运维管理建议

TiDB K8s集群运维管理建议:

  • 日常巡检:定期检查Pod状态、资源使用、存储容量
  • 日志管理:配置日志收集系统(EFK/Loki)集中管理日志
  • 容量规划:监控存储使用率,提前规划扩容
  • 安全加固:配置NetworkPolicy、RBAC、Secret加密
  • 灾难恢复:定期演练备份恢复流程
  • 版本管理:关注TiDB和Operator版本更新,规划升级
风哥提示:TiDB Operator大大简化了K8s环境中TiDB的运维管理,但仍需要掌握K8s和TiDB的核心原理。建议定期参加PingCAP官方培训,获取最新的技术知识和最佳实践。from tidb视频:www.itpux.com

本文由风哥教程整理发布,仅用于学习测试使用,转载注明出处:http://www.fgedu.net.cn/10327.html

联系我们

在线咨询:点击这里给我发消息

微信号:itpux-com

工作日:9:30-18:30,节假日休息