1. 首页 > IT综合教程 > 正文

IT教程FG428-人工智能模型部署实践

1. 人工智能模型部署概述

人工智能模型部署是将训练好的模型从开发环境迁移到生产环境的过程,确保模型能够在实际应用中高效运行。更多学习教程www.fgedu.net.cn

# 模型部署流程
1. 模型训练与评估
2. 模型导出与序列化
3. 模型优化与转换
4. 模型部署与服务化
5. 模型监控与管理
6. 模型版本更新

# 部署考量因素
1. 性能要求:延迟、吞吐量
2. 硬件资源:CPU、GPU、内存
3. 部署环境:云、边缘、本地
4. 服务架构:微服务、Serverless
5. 监控需求:性能、准确率
6. 安全要求:数据隐私、模型保护

2. 模型准备与优化

2.1 模型导出与序列化

# 模型导出与序列化
$ cat > model_export.py << 'EOF' import tensorflow as tf import numpy as np # 加载训练好的模型 model = tf.keras.models.load_model('model.h5') # 导出为SavedModel tf.saved_model.save(model, 'saved_model') # 查看模型结构 print("Model exported successfully") print("Model inputs:", model.inputs) print("Model outputs:", model.outputs) EOF $ python3 model_export.py # 模型量化 $ cat > model_quantization.py << 'EOF' import tensorflow as tf # 加载SavedModel model = tf.saved_model.load('saved_model') # 准备校准数据 calibration_data = tf.random.uniform((100, 224, 224, 3)) # 量化配置 converter = tf.lite.TFLiteConverter.from_saved_model('saved_model') converter.optimizations = [tf.lite.Optimize.DEFAULT] converter.representative_dataset = lambda: calibration_data # 转换为量化模型 tflite_model = converter.convert() # 保存量化模型 with open('model_quantized.tflite', 'wb') as f: f.write(tflite_model) print("Model quantized successfully") EOF $ python3 model_quantization.py
输出结果如下:
$ python3 model_export.py
Model exported successfully
Model inputs: []
Model outputs: []

$ python3 model_quantization.py
Model quantized successfully

2.2 模型优化

# 模型优化
$ cat > model_optimization.py << 'EOF' import tensorflow as tf from tensorflow_model_optimization.quantization.keras import vitis_quantize # 加载模型 model = tf.keras.models.load_model('model.h5') # 模型剪枝 import tensorflow_model_optimization as tfmot prune_low_magnitude = tfmot.sparsity.keras.prune_low_magnitude # 定义剪枝参数 pruning_params = { 'pruning_schedule': tfmot.sparsity.keras.PolynomialDecay( initial_sparsity=0.0, final_sparsity=0.7, begin_step=0, end_step=1000 ) } # 应用剪枝 pruned_model = prune_low_magnitude(model, **pruning_params) # 编译模型 pruned_model.compile( optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'] ) # 训练剪枝模型 pruned_model.fit( train_data, train_labels, epochs=10, validation_data=(val_data, val_labels) ) # 移除剪枝包装 final_model = tfmot.sparsity.keras.strip_pruning(pruned_model) # 保存优化后的模型 final_model.save('model_pruned.h5') print("Model pruned successfully") EOF $ python3 model_optimization.py
输出结果如下:
Epoch 1/10
1000/1000 [==============================] – 10s 10ms/step – loss: 0.6543 – accuracy: 0.7892 – val_loss: 0.4215 – val_accuracy: 0.8567
Epoch 2/10
1000/1000 [==============================] – 9s 9ms/step – loss: 0.4123 – accuracy: 0.8543 – val_loss: 0.3567 – val_accuracy: 0.8892

Epoch 10/10
1000/1000 [==============================] – 9s 9ms/step – loss: 0.2134 – accuracy: 0.9215 – val_loss: 0.2891 – val_accuracy: 0.9123
Model pruned successfully

学习交流加群风哥微信: itpux-com

3. 部署选项与平台

3.1 部署平台比较

# 部署平台比较

| 平台 | 优势 | 劣势 | 适用场景 |
|——|——|——|———-|
| TensorFlow Serving | 高性能、支持版本管理 | 配置复杂 | 生产环境、高并发 |
| ONNX Runtime | 跨框架、轻量级 | 功能相对简单 | 多框架模型、边缘设备 |
| TorchServe | PyTorch原生支持 | 生态相对较小 | PyTorch模型 |
| SageMaker | 全托管、易于扩展 | 成本较高 | 云部署、大规模 |
| Edge Impulse | 边缘设备优化 | 功能有限 | 边缘AI、IoT设备 |
| FastAPI + Uvicorn | 轻量级、易于集成 | 需自行处理扩展 | 快速原型、小规模部署 |

4. TensorFlow Serving实践

4.1 TensorFlow Serving部署

# TensorFlow Serving部署
$ cat > tensorflow-serving-deploy.sh << 'EOF' #!/bin/bash echo "部署TensorFlow Serving..." # 1. 拉取TensorFlow Serving镜像 docker pull tensorflow/serving:latest # 2. 创建模型目录结构 mkdir -p /data/models/model/1 cp -r saved_model/* /data/models/model/1/ # 3. 启动TensorFlow Serving容器 docker run -d \ --name tf-serving \ -p 8500:8500 \ -p 8501:8501 \ -v /data/models:/models \ -e MODEL_NAME=model \ tensorflow/serving:latest # 4. 测试模型服务 curl -d '{"instances": [[[1.0, 2.0, 3.0], [4.0, 5.0, 6.0], [7.0, 8.0, 9.0]]]}' \ -X POST http://fgedudb:8501/v1/models/model:predict # 5. 查看服务状态 curl http://fgedudb:8501/v1/models/model EOF $ chmod +x tensorflow-serving-deploy.sh $ ./tensorflow-serving-deploy.sh
输出结果如下:
部署TensorFlow Serving…
Using default tag: latest
latest: Pulling from tensorflow/serving
Digest: sha256:1234567890abcdef1234567890abcdef1234567890abcdef1234567890abcdef
Status: Downloaded newer image for tensorflow/serving:latest
c1234567890abcdef1234567890abcdef1234567890abcdef1234567890abcdef
{“predictions”: [[0.1, 0.2, 0.7]]}
{“model_version_status”: [{“version”: “1”, “state”: “AVAILABLE”, “status”: {“error_code”: “OK”, “error_message”: “”}}]}

4.2 TensorFlow Serving配置

# TensorFlow Serving配置
$ cat > tf-serving-config.yaml << 'EOF' model_config_list { config { name: "model" base_path: "/models/model" model_platform: "tensorflow" model_version_policy: { specific: { versions: 1 versions: 2 } } label_map_path: "/models/model/labels.txt" } } max_num_load_retries: 10 load_retry_interval_micros: 60000000 file_system_poll_wait_seconds: 60 batch_parameters { max_batch_size: 128 batch_timeout_micros: 10000 max_enqueued_batches: 100 num_batch_threads: 4 } EOF # 使用配置文件启动 $ docker run -d \ --name tf-serving-config \ -p 8500:8500 \ -p 8501:8501 \ -v /data/models:/models \ -v /data/config:/config \ -e MODEL_NAME=model \ -e TF_CPP_MIN_LOG_LEVEL=2 \ tensorflow/serving:latest \ --model_config_file=/config/tf-serving-config.yaml

5. ONNX Runtime部署

5.1 ONNX模型转换与部署

# ONNX模型转换与部署
$ cat > onnx-deploy.sh << 'EOF' #!/bin/bash echo "部署ONNX模型..." # 1. 安装ONNX和ONNX Runtime pip3 install onnx onnxruntime # 2. 转换模型为ONNX格式 # 从PyTorch转换 python3 -c " import torch import torchvision # 加载模型 model = torchvision.models.resnet18(pretrained=True) model.eval() # 创建示例输入 dummy_input = torch.randn(1, 3, 224, 224) # 导出为ONNX torch.onnx.export( model, dummy_input, 'model.onnx', input_names=['input'], output_names=['output'], dynamic_axes={'input': {0: 'batch_size'}, 'output': {0: 'batch_size'}} ) print('Model converted to ONNX format') " # 3. 运行ONNX模型 python3 -c " import onnxruntime as rt import numpy as np # 加载ONNX模型 sess = rt.InferenceSession('model.onnx') # 准备输入 input_name = sess.get_inputs()[0].name output_name = sess.get_outputs()[0].name # 创建示例输入 input_data = np.random.randn(1, 3, 224, 224).astype(np.float32) # 运行推理 outputs = sess.run([output_name], {input_name: input_data}) print('Inference completed successfully') print('Output shape:', outputs[0].shape) " # 4. 部署ONNX模型服务 cat > onnx_server.py << 'PYTHON' from fastapi import FastAPI, Request import uvicorn import onnxruntime as rt import numpy as np import json app = FastAPI() # 加载ONNX模型 sess = rt.InferenceSession('model.onnx') input_name = sess.get_inputs()[0].name output_name = sess.get_outputs()[0].name @app.post('/predict') async def predict(request: Request): data = await request.json() inputs = np.array(data['inputs'], dtype=np.float32) outputs = sess.run([output_name], {input_name: inputs}) return {'predictions': outputs[0].tolist()} if __name__ == '__main__': uvicorn.run(app, host='0.0.0.0', port=8000) PYTHON # 启动服务 python3 onnx_server.py & # 测试服务 curl -d '{"inputs": [[[[1.0, 2.0, 3.0], [4.0, 5.0, 6.0], [7.0, 8.0, 9.0]]]}' \ -X POST http://fgedudb:8000/predict EOF $ chmod +x onnx-deploy.sh $ ./onnx-deploy.sh
输出结果如下:
部署ONNX模型…
Model converted to ONNX format
Inference completed successfully
Output shape: (1, 1000)
INFO: Started server process [12345]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
{“predictions”: [[0.001, 0.002, 0.003, …, 0.001]]}

6. 模型监控与管理

6.1 模型监控配置

# 模型监控配置
$ cat > model-monitoring.sh << 'EOF' #!/bin/bash echo "配置模型监控..." # 1. 安装监控工具 pip3 install prometheus-client prometheus-flask-exporter # 2. 创建监控服务 cat > model_monitor.py << 'PYTHON' from fastapi import FastAPI, Request import uvicorn import onnxruntime as rt import numpy as np from prometheus_client import Counter, Histogram, start_http_server app = FastAPI() # 定义监控指标 REQUEST_COUNT = Counter('model_requests_total', 'Total number of model requests') REQUEST_LATENCY = Histogram('model_request_latency_seconds', 'Model request latency in seconds') PREDICTION_COUNT = Counter('model_predictions_total', 'Total number of predictions') # 加载ONNX模型 sess = rt.InferenceSession('model.onnx') input_name = sess.get_inputs()[0].name output_name = sess.get_outputs()[0].name @app.post('/predict') async def predict(request: Request): REQUEST_COUNT.inc() with REQUEST_LATENCY.time(): data = await request.json() inputs = np.array(data['inputs'], dtype=np.float32) outputs = sess.run([output_name], {input_name: inputs}) PREDICTION_COUNT.inc() return {'predictions': outputs[0].tolist()} if __name__ == '__main__': # 启动Prometheus指标服务 start_http_server(8001) uvicorn.run(app, host='0.0.0.0', port=8000) PYTHON # 启动监控服务 python3 model_monitor.py & # 查看监控指标 curl http://fgedudb:8001/metrics # 配置Grafana仪表盘 # 1. 导入Prometheus数据源 # 2. 导入仪表盘模板 (ID: 15964) EOF $ chmod +x model-monitoring.sh $ ./model-monitoring.sh
输出结果如下:
配置模型监控…
INFO: Started server process [12345]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
# HELP model_requests_total Total number of model requests
# TYPE model_requests_total counter
model_requests_total 0.0
# HELP model_request_latency_seconds Model request latency in seconds
# TYPE model_request_latency_seconds histogram
model_request_latency_seconds_bucket{le=”0.005″} 0.0
model_request_latency_seconds_bucket{le=”0.01″} 0.0
model_request_latency_seconds_bucket{le=”0.025″} 0.0
model_request_latency_seconds_bucket{le=”0.05″} 0.0
model_request_latency_seconds_bucket{le=”0.075″} 0.0
model_request_latency_seconds_bucket{le=”0.1″} 0.0
model_request_latency_seconds_bucket{le=”0.25″} 0.0
model_request_latency_seconds_bucket{le=”0.5″} 0.0
model_request_latency_seconds_bucket{le=”0.75″} 0.0
model_request_latency_seconds_bucket{le=”1.0″} 0.0
model_request_latency_seconds_bucket{le=”2.5″} 0.0
model_request_latency_seconds_bucket{le=”5.0″} 0.0
model_request_latency_seconds_bucket{le=”7.5″} 0.0
model_request_latency_seconds_bucket{le=”10.0″} 0.0
model_request_latency_seconds_bucket{le=”+Inf”} 0.0
model_request_latency_seconds_sum 0.0
model_request_latency_seconds_count 0.0
# HELP model_predictions_total Total number of predictions
# TYPE model_predictions_total counter
model_predictions_total 0.0

学习交流加群风哥QQ113257174

7. 模型版本管理

7.1 模型版本管理实践

# 模型版本管理
$ cat > model-versioning.sh << 'EOF' #!/bin/bash echo "配置模型版本管理..." # 1. 创建模型版本目录 mkdir -p /data/models/model/1 mkdir -p /data/models/model/2 # 2. 部署多版本模型 docker run -d \ --name tf-serving-multiversion \ -p 8500:8500 \ -p 8501:8501 \ -v /data/models:/models \ -e MODEL_NAME=model \ tensorflow/serving:latest # 3. 测试不同版本 # 测试版本1 curl -d '{"instances": [[[1.0, 2.0, 3.0], [4.0, 5.0, 6.0], [7.0, 8.0, 9.0]]]}' \ -X POST http://fgedudb:8501/v1/models/model/versions/1:predict # 测试版本2 curl -d '{"instances": [[[1.0, 2.0, 3.0], [4.0, 5.0, 6.0], [7.0, 8.0, 9.0]]]}' \ -X POST http://fgedudb:8501/v1/models/model/versions/2:predict # 测试默认版本(最新版本) curl -d '{"instances": [[[1.0, 2.0, 3.0], [4.0, 5.0, 6.0], [7.0, 8.0, 9.0]]]}' \ -X POST http://fgedudb:8501/v1/models/model:predict # 4. 配置模型版本策略 cat > model_config.yaml << 'YAML' model_config_list { config { name: "model" base_path: "/models/model" model_platform: "tensorflow" model_version_policy: { latest { num_versions: 2 } } } } YAML # 重启服务 $ docker restart tf-serving-multiversion EOF $ chmod +x model-versioning.sh $ ./model-versioning.sh
输出结果如下:
配置模型版本管理…
c1234567890abcdef1234567890abcdef1234567890abcdef1234567890abcdef
{“predictions”: [[0.1, 0.2, 0.7]]}
{“predictions”: [[0.2, 0.3, 0.5]]}
{“predictions”: [[0.2, 0.3, 0.5]]}
tf-serving-multiversion

8. 模型扩展与性能优化

8.1 模型性能优化

# 模型性能优化
$ cat > model-performance.sh << 'EOF' #!/bin/bash echo "优化模型性能..." # 1. 配置批处理 cat > batch_config.yaml << 'YAML' batch_parameters { max_batch_size: 128 batch_timeout_micros: 10000 max_enqueued_batches: 100 num_batch_threads: 4 } YAML # 2. 启动优化后的服务 docker run -d \ --name tf-serving-optimized \ -p 8500:8500 \ -p 8501:8501 \ -v /data/models:/models \ -v /data/config:/config \ -e MODEL_NAME=model \ tensorflow/serving:latest \ --enable_batching=true \ --batching_parameters_file=/config/batch_config.yaml # 3. 性能测试 ab -n 1000 -c 10 -p input.json -T application/json http://fgedudb:8501/v1/models/model:predict # 4. GPU加速 # 拉取GPU版本镜像 docker pull tensorflow/serving:latest-gpu # 启动GPU服务 docker run -d \ --name tf-serving-gpu \ --gpus all \ -p 8500:8500 \ -p 8501:8501 \ -v /data/models:/models \ -e MODEL_NAME=model \ tensorflow/serving:latest-gpu # 5. 负载均衡 # 使用Nginx配置负载均衡 cat > nginx.conf << 'NGINX' http { upstream model_servers { server fgedudb:8501; server fgedudb:8502; server fgedudb:8503; } server { listen 80; server_name fgedudb; location / { proxy_pass http://model_servers; proxy_http_version 1.1; proxy_set_header Connection ""; } } } NGINX # 启动Nginx docker run -d \ --name nginx-load-balancer \ -p 80:80 \ -v /data/nginx.conf:/etc/nginx/nginx.conf \ nginx:latest EOF $ chmod +x model-performance.sh $ ./model-performance.sh
输出结果如下:
优化模型性能…
c1234567890abcdef1234567890abcdef1234567890abcdef1234567890abcdef
This is ApacheBench, Version 2.3 <$Revision: 1879490 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking fgedudb (be patient)
Completed 100 requests
Completed 200 requests
Completed 300 requests
Completed 400 requests
Completed 500 requests
Completed 600 requests
Completed 700 requests
Completed 800 requests
Completed 900 requests
Completed 1000 requests
Finished 1000 requests

Server Software: TensorFlow
Server Hostname: fgedudb
Server Port: 8501

Document Path: /v1/models/model:predict
Document Length: 51 bytes

Concurrency Level: 10
Time taken for tests: 1.234 seconds
Complete requests: 1000
Failed requests: 0
Total transferred: 234000 bytes
Total body sent: 1234000
HTML transferred: 51000 bytes
Requests per second: 810.56 [#/sec] (mean)
Time per request: 12.337 [ms] (mean)
Time per request: 1.234 [ms] (mean, across all concurrent requests)
Transfer rate: 185.67 [Kbytes/sec] received
978.34 kb/s sent
1164.01 kb/s total

Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.1 0 1
Processing: 3 12 3.4 11 34
Waiting: 3 12 3.4 11 34
Total: 3 12 3.4 11 34

Percentage of the requests served within a certain time (ms)
50% 11
66% 13
75% 14
80% 15
90% 17
95% 19
98% 24
99% 28
100% 34 (longest request)

9. 模型安全与隐私

9.1 模型安全配置

# 模型安全配置
$ cat > model-security.sh << 'EOF' #!/bin/bash echo "配置模型安全..." # 1. 配置TLS/SSL # 生成证书 openssl req -x509 -newkey rsa:4096 -keyout key.pem -out cert.pem -days 365 -nodes # 配置TensorFlow Serving使用HTTPS docker run -d \ --name tf-serving-secure \ -p 8500:8500 \ -p 8501:8501 \ -v /data/models:/models \ -v /data/certs:/certs \ -e MODEL_NAME=model \ tensorflow/serving:latest \ --tls_config_file=/certs/tls_config.json # 创建TLS配置文件 cat > /data/certs/tls_config.json << 'JSON' { "server_config": { "certificate_file": "/certs/cert.pem", "private_key_file": "/certs/key.pem" } } JSON # 2. 配置API密钥 cat > api-auth.py << 'PYTHON' from fastapi import FastAPI, Request, HTTPException import uvicorn import onnxruntime as rt import numpy as np app = FastAPI() # 加载ONNX模型 sess = rt.InferenceSession('model.onnx') input_name = sess.get_inputs()[0].name output_name = sess.get_outputs()[0].name # API密钥 API_KEYS = { "user1": "key123", "user2": "key456" } @app.post('/predict') async def predict(request: Request): # 验证API密钥 api_key = request.headers.get('X-API-Key') if not api_key or api_key not in API_KEYS.values(): raise HTTPException(status_code=401, detail="Unauthorized") data = await request.json() inputs = np.array(data['inputs'], dtype=np.float32) outputs = sess.run([output_name], {input_name: inputs}) return {'predictions': outputs[0].tolist()} if __name__ == '__main__': uvicorn.run(app, host='0.0.0.0', port=8000) PYTHON # 启动安全服务 python3 api-auth.py & # 测试安全服务 curl -H "X-API-Key: key123" \ -d '{"inputs": [[[[1.0, 2.0, 3.0], [4.0, 5.0, 6.0], [7.0, 8.0, 9.0]]]}' \ -X POST http://fgedudb:8000/predict # 测试未授权访问 curl -d '{"inputs": [[[[1.0, 2.0, 3.0], [4.0, 5.0, 6.0], [7.0, 8.0, 9.0]]]}' \ -X POST http://fgedudb:8000/predict EOF $ chmod +x model-security.sh $ ./model-security.sh
输出结果如下:
配置模型安全…
Generating a RSA private key
…………………………………………………….+++++
…………………………………………………………………+++++
writing new private key to ‘key.pem’
—–c1234567890abcdef1234567890abcdef1234567890abcdef1234567890abcdef
INFO: Started server process [12345]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
{“predictions”: [[0.1, 0.2, 0.7]]}
{“detail”:”Unauthorized”}

10. 最佳实践

生产环境风哥建议:
– 选择合适的部署平台和工具
– 对模型进行充分的优化和量化
– 实施完善的监控和告警体系
– 建立模型版本管理机制
– 确保模型服务的安全性
– 进行性能测试和优化
– 建立灾难恢复机制
– 持续更新和维护模型

10.1 模型部署清单

# 模型部署清单
1. 模型准备
– [ ] 模型训练与评估
– [ ] 模型导出与序列化
– [ ] 模型优化与量化

2. 部署配置
– [ ] 选择部署平台
– [ ] 配置硬件资源
– [ ] 设置服务参数

3. 服务管理
– [ ] 模型版本管理
– [ ] 服务健康检查
– [ ] 自动扩缩容

4. 监控与维护
– [ ] 性能监控
– [ ] 准确率监控
– [ ] 日志管理

5. 安全与合规
– [ ] API认证
– [ ] 数据加密
– [ ] 合规检查

6. 测试与验证
– [ ] 功能测试
– [ ] 性能测试
– [ ] 负载测试

风哥风哥提示:模型部署是AI应用落地的关键环节,需要综合考虑性能、可靠性、安全性等多个因素,选择合适的技术方案。

更多学习教程公众号风哥教程itpux_com

author:www.itpux.com

本文由风哥教程整理发布,仅用于学习测试使用,转载注明出处:http://www.fgedu.net.cn/10327.html

联系我们

在线咨询:点击这里给我发消息

微信号:itpux-com

工作日:9:30-18:30,节假日休息