1. 首页 > MongoDB教程 > 正文

MongoDB教程FG050-MongoDB生产故障典型案例实战

内容简介

本文详细介绍MongoDB生产环境中常见的故障类型、处理流程和实战案例,包括连接故障、复制故障、分片故障、性能故障等。风哥教程参考MongoDB官方文档和运维最佳实践,提供完整的故障处理方案。

通过本文学习,您将掌握MongoDB生产故障的识别、分析和处理方法,能够快速响应和解决各种故障情况,确保数据库的稳定运行。

本文适合MongoDB管理员、数据库运维人员和开发人员阅读,帮助大家建立完善的故障处理体系。

目录大纲

Part01-基础概念与理论知识

1.1 故障类型分类

MongoDB生产环境中常见的故障类型:

  • 连接故障:网络问题、认证失败、连接数过多等
  • 复制故障:复制延迟、选举失败、 oplog 问题等
  • 分片故障:分片平衡失败、数据分布不均、mongos 故障等
  • 性能故障:慢查询、资源耗尽、索引问题等
  • 存储故障:磁盘空间不足、文件系统错误、硬件故障等
  • 配置故障:参数配置错误、权限问题、版本兼容性问题等

更多视频教程www.fgedu.net.cn

1.2 故障处理流程

故障处理流程

  1. 故障发现:通过监控系统、用户反馈或日志分析发现故障
  2. 故障定位:分析故障现象,确定故障类型和原因
  3. 故障隔离:采取措施隔离故障,防止故障扩大
  4. 故障处理:根据故障类型执行相应的处理方案
  5. 故障验证:验证故障是否已解决
  6. 故障分析:分析故障原因,制定预防措施

1.3 故障预防机制

MongoDB故障预防机制:

  • 高可用架构:部署副本集或分片集群
  • 监控系统:建立完善的监控体系
  • 备份策略:定期备份数据
  • 容量规划:合理规划存储和资源
  • 参数优化:根据实际情况优化配置参数
  • 定期维护:定期进行系统维护和检查

学习交流加群风哥微信: itpux-com

Part02-生产环境规划与建议

2.1 高可用性规划

高可用性规划建议:

  • 副本集部署:至少3个节点,确保数据冗余
  • 网络规划:使用可靠的网络设备,配置网络冗余
  • 硬件规划:使用高质量硬件,配置RAID
  • 电源规划:配置UPS和发电机,确保电力供应
  • 地理分布:跨数据中心部署,提高容灾能力

2.2 监控与告警规划

风哥提示:

监控与告警是故障预防的关键,应覆盖MongoDB的各个组件和指标,确保及时发现问题。

监控与告警规划建议:

  • 监控指标:包括系统状态、性能指标、复制状态、分片状态等
  • 监控工具:使用Prometheus + Grafana、MongoDB Compass等
  • 告警阈值:根据实际情况设置合理的告警阈值
  • 告警方式:配置邮件、短信、微信等多种告警方式
  • 告警级别:设置不同级别的告警,区分处理优先级

学习交流加群风哥QQ113257174

2.3 灾备方案规划

灾备方案规划建议:

  • 备份策略:制定合理的备份策略,包括全量备份和增量备份
  • 备份存储:将备份存储在不同的地理位置
  • 恢复测试:定期进行恢复测试,确保备份可用
  • 灾备演练:定期进行灾备演练,提高应急响应能力
  • 故障转移:配置自动故障转移机制

Part03-生产环境项目实施方案

3.1 故障响应机制

故障响应机制:

  1. 响应团队:组建专业的故障响应团队
  2. 响应流程:制定详细的故障响应流程
  3. 职责分工:明确团队成员的职责分工
  4. 沟通机制:建立有效的沟通机制
  5. 文档记录:详细记录故障处理过程

更多学习教程公众号风哥教程itpux_com

3.2 故障处理工具

故障处理工具:

#!/bin/bash
# mongo_fault_detection.sh
# from:www.itpux.com.qq113257174.wx:itpux-com
# web: http://www.fgedu.net.cn
# 检查MongoDB服务状态
echo "=== MongoDB服务状态检查 ==="
systemctl status mongod
# 检查MongoDB连接状态
echo "\n=== MongoDB连接状态检查 ==="
mongo -u fgedu -p password123 --eval "db.runCommand({ ping: 1 })"
# 检查MongoDB日志
echo "\n=== MongoDB日志检查 ==="
tail -n 50 /mongodb/fgdata/log/mongod.log
# 检查系统资源使用情况
echo "\n=== 系统资源使用情况检查 ==="
top -b -n 1 | head -n 20
# 检查磁盘空间
echo "\n=== 磁盘空间检查 ==="
df -h
# 检查网络连接
echo "\n=== 网络连接检查 ==="
netstat -tuln | grep 27017

3.3 故障演练计划

故障演练计划:

  • 演练频率:每季度至少进行一次故障演练
  • 演练内容:包括网络故障、硬件故障、软件故障等
  • 演练流程:按照实际故障处理流程进行演练
  • 演练评估:评估演练效果,改进故障处理流程
  • 演练文档:详细记录演练过程和结果

Part04-生产案例与实战讲解

4.1 连接故障处理实战

案例:MongoDB连接数过多导致服务不可用

# 检查MongoDB连接数

mongo -u fgedu -p password123 –eval “db.serverStatus().connections”

MongoDB shell version v6.0.0
connecting to: mongodb://127.0.0.1:27017/?compressors=disabled&gssapiServiceName=mongodb
Implicit session: session { “id” : UUID(“12345678-1234-1234-1234-123456789012”) }
MongoDB server version: 6.0.0
{
“current” : 65535,
“available” : 1,
“totalCreated” : NumberLong(100000)
}

# 查看当前连接详情

mongo -u fgedu -p password123 –eval “db.currentOp()”

MongoDB shell version v6.0.0
connecting to: mongodb://127.0.0.1:27017/?compressors=disabled&gssapiServiceName=mongodb
Implicit session: session { “id” : UUID(“12345678-1234-1234-1234-123456789012”) }
MongoDB server version: 6.0.0
{
“inprog” : [

{
“desc” : “conn12345”,
“threadId” : “140123456789012”,
“connectionId” : 12345,
“client” : “192.168.1.100:54321”,
“appName” : “MongoDB Shell”,
“active” : true,
“opid” : 123456,
“secs_running” : 3600,
“microsecs_running” : NumberLong(3600000000),
“op” : “query”,
“ns” : “fgedudb.fgedu_users”,
“query” : { “age” : { “$gt” : 30 } },
“planSummary” : “COLLSCAN”,
“numYields” : 1000,
“locks” : {
“Global” : “r”,
“Database” : “r”,
“Collection” : “r”
},
“waitingForLock” : false,
“lockStats” : {
“Global” : { “acquireCount” : { “r” : NumberLong(2) } },
“Database” : { “acquireCount” : { “r” : NumberLong(1) } },
“Collection” : { “acquireCount” : { “r” : NumberLong(1) } }
}
},

],
“ok” : 1
}

# 分析慢查询日志

mongo -u fgedu -p password123 –eval “db.system.profile.find({ millis: { $gt: 100 } }).sort({ ts: -1 }).limit(10)”

MongoDB shell version v6.0.0
connecting to: mongodb://127.0.0.1:27017/?compressors=disabled&gssapiServiceName=mongodb
Implicit session: session { “id” : UUID(“12345678-1234-1234-1234-123456789012”) }
MongoDB server version: 6.0.0
{ “ts” : ISODate(“2024-01-01T00:00:00Z”), “op” : “query”, “ns” : “fgedudb.fgedu_users”, “command” : { “find” : “fgedu_users”, “filter” : { “age” : { “$gt” : 30 } }, “lsid” : { “id” : UUID(“12345678-1234-1234-1234-123456789012”) } }, “keysExamined” : 0, “docsExamined” : 1000000, “cursorExhausted” : true, “numYields” : 1000, “nreturned” : 500000, “locks” : { “Global” : { “acquireCount” : { “r” : NumberLong(2) } }, “Database” : { “acquireCount” : { “r” : NumberLong(1) } }, “Collection” : { “acquireCount” : { “r” : NumberLong(1) } } }, “flowControl” : { “acquireCount” : NumberLong(0) }, “responseLength” : 10000000, “protocol” : “op_msg”, “millis” : 5000, “planSummary” : “COLLSCAN”, “execStats” : { “stage” : “COLLSCAN”, “filter” : { “age” : { “$gt” : 30 } }, “nReturned” : 500000, “executionTimeMillisEstimate” : 4500, “works” : 1000002, “advanced” : 500000, “needTime” : 500001, “needYield” : 1000, “saveState” : 1000, “restoreState” : 1000, “isEOF” : 1, “invalidates” : 0, “direction” : “forward”, “docsExamined” : 1000000 }, “tsms” : NumberLong(1234567890123), “client” : “192.168.1.100”, “appName” : “MongoDB Shell”, “allUsers” : [ { “user” : “fgedu”, “db” : “admin” } ], “user” : “fgedu@admin” }

# 为age字段创建索引

mongo -u fgedu -p password123 –eval “db.fgedu_users.createIndex({ age: 1 })”

MongoDB shell version v6.0.0
connecting to: mongodb://127.0.0.1:27017/?compressors=disabled&gssapiServiceName=mongodb
Implicit session: session { “id” : UUID(“12345678-1234-1234-1234-123456789012”) }
MongoDB server version: 6.0.0
{ “createdCollectionAutomatically” : false, “numIndexesBefore” : 1, “numIndexesAfter” : 2, “ok” : 1 }

# 调整MongoDB最大连接数

sed -i ‘s/maxIncomingConnections.*/maxIncomingConnections: 100000/’ /etc/mongod.conf

# 重启MongoDB服务

systemctl restart mongod

# 再次检查连接数

mongo -u fgedu -p password123 –eval “db.serverStatus().connections”

MongoDB shell version v6.0.0
connecting to: mongodb://127.0.0.1:27017/?compressors=disabled&gssapiServiceName=mongodb
Implicit session: session { “id” : UUID(“12345678-1234-1234-1234-123456789012”) }
MongoDB server version: 6.0.0
{
“current” : 100,
“available” : 99900,
“totalCreated” : NumberLong(100050)
}

从MongoDB视频:www.itpux.com

4.2 复制故障处理实战

案例:副本集复制延迟过大

# 检查副本集状态

mongo -u fgedu -p password123 –eval “rs.status()”

MongoDB shell version v6.0.0
connecting to: mongodb://127.0.0.1:27017/?compressors=disabled&gssapiServiceName=mongodb
Implicit session: session { “id” : UUID(“12345678-1234-1234-1234-123456789012”) }
MongoDB server version: 6.0.0
{
“set” : “rs0”,
“date” : ISODate(“2024-01-01T00:00:00Z”),
“myState” : 1,
“term” : NumberLong(1),
“syncSourceHost” : “”,
“syncSourceId” : -1,
“heartbeatIntervalMillis” : NumberLong(2000),
“majorityVoteCount” : 2,
“writeMajorityCount” : 2,
“votingMembersCount” : 3,
“writableVotingMembersCount” : 3,
“optimes” : {
“lastCommittedOpTime” : {
“ts” : Timestamp(1234567890, 1),
“t” : NumberLong(1)
},
“lastCommittedWallTime” : ISODate(“2024-01-01T00:00:00Z”),
“readConcernMajorityOpTime” : {
“ts” : Timestamp(1234567890, 1),
“t” : NumberLong(1)
},
“appliedOpTime” : {
“ts” : Timestamp(1234567890, 1),
“t” : NumberLong(1)
},
“durableOpTime” : {
“ts” : Timestamp(1234567890, 1),
“t” : NumberLong(1)
},
“lastAppliedWallTime” : ISODate(“2024-01-01T00:00:00Z”),
“lastDurableWallTime” : ISODate(“2024-01-01T00:00:00Z”)
},
“lastStableRecoveryTimestamp” : Timestamp(1234567880, 1),
“electionCandidateMetrics” : {
“lastElectionReason” : “electionTimeout”,
“lastElectionDate” : ISODate(“2024-01-01T00:00:00Z”),
“electionTerm” : NumberLong(1),
“lastCommittedOpTimeAtElection” : {
“ts” : Timestamp(0, 0),
“t” : NumberLong(-1)
},
“lastSeenOpTimeAtElection” : {
“ts” : Timestamp(1234567880, 1),
“t” : NumberLong(-1)
},
“numVotesNeeded” : 2,
“priorityAtElection” : 1,
“electionTimeoutMillis” : NumberLong(10000),
“numCatchUpOps” : NumberLong(0),
“newTermStartDate” : ISODate(“2024-01-01T00:00:00Z”),
“wMajorityWriteAvailabilityDate” : ISODate(“2024-01-01T00:00:00Z”)
},
“members” : [
{
“_id” : 0,
“name” : “192.168.1.10:27017”,
“health” : 1,
“state” : 1,
“stateStr” : “PRIMARY”,
“uptime” : 3600,
“optime” : {
“ts” : Timestamp(1234567890, 1),
“t” : NumberLong(1)
},
“optimeDate” : ISODate(“2024-01-01T00:00:00Z”),
“lastHeartbeat” : ISODate(“2024-01-01T00:00:00Z”),
“lastHeartbeatRecv” : ISODate(“2024-01-01T00:00:00Z”),
“pingMs” : NumberLong(1),
“lastHeartbeatMessage” : “”,
“syncSourceHost” : “”,
“syncSourceId” : -1,
“infoMessage” : “”,
“electionTime” : Timestamp(1234567880, 1),
“electionDate” : ISODate(“2024-01-01T00:00:00Z”),
“configVersion” : 1,
“configTerm” : 1,
“self” : true,
“lastCommittedOpTime” : {
“ts” : Timestamp(1234567890, 1),
“t” : NumberLong(1)
},
“lastCommittedWallTime” : ISODate(“2024-01-01T00:00:00Z”)
},
{
“_id” : 1,
“name” : “192.168.1.11:27017”,
“health” : 1,
“state” : 2,
“stateStr” : “SECONDARY”,
“uptime” : 3600,
“optime” : {
“ts” : Timestamp(1234567800, 1),
“t” : NumberLong(1)
},
“optimeDate” : ISODate(“2024-01-01T00:00:00Z”),
“lastHeartbeat” : ISODate(“2024-01-01T00:00:00Z”),
“lastHeartbeatRecv” : ISODate(“2024-01-01T00:00:00Z”),
“pingMs” : NumberLong(1),
“lastHeartbeatMessage” : “”,
“syncSourceHost” : “192.168.1.10:27017”,
“syncSourceId” : 0,
“infoMessage” : “”,
“configVersion” : 1,
“configTerm” : 1,
“lastCommittedOpTime” : {
“ts” : Timestamp(1234567800, 1),
“t” : NumberLong(1)
},
“lastCommittedWallTime” : ISODate(“2024-01-01T00:00:00Z”)
},
{
“_id” : 2,
“name” : “192.168.1.12:27017”,
“health” : 1,
“state” : 2,
“stateStr” : “SECONDARY”,
“uptime” : 3600,
“optime” : {
“ts” : Timestamp(1234567800, 1),
“t” : NumberLong(1)
},
“optimeDate” : ISODate(“2024-01-01T00:00:00Z”),
“lastHeartbeat” : ISODate(“2024-01-01T00:00:00Z”),
“lastHeartbeatRecv” : ISODate(“2024-01-01T00:00:00Z”),
“pingMs” : NumberLong(1),
“lastHeartbeatMessage” : “”,
“syncSourceHost” : “192.168.1.10:27017”,
“syncSourceId” : 0,
“infoMessage” : “”,
“configVersion” : 1,
“configTerm” : 1,
“lastCommittedOpTime” : {
“ts” : Timestamp(1234567800, 1),
“t” : NumberLong(1)
},
“lastCommittedWallTime” : ISODate(“2024-01-01T00:00:00Z”)
}
],
“ok” : 1
}

# 检查复制延迟

mongo -u fgedu -p password123 –eval “rs.printSlaveReplicationInfo()”

MongoDB shell version v6.0.0
connecting to: mongodb://127.0.0.1:27017/?compressors=disabled&gssapiServiceName=mongodb
Implicit session: session { “id” : UUID(“12345678-1234-1234-1234-123456789012”) }
MongoDB server version: 6.0.0
source: 192.168.1.11:27017
syncedTo: Thu Jan 01 2024 00:00:00 GMT+0000 (UTC)
0 secs (0 hrs) behind the primary
source: 192.168.1.12:27017
syncedTo: Thu Jan 01 2024 00:00:00 GMT+0000 (UTC)
0 secs (0 hrs) behind the primary

# 检查oplog大小

mongo -u fgedu -p password123 –eval “db.getReplicationInfo()”

MongoDB shell version v6.0.0
connecting to: mongodb://127.0.0.1:27017/?compressors=disabled&gssapiServiceName=mongodb
Implicit session: session { “id” : UUID(“12345678-1234-1234-1234-123456789012”) }
MongoDB server version: 6.0.0
{
“logSizeMB” : 1024,
“usedMB” : 512,
“timeDiff” : 3600,
“timeDiffHours” : 1,
“tFirst” : “Thu Jan 01 2024 00:00:00 GMT+0000 (UTC)”,
“tLast” : “Thu Jan 01 2024 01:00:00 GMT+0000 (UTC)”,
“now” : “Thu Jan 01 2024 01:00:00 GMT+0000 (UTC)”
}

# 增加oplog大小

mongo -u fgedu -p password123 –eval “db.adminCommand({ replSetResizeOplog: 1, size: 2048 })”

MongoDB shell version v6.0.0
connecting to: mongodb://127.0.0.1:27017/?compressors=disabled&gssapiServiceName=mongodb
Implicit session: session { “id” : UUID(“12345678-1234-1234-1234-123456789012”) }
MongoDB server version: 6.0.0
{ “ok” : 1 }

4.3 分片故障处理实战

案例:分片平衡失败

# 检查分片平衡状态

mongo -u fgedu -p password123 –eval “sh.getBalancerState()”

MongoDB shell version v6.0.0
connecting to: mongodb://127.0.0.1:27017/?compressors=disabled&gssapiServiceName=mongodb
Implicit session: session { “id” : UUID(“12345678-1234-1234-1234-123456789012”) }
MongoDB server version: 6.0.0
true

# 检查分片平衡日志

mongo -u fgedu -p password123 –eval “db.adminCommand({ logComponentVerbosity: { sharding: { balancer: 2 } } })”

MongoDB shell version v6.0.0
connecting to: mongodb://127.0.0.1:27017/?compressors=disabled&gssapiServiceName=mongodb
Implicit session: session { “id” : UUID(“12345678-1234-1234-1234-123456789012”) }
MongoDB server version: 6.0.0
{ “was” : { “sharding” : { “balancer” : 0 } }, “ok” : 1 }

# 查看分片平衡日志

tail -n 100 /mongodb/fgdata/log/mongod.log | grep “Balancer”

2024-01-01T00:00:00.000+0000 I SHARDING [Balancer] Balancer round completed in 0ms. 0 migrations attempted
2024-01-01T00:00:30.000+0000 I SHARDING [Balancer] Balancer round completed in 0ms. 0 migrations attempted
2024-01-01T00:01:00.000+0000 I SHARDING [Balancer] Balancer round completed in 0ms. 0 migrations attempted
2024-01-01T00:01:30.000+0000 I SHARDING [Balancer] Balancer round completed in 0ms. 0 migrations attempted
2024-01-01T00:02:00.000+0000 I SHARDING [Balancer] Balancer round completed in 0ms. 0 migrations attempted
2024-01-01T00:02:30.000+0000 I SHARDING [Balancer] Balancer round completed in 0ms. 0 migrations attempted
2024-01-01T00:03:00.000+0000 I SHARDING [Balancer] Balancer round completed in 0ms. 0 migrations attempted
2024-01-01T00:03:30.000+0000 I SHARDING [Balancer] Balancer round completed in 0ms. 0 migrations attempted
2024-01-01T00:04:00.000+0000 I SHARDING [Balancer] Balancer round completed in 0ms. 0 migrations attempted
2024-01-01T00:04:30.000+0000 I SHARDING [Balancer] Balancer round completed in 0ms. 0 migrations attempted

# 检查分片状态

mongo -u fgedu -p password123 –eval “sh.status()”

MongoDB shell version v6.0.0
connecting to: mongodb://127.0.0.1:27017/?compressors=disabled&gssapiServiceName=mongodb
Implicit session: session { “id” : UUID(“12345678-1234-1234-1234-123456789012”) }
MongoDB server version: 6.0.0
— Sharding Status —
sharding version: {
“_id” : 1,
“minCompatibleVersion” : 5,
“currentVersion” : 6,
“clusterId” : ObjectId(“1234567890abcdef12345678”)
}
shards:
{ “_id” : “rs0”, “host” : “rs0/192.168.1.10:27017,192.168.1.11:27017,192.168.1.12:27017” }
{ “_id” : “rs1”, “host” : “rs1/192.168.1.13:27017,192.168.1.14:27017,192.168.1.15:27017” }
{ “_id” : “rs2”, “host” : “rs2/192.168.1.16:27017,192.168.1.17:27017,192.168.1.18:27017” }
active mongoses:
“6.0.0” : 2
autosplit:
enabled: true
balancer:
Currently enabled: yes
Currently running: no
Failed balancer rounds in last 5 attempts: 0
Migration Results for the last 24 hours:
No recent migrations
databases:
{ “_id” : “admin”, “partitioned” : false, “primary” : “config” }
{ “_id” : “config”, “partitioned” : true, “primary” : “config” }
{ “_id” : “fgedudb”, “partitioned” : true, “primary” : “rs0” }
fgedudb.fgedu_users
documents per chunk: 100000
chunks:
on shard rs0: 10
on shard rs1: 2
on shard rs2: 2
shard key: { “_id” : “hashed” }
unique: false
balances: true

# 手动触发分片平衡

mongo -u fgedu -p password123 –eval “sh.startBalancer()”

{ “ok” : 1 }

# 检查分片平衡状态

mongo -u fgedu -p password123 –eval “sh.getBalancerState()”

true

# 等待一段时间后检查分片状态

sleep 60 && mongo -u fgedu -p password123 –eval “sh.status()”

MongoDB shell version v6.0.0
connecting to: mongodb://127.0.0.1:27017/?compressors=disabled&gssapiServiceName=mongodb
Implicit session: session { “id” : UUID(“12345678-1234-1234-1234-123456789012”) }
MongoDB server version: 6.0.0
— Sharding Status —
sharding version: {
“_id” : 1,
“minCompatibleVersion” : 5,
“currentVersion” : 6,
“clusterId” : ObjectId(“1234567890abcdef12345678”)
}
shards:
{ “_id” : “rs0”, “host” : “rs0/192.168.1.10:27017,192.168.1.11:27017,192.168.1.12:27017” }
{ “_id” : “rs1”, “host” : “rs1/192.168.1.13:27017,192.168.1.14:27017,192.168.1.15:27017” }
{ “_id” : “rs2”, “host” : “rs2/192.168.1.16:27017,192.168.1.17:27017,192.168.1.18:27017” }
active mongoses:
“6.0.0” : 2
autosplit:
enabled: true
balancer:
Currently enabled: yes
Currently running: no
Failed balancer rounds in last 5 attempts: 0
Migration Results for the last 24 hours:
10 migrations completed successfully
0 migrations failed
databases:
{ “_id” : “admin”, “partitioned” : false, “primary” : “config” }
{ “_id” : “config”, “partitioned” : true, “primary” : “config” }
{ “_id” : “fgedudb”, “partitioned” : true, “primary” : “rs0” }
fgedudb.fgedu_users
documents per chunk: 100000
chunks:
on shard rs0: 4
on shard rs1: 4
on shard rs2: 4
shard key: { “_id” : “hashed” }
unique: false
balances: true

4.4 性能故障处理实战

案例:MongoDB性能下降

# 检查MongoDB性能指标

mongo -u fgedu -p password123 –eval “db.serverStatus()”

MongoDB shell version v6.0.0
connecting to: mongodb://127.0.0.1:27017/?compressors=disabled&gssapiServiceName=mongodb
Implicit session: session { “id” : UUID(“12345678-1234-1234-1234-123456789012”) }
MongoDB server version: 6.0.0
{
“host” : “fgedu.net.cn”,
“version” : “6.0.0”,
“process” : “mongod”,
“pid” : NumberLong(1234),
“uptime” : 3600,
“uptimeMillis” : NumberLong(3600000),
“uptimeEstimate” : NumberLong(3600),
“localTime” : ISODate(“2024-01-01T00:00:00Z”),
“asserts” : {
“regular” : 0,
“warning” : 0,
“msg” : 0,
“user” : 0,
“rollovers” : 0
},
“connections” : {
“current” : 100,
“available” : 65436,
“totalCreated” : NumberLong(1000)
},
“extra_info” : {
“note” : “fields vary by platform”,
“versionString” : “#1 SMP Thu Oct 27 02:04:03 UTC 2022”
},
“network” : {
“bytesIn” : NumberLong(1000000),
“bytesOut” : NumberLong(2000000),
“numRequests” : NumberLong(10000)
},
“opcounters” : {
“insert” : NumberLong(1000),
“query” : NumberLong(5000),
“update” : NumberLong(500),
“delete” : NumberLong(100),
“getmore” : NumberLong(500),
“command” : NumberLong(3000)
},
“opcountersRepl” : {
“insert” : NumberLong(1000),
“query” : NumberLong(0),
“update” : NumberLong(500),
“delete” : NumberLong(100),
“getmore” : NumberLong(0),
“command” : NumberLong(0)
},
“mem” : {
“bits” : 64,
“resident” : 16384,
“virtual” : 32768,
“supported” : true,
“mapped” : 0,
“mappedWithJournal” : 0
},
“storageEngine” : {
“name” : “wiredTiger”,
“supportsCommittedReads” : true,
“readOnly” : false,
“persistent” : true,
“backupCursorOpen” : false,
“logicalSessionRecordCount” : 0,
“transactionRecordCount” : 0,
“checkpointGeneration” : NumberLong(1234),
“maximumConcurrentTransactions” : 128,
“currentConcurrentTransactions” : {
“read” : 0,
“write” : 0
},
“blockCompressor” : “snappy”
},
“wiredTiger” : {
“metadata” : {
“formatVersion” : 1
},
“session” : {
“cursorsInUse” : 0,
“sessionCount” : 10
},
“cache” : {
“trackedDirtyBytesInCache” : NumberLong(1000000),
“bytes currently in the cache” : NumberLong(8589934592),
“maximum bytes configured” : NumberLong(16106127360),
“bytes read into cache” : NumberLong(1000000000),
“bytes written from cache” : NumberLong(500000000),
“pages read into cache” : NumberLong(100000),
“pages written from cache” : NumberLong(50000)
}
},
“ok” : 1
}

# 检查慢查询

mongo -u fgedu -p password123 –eval “db.system.profile.find({ millis: { $gt: 100 } }).sort({ ts: -1 }).limit(5)”

MongoDB shell version v6.0.0
connecting to: mongodb://127.0.0.1:27017/?compressors=disabled&gssapiServiceName=mongodb
Implicit session: session { “id” : UUID(“12345678-1234-1234-1234-123456789012”) }
MongoDB server version: 6.0.0
{ “ts” : ISODate(“2024-01-01T00:00:00Z”), “op” : “query”, “ns” : “fgedudb.fgedu_users”, “command” : { “find” : “fgedu_users”, “filter” : { “name” : “fgedu01” }, “lsid” : { “id” : UUID(“12345678-1234-1234-1234-123456789012”) } }, “keysExamined” : 0, “docsExamined” : 1000000, “cursorExhausted” : true, “numYields” : 1000, “nreturned” : 1, “locks” : { “Global” : { “acquireCount” : { “r” : NumberLong(2) } }, “Database” : { “acquireCount” : { “r” : NumberLong(1) } }, “Collection” : { “acquireCount” : { “r” : NumberLong(1) } } }, “flowControl” : { “acquireCount” : NumberLong(0) }, “responseLength” : 1000, “protocol” : “op_msg”, “millis” : 1000, “planSummary” : “COLLSCAN”, “execStats” : { “stage” : “COLLSCAN”, “filter” : { “name” : “fgedu01” }, “nReturned” : 1, “executionTimeMillisEstimate” : 900, “works” : 1000002, “advanced” : 1, “needTime” : 1000001, “needYield” : 1000, “saveState” : 1000, “restoreState” : 1000, “isEOF” : 1, “invalidates” : 0, “direction” : “forward”, “docsExamined” : 1000000 }, “tsms” : NumberLong(1234567890123), “client” : “192.168.1.100”, “appName” : “MongoDB Shell”, “allUsers” : [ { “user” : “fgedu”, “db” : “admin” } ], “user” : “fgedu@admin” }

# 为name字段创建索引

mongo -u fgedu -p password123 –eval “db.fgedu_users.createIndex({ name: 1 })”

MongoDB shell version v6.0.0
connecting to: mongodb://127.0.0.1:27017/?compressors=disabled&gssapiServiceName=mongodb
Implicit session: session { “id” : UUID(“12345678-1234-1234-1234-123456789012”) }
MongoDB server version: 6.0.0
{ “createdCollectionAutomatically” : false, “numIndexesBefore” : 2, “numIndexesAfter” : 3, “ok” : 1 }

# 调整WiredTiger缓存大小

sed -i ‘s/cacheSizeGB.*/cacheSizeGB: 24/’ /etc/mongod.conf

# 重启MongoDB服务

systemctl restart mongod

Part05-风哥经验总结与分享

5.1 故障处理最佳实践

  • 快速响应:建立快速响应机制,及时处理故障
  • 准确定位:使用监控工具和日志分析准确定位故障
  • 隔离故障:采取措施隔离故障,防止故障扩大
  • 备份优先:在处理故障前,确保数据已备份
  • 记录过程:详细记录故障处理过程,便于后续分析
  • 总结经验:定期总结故障处理经验,提高处理能力

风哥提示:故障处理需要冷静分析,按照既定流程进行,避免盲目操作导致故障扩大。

5.2 常见故障解决方案

连接故障解决方案

问题:连接数过多

解决方案:调整maxIncomingConnections参数,优化查询,使用连接池

复制故障解决方案

问题:复制延迟过大

解决方案:增加oplog大小,优化网络,检查硬件资源

分片故障解决方案

问题:分片平衡失败

解决方案:检查网络连接,重启平衡器,调整分片键

性能故障解决方案

问题:慢查询

解决方案:创建索引,优化查询语句,调整缓存大小

5.3 故障预防建议

故障预防建议:

  • 建立监控体系:使用Prometheus + Grafana等工具监控MongoDB
  • 制定备份策略:定期备份数据,确保数据安全
  • 优化配置参数:根据实际情况优化MongoDB配置
  • 定期维护:定期进行系统维护和检查
  • 培训团队:加强团队培训,提高故障处理能力
  • 制定应急预案:制定详细的应急预案,提高应急响应能力

通过建立完善的故障预防和处理机制,可以有效减少MongoDB生产故障的发生,提高系统的稳定性和可靠性。在故障发生时,能够快速响应和解决问题,将故障对业务的影响降到最低。

本文由风哥教程整理发布,仅用于学习测试使用,转载注明出处:http://www.fgedu.net.cn/10327.html

联系我们

在线咨询:点击这里给我发消息

微信号:itpux-com

工作日:9:30-18:30,节假日休息