opengauss教程FG069-openGauss锁等待与死锁处理生产实战解析
目录大纲
- Part01-基础概念与理论知识
- 1.1 openGauss数据库锁机制原理
- 1.2 锁等待与死锁的区别
- 1.3 openGauss锁类型详解
- Part02-生产环境规划与建议
- 2.1 锁监控体系设计
- 2.2 死锁预防策略
- Part03-生产环境项目实施方案
- 3.1 锁等待监控配置
- 3.2 死锁检测参数设置
- Part04-生产案例与实战讲解
- 4.1 锁等待查询与分析
- 4.2 死锁模拟与诊断
- 4.3 锁问题解决方案
- 4.4 自动化锁监控脚本
- Part05-风哥经验总结与分享
- 5.1 锁问题处理经验
- 5.2 性能优化建议
Part01-基础概念与理论知识
1.1 openGauss数据库锁机制原理
openGauss数据库采用多版本并发控制(MVCC)机制,在保证数据一致性的同时提供高并发访问能力。锁机制是数据库并发控制的核心,用于协调多个事务对同一资源的访问。
1. 事务在访问数据时会自动获取相应的锁
2. 锁的粒度可以是表级、行级或页级
3. 锁之间存在兼容性规则,不兼容的锁会导致等待
4. 死锁检测器定期检查并解除死锁
1.2 锁等待与死锁的区别
理解锁等待与死锁的区别对于问题诊断至关重要:
– 事务A持有锁,事务B请求不兼容的锁
– 事务B进入等待状态,直到事务A释放锁
– 是正常现象,但长时间等待会影响性能
风哥提示:
死锁(Deadlock):
– 两个或多个事务互相等待对方持有的锁
– 形成循环等待,无法自行解除
– 必须由数据库死锁检测器介入解除
1.3 openGauss锁类型详解
openGauss支持多种锁类型,用于不同的操作场景:
SELECT locktype, mode, granted
FROM pg_locks
WHERE locktype IS NOT NULL
GROUP BY locktype, mode, granted
ORDER BY locktype, mode;
——————–+—————–+———
relation | AccessShareLock | t
relation | RowShareLock | t
relation | RowExclusiveLock| t
relation | ShareLock | t
relation | ShareRowExclusiveLock | t
relation | ExclusiveLock | t
relation | AccessExclusiveLock | t
tuple | ExclusiveLock | t
transactionid | ExclusiveLock | t
学习交流加群风哥微信: itpux-com
virtualxid | ExclusiveLock | t
(10 rows)
主要锁类型说明:
- AccessShareLock:SELECT操作获取,与其他AccessShareLock兼容
- RowShareLock:SELECT FOR UPDATE/FOR SHARE获取
- RowExclusiveLock:INSERT/UPDATE/DELETE获取
- ShareLock:CREATE INDEX获取
- AccessExclusiveLock:ALTER TABLE/DROP TABLE/TRUNCATE获取,最严格的锁
Part02-生产环境规划与建议
2.1 锁监控体系设计
建立完善的锁监控体系是预防锁问题的关键:
1. 实时监控锁等待情况,设置告警阈值
2. 定期分析死锁日志,识别业务模式问题
3. 建立锁等待历史记录,用于趋势分析
4. 配置自动化处理脚本,处理长时间等待
2.2 死锁预防策略
预防死锁比处理死锁更重要:
1. 统一访问顺序:所有事务按照相同的顺序访问资源
2. 缩短事务:减少事务持有锁的时间
3. 使用乐观锁:适合冲突较少的场景
4. 设置超时:使用lock_timeout参数
5. 应用层控制:在应用代码中实现重试机制
Part03-生产环境项目实施方案
学习交流加群风哥QQ113257174
3.1 锁等待监控配置
配置锁等待监控参数:
— 查看当前锁等待相关参数
SHOW lock_timeout;
SHOW deadlock_timeout;
SHOW log_lock_waits;
————–
0
(1 row)
deadlock_timeout
——————
1s
(1 row)
log_lock_waits
—————-
off
(1 row)
ALTER SYSTEM SET lock_timeout = ’30s’;
ALTER SYSTEM SET log_lock_waits = on;
ALTER SYSTEM SET log_min_duration_statement = 1000;
— 重新加载配置
SELECT pg_reload_conf();
—————-
t
(1 row)更多视频教程www.fgedu.net.cn
3.2 死锁检测参数设置
配置死锁检测相关参数:
SHOW deadlock_timeout;
SHOW log_deadlocks;
SHOW log_min_messages;
——————
1s
(1 row)
log_deadlocks
—————
off
(1 row)
log_min_messages
——————
warning
(1 row)
ALTER SYSTEM SET log_deadlocks = on;
ALTER SYSTEM SET log_min_messages = ‘warning’;
— 重新加载配置
SELECT pg_reload_conf();
—————-
t
(1 row)更多学习教程公众号风哥教程itpux_com
Part04-生产案例与实战讲解
4.1 锁等待查询与分析
实战演示如何查询和分析锁等待情况:
— 查询当前锁等待情况
SELECT
w.pid AS waiting_pid,
w.usename AS waiting_user,
w.query AS waiting_query,
w.state AS waiting_state,
b.pid AS blocking_pid,
b.usename AS blocking_user,
b.query AS blocking_query,
b.state AS blocking_state,
l.locktype,
l.mode AS waiting_mode,
l.relation::regclass AS table_name,
now() – w.query_start AS waiting_duration
FROM pg_stat_activity w
JOIN pg_locks l ON w.pid = l.pid
JOIN pg_locks l2 ON l.locktype = l2.locktype
AND l.relation = l2.relation
AND l.pid != l2.pid
JOIN pg_stat_activity b ON l2.pid = b.pid
WHERE w.waiting = true
from DB视频:www.itpux.com
AND l.granted = false
AND l2.granted = true
ORDER BY waiting_duration DESC;
————-+————–+—————+—————+————–+—————+—————-+—————-+———-+————–+————+——————
15234 | fgedu01 | UPDATE fgedu_orders SET status = ‘processing’ WHERE order_id = 10001; | active | 15201 | fgedu02 | UPDATE fgedu_orders SET status = ‘shipped’ WHERE order_id = 10001;
| idle in transaction | relation | RowExclusiveLock | fgedu_orders | 00:02:35
15256 | fgedu03 | DELETE FROM fgedu_inventory WHERE product_id = 5001; | active | 15234 | fgedu01 | UPDATE fgedu_orders SET status = ‘processing’ WHERE order_id = 10001;
| active | relation | RowExclusiveLock | fgedu_inventory | 00:01:20
(2 rows)
SELECT
l.locktype,
l.relation::regclass AS table_name,
l.mode,
l.granted,
a.usename,
a.application_name,
a.client_addr,
a.state,
a.query
FROM pg_locks l
JOIN pg_stat_activity a ON l.pid = a.pid
WHERE l.granted = false
ORDER BY l.pid;
———-+————+——+———+———+——————+————-+——-+——-
relation | fgedu_orders | RowExclusiveLock | f | fgedu01 | psql | 192.168.1.20 | active | UPDATE fgedu_orders SET status = ‘processing’ WHERE order_id = 10001;
relation | fgedu_inventory | RowExclusiveLock | f | fgedu03 | psql | 192.168.1.21 | active | DELETE FROM fgedu_inventory WHERE product_id = 5001;
(2 rows)
4.2 死锁模拟与诊断
模拟死锁场景并进行诊断:
CREATE TABLE fgedu_deadlock_test_a (
id INT PRIMARY KEY,
value VARCHAR(50)
);
CREATE TABLE fgedu_deadlock_test_b (
id INT PRIMARY KEY,
value VARCHAR(50)
);
INSERT INTO fgedu_deadlock_test_a VALUES (1, ‘A1’), (2, ‘A2’);
INSERT INTO fgedu_deadlock_test_b VALUES (1, ‘B1’), (2, ‘B2’);
CREATE TABLE
INSERT 0 2
INSERT 0 2
BEGIN;
UPDATE fgedu_deadlock_test_a SET value = ‘Updated A1’ WHERE id = 1;
UPDATE 1
BEGIN;
UPDATE fgedu_deadlock_test_b SET value = ‘Updated B1’ WHERE id = 1;
UPDATE 1
UPDATE fgedu_deadlock_test_b SET value = ‘A updates B’ WHERE id = 1;
UPDATE fgedu_deadlock_test_a SET value = ‘B updates A’ WHERE id = 1;
DETAIL: Process 15234 waits for ShareLock on transaction 12345; blocked by process 15256.
Process 15256 waits for ShareLock on transaction 12346; blocked by process 15234.
HINT: See server log for query details.
CONTEXT: while updating tuple (0,1) in relation “fgedu_deadlock_test_a”
SELECT
pid,
usename,
application_name,
state,
query,
wait_event_type,
wait_event
FROM pg_stat_activity
WHERE wait_event_type = ‘Lock’;
——-+———+——————+——–+——-+—————–+————
15234 | fgedu01 | psql | active | UPDATE fgedu_deadlock_test_b SET value = ‘A updates B’ WHERE id = 1;
| Lock | transactionid
(1 row)
4.3 锁问题解决方案
实战演示锁问题的解决方案:
— 首先确认阻塞会话的PID
SELECT pid, usename, application_name, state, query
FROM pg_stat_activity
WHERE pid IN (
SELECT DISTINCT blocking_pid
FROM (
SELECT
w.pid AS waiting_pid,
b.pid AS blocking_pid
FROM pg_stat_activity w
JOIN pg_locks l ON w.pid = l.pid
JOIN pg_locks l2 ON l.locktype = l2.locktype
AND l.relation = l2.relation
AND l.pid != l2.pid
JOIN pg_stat_activity b ON l2.pid = b.pid
WHERE w.waiting = true
AND l.granted = false
AND l2.granted = true
) t
);
——-+———+——————+——-+——-
15201 | fgedu02 | psql | idle in transaction | UPDATE fgedu_orders SET status = ‘shipped’ WHERE order_id = 10001;
(1 row)
SELECT pg_terminate_backend(15201);
———————-
t
(1 row)
— 会话级别设置
SET lock_timeout = ’10s’;
— 测试锁等待超时
BEGIN;
UPDATE fgedu_orders SET status = ‘test’ WHERE order_id = 10001;
BEGIN
ERROR: canceling statement due to lock timeout
CONTEXT: while updating tuple (0,15) in relation “fgedu_orders”
SELECT
pid,
usename,
application_name,
state,
query,
xact_start,
now() – xact_start AS xact_duration,
query_start,
now() – query_start AS query_duration
FROM pg_stat_activity
WHERE state = ‘idle in transaction’
AND now() – xact_start > interval ‘5 minutes’
ORDER BY xact_start;
——-+———+——————+——-+——-+————+—————+————-+—————-
15201 | fgedu02 | psql | idle in transaction | UPDATE fgedu_orders SET status = ‘shipped’ WHERE order_id = 10001;
| 2024-01-15 10:30:25 | 00:15:30 | 2024-01-15 10:30:26 | 00:15:29
15289 | fgedu04 | app_server | idle in transaction | SELECT * FROM fgedu_inventory WHERE product_id = 5001;
| 2024-01-15 10:25:10 | 00:20:45 | 2024-01-15 10:25:11 | 00:20:44
(2 rows)
4.4 自动化锁监控脚本
编写自动化锁监控脚本:
# lock_monitor.sh
# from:www.itpux.com.qq113257174.wx:itpux-com
# web: `http://www.fgedu.net.cn`
# openGauss锁监控脚本
DB_IP=”192.168.1.10″
DB_PORT=”5432″
DB_NAME=”fgedudb”
DB_USER=”fgedu”
LOG_FILE=”/opengauss/log/lock_monitor.log”
ALERT_THRESHOLD=300 # 锁等待告警阈值(秒)
log() {
echo “[$(date ‘+%Y-%m-%d %H:%M:%S’)] $1” | tee -a $LOG_FILE
}
# 检查锁等待
check_lock_waits() {
log “=== 检查锁等待情况 ===”
gsql -h $DB_IP -p $DB_PORT -d $DB_NAME -U $DB_USER -t -c “
SELECT
w.pid AS waiting_pid,
w.usename AS waiting_user,
b.pid AS blocking_pid,
b.usename AS blocking_user,
l.relation::regclass AS table_name,
EXTRACT(EPOCH FROM (now() – w.query_start))::int AS wait_seconds,
w.query AS waiting_query
FROM pg_stat_activity w
JOIN pg_locks l ON w.pid = l.pid
JOIN pg_locks l2 ON l.locktype = l2.locktype
AND l.relation = l2.relation
AND l.pid != l2.pid
JOIN pg_stat_activity b ON l2.pid = b.pid
WHERE w.waiting = true
AND l.granted = false
AND l2.granted = true
ORDER BY wait_seconds DESC;
” | while read line; do
if [ -n “$line” ]; then
log “发现锁等待: $line”
# 提取等待时间
wait_seconds=$(echo $line | awk -F’|’ ‘{print $6}’ | tr -d ‘ ‘)
# 超过阈值则告警
if [ “$wait_seconds” -gt “$ALERT_THRESHOLD” ]; then
blocking_pid=$(echo $line | awk -F’|’ ‘{print $3}’ | tr -d ‘ ‘)
log “告警: 锁等待超过${ALERT_THRESHOLD}秒,阻塞会话PID: $blocking_pid”
fi
fi
done
}
# 检查长时间未提交的事务
check_long_transactions() {
log “=== 检查长时间未提交的事务 ===”
gsql -h $DB_IP -p $DB_PORT -d $DB_NAME -U $DB_USER -t -c “
SELECT
pid,
usename,
application_name,
EXTRACT(EPOCH FROM (now() – xact_start))::int AS xact_seconds,
query
FROM pg_stat_activity
WHERE state = ‘idle in transaction’
AND now() – xact_start > interval ’10 minutes’
ORDER BY xact_start;
” | while read line; do
if [ -n “$line” ]; then
log “发现长时间未提交事务: $line”
fi
done
}
# 检查死锁
check_deadlocks() {
log “=== 检查死锁统计 ===”
deadlock_count=$(gsql -h $DB_IP -p $DB_PORT -d $DB_NAME -U $DB_USER -t -c “
SELECT deadlocks FROM pg_stat_database WHERE datname = ‘$DB_NAME’;
” | tr -d ‘ ‘)
log “数据库死锁总数: $deadlock_count”
}
# 主函数
main() {
log “开始锁监控检查…”
check_lock_waits
check_long_transactions
check_deadlocks
log “锁监控检查完成”
}
main
# /opengauss/scripts/lock_monitor.sh
[2024-01-15 14:30:01] === 检查锁等待情况 ===
[2024-01-15 14:30:02] 发现锁等待: 15234 | fgedu01 | 15201 | fgedu02 | fgedu_orders | 185 | UPDATE fgedu_orders SET status = ‘processing’ WHERE order_id = 10001;
[2024-01-15 14:30:02] 告警: 锁等待超过300秒,阻塞会话PID: 15201
[2024-01-15 14:30:03] === 检查长时间未提交的事务 ===
[2024-01-15 14:30:04] 发现长时间未提交事务: 15201 | fgedu02 | psql | 930 | UPDATE fgedu_orders SET status = ‘shipped’ WHERE order_id = 10001;
[2024-01-15 14:30:05] === 检查死锁统计 ===
[2024-01-15 14:30:06] 数据库死锁总数: 15
[2024-01-15 14:30:06] 锁监控检查完成
*/5 * * * * /opengauss/scripts/lock_monitor.sh > /dev/null 2>&1
Part05-风哥经验总结与分享
5.1 锁问题处理经验
风哥提示:锁问题是生产环境中最常见的性能问题之一,需要建立系统化的处理流程。
1. 快速识别:通过监控脚本或pg_stat_activity快速定位锁等待
2. 分析原因:确定是正常等待还是异常阻塞
3. 评估影响:判断影响范围和紧急程度
4. 采取措施:终止阻塞会话或优化SQL
5. 根因分析:事后分析避免再次发生
1. 优先终止idle in transaction状态的会话
2. 避免终止正在执行重要业务的active会话
3. 终止前确认会话的业务影响
4. 记录终止操作用于后续分析
5.2 性能优化建议
1. 索引优化:确保UPDATE/DELETE语句使用索引,减少锁持有时间
2. 事务控制:尽量缩短事务长度,及时提交或回滚
3. 批量操作:大批量操作分批执行,减少单次锁持有时间
4. 锁粒度:根据业务场景选择合适的锁粒度
5. 应用设计:应用层实现锁等待重试机制
6. 监控告警:建立锁等待监控告警体系
1. 所有事务按照固定的顺序访问资源
2. 避免在事务中等待用户输入
3. 使用NOWAIT选项快速失败
4. 设置合理的lock_timeout参数
5. 定期检查死锁日志,优化业务逻辑
本文由风哥教程整理发布,仅用于学习测试使用,转载注明出处:http://www.fgedu.net.cn/10327.html
