1. 首页 > 国产数据库教程 > openGauss教程 > 正文

opengauss教程FG069-openGauss锁等待与死锁处理生产实战解析

内容简介:本文深入讲解openGauss数据库锁等待与死锁处理的原理、监控方法和解决方案。风哥教程参考openGauss官方文档openGauss6系统管理员手册、openGauss6性能调优指南,帮助DBA掌握生产环境中锁问题的诊断与处理技巧,确保数据库稳定运行。

目录大纲

Part01-基础概念与理论知识

1.1 openGauss数据库锁机制原理

openGauss数据库采用多版本并发控制(MVCC)机制,在保证数据一致性的同时提供高并发访问能力。锁机制是数据库并发控制的核心,用于协调多个事务对同一资源的访问。

锁机制核心原理:
1. 事务在访问数据时会自动获取相应的锁
2. 锁的粒度可以是表级、行级或页级
3. 锁之间存在兼容性规则,不兼容的锁会导致等待
4. 死锁检测器定期检查并解除死锁

1.2 锁等待与死锁的区别

理解锁等待与死锁的区别对于问题诊断至关重要:

锁等待(Lock Waiting):
– 事务A持有锁,事务B请求不兼容的锁
– 事务B进入等待状态,直到事务A释放锁
– 是正常现象,但长时间等待会影响性能

风哥提示:
死锁(Deadlock):
– 两个或多个事务互相等待对方持有的锁
– 形成循环等待,无法自行解除
– 必须由数据库死锁检测器介入解除

1.3 openGauss锁类型详解

openGauss支持多种锁类型,用于不同的操作场景:

— 查看openGauss支持的锁类型
SELECT locktype, mode, granted
FROM pg_locks
WHERE locktype IS NOT NULL
GROUP BY locktype, mode, granted
ORDER BY locktype, mode;

locktype | mode | granted
——————–+—————–+———
relation | AccessShareLock | t
relation | RowShareLock | t
relation | RowExclusiveLock| t
relation | ShareLock | t
relation | ShareRowExclusiveLock | t
relation | ExclusiveLock | t
relation | AccessExclusiveLock | t
tuple | ExclusiveLock | t
transactionid | ExclusiveLock | t
学习交流加群风哥微信: itpux-com
virtualxid | ExclusiveLock | t
(10 rows)

主要锁类型说明:

  • AccessShareLock:SELECT操作获取,与其他AccessShareLock兼容
  • RowShareLock:SELECT FOR UPDATE/FOR SHARE获取
  • RowExclusiveLock:INSERT/UPDATE/DELETE获取
  • ShareLock:CREATE INDEX获取
  • AccessExclusiveLock:ALTER TABLE/DROP TABLE/TRUNCATE获取,最严格的锁

Part02-生产环境规划与建议

2.1 锁监控体系设计

建立完善的锁监控体系是预防锁问题的关键:

锁监控体系要点:
1. 实时监控锁等待情况,设置告警阈值
2. 定期分析死锁日志,识别业务模式问题
3. 建立锁等待历史记录,用于趋势分析
4. 配置自动化处理脚本,处理长时间等待

2.2 死锁预防策略

预防死锁比处理死锁更重要:

死锁预防策略:
1. 统一访问顺序:所有事务按照相同的顺序访问资源
2. 缩短事务:减少事务持有锁的时间
3. 使用乐观锁:适合冲突较少的场景
4. 设置超时:使用lock_timeout参数
5. 应用层控制:在应用代码中实现重试机制

Part03-生产环境项目实施方案

学习交流加群风哥QQ113257174

3.1 锁等待监控配置

配置锁等待监控参数:

# gsql -h 192.168.1.10 -d fgedudb -U fgedu -W fgedu_password

— 查看当前锁等待相关参数
SHOW lock_timeout;

SHOW deadlock_timeout;

SHOW log_lock_waits;

lock_timeout
————–
0
(1 row)

deadlock_timeout
——————
1s
(1 row)

log_lock_waits
—————-
off
(1 row)

— 配置锁等待监控参数
ALTER SYSTEM SET lock_timeout = ’30s’;

ALTER SYSTEM SET log_lock_waits = on;

ALTER SYSTEM SET log_min_duration_statement = 1000;

— 重新加载配置
SELECT pg_reload_conf();

pg_reload_conf
—————-
t
(1 row)更多视频教程www.fgedu.net.cn

3.2 死锁检测参数设置

配置死锁检测相关参数:

— 查看死锁检测参数
SHOW deadlock_timeout;

SHOW log_deadlocks;

SHOW log_min_messages;

deadlock_timeout
——————
1s
(1 row)

log_deadlocks
—————
off
(1 row)

log_min_messages
——————
warning
(1 row)

— 启用死锁日志记录
ALTER SYSTEM SET log_deadlocks = on;

ALTER SYSTEM SET log_min_messages = ‘warning’;

— 重新加载配置
SELECT pg_reload_conf();

pg_reload_conf
—————-
t
(1 row)更多学习教程公众号风哥教程itpux_com

Part04-生产案例与实战讲解

4.1 锁等待查询与分析

实战演示如何查询和分析锁等待情况:

# gsql -h 192.168.1.10 -d fgedudb -U fgedu -W fgedu_password

— 查询当前锁等待情况
SELECT
w.pid AS waiting_pid,
w.usename AS waiting_user,
w.query AS waiting_query,
w.state AS waiting_state,
b.pid AS blocking_pid,
b.usename AS blocking_user,
b.query AS blocking_query,
b.state AS blocking_state,
l.locktype,
l.mode AS waiting_mode,
l.relation::regclass AS table_name,
now() – w.query_start AS waiting_duration
FROM pg_stat_activity w
JOIN pg_locks l ON w.pid = l.pid
JOIN pg_locks l2 ON l.locktype = l2.locktype
AND l.relation = l2.relation
AND l.pid != l2.pid
JOIN pg_stat_activity b ON l2.pid = b.pid
WHERE w.waiting = true
from DB视频:www.itpux.com
AND l.granted = false
AND l2.granted = true
ORDER BY waiting_duration DESC;

waiting_pid | waiting_user | waiting_query | waiting_state | blocking_pid | blocking_user | blocking_query | blocking_state | locktype | waiting_mode | table_name | waiting_duration
————-+————–+—————+—————+————–+—————+—————-+—————-+———-+————–+————+——————
15234 | fgedu01 | UPDATE fgedu_orders SET status = ‘processing’ WHERE order_id = 10001; | active | 15201 | fgedu02 | UPDATE fgedu_orders SET status = ‘shipped’ WHERE order_id = 10001;
| idle in transaction | relation | RowExclusiveLock | fgedu_orders | 00:02:35
15256 | fgedu03 | DELETE FROM fgedu_inventory WHERE product_id = 5001; | active | 15234 | fgedu01 | UPDATE fgedu_orders SET status = ‘processing’ WHERE order_id = 10001;
| active | relation | RowExclusiveLock | fgedu_inventory | 00:01:20
(2 rows)

— 查询锁等待统计信息
SELECT
l.locktype,
l.relation::regclass AS table_name,
l.mode,
l.granted,
a.usename,
a.application_name,
a.client_addr,
a.state,
a.query
FROM pg_locks l
JOIN pg_stat_activity a ON l.pid = a.pid
WHERE l.granted = false
ORDER BY l.pid;

locktype | table_name | mode | granted | usename | application_name | client_addr | state | query
———-+————+——+———+———+——————+————-+——-+——-
relation | fgedu_orders | RowExclusiveLock | f | fgedu01 | psql | 192.168.1.20 | active | UPDATE fgedu_orders SET status = ‘processing’ WHERE order_id = 10001;

relation | fgedu_inventory | RowExclusiveLock | f | fgedu03 | psql | 192.168.1.21 | active | DELETE FROM fgedu_inventory WHERE product_id = 5001;

(2 rows)

4.2 死锁模拟与诊断

模拟死锁场景并进行诊断:

警告:死锁模拟请在测试环境进行,不要在生产环境执行!

— 会话1:创建测试表
CREATE TABLE fgedu_deadlock_test_a (
id INT PRIMARY KEY,
value VARCHAR(50)
);

CREATE TABLE fgedu_deadlock_test_b (
id INT PRIMARY KEY,
value VARCHAR(50)
);

INSERT INTO fgedu_deadlock_test_a VALUES (1, ‘A1’), (2, ‘A2’);

INSERT INTO fgedu_deadlock_test_b VALUES (1, ‘B1’), (2, ‘B2’);

CREATE TABLE
CREATE TABLE
INSERT 0 2
INSERT 0 2

— 会话1:开始事务并锁定表A
BEGIN;
UPDATE fgedu_deadlock_test_a SET value = ‘Updated A1’ WHERE id = 1;

BEGIN
UPDATE 1

— 会话2:开始事务并锁定表B
BEGIN;
UPDATE fgedu_deadlock_test_b SET value = ‘Updated B1’ WHERE id = 1;

BEGIN
UPDATE 1

— 会话1:尝试更新表B(将等待会话2)
UPDATE fgedu_deadlock_test_b SET value = ‘A updates B’ WHERE id = 1;

— 此时会话1进入等待状态

— 会话2:尝试更新表A(将形成死锁)
UPDATE fgedu_deadlock_test_a SET value = ‘B updates A’ WHERE id = 1;

ERROR: deadlock detected
DETAIL: Process 15234 waits for ShareLock on transaction 12345; blocked by process 15256.
Process 15256 waits for ShareLock on transaction 12346; blocked by process 15234.
HINT: See server log for query details.
CONTEXT: while updating tuple (0,1) in relation “fgedu_deadlock_test_a”

— 查看死锁信息
SELECT
pid,
usename,
application_name,
state,
query,
wait_event_type,
wait_event
FROM pg_stat_activity
WHERE wait_event_type = ‘Lock’;

pid | usename | application_name | state | query | wait_event_type | wait_event
——-+———+——————+——–+——-+—————–+————
15234 | fgedu01 | psql | active | UPDATE fgedu_deadlock_test_b SET value = ‘A updates B’ WHERE id = 1;
| Lock | transactionid
(1 row)

4.3 锁问题解决方案

实战演示锁问题的解决方案:

— 方案1:终止阻塞会话
— 首先确认阻塞会话的PID
SELECT pid, usename, application_name, state, query
FROM pg_stat_activity
WHERE pid IN (
SELECT DISTINCT blocking_pid
FROM (
SELECT
w.pid AS waiting_pid,
b.pid AS blocking_pid
FROM pg_stat_activity w
JOIN pg_locks l ON w.pid = l.pid
JOIN pg_locks l2 ON l.locktype = l2.locktype
AND l.relation = l2.relation
AND l.pid != l2.pid
JOIN pg_stat_activity b ON l2.pid = b.pid
WHERE w.waiting = true
AND l.granted = false
AND l2.granted = true
) t
);

pid | usename | application_name | state | query
——-+———+——————+——-+——-
15201 | fgedu02 | psql | idle in transaction | UPDATE fgedu_orders SET status = ‘shipped’ WHERE order_id = 10001;

(1 row)

— 终止阻塞会话
SELECT pg_terminate_backend(15201);

pg_terminate_backend
———————-
t
(1 row)

— 方案2:设置锁等待超时
— 会话级别设置
SET lock_timeout = ’10s’;

— 测试锁等待超时
BEGIN;
UPDATE fgedu_orders SET status = ‘test’ WHERE order_id = 10001;

SET
BEGIN
ERROR: canceling statement due to lock timeout
CONTEXT: while updating tuple (0,15) in relation “fgedu_orders”

— 方案3:查询长时间未提交的事务
SELECT
pid,
usename,
application_name,
state,
query,
xact_start,
now() – xact_start AS xact_duration,
query_start,
now() – query_start AS query_duration
FROM pg_stat_activity
WHERE state = ‘idle in transaction’
AND now() – xact_start > interval ‘5 minutes’
ORDER BY xact_start;

pid | usename | application_name | state | query | xact_start | xact_duration | query_start | query_duration
——-+———+——————+——-+——-+————+—————+————-+—————-
15201 | fgedu02 | psql | idle in transaction | UPDATE fgedu_orders SET status = ‘shipped’ WHERE order_id = 10001;
| 2024-01-15 10:30:25 | 00:15:30 | 2024-01-15 10:30:26 | 00:15:29
15289 | fgedu04 | app_server | idle in transaction | SELECT * FROM fgedu_inventory WHERE product_id = 5001;
| 2024-01-15 10:25:10 | 00:20:45 | 2024-01-15 10:25:11 | 00:20:44
(2 rows)

4.4 自动化锁监控脚本

编写自动化锁监控脚本:

# cat /opengauss/scripts/lock_monitor.sh

#!/bin/bash
# lock_monitor.sh
# from:www.itpux.com.qq113257174.wx:itpux-com
# web: `http://www.fgedu.net.cn`
# openGauss锁监控脚本

DB_IP=”192.168.1.10″
DB_PORT=”5432″
DB_NAME=”fgedudb”
DB_USER=”fgedu”
LOG_FILE=”/opengauss/log/lock_monitor.log”
ALERT_THRESHOLD=300 # 锁等待告警阈值(秒)

log() {
echo “[$(date ‘+%Y-%m-%d %H:%M:%S’)] $1” | tee -a $LOG_FILE
}

# 检查锁等待
check_lock_waits() {
log “=== 检查锁等待情况 ===”

gsql -h $DB_IP -p $DB_PORT -d $DB_NAME -U $DB_USER -t -c “
SELECT
w.pid AS waiting_pid,
w.usename AS waiting_user,
b.pid AS blocking_pid,
b.usename AS blocking_user,
l.relation::regclass AS table_name,
EXTRACT(EPOCH FROM (now() – w.query_start))::int AS wait_seconds,
w.query AS waiting_query
FROM pg_stat_activity w
JOIN pg_locks l ON w.pid = l.pid
JOIN pg_locks l2 ON l.locktype = l2.locktype
AND l.relation = l2.relation
AND l.pid != l2.pid
JOIN pg_stat_activity b ON l2.pid = b.pid
WHERE w.waiting = true
AND l.granted = false
AND l2.granted = true
ORDER BY wait_seconds DESC;
” | while read line; do
if [ -n “$line” ]; then
log “发现锁等待: $line”

# 提取等待时间
wait_seconds=$(echo $line | awk -F’|’ ‘{print $6}’ | tr -d ‘ ‘)

# 超过阈值则告警
if [ “$wait_seconds” -gt “$ALERT_THRESHOLD” ]; then
blocking_pid=$(echo $line | awk -F’|’ ‘{print $3}’ | tr -d ‘ ‘)
log “告警: 锁等待超过${ALERT_THRESHOLD}秒,阻塞会话PID: $blocking_pid”
fi
fi
done
}

# 检查长时间未提交的事务
check_long_transactions() {
log “=== 检查长时间未提交的事务 ===”

gsql -h $DB_IP -p $DB_PORT -d $DB_NAME -U $DB_USER -t -c “
SELECT
pid,
usename,
application_name,
EXTRACT(EPOCH FROM (now() – xact_start))::int AS xact_seconds,
query
FROM pg_stat_activity
WHERE state = ‘idle in transaction’
AND now() – xact_start > interval ’10 minutes’
ORDER BY xact_start;
” | while read line; do
if [ -n “$line” ]; then
log “发现长时间未提交事务: $line”
fi
done
}

# 检查死锁
check_deadlocks() {
log “=== 检查死锁统计 ===”

deadlock_count=$(gsql -h $DB_IP -p $DB_PORT -d $DB_NAME -U $DB_USER -t -c “
SELECT deadlocks FROM pg_stat_database WHERE datname = ‘$DB_NAME’;

” | tr -d ‘ ‘)

log “数据库死锁总数: $deadlock_count”
}

# 主函数
main() {
log “开始锁监控检查…”

check_lock_waits
check_long_transactions
check_deadlocks

log “锁监控检查完成”
}

main

# chmod +x /opengauss/scripts/lock_monitor.sh
# /opengauss/scripts/lock_monitor.sh

[2024-01-15 14:30:00] 开始锁监控检查…
[2024-01-15 14:30:01] === 检查锁等待情况 ===
[2024-01-15 14:30:02] 发现锁等待: 15234 | fgedu01 | 15201 | fgedu02 | fgedu_orders | 185 | UPDATE fgedu_orders SET status = ‘processing’ WHERE order_id = 10001;

[2024-01-15 14:30:02] 告警: 锁等待超过300秒,阻塞会话PID: 15201
[2024-01-15 14:30:03] === 检查长时间未提交的事务 ===
[2024-01-15 14:30:04] 发现长时间未提交事务: 15201 | fgedu02 | psql | 930 | UPDATE fgedu_orders SET status = ‘shipped’ WHERE order_id = 10001;

[2024-01-15 14:30:05] === 检查死锁统计 ===
[2024-01-15 14:30:06] 数据库死锁总数: 15
[2024-01-15 14:30:06] 锁监控检查完成

# crontab -l

# 每5分钟执行一次锁监控
*/5 * * * * /opengauss/scripts/lock_monitor.sh > /dev/null 2>&1

Part05-风哥经验总结与分享

5.1 锁问题处理经验

风哥提示:锁问题是生产环境中最常见的性能问题之一,需要建立系统化的处理流程。

锁问题处理流程:
1. 快速识别:通过监控脚本或pg_stat_activity快速定位锁等待
2. 分析原因:确定是正常等待还是异常阻塞
3. 评估影响:判断影响范围和紧急程度
4. 采取措施:终止阻塞会话或优化SQL
5. 根因分析:事后分析避免再次发生

终止会话注意事项:
1. 优先终止idle in transaction状态的会话
2. 避免终止正在执行重要业务的active会话
3. 终止前确认会话的业务影响
4. 记录终止操作用于后续分析

5.2 性能优化建议

锁性能优化建议:
1. 索引优化:确保UPDATE/DELETE语句使用索引,减少锁持有时间
2. 事务控制:尽量缩短事务长度,及时提交或回滚
3. 批量操作:大批量操作分批执行,减少单次锁持有时间
4. 锁粒度:根据业务场景选择合适的锁粒度
5. 应用设计:应用层实现锁等待重试机制
6. 监控告警:建立锁等待监控告警体系

死锁预防最佳实践:
1. 所有事务按照固定的顺序访问资源
2. 避免在事务中等待用户输入
3. 使用NOWAIT选项快速失败
4. 设置合理的lock_timeout参数
5. 定期检查死锁日志,优化业务逻辑

总结:本文详细介绍了openGauss数据库锁等待与死锁处理的原理、监控方法和解决方案。通过建立完善的锁监控体系、配置合理的参数、编写自动化脚本,可以有效预防和处理锁问题,确保数据库稳定高效运行。

本文由风哥教程整理发布,仅用于学习测试使用,转载注明出处:http://www.fgedu.net.cn/10327.html

联系我们

在线咨询:点击这里给我发消息

微信号:itpux-com

工作日:9:30-18:30,节假日休息