PostgreSQL教程FG115-PG常用报错案例实战教程
本文档详细介绍PostgreSQL数据库常用报错的诊断和处理方法,包括连接错误、权限错误、存储错误、性能错误等,风哥教程参考PostgreSQL官方文档Error Codes内容,适合DBA在生产环境中快速定位和解决问题。
Part01-基础概念与理论知识
1.1 PostgreSQL错误类型
PostgreSQL错误分为多个类别,每个类别有对应的三位错误码。了解错误类型有助于快速定位问题原因。PostgreSQL错误码遵循SQL标准,由五位字符组成,前两位表示错误类别,后三位表示具体错误。更多视频教程www.fgedu.net.cn
- 00 – 成功完成
- 01 – 警告
- 02 – 无数据
- 03 – SQL语句未完成
- 08 – 连接异常
- 09 – 触发器动作异常
- 0A – 特性不支持
- 20 – 用例未找到
- 21 – 基数违规
- 22 – 数据异常
- 23 – 完整性约束违规
- 24 – 无效游标状态
- 25 – 无效事务状态
- 26 – 无效SQL语句名称
- 27 – 触发数据更改违规
- 28 – 无效授权规范
- 2B – 依赖特权描述符仍存在
- 2D – 无效事务终止
- 2F – SQL例程异常
- 34 – 无效游标名称
- 38 – 外部例程异常
- 39 – 外部例程调用异常
- 3B – 保存点异常
- 3D – 无效目录名称
- 3F – 无效模式名称
- 40 – 事务回滚
- 42 – 语法错误或访问规则违规
- 44 – WITH CHECK OPTION违规
- 53 – 资源不足
- 54 – 程序限制超出
- 55 – 对象不处于先决条件状态
- 57 – 操作员干预
- 58 – 系统错误
- 72 – 快照失败
- F0 – 配置文件错误
- HV – 外部数据包装器错误
- P0 – PL/pgSQL错误
- XX – 内部错误
1.2 PostgreSQL错误结构
# 1. 错误消息格式
# ERROR: 错误描述
# DETAIL: 详细信息
# HINT: 提示信息
# CONTEXT: 上下文信息
# STATEMENT: 导致错误的SQL语句
# 2. 错误示例
ERROR: relation “fgedu_table” does not exist
LINE 1: SELECT * FROM fgedu_table;
^
DETAIL: The table does not exist in the current schema.
HINT: Please check the table name or create the table first.
# 3. 错误码查询
# 查看所有错误码
$ psql -U fgedu -d fgedudb -c “SELECT * FROM pg_error ORDER BY code;”
# 4. 错误字段
# SQLSTATE: 标准错误码
# MESSAGE: 主要错误消息
# DETAIL: 详细错误信息
# HINT: 建议提示
# CONTEXT: 错误上下文
# SCHEMA: 相关模式名
# TABLE: 相关表名
# COLUMN: 相关列名
# DATATYPE: 相关数据类型
# CONSTRAINT: 相关约束名
1.3 PostgreSQL错误处理机制
PostgreSQL错误处理机制:
- 错误捕获:在PL/pgSQL中使用EXCEPTION块捕获错误
- 错误日志:错误信息记录到日志文件
- 错误传播:错误向上传播到客户端
- 事务回滚:错误导致事务回滚
Part02-生产环境规划与建议
2.1 PostgreSQL错误预防
# 1. 连接类错误预防
# – 合理配置max_connections
# – 使用连接池
# – 配置pg_hba.conf正确
# – 监控连接数使用率
# 2. 权限类错误预防
# – 遵循最小权限原则
# – 定期审计用户权限
# – 使用角色管理权限
# – 避免直接使用超级用户
# 3. 存储类错误预防
# – 监控磁盘空间
# – 配置表空间配额
# – 定期清理历史数据
# – 合理设置WAL保留
# 4. 性能类错误预防
# – 优化查询语句
# – 创建合适的索引
# – 定期更新统计信息
# – 监控锁等待
# 5. 复制类错误预防
# – 监控复制延迟
# – 配置复制槽
# – 定期验证复制状态
# – 保持网络稳定
2.2 PostgreSQL错误监控
PostgreSQL错误监控:
- 日志监控:实时监控日志文件中的错误
- 告警配置:配置关键错误的告警
- 统计收集:收集错误统计信息
- 趋势分析:分析错误发生趋势
2.3 PostgreSQL错误日志
# 1. 日志配置参数
$ psql -U fgedu -d fgedudb -c ”
SELECT name, setting, short_desc
FROM pg_settings
WHERE name LIKE ‘log_%’
ORDER BY name;
”
name | setting | short_desc
—————————-+—————+————————————————–
log_checkpoints | on | Logs each checkpoint.
log_connections | on | Logs each successful connection.
log_destination | csvlog | Sets the destination for server log output.
log_directory | pg_log | Sets the directory where log files are created.
log_disconnections | on | Logs end of a session, including duration.
log_duration | off | Logs the duration of each completed SQL statement.
log_error_verbosity | verbose | Sets the verbosity of logged messages.
log_filename | postgresql-%Y | Sets the file name pattern for log files.
log_hostname | off | Logs the host name in the connection logs.
log_line_prefix | %t [%p] %u@%d | Controls information prefixed to each log line.
log_lock_waits | on | Logs long lock waits.
log_min_duration_statement | 1000 | Sets the minimum execution time for logged stateme
log_min_error_statement | error | Sets the minimum message level for logged statemen
log_min_messages | warning | Sets the message levels that are logged.
log_rotation_age | 1440 | Automatic log file rotation will occur after N min
log_rotation_size | 10240 | Automatic log file rotation will occur after N kil
log_statement | ddl | Sets the type of statements logged.
log_temp_files | -1 | Log the use of temporary files larger than N kilob
log_timezone | Asia/Shanghai | Sets the time zone to use in log messages.
log_truncate_on_rotation | on | Truncate existing log files of same name during lo
# 2. 日志文件位置
$ ls -la /postgresql/fgdata/pg_log/
total 102400
drwx—— 2 pgsql fgedudb 4096 Apr 7 10:00 .
drwx—— 3 pgsql fgedudb 4096 Apr 1 10:00 ..
-rw——- 1 pgsql fgedudb 51200 Apr 7 10:00 postgresql-2026-04-07.csv
-rw——- 1 pgsql fgedudb 10240 Apr 7 10:00 postgresql-2026-04-07.log
# 3. 查看错误日志
$ grep -i error /postgresql/fgdata/pg_log/postgresql-2026-04-07.log
2026-04-07 10:00:00.123 CST [12345] fgedu@fgedudb ERROR: relation “fgedu_table” does not exist
2026-04-07 10:05:00.456 CST [12346] fgapp_usr@fgedudb ERROR: duplicate key value violates unique constraint
# 4. 日志分析工具
# 使用pgBadger分析日志
$ pgbadger /postgresql/fgdata/pg_log/postgresql-2026-04-07.log -o /tmp/report.html
Part03-生产环境项目实施方案
3.1 PostgreSQL连接类错误处理
3.1.1 连接数超限错误
# 错误信息
FATAL: sorry, too many clients already
DETAIL: There are already 300 clients connected.
HINT: You might need to increase max_connections.
# 问题分析
$ psql -U fgedu -d fgedudb -c ”
SELECT
setting::int as max_connections,
(SELECT count(*) FROM pg_stat_activity) as current_connections
FROM pg_settings
WHERE name = ‘max_connections’;
”
max_connections | current_connections
—————–+———————
300 | 300
(1 row)
# 解决方案1:释放空闲连接
$ psql -U fgedu -d fgedudb -c ”
SELECT pg_terminate_backend(pid)
FROM pg_stat_activity
WHERE state = ‘idle’
AND now() – state_change > interval ’10 minutes’
AND usename NOT IN (‘fgedu’, ‘pgsql’);
”
pg_terminate_backend
———————-
t
t
(2 rows)
# 解决方案2:增加最大连接数
$ psql -U fgedu -d fgedudb -c “ALTER SYSTEM SET max_connections = 400;”
# 重启数据库生效
$ pg_ctl -D /postgresql/fgdata restart
# 解决方案3:使用连接池
# 安装PgBouncer
$ yum install -y pgbouncer
# 配置PgBouncer
$ vi /etc/pgbouncer/pgbouncer.ini
[databases]
fgedudb = host=127.0.0.1 port=5432 dbname=fgedudb
[pgbouncer]
listen_addr = 0.0.0.0
listen_port = 6432
auth_type = md5
auth_file = /etc/pgbouncer/userlist.txt
pool_mode = session
max_client_conn = 1000
default_pool_size = 25
# 启动PgBouncer
$ systemctl start pgbouncer
3.1.2 认证失败错误
# 错误信息
FATAL: password authentication failed for user “fgedu”
DETAIL: Role “fgedu” does not exist.
Connection matched pg_hba.conf line 95: “host all all 192.168.1.0/24 scram-sha-256”
# 问题分析
# 1. 检查用户是否存在
$ psql -U pgsql -d postgres -c “\du fgedu”
List of roles
Role name | Attributes | Member of
———–+————————————————————+———–
fgedu | Superuser, Create role, Create DB | {}
# 2. 检查pg_hba.conf配置
$ cat /postgresql/fgdata/pg_hba.conf | grep -v “^#” | grep -v “^$”
local all all trust
host all all 127.0.0.1/32 scram-sha-256
host all all 192.168.1.0/24 scram-sha-256
host replication all 192.168.1.0/24 scram-sha-256
# 解决方案1:重置密码
$ psql -U pgsql -d postgres -c “ALTER USER fgedu WITH PASSWORD ‘Fgedu@New2026’;”
ALTER ROLE
# 解决方案2:修改认证方式
$ vi /postgresql/fgdata/pg_hba.conf
# 将scram-sha-256改为md5或trust(不推荐生产环境)
host all all 192.168.1.0/24 md5
# 重载配置
$ pg_ctl -D /postgresql/fgdata reload
# 解决方案3:检查密码加密方式
$ psql -U fgedu -d fgedudb -c ”
SELECT rolname, rolpassword
FROM pg_authid
WHERE rolname = ‘fgedu’;
”
rolname | rolpassword
———+—————————————-
fgedu | SCRAM-SHA-256$4096:…
(1 row)
3.2 PostgreSQL权限类错误处理
3.2.1 权限不足错误
# 错误信息
ERROR: permission denied for table fgedu_orders
STATEMENT: SELECT * FROM fgedu_orders;
# 问题分析
$ psql -U fgedu -d fgedudb -c ”
SELECT
grantee,
table_schema,
table_name,
privilege_type
FROM information_schema.table_privileges
WHERE table_name = ‘fgedu_orders’
AND grantee = ‘fgapp_usr’;
”
grantee | table_schema | table_name | privilege_type
———+————–+—————-+—————-
fgapp_usr | public | fgedu_orders | SELECT
(1 row)
# 解决方案1:授予SELECT权限
$ psql -U fgedu -d fgedudb -c “GRANT SELECT ON fgedu_orders TO fgapp_usr;”
GRANT
# 解决方案2:授予所有权限
$ psql -U fgedu -d fgedudb -c “GRANT ALL PRIVILEGES ON fgedu_orders TO fgapp_usr;”
GRANT
# 解决方案3:授予模式权限
$ psql -U fgedu -d fgedudb -c “GRANT USAGE ON SCHEMA public TO fgapp_usr;”
GRANT
# 解决方案4:授予默认权限
$ psql -U fgedu -d fgedudb -c ”
ALTER DEFAULT PRIVILEGES IN SCHEMA public
GRANT SELECT ON TABLES TO fgapp_usr;
”
ALTER DEFAULT PRIVILEGES
3.2.2 模式不存在错误
# 错误信息
ERROR: schema “fgedu_schema” does not exist
LINE 1: SELECT * FROM fgedu_schema.fgedu_table;
^
# 解决方案1:创建模式
$ psql -U fgedu -d fgedudb -c “CREATE SCHEMA fgedu_schema;”
CREATE SCHEMA
# 解决方案2:检查搜索路径
$ psql -U fgedu -d fgedudb -c “SHOW search_path;”
search_path
—————–
“$user”, public
(1 row)
# 解决方案3:设置搜索路径
$ psql -U fgedu -d fgedudb -c “SET search_path TO fgedu_schema, public;”
SET
# 解决方案4:永久设置搜索路径
$ psql -U fgedu -d fgedudb -c “ALTER USER fgapp_usr SET search_path TO fgedu_schema, public;”
ALTER ROLE
3.3 PostgreSQL存储类错误处理
3.3.1 磁盘空间不足错误
# 错误信息
ERROR: could not extend file “base/16384/12345″: No space left on device
HINT: Check free disk space.
# 问题分析
$ df -h /postgresql
Filesystem Size Used Avail Use% Mounted on
/dev/sdb1 2.0T 1.9T 50G 98% /postgresql
# 解决方案1:清理历史数据
$ psql -U fgedu -d fgedudb -c ”
— 删除旧数据
DELETE FROM fgedu_logs WHERE created_at < now() - interval '30 days';
-- 执行VACUUM
VACUUM FULL fgedu_logs;
"
DELETE 10000000
VACUUM
# 解决方案2:清理WAL日志
$ psql -U fgedu -d fgedudb -c "
-- 检查WAL文件数量
SELECT count(*) FROM pg_ls_waldir();
"
count
-------
50
(1 row)
-- 调整WAL保留
ALTER SYSTEM SET max_wal_size = '2GB';
ALTER SYSTEM SET wal_keep_size = '1GB';
# 解决方案3:清理临时文件
$ rm -rf /postgresql/fgdata/base/pgsql_tmp/*
# 解决方案4:扩展存储
# 添加新磁盘
$ fdisk /dev/sdc
$ mkfs.xfs /dev/sdc1
$ mount /dev/sdc1 /postgresql/new_storage
# 创建新表空间
$ psql -U fgedu -d fgedudb -c "
CREATE TABLESPACE fgedutbs_new LOCATION '/postgresql/new_storage';
"
# 移动表到新表空间
$ psql -U fgedu -d fgedudb -c "
ALTER TABLE fgedu_large_table SET TABLESPACE fgedutbs_new;
"
3.3.2 表空间不足错误
# 错误信息
ERROR: could not create file “pg_tblspc/12345/PG_18_202307071/16384/12345″: No space left on device
# 问题分析
$ psql -U fgedu -d fgedudb -c ”
SELECT
spcname,
pg_size_pretty(pg_tablespace_size(oid)) as size,
pg_tablespace_location(oid) as location
FROM pg_tablespace;
”
spcname | size | location
—————+——–+——————————–
pg_default | 150 GB |
pg_global | 500 MB |
fgedutbs | 50 GB | /postgresql/fgedutbs
fgedutbs_data | 100 GB | /postgresql/fgedutbs_data
# 检查表空间所在磁盘
$ df -h /postgresql/fgedutbs_data
Filesystem Size Used Avail Use% Mounted on
/dev/sdc1 100G 95G 5G 95% /postgresql/fgedutbs_data
# 解决方案1:清理表空间
$ psql -U fgedu -d fgedudb -c ”
— 查看表空间中的表
SELECT
schemaname || ‘.’ || relname as table_name,
pg_size_pretty(pg_total_relation_size(schemaname || ‘.’ || relname)) as size
FROM pg_class c
JOIN pg_namespace n ON c.relnamespace = n.oid
JOIN pg_tablespace t ON c.reltablespace = t.oid
WHERE t.spcname = ‘fgedutbs_data’
ORDER BY pg_total_relation_size(schemaname || ‘.’ || relname) DESC;
”
# 解决方案2:扩展表空间
# 扩展底层存储或添加新位置
$ psql -U fgedu -d fgedudb -c ”
— 创建新表空间
CREATE TABLESPACE fgedutbs_new LOCATION ‘/postgresql/new_storage’;
— 移动表
ALTER TABLE fgedu_large_table SET TABLESPACE fgedutbs_new;
”
Part04-生产案例与实战讲解
4.1 PostgreSQL常见错误案例
# 案例1:唯一约束冲突
# 错误信息
ERROR: duplicate key value violates unique constraint “fgedu_orders_pkey”
DETAIL: Key (order_id)=(12345) already exists.
# 解决方案
— 方案1:使用ON CONFLICT处理
INSERT INTO fgedu_orders (order_id, customer_id, amount)
VALUES (12345, 100, 1000.00)
ON CONFLICT (order_id) DO UPDATE SET amount = EXCLUDED.amount;
— 方案2:检查后插入
INSERT INTO fgedu_orders (order_id, customer_id, amount)
SELECT 12345, 100, 1000.00
WHERE NOT EXISTS (SELECT 1 FROM fgedu_orders WHERE order_id = 12345);
# 案例2:外键约束冲突
# 错误信息
ERROR: insert or update on table “fgedu_orders” violates foreign key constraint “fk_customer”
DETAIL: Key (customer_id)=(99999) is not present in table “fgedu_customers”.
# 解决方案
— 方案1:先插入父表数据
INSERT INTO fgedu_customers (customer_id, name) VALUES (99999, ‘New Customer’);
— 方案2:使用DEFERRABLE约束
ALTER TABLE fgedu_orders DROP CONSTRAINT fk_customer;
ALTER TABLE fgedu_orders ADD CONSTRAINT fk_customer
FOREIGN KEY (customer_id) REFERENCES fgedu_customers(customer_id)
DEFERRABLE INITIALLY DEFERRED;
# 案例3:死锁错误
# 错误信息
ERROR: deadlock detected
DETAIL: Process 12345 waits for ShareLock on transaction 12346; blocked by process 12346.
Process 12346 waits for ShareLock on transaction 12345; blocked by process 12345.
HINT: See server log for query details.
CONTEXT: while updating tuple (0,1) in relation “fgedu_orders”
# 解决方案
— 方案1:调整事务顺序,确保一致的锁定顺序
— 方案2:减少事务持有锁的时间
— 方案3:使用SELECT FOR UPDATE NOWAIT
# 案例4:序列耗尽
# 错误信息
ERROR: nextval: reached maximum value of sequence “fgedu_orders_order_id_seq” (2147483647)
# 解决方案
— 方案1:重置序列
ALTER SEQUENCE fgedu_orders_order_id_seq RESTART WITH 1;
— 方案2:修改序列为BIGINT
ALTER SEQUENCE fgedu_orders_order_id_seq AS bigint;
ALTER TABLE fgedu_orders ALTER COLUMN order_id TYPE bigint;
4.2 PostgreSQL性能类错误案例
# 案例1:内存不足错误
# 错误信息
ERROR: out of memory
DETAIL: Failed on request of size 12345678 in memory context “HashBatchContext”.
# 问题分析
$ psql -U fgedu -d fgedudb -c ”
SELECT name, setting, unit, short_desc
FROM pg_settings
WHERE name IN (‘work_mem’, ‘maintenance_work_mem’, ‘shared_buffers’);
”
name | setting | unit | short_desc
——————-+———-+——+————————————————–
shared_buffers | 16384 | 8kB | Sets the number of shared memory buffers.
work_mem | 262144 | kB | Sets the maximum memory for query workspaces.
maintenance_work_mem | 1048576 | kB | Sets the maximum memory for maintenance operations.
# 解决方案
— 方案1:降低work_mem
ALTER SYSTEM SET work_mem = ‘256MB’;
SELECT pg_reload_conf();
— 方案2:优化查询
— 避免大型排序和哈希操作
— 添加合适的索引
CREATE INDEX idx_fgedu_orders_customer ON fgedu_orders(customer_id);
— 方案3:分批处理
— 将大查询拆分为多个小查询
# 案例2:锁等待超时
# 错误信息
ERROR: canceling statement due to lock timeout
STATEMENT: UPDATE fgedu_orders SET status = ‘completed’ WHERE order_id = 12345;
# 问题分析
$ psql -U fgedu -d fgedudb -c ”
SELECT
blocked.pid,
blocked.query,
blocking.pid as blocking_pid,
blocking.query as blocking_query
FROM pg_locks blocked
JOIN pg_stat_activity blocked ON blocked.pid = blocked.pid
JOIN pg_locks blocking ON blocked.locktype = blocking.locktype
AND blocked.relation = blocking.relation
AND blocked.pid != blocking.pid
JOIN pg_stat_activity blocking ON blocking.pid = blocking.pid
WHERE NOT blocked.granted;
”
# 解决方案
— 方案1:终止阻塞会话
SELECT pg_terminate_backend(12346);
— 方案2:增加锁等待超时
SET lock_timeout = ’30s’;
— 方案3:使用NOWAIT
UPDATE fgedu_orders SET status = ‘completed’ WHERE order_id = 12345 NOWAIT;
# 案例3:事务超时
# 错误信息
ERROR: terminating connection due to idle-in-transaction timeout
# 解决方案
— 方案1:调整超时参数
ALTER SYSTEM SET idle_in_transaction_session_timeout = ‘600000’; — 10分钟
SELECT pg_reload_conf();
— 方案2:及时提交或回滚事务
— 方案3:使用连接池管理事务
4.3 PostgreSQL复制类错误案例
# 案例1:复制连接失败
# 错误信息(备库日志)
ERROR: could not connect to the primary server: connection to server at “192.168.1.100”, port 5432 failed: Connection refused
# 问题分析
# 检查主库状态
$ pg_isready -h 192.168.1.100 -p 5432
# 检查pg_hba.conf
$ grep replication /postgresql/fgdata/pg_hba.conf
host replication fgedu_repl 192.168.1.0/24 scram-sha-256
# 解决方案
— 方案1:检查网络连通性
$ ping 192.168.1.100
$ telnet 192.168.1.100 5432
— 方案2:检查复制用户
$ psql -U fgedu -d fgedudb -c “\du fgedu_repl”
— 方案3:检查主库监听
$ netstat -tlnp | grep 5432
# 案例2:复制槽不存在
# 错误信息
ERROR: replication slot “fgedu_slot” does not exist
# 解决方案
— 方案1:创建复制槽
SELECT pg_create_physical_replication_slot(‘fgedu_slot’);
— 方案2:修改primary_slot_name参数
ALTER SYSTEM SET primary_slot_name = ‘fgedu_slot’;
SELECT pg_reload_conf();
# 案例3:WAL文件缺失
# 错误信息
ERROR: requested WAL segment 000000010000000000000065 has already been removed
# 问题分析
$ psql -U fgedu -d fgedudb -c ”
SELECT
pg_walfile_name(pg_current_wal_lsn()) as current_wal,
pg_walfile_name(replay_lsn) as replayed_wal
FROM pg_stat_replication;
”
# 解决方案
— 方案1:重新搭建备库
pg_basebackup -h 192.168.1.100 -U fgedu_repl -D /postgresql/fgdata -P -R
— 方案2:增加WAL保留
ALTER SYSTEM SET wal_keep_size = ‘2GB’;
SELECT pg_reload_conf();
— 方案3:使用WAL归档
ALTER SYSTEM SET archive_mode = on;
ALTER SYSTEM SET archive_command = ‘cp %p /postgresql/archive/%f’;
SELECT pg_reload_conf();
Part05-风哥经验总结与分享
5.1 PostgreSQL错误处理最佳实践
PostgreSQL错误处理最佳实践:
- 预防为主:通过监控和告警预防错误发生
- 快速定位:熟悉错误类型,快速定位问题原因
- 规范处理:建立标准化的错误处理流程
- 记录风哥教程风哥教程总结:记录错误处理过程,建立知识库
- 持续改进:分析错误原因,优化系统配置
- 培训提升:定期培训,提高团队处理能力
5.2 PostgreSQL错误诊断脚本
# pg_error_diagnosis.sh
# from:www.itpux.com.qq113257174.wx:itpux-com
# web: http://www.fgedu.net.cn
# PostgreSQL错误诊断脚本
PGHOME=/postgresql/fgapp
PGHOST=fgedu.localhost
PGPORT=5432
PGUSER=fgedu
PGDATABASE=fgedudb
LOG_FILE=/postgresql/scripts/logs/error_diagnosis.log
log_message() {
echo “[$(date ‘+%Y-%m-%d %H:%M:%S’)] $1” | tee -a ${LOG_FILE}
}
# 检查连接错误
check_connection_errors() {
log_message “=== 检查连接错误 ===”
local conn_count=$(${PGHOME}/bin/psql -h ${PGHOST} -p ${PGPORT} -U ${PGUSER} -d ${PGDATABASE} -t -c ”
SELECT count(*) FROM pg_stat_activity;
” 2>/dev/null | tr -d ‘ ‘)
local max_conn=$(${PGHOME}/bin/psql -h ${PGHOST} -p ${PGPORT} -U ${PGUSER} -d ${PGDATABASE} -t -c ”
SELECT setting FROM pg_settings WHERE name = ‘max_connections’;
” 2>/dev/null | tr -d ‘ ‘)
local usage=$(echo “scale=2; $conn_count * 100 / $max_conn” | bc)
if [ $(echo “$usage > 80” | bc) -eq 1 ]; then
log_message “WARNING: 连接数使用率过高: ${usage}%”
else
log_message “连接数使用率: ${usage}%”
fi
}
# 检查锁等待
check_lock_waits() {
log_message “=== 检查锁等待 ===”
local lock_count=$(${PGHOME}/bin/psql -h ${PGHOST} -p ${PGPORT} -U ${PGUSER} -d ${PGDATABASE} -t -c ”
SELECT count(*) FROM pg_locks WHERE NOT granted;
” 2>/dev/null | tr -d ‘ ‘)
if [ “$lock_count” -gt 10 ]; then
log_message “WARNING: 锁等待数量过多: ${lock_count}”
else
log_message “锁等待数量: ${lock_count}”
fi
}
# 检查磁盘空间
check_disk_space() {
log_message “=== 检查磁盘空间 ===”
local disk_usage=$(df -h /postgresql | tail -1 | awk ‘{print $5}’ | tr -d ‘%’)
if [ “$disk_usage” -gt 90 ]; then
log_message “ERROR: 磁盘使用率过高: ${disk_usage}%”
elif [ “$disk_usage” -gt 80 ]; then
log_message “WARNING: 磁盘使用率较高: ${disk_usage}%”
else
log_message “磁盘使用率: ${disk_usage}%”
fi
}
# 检查错误日志
check_error_log() {
log_message “=== 检查错误日志 ===”
local error_count=$(grep -c “ERROR:” /postgresql/fgdata/pg_log/postgresql-$(date +%Y-%m-%d).log 2>/dev/null || echo 0)
if [ “$error_count” -gt 100 ]; then
log_message “WARNING: 今日错误数量过多: ${error_count}”
else
log_message “今日错误数量: ${error_count}”
fi
}
# 主函数
main() {
log_message “==========================================”
log_message “PostgreSQL错误诊断”
log_message “==========================================”
check_connection_errors
check_lock_waits
check_disk_space
check_error_log
log_message “==========================================”
log_message “诊断完成”
log_message “==========================================”
}
main
5.3 PostgreSQL错误处理清单
# 连接类错误
– [ ] 检查max_connections配置
– [ ] 检查pg_hba.conf配置
– [ ] 检查用户密码
– [ ] 检查网络连通性
– [ ] 检查连接池配置
# 权限类错误
– [ ] 检查用户权限
– [ ] 检查模式权限
– [ ] 检查表权限
– [ ] 检查列权限
– [ ] 检查函数权限
# 存储类错误
– [ ] 检查磁盘空间
– [ ] 检查表空间
– [ ] 检查WAL空间
– [ ] 检查临时文件
– [ ] 检查归档空间
# 性能类错误
– [ ] 检查内存配置
– [ ] 检查锁等待
– [ ] 检查长事务
– [ ] 检查慢查询
– [ ] 检查索引使用
# 复制类错误
– [ ] 检查主库状态
– [ ] 检查网络连通性
– [ ] 检查复制用户
– [ ] 检查WAL文件
– [ ] 检查复制槽
本文由风哥教程整理发布,仅用于学习测试使用,转载注明出处:http://www.fgedu.net.cn/10327.html
