数据库rac多实例监控,当发生异常宕机,可以邮件发送出来。
创建日志目录,存放数据库状态日志
mkdir -p /home/oracle/scripts/logs
创建配置文件目录
mkdir -p /home/oracle/scripts/conf
配置收件箱邮件地址,多个用","隔开,如下
vi /home/oracle/scripts/conf/dbadmin.txt
sj1@163.com,sj2@163.com
编写脚本
#!/bin/bash
###################################################################
# Oracle实例监控脚本
# 功能:检测多个Oracle实例状态,异常时邮件告警
###################################################################
# 配置区域
BASE_DIR=/home/oracle/scripts
LOG_DIR="${BASE_DIR}/logs"
CONF_DIR="${BASE_DIR}/conf"
RECIPIENTS=$(cat "${CONF_DIR}/dbadmin.txt" 2>/dev/null)
INSTANCE_LIST=(orcl1 his1 emr1 pacs1)
# 初始化环境
source ~/.bash_profile >/dev/null 2>&1
mkdir -p "${LOG_DIR}"
LOG_FILE="${LOG_DIR}/instance_check_$(date +%Y%m%d).log"
# 日志函数
log() {
local level=$1
local message=$2
echo "$(date '+%Y-%m-%d %H:%M:%S') [${level}] ${message}" | tee -a "${LOG_FILE}"
}
# 实例状态检测
check_instance_status() {
local instance=$1
local status
pslist="`ps -ef | grep pmon`"
# 检查PMON进程
echo "$pslist" | grep "ora_pmon_${instance}" > /dev/null 2>&1
if (( $? )); then
log "ERROR" "${instance} PMON进程不存在"
return 1
fi
# 获取实例状态
export ORACLE_SID=${instance}
status=$(sqlplus -s '/ as sysdba' <<EOF
set pagesize 0 feedback off verify off heading off echo off
select status from v\$instance;
exit
EOF
)
case "${status}" in
"OPEN") return 0;;
"MOUNTED") return 2;;
"STARTED") return 3;;
*) return 4;;
esac
}
# 主流程
for instance in "${INSTANCE_LIST[@]}"; do
check_instance_status "${instance}"
case $? in
0) log "INFO" "${instance} 状态正常(OPEN)" ;;
1)
mailx -s "CRITICAL: ${instance} PMON进程停止 (Host: $(hostname))" \
"${RECIPIENTS}" <<< "$(date) ${instance} 实例PMON进程不存在"
;;
2)
mailx -s "WARNING: ${instance} 状态异常 (Host: $(hostname))" \
"${RECIPIENTS}" <<< "$(date) ${instance} 当前状态: MOUNTED"
;;
3)
mailx -s "WARNING: ${instance} 状态异常 (Host: $(hostname))" \
"${RECIPIENTS}" <<< "$(date) ${instance} 当前状态: STARTED"
;;
4)
mailx -s "WARNING: ${instance} 状态异常 (Host: $(hostname))" \
"${RECIPIENTS}" <<< "$(date) ${instance} 当前状态: 请DBA确认"
;;
esac
done
# 日志轮转(保留7天)
find "${LOG_DIR}" -name "instance_check_*.log" -mtime +7 -delete
数据库未启动
SQL> shutdown immediate
Database closed.
Database dismounted.
ORACLE instance shut down.
检查状态,同时邮件也会收到信息:
[oracle@rac1 scripts]$ sh check_instance.sh
2025-07-18 09:09:55 [ERROR] orcl1 PMON进程不存在
数据库nomount启动
SQL> startup nomount
ORACLE instance started.
Total System Global Area 1258290752 bytes
Fixed Size 8896064 bytes
Variable Size 905969664 bytes
Database Buffers 335544320 bytes
Redo Buffers 7880704 bytes
检查[oracle@rac1 scripts]$ sh check_instance.sh时,邮件会收到告警
Fri Jul 18 09:12:38 CST 2025 orcl1 当前状态: STARTED
数据库mount状态
SQL> alter database mount;
Database altered.
检查[oracle@rac1 scripts]$ sh check_instance.sh时,邮件会收到告警
正常状态,不发邮件,会记录到日志文件:
SQL> alter database open;
Database altered.
[oracle@rac1 scripts]$ sh check_instance.sh
2025-07-18 09:19:04 [INFO] orcl1 状态正常(OPEN)
日志记录会把实例宕机和正常状态记录:
[oracle@rac1 scripts]$ cat logs/instance_check_20250718.log
2025-07-18 09:09:55 [ERROR] orcl1 PMON进程不存在
2025-07-18 09:19:04 [INFO] orcl1 状态正常(OPEN)
我们可以把检查脚本配置到crontab中,每分钟检查一次,当发生异常时,可以及时收到邮件。