oracle 11gR2 crs 其中一个节点grid集群启动不成功处理案例

教程发布:风哥 教程分类:ITPUX技术网 更新日期:2022-02-12 浏览学习:305

oracle 11gR2 crs 其中一个节点grid集群启动不成功处理案例

环境:

oracle 11.2.0.2 rac+aix 6.1

问题描述:

其中一个节点grid集群启动不成功

分析过程:

为排查这个问题,我们对操作系统日志,集群配置环境
如/etc/hosts 主机配置信息,磁盘组权限和属性,ssh测试,/etc/inittab等均做过检查。
详细分析过程如下:

1.1 操作系统日志 errpt 无报错

1.2 /etc/hosts 记录 发现私网相关Ip已经注释掉

1.3 检查asm磁盘组属性和权限
Ls -ltr /dev/rhdiskpower* Lsattr -El hdiskpower* 都正常
1.4 检查2个节点/etc/inittab db18:
init:2:initdefault:
brc::sysinit:/sbin/rc.boot 3 >/dev/console 2>&1 # Phase 3 of system boot
powerfail::powerfail:/etc/rc.powerfail 2>&1 | alog -tboot > /dev/console # Power Failure Detection
powermig:2:wait:/etc/rc.powermig transition >/dev/null 2>&1 # powermig startup
mkatmpvc:2:once:/usr/sbin/mkatmpvc >/dev/console 2>&1
atmsvcd:2:once:/usr/sbin/atmsvcd >/dev/console 2>&1
tunables:23456789:wait:/usr/sbin/tunrestore -R > /dev/console 2>&1 # Set tunables
securityboot:2:bootwait:/etc/rc.security.boot > /dev/console 2>&1
rc:23456789:wait:/etc/rc 2>&1 | alog -tboot > /dev/console # Multi-User checks
powermig2:2:wait:/etc/rc.powermig recover >/dev/null 2>&1 # powermig recover
powermt:2:wait:/usr/sbin/powermt load >/dev/null 2>&1 # powermt load
fbcheck:23456789:wait:/usr/sbin/fbcheck 2>&1 | alog -tboot > /dev/console # run /etc/firstboot
srcmstr:23456789:respawn:/usr/sbin/srcmstr # System Resource Controller
platform_agent:2:once:/usr/bin/startsrc -s platform_agent >/dev/null 2>&1
rctcpip:23456789:wait:/etc/rc.tcpip > /dev/console 2>&1 # Start TCP/IP daemons
rcemcp_mond:2:wait:/etc/rc.emcp_mond start > /dev/console 2>&1
sniinst:2:wait:/var/adm/sni/sniprei > /dev/console 2>&1
rcnfs:23456789:wait:/etc/rc.nfs > /dev/console 2>&1 # Start NFS Daemons
rcnsr:2:wait:sh /etc/rc.nsr
cron:23456789:respawn:/usr/sbin/cron
piobe:2:wait:/usr/lib/lpd/pioinit_cp >/dev/null 2>&1 # pb cleanup
qdaemon:23456789:wait:/usr/bin/startsrc -sqdaemon
writesrv:23456789:wait:/usr/bin/startsrc -swritesrv
uprintfd:23456789:respawn:/usr/sbin/uprintfd
shdaemon:2:off:/usr/sbin/shdaemon >/dev/console 2>&1 # High availability daemon
l2:2:wait:/etc/rc.d/rc 2
l3:3:wait:/etc/rc.d/rc 3
l4:4:wait:/etc/rc.d/rc 4
l5:5:wait:/etc/rc.d/rc 5
l6:6:wait:/etc/rc.d/rc 6
l7:7:wait:/etc/rc.d/rc 7
l8:8:wait:/etc/rc.d/rc 8
l9:9:wait:/etc/rc.d/rc 9
naudio2::boot:/usr/sbin/naudio2 > /dev/null
naudio::boot:/usr/sbin/naudio > /dev/null
ntbl_reset:2:once:/usr/bin/ntbl_reset_datafiles
rcml:2:once:/usr/ml/aix61/rc.ml > /dev/console 2>&1
rcwpars:2:once:/etc/rc.wpars > /dev/console 2>&1 # Corrals autostart
logsymp:2:once:/usr/lib/ras/logsymptom # for system dumps
perfstat:2:once:/usr/lib/perf/libperfstat_updt_dictionary >/dev/console 2>&1
diagd:2:once:/usr/lpp/diagnostics/bin/diagd >/dev/console 2>&1
artex:2:wait:/usr/sbin/artexset -c -R /etc/security/artex/config/master_profile.xml > /dev/console 2>&1
cimservices:2:once:/usr/bin/startsrc -s cimsys >/dev/null 2>&1
pconsole:2:once:/usr/bin/startsrc -s pconsole > /dev/null 2>&1
xmdaily:2:once:/usr/bin/topasrec -L -s 300 -R 1 -r 6 -o /etc/perf/daily/ -ypersistent=1 2>&1 >/dev/null #Start local b
inary recording
ctrmc:2:once:/usr/bin/startsrc -s ctrmc > /dev/console 2>&1
ha_star:h2:once:/etc/rc.ha_star >/dev/console 2>&1
rcnetwlm:23456789:wait:/etc/rc.netwlm start> /dev/console 2>&1 # Start netwlm
dt:2:wait:/etc/rc.dt
orapw:2:wait:/etc/loadext -L /etc

h1:2:respawn:/etc/init.ohasd run >/dev/null 2>&1 /dev/console 2>&1
cons:0123456789:respawn:/usr/sbin/getty /dev/console
xntpd00:2:boot:/usr/bin/startsrc -s xntpd

db27 节点2:
init:2:initdefault:
brc::sysinit:/sbin/rc.boot 3 >/dev/console 2>&1 # Phase 3 of system boot
powerfail::powerfail:/etc/rc.powerfail 2>&1 | alog -tboot > /dev/console # Power Failure Detection
powermig:2:wait:/etc/rc.powermig transition >/dev/null 2>&1 # powermig startup
mkatmpvc:2:once:/usr/sbin/mkatmpvc >/dev/console 2>&1
atmsvcd:2:once:/usr/sbin/atmsvcd >/dev/console 2>&1
tunables:23456789:wait:/usr/sbin/tunrestore -R > /dev/console 2>&1 # Set tunables
securityboot:2:bootwait:/etc/rc.security.boot > /dev/console 2>&1
rc:23456789:wait:/etc/rc 2>&1 | alog -tboot > /dev/console # Multi-User checks
powermig2:2:wait:/etc/rc.powermig recover >/dev/null 2>&1 # powermig recover
powermt:2:wait:/usr/sbin/powermt load >/dev/null 2>&1 # powermt load
fbcheck:23456789:wait:/usr/sbin/fbcheck 2>&1 | alog -tboot > /dev/console # run /etc/firstboot
srcmstr:23456789:respawn:/usr/sbin/srcmstr # System Resource Controller
platform_agent:2:once:/usr/bin/startsrc -s platform_agent >/dev/null 2>&1
rctcpip:23456789:wait:/etc/rc.tcpip > /dev/console 2>&1 # Start TCP/IP daemons
rcemcp_mond:2:wait:/etc/rc.emcp_mond start > /dev/console 2>&1
sniinst:2:wait:/var/adm/sni/sniprei > /dev/console 2>&1
rcnfs:23456789:wait:/etc/rc.nfs > /dev/console 2>&1 # Start NFS Daemons
cron:23456789:respawn:/usr/sbin/cron
piobe:2:wait:/usr/lib/lpd/pioinit_cp >/dev/null 2>&1 # pb cleanup
install_assist:2:wait:/usr/sbin/install_assist /dev/console 2>&1
qdaemon:23456789:wait:/usr/bin/startsrc -sqdaemon
writesrv:23456789:wait:/usr/bin/startsrc -swritesrv
uprintfd:23456789:respawn:/usr/sbin/uprintfd
shdaemon:2:off:/usr/sbin/shdaemon >/dev/console 2>&1 # High availability daemon
l2:2:wait:/etc/rc.d/rc 2
l3:3:wait:/etc/rc.d/rc 3
l4:4:wait:/etc/rc.d/rc 4
l5:5:wait:/etc/rc.d/rc 5
l6:6:wait:/etc/rc.d/rc 6
l7:7:wait:/etc/rc.d/rc 7
l8:8:wait:/etc/rc.d/rc 8
l9:9:wait:/etc/rc.d/rc 9
naudio2::boot:/usr/sbin/naudio2 > /dev/null
naudio::boot:/usr/sbin/naudio > /dev/null
ntbl_reset:2:once:/usr/bin/ntbl_reset_datafiles
rcml:2:once:/usr/ml/aix61/rc.ml > /dev/console 2>&1
rcwpars:2:once:/etc/rc.wpars > /dev/console 2>&1 # Corrals autostart
logsymp:2:once:/usr/lib/ras/logsymptom # for system dumps
perfstat:2:once:/usr/lib/perf/libperfstat_updt_dictionary >/dev/console 2>&1
diagd:2:once:/usr/lpp/diagnostics/bin/diagd >/dev/console 2>&1
artex:2:wait:/usr/sbin/artexset -c -R /etc/security/artex/config/master_profile.xml > /dev/console 2>&1
cimservices:2:once:/usr/bin/startsrc -s cimsys >/dev/null 2>&1
pconsole:2:once:/usr/bin/startsrc -s pconsole > /dev/null 2>&1
xmdaily:2:once:/usr/bin/topasrec -L -s 300 -R 1 -r 6 -o /etc/perf/daily/ -ypersistent=1 2>&1 >/dev/null #Start local b
inary recording
ctrmc:2:once:/usr/bin/startsrc -s ctrmc > /dev/console 2>&1
ha_star:h2:once:/etc/rc.ha_star >/dev/console 2>&1
dt:2:wait:/etc/rc.dt
rcemcpower:2:wait:/etc/rc.emcpower set_ipldevice > /dev/console 2>&1
cons:0123456789:respawn:/usr/sbin/getty /dev/console
xntpd00:2:boot:/usr/bin/startsrc -s xntpd
sshdstart:2:boot:/usr/bin/startsrc -s sshd

#h1:35:respawn:/etc/init.ohasd run >/dev/null 2>&1 /dev/null 2>&1 121C\0\0".., 512) = 512
0.0002: lseek(3, 153600, 0) = 153600
kread(3, " 0\0\0\0\0\0\0\0\0\0".., 512) = 512
0.0001: lseek(3, 154112, 0) = 154112
kread(3, " 0\0\0\0\0\0\0\0\0\0".., 512) = 512
0.0001: lseek(3, 154624, 0) = 154624
kread(3, "\0\0\0\0\0\0\0\b\0\0\0\0".., 512) = 512
0.0001: close(3) = 0
0.0002: kopen("/grid/product/11.2.0/crs/mesg/crsus.msb", O_RDONLY) = 3
0.0002: kfcntl(3, F_SETFD, 0x0000000000000001) = 0
0.0001: lseek(3, 0, 0) = 0
kread(3, "1513 "011303\t\t\0\0\0\0".., 256) = 256
0.0001: lseek(3, 512, 0) = 512
kread(3, "1F D '94\0\0\0\0\0\0\0\0".., 512) = 512
0.0001: lseek(3, 1024, 0) = 1024
kread(3, "\096\0 , 512) = 512
0.0002: lseek(3, 107008, 0) = 107008
kread(3, "\0\t121B\001\0 >121C\0\0".., 512) = 512
0.0001: lseek(3, 153600, 0) = 153600

2.2 尝试手工启动grid集群 cd
nohup ./init.ohasd run &

--可以正常启动 grid集群
但用正常启动crs命令 crsctl start crs 还是启动不成功。

2.3 根据/tmp/.oracle/sOHASD_UI_SOCKET,查找metalink 查到以下信息:
Seems the AIX post-installation was not complete on this box . Which resulted below leftover entries in /etc/inittab # grep /etc/inittab 多了一行与install有关的东西 在节点2 /etc/inittab 确实找到这行,通过去掉这行。 Crsctl start crs 启动成功。 原因是: Seems the AIX post-installation was not complete on this box . Which resulted below leftover entries in /etc/inittab

看来是安装aix操作系统图形化界面没完成,中途中断的原因造成的。
呵呵,一个很大的坑,差点陷进去了。

本文标签:
网站声明:本文由风哥整理发布,转载请保留此段声明,本站所有内容将不对其使用后果做任何承诺,请读者谨慎使用!
【上一篇】
【下一篇】