Problem Description
In Linux x86-64 system while installing Oracle Grid Infrastructure RAC cluster running $GRID_HOME/root.sh succeeds on the 1st node but fails on the 2nd node while attempting to start 'ora.cssd'.
Following is the log file entry from the cssd log.
CRS-2674: Start of 'ora.cssd' on 'rac2' failed
CRS-2679: Attempting to clean 'ora.cssd' on 'rac2'
CRS-2681: Clean of 'ora.cssd' on 'rac2' succeeded
CRS-2673: Attempting to stop 'ora.diskmon' on 'rac2'
CRS-2677: Stop of 'ora.diskmon' on 'rac2' succeeded
CRS-4000: Command Start failed, or completed with errors.
CRS-2672: Attempting to start 'ora.cssd' on 'rac2'
CRS-2672: Attempting to start 'ora.diskmon' on 'rac2'
CRS-2674: Start of 'ora.diskmon' on 'rac2' failed
CRS-2679: Attempting to clean 'ora.diskmon' on 'rac2'
CRS-5016: Process "/u01/oracle/11.2.0/grid/bin/diskmon" spawned by agent "/u01/oracle/11.2.0/grid/bin/orarootagent.bin" for action "clean" failed: details at "(:CLSN00010:)" in "/u01/oracle/11.2.0/grid/log/rac2/agent/ohasd/orarootagent_root/orarootagent_root.log"
CRS-2681: Clean of 'ora.diskmon' on 'rac2' succeeded
CRS-2674: Start of 'ora.cssd' on 'rac2' failed
CRS-2679: Attempting to clean 'ora.cssd' on 'rac2'
CRS-2681: Clean of 'ora.cssd' on 'rac2' succeeded
CRS-4000: Command Start failed, or completed with errors.
Command return code of 1 (256) from command: /u01/oracle/11.2.0/grid/bin/crsctl start resource ora.ctssd -init -env USR_ORA_ENV=CTSS_REBOOT=TRUE
Start of resource "ora.ctssd -init -env USR_ORA_ENV=CTSS_REBOOT=TRUE" failed
Failed to start CTSS
Failed to start Oracle Clusterware stack
Cause of the Problem
The startup of the CSS daemon on RAC node 2 failed because either it could not establish a network connection to the first node or it could not synchronize time with the first node. You can be sure by reviewing the messages seen in the CSS daemon log ($GRID_HOME/log/{nodename}/cssd/ocssd.log) on the 2nd node. If your 2nd node hostname is rac2 then review at location $GRID_HOME/log/rac2/cssd/ocssd.log.
From is the part of ocssd.log entry.
2010-03-13 10:59:36.581: [ CSSD][1246480704]clssnmLocalJoinEvent: Node rac1, number 1, is in an existing cluster with disk state 3
2010-03-13 10:59:36.582: [ CSSD][1246480704]clssnmLocalJoinEvent: takeover aborted due to cluster member node found on disk
2010-03-13 10:59:36.685: [ CSSD][1162561856]clssnmvDHBValidateNCopy: node 1, rac1, has a disk HB, but no network HB, DHB has rcfg 157026738, wrtcnt, 1507, LATS 66284084, lastSeqNo 1507, uniqueness 1261524838, timestamp 1261526376/66279084
2010-03-13 10:59:37.110: [ CSSD][1215011136]clssgmWaitOnEventValue: after CmInfo State val 3, eval 1 waited 0
2010-03-13 10:59:37.513: [ CSSD][1235990848]clssnmSendingThread: sending join msg to all nodes
2010-03-13 10:59:37.513: [ CSSD][1235990848]clssnmSendingThread: sent 5 join msgs to all nodes
While the connectivity on the cluster interconnect appeared to be functioning (nodes can be pinged via their private nodenames/IP addresses), a firewall blocked traffic on certain ports thus impacting the communication between the CRS daemon processes. It may also be reason that due to time synchronization problem running script root.sh on 2nd node fails.
Solution of the Problem
1) Disable Firewall:
Disable firewalls in all nodes. In Linux log in as root and run the following commands,
$ su
# service iptables stop
# service ip6tables stop
To disable firewall permanently issue,
# chkconfig iptables off
# chkconfig ip6tables off
If you want to enable firewall then exclude all traffic on the private network from the firewall configuration.
2) Synchronize Time between the nodes:
Setup NTP server to ensure that you have time synchronized between all nodes. In the post How to setup NTP Server it is discussed how to setup NTP server. So after synchronize timing deconfigure and reconfigure your grid infrastructure installation by following http://arjudba.blogspot.com/2010/03/what-to-do-after-failure-of-oracle.html.
Related Documents
http://arjudba.blogspot.com/2010/03/cluvfy-fails-with-prvf-5436-prvf-9652.html
http://arjudba.blogspot.com/2010/03/in-11gr2-grid-rootsh-fails-with-crs.html
http://arjudba.blogspot.com/2010/03/what-to-do-after-failure-of-oracle.html
http://arjudba.blogspot.com/2009/12/enable-archive-log-mode-for-rac.html
http://arjudba.blogspot.com/2008/09/list-of-parameters-that-must-have.html
http://arjudba.blogspot.com/2008/08/oracle-rac-software-components.html
http://arjudba.blogspot.com/2008/08/oracle-clusterware-processes-on-unix.html
http://arjudba.blogspot.com/2008/08/configure-raw-devices-for-asm-in-rac.html
http://arjudba.blogspot.com/2008/08/crs-stack-fails-to-start-after-reboot.html
http://arjudba.blogspot.com/2008/08/configure-network-for-oracle-rac.html
http://arjudba.blogspot.com/2008/08/pre-installation-rac-environement-setup.html
http://arjudba.blogspot.com/2008/08/configure-server-to-install-oracle-rac.html
Sunday, March 21, 2010
Subscribe to:
Post Comments (Atom)
Tag Cloud
10.2g
10g
11g
11gR2
Abasa
About Oracle
Administration
Adsense
Alerts
Archival
ASM
ASP.Net
Audit
Audit Vault
Backup
Bangladesh
Block Corruption
Blogger
Browser
Bug
Business
Clone
Clusterware
Comments
Concepts
Connection
Controlfiles
Crime
CSS
Data Block
Data Dictionary
Data Guard
Data Mining
Data Pump
Data Type
Database Administration
Database Vault
DBConsole
Developer
Economics
EM
Excel
Exercise
Explain plan
Export
External Table
Facebook
Firefox
Firmware
Flashback
Forum
Functions
Games
Globalization Support
Grid Control
Hardware
History
HTML
IE
Import
Indexes
initializaion parameter
initialization parameter
Installation
Internals
Internet
Interview
isql*plus
Java
JavaScript
Job
Joins
Joke
Limitation
Linux
Listener
Logminer
Magento
Mail
Materialized View
Medical
Memory
Mobile
Money
Multimedia
MySQL
Net Services
Network
OCP
Operators
Oracle
Oracle Concepts
Oracle Recovery
OS
Others
OUI
Package
Packages
Parameters
Partitioning
Patchset
Performance
Perl
Pfile
Photos
PHP
PL/SQL
Profile
Pseudocolumns
Puzzle
Quiz
Quota
RAC
RAC Installation
Recovery
Recovery Problems
Redo Log
Reports
RMAN
Scripts
Security
SEO
Server Administration
SGA
Shell Script
Smarty
Social Marketing
Solaris
Spfile
SQL
SQL Tuning
SQL*Loader
Sql*Plus
Startup Problem
Streams
SwingBench
System Analysis
Tablespaces
Technology
Temp
TNS Error
Tools
Troubleshooting
Tuning
Undo
UNIX
Upgradation
Utilities
Version
Views
Vmware
Windows
Wordpress
XML
1 comment:
Hi,
I encountered the same issue.
I have already done the following prior to installation
1. Disabled firewall and SELINUX on both nodes.
2. Disable NTPD on both nodes. Assuming the Oracle Cluster Time Synchronization Service (ctssd) can synchronize the times of the RAC nodes
3. Established user equivalence for user oracle and passed prerequisite check.
My questions are
1. Do I need to established user equivalence for user root separately as the root.sh is run as user root?
2. Can I rely on Clusterware time sync rather than ntpd?
3. If i don't want to deinstall how do I fix it and proceed?
Thanks,
Sha
Post a Comment