Saturday, June 26, 2010

Dynamic change of CPU_COUNT causes ORA-600 [Ksrpubwait:Deadlock] ORA-2097 Instance crash

Problem Description
Dynamic change of CPU_COUNT parameter causes oracle bug ORA-600 [ksrpubwait:deadlock]. For example, after issuing command "alter system set cpu_count=5;" it fails with following errors.

ORA-600 [ksrpubwait:deadlock], [parameters to cluster db instances - broadcast channel]
ORA-2097: parameter cannot be modified because specified value is invalid

And then instance crashes.

If you look at stack trace you will see,
ksrpubwait ksrpublish kspbcast kspset0 kxfpCPUCountUpdate ksbcpcb

Cause of the Problem
This errors are caused by Oracle Bug 7535429

Solution of the Problem
Solution 01: Bug 7535429 is fixed in version 11.2. So upgrade your oracle database to 11gR2 will solve the problem.
There is one-off patch which also help you to solve the problem. Check for the availability of one-off Patch 7535429 for your platform on MetaLink.

Solution 02: Explicitely setting CPU_COUNT to new value.

Solution 03: If you start the instance from pfile instead of using spfile then it will also solve the problem.

Solution 04: Set explicitly PARALLEL_MAX_SERVERS to the current value instead of relying default value.

While startup, instance terminated after ORA-600 [6006] ORA-600 [6856]

Problem Description
While starting up the instance it fails with following errors:

ORA-00600: internal error code, arguments: [6006], [1], [], [], [], [], [], []
ORA-00600: internal error code, arguments: [6856], [0], [60], [], [], []

and then SMON abnormally crashes the database instance.

Cause of the Problem
These are oracle bugs and these ORA-600 [6006] and ORA-600 [6856] bug occurred whenever undo segments are trying to rollback a failed transaction and cannot.

For error "ORA-00600: internal error code, arguments: [6006], [1], [], [], [], [], [], []" ,
Oracle is undoing an index leaf key operation. If the key is not found, ORA-00600 [6006] is logged.

For error "ORA-00600: internal error code, arguments: [6856], [0], [60], [], [], []" , SMON is trying to recover a dead transaction. But the undo application runs into an internal error (trying to delete a row that is already deleted).

Solution of the Problem

1) Review the trace files and look for the object(s) involved. If the trace file does not have a SQL statement, search on the following: "block dump header"

2) In the block header there will be a seg/obj = hex value. Convert the hex to dec and this will give you the data_object_id.

3) The alert.log may also show the affected object, for example:

ORACLE Instance ORCL (pid = 8) - Error 600 encountered while recovering transaction (9999, 36) on object 45879.

SQL>select owner, object_name from dba_objects where data_object_id = object#;

This will be the object you need to work with.


To implement the solution:

1. shutdown the oracle instance.
SQL> shut immediate

2. set event - event="10513 trace name context forever, level 2" (this event and setting disables transaction recovery which was initiated by SMON). Open the pfile and add the following line inside pfile.

event="10513 trace name context forever, level 2"

3. startup instance
SQL> startup

4.
- If the object is an index - drop and recreate.
- If the object is a table - drop / export / Create Table as Select (CTAS) to change the object ID

5. shutdown the instance
SQL> shut immediate

6. remove the event
Open the pfile and remove the event line that was added.

7. startup the instance
SQL> startup

8. recreate the affected object(s)

ORA-600 [17147] ORA-600 [Kghlkrem1] ORA-7445 kghlkremf() database crash

During normal database activity, in alert log the following ORA-00600 error occurred and it causes database crashed.
ORA-00600: internal error code, arguments: [17147], [0xC0000003E4CE10A5], [], [], [], [], [], [] 
ORA-00600: internal error code, arguments: [KGHLKREM1], [0xC0000003E4CE10B0], [], [], [], [] 
ORA-7445: exception encountered: core dump [kghlkremf()+33] [SIGSEGV] 
If you look at the trace file you no longer will see any sql that causes this bug.

The call stack is as follows:

kghfrmrg kghfre qsmqktcc ktcdso ktcrcm ktdcmt k2lcom k2send xctctl xctcom_with_options opicom

or

k2send xctctl xctcom_with_options opicom opiodr

or

kghalo ktcccadd kwqidracbk kwqidcpmc kwqidafm0 kwqididqx kpoaqdq

or

kghadd_reserved_extent kghget_reserved_extent kghgex kghfnd

Cause of the Problem
The problem occurred due to oracle bug. The bug involves abnormal termination of queries involving bloom filters. This manifests itself as memory corruptions. The memory was freed, reallocated to some other client, and corrupted but slaves were still writing into it.

Solution of the Problem
Solution 01:
This bug is fixed in oracle database version 10.2.0.4. So apply 10.2.0.4 patchset to solve this problem.

Solution 02:
Workaround, you can set the oracle hidden parameter _bloom_filter_enabled to false.
This parameter can be set dynamically:

If you are using spfile then just set as,
SQL> connect / as sysdba
SQL> alter system set "_bloom_filter_enabled"=false scope=both;

If you use pfile to startup your database remove the scope parameter that is use,
SQL> alter system set "_bloom_filter_enabled"=false;

ORA-00600 [730] [SPACE LEAK] OR ORA-600 [LIBRARYCACHENOTEMPTYONCLOSE] during shutdown

Problem Description
While shutting down oracle database the following error occurred.

In oracle 10g and 11g,
ORA-00600: internal error code, arguments: [730], [4100736], [space leak], [], [], [], [], [], [], [], [], []

If you observe stack trace it will look like,
ksesic2 <- ksmshu <- opistp_real <- opistp <- opiodr <- ttcpip <- opitsk <- opiino <- opiodr <- opidrv <- sou2o <- main <- start

 In oracle 10g the error looks like,
ORA-00600: internal error code, arguments: [LibraryCacheNotEmptyOnClose], [], [] ,[], [], [], [], []

 If you observe stack trace it will look like, kglshu <- kqlnfy <- kscnfy <- ksmshu <- opistp_real <- opistp <- opiodr <- ttcpip <- opitsk <- opiino <- opiodr <- opidrv <- sou2o <- opimai_real


Cause of the Problem
The error occurred due to oracle bug. Oracle named this bug as BUG 7572335. This happens during database shutdown when child cursors had been previously marked as kept in the shared pool. Note that this is a non-corruptive error.

Solution of the Problem
Solution 01: This bug is solved in Oracle database 11gR2. So upgrade to Oracle database 11.2 or higher is one solution.

Solution 02: If you are using Oracle database 10gR2 then up to patchset version 10.2.0.4 this bug remained and it is solved in version 10.2.0.5. You can upgrade to oracle version 10.2.0.5 by applying the Patchset, Patch 8202632.

Solution 03: If available for your platform and version, you can apply one off patch 7572335. Download and apply the fix, Patch 7572335

Friday, June 25, 2010

In RAC flashback transaction query fails with ORA-600[kcbgtcr_13]

Problem Description
In Oracle database 11gR1 RAC environment flashback transaction queries fails with Oracle bug ORA-600[kcbgtcr_13].
For example,

SQL> SELECT UNDO_SQL FROM FLASHBACK_TRANSACTION_QUERY WHERE XID = '08001C00ACD20901';
ORA-00600: internal error code, arguments: [kcbgtcr_13], [], [], [], [], [], [], [], [], [], [], []

If you investigate into trace file you will see error stack as,

kcbgtcr <- ktuq_get_urec <- ktuqup_get_startslot <- ktuqup_init <- ktuqqp_fetch_next_rec <- ktuqtpgv <- krvxInvokeCallbacks <- krvxrfro_FlashReadOne <- krvxrporr_ProcessOneRedoRecord <- krvxread <- krvxgtsp_GetTxnSingleProcess <- krvxgt <- ktuqfcbk <-qerfxFetch <- opifch2 <- kpoal8 <- opiodr <- ttcpip <- opitsk <- opiino <- opiodr <- opidrv <- sou2o <- opimai_real <- main


Cause of the Problem
This is oracle bug and oracle named this bug as bug 7484261.

Solution of the Problem
This issue is fixed in Oracle database 11gR2. So upgrade your database to 11.2g will solve the problem.

If you are still using 11gR1 then download and apply Patch 7484261 from Metalink. Based on your platform and oracle version download from link https://updates.oracle.com/ARULink/PatchDetails/process_form?patch_num=7484261

ORA-00600: internal error code, arguments: [4819]

Problem Description
While reviewing database alert log files it was report an ORA-600 [4819] error followed by ORA-00308 and ORA-27037 errors like below.

ORA-00600: internal error code, arguments: [4819], [], [], [], [], [], [], []
ORA-00308: cannot open archived log '/home/oracle/archive/arc_ARJU_001.arc'
ORA-27037: unable to obtain file status

This problem started to happen because recently database has been switched into NOARCHIVELOG mode from ARCHIVELOG mode.

Cause of the Problem
The problem arises because the LOG_ARCHIVE_DEST_n parameter is still set to point to an archive log destination and the default value for the LOG_ARCHIVE_DEST_STATE_n parameter is 'ENABLE'. Based on these two settings database still attempts to open the required archive log file during transaction recovery.

Solution of the Problem
To resolve this problem, mark the destination as deferred and null out the archive destination, That is
1) Login as sysdba.
SQL> connect / as sysdba

2) Set the archive_dest_state_1 to defer.
SQL> alter system set log_archive_dest_state_1 = defer scope=both;

3) Set log_archive_dest_1 to null.
SQL> alter system set log_archive_dest_1 = '' scope=both;

You are done!