通过 Oracle ORA-00600 [krccfl_chunk] 问题处理来看如何快速恢复业务
最近3天遇到3次故障,记录其中一个先。
一大早就连续收到两条短信告警,一条是节点1数据库状态不为不可用,另一条是节点2数据库状态变为不可用,嗯,就是rac 俩节点都不可用了。快速登录服务器,发现主机可以登录,只是库挂了。
登录服务器查看日志如下:
2025-02-25T08:39:18.568741+08:00
Errors in file /oracle/app/oracle/diag/rdbms/liups/liups2/trace/liups2_ctwr_18172.trc (incident=1528836):
ORA-00600: internal error code, arguments: [krccfl_chunk], [0x7FFFF2998760], [236160], [], [], [], [], [], [], [], [], []
Incident details in: /oracle/app/oracle/diag/rdbms/liups/liups2/incident/incdir_1528836/liups2_ctwr_18172_i1528836.trc
2025-02-25T08:39:20.430291+08:00
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
2025-02-25T08:39:20.430465+08:00
Block change tracking service error 600.
2025-02-25T08:39:20.560265+08:00
Dumping diagnostic data in directory=[cdmp_20250225083920], requested by (instance=2, osid=18172 (CTWR)), summary=[incident=1528836].
2025-02-25T08:40:21.569711+08:00
Errors in file /oracle/app/oracle/diag/rdbms/liups/liups2/trace/liups2_ctwr_18172.trc:
ORA-00600: internal error code, arguments: [krccfl_chunk], [0x7FFFF2998760], [236160], [], [], [], [], [], [], [], [], []
Errors in file /oracle/app/oracle/diag/rdbms/liups/liups2/trace/liups2_ctwr_18172.trc (incident=1528837):
ORA-487 [] [] [] [] [] [] [] [] [] [] [] []
Incident details in: /oracle/app/oracle/diag/rdbms/liups/liups2/incident/incdir_1528837/liups2_ctwr_18172_i1528837.trc
2025-02-25T08:40:23.665753+08:00
USER (ospid: 18172): terminating the instance due to ORA error 487
2025-02-25T08:40:23.682724+08:00
opiodr aborting process unknown ospid (2942) as a result of ORA-1092
2025-02-25T08:40:23.682966+08:00
opiodr aborting process unknown ospid (121968) as a result of ORA-1092
2025-02-25T08:40:23.683246+08:00
opiodr aborting process unknown ospid (21423) as a result of ORA-1092
2025-02-25T08:40:23.683459+08:00
可以看到是 ora 600的错误,这个通常是要查询mos,按照常规经验就是登录mos:https://support.oracle.com/signin
输入 ora 600. 找到 ORA-600/ORA-7445/ORA-700 Error Look-up Tool 打开,输入 Error Code First Argument,并选择数据库的版本,然后点击 Look-up an error
,可能会弹出一篇或者几篇文章,也可能找不到相关的文章,比如
An Error document for ORA-600 [krccfl_chunk] is not registered with the tool.
An Error document for ORA-600 [krccfl_chunk] is not registered with the tool.
Your request for information on this error has been recorded and will be used for publishing prioritization.
Things to try:
Check the error message and confirm that this is an ORA-600 error and not an ORA-700 or ORA-7445 error.
Use ‘Do a general Search for Knowledge’ to begin a search for any published documents and bugs that mention the error.
然后我们再点击close,再点击 do a general search for knowkedge
,也就是整个知识库搜索,然后根据结果进行匹配查找。
一通操作之后 几分钟过去了,在关键数据库宕机面前,争分夺秒恢复业务是多么的重要。这里呢,我也是一顿mos搜索,但是事后我去google上搜索 ora 600 krccfl_chunk
第一篇文章就是:Database doesn’t open after crash ORA-00600 [krccfl_chunk] (Doc ID 2967548.1),点击登录之后登录mos账号,就直接可以看到解决方案了。写的很明确跟 bct有关,然后仔细看alert日志,上面写的也很清楚:Block change tracking service error 600.
trc的进程也是 CTWR,基本上完美匹配了,如果不放心可以在看下 call stack。
CTWR crashed with ORA-600 while starting DB due to BCT Enabled.
Stack Trace:
Error Stack: ORA-600[krccfl_chunk]
Main Stack:
1 | krccfl <- krcchg <- krcccr <- kcratr_apply <- kcratr <- kctrec <- kcvirv <- ktmmon |
The Issue caused by unpulished bugs which fixed in 19.17 and above.
解决方案也有3个:
Option 1:
Disable Block Change Tracking and Open Database.
1 | SQL> alter database disable block change tracking; |
Option 2:
If BCT Needs to be enabled for Backup performance , tune BCT Parameter as below and Enable BCT.
1 | alter system set "_bct_public_dba_buffer_size"=200000000 scope=both sid='*'; |
Once Re-enabled take new Level 0 backup and followed by Level 1.
Option 3:
Upgrade to 19.17 and above if affected version.
这里通过 alter database disable block change tracking;
禁用 bct解决。
现在 mos 的seo 足够智能,通过Look-up an error
竟然找不到的内容,可以通过google引擎直接搜索到。同时目前在ai的加持下,老DBA的经验感觉越来越不吃香了,学会搜索,学会 ai ,可能处理起问题会更快速。且干且珍重。
原文作者: liups.com
原文链接: http://liups.com/posts/b2c46938/
许可协议: 知识共享署名-非商业性使用 4.0 国际许可协议