关于Oracle数据库latch: cache buffers chains等待事件
关于Oracle数据库latch: cache buffers chains等待事件
latch: cache buffers chains等待事件的原理[color=purple]当一个数据块读入到sga中时,该块的块头(buffer header)会放置在一个hash bucket的链表(hash chain)中。该内存结构由一系列cache buffers chains子latch保护(又名hash latch或者cbc latch)。对Buffer cache中的块,要select或者update、insert,delete等,都得先获得cache buffers chains子latch,以保证对chain的排他访问。若在过程中发生争用,就会等待latch:cache buffers chains事件。
latch: cache buffers chains等待事件产生原因我们先看看Oracle官方对latch:cache buffers chains等待事件的说明:latch: cache buffers chains:"latch: cache buffers chains" contention is typically encountered because SQL statements read more buffers than they need to and multiple sessions are waiting to read the same block.If you have high contention, you need to look at the statements that perform the most buffer gets and then look at their access paths to determine whether these are performing as efficiently as you would like.Typical solutions are:-•Look for SQL that accesses the blocks in question and determine if the repeated reads are necessary. This may be within a single session or across multiple sessions.•Check for suboptimal SQL (this is the most common cause of the events) - look at the execution plan for the SQL being run and try to reduce the gets per executions which will minimize the number of blocks being accessed and therefore reduce the chances of multiple sessions contending for the same block.
[color=darkred]1. 低效率的SQL语句(主要体现在逻辑读过高) 在某些环境中,应用程序打开执行相同的低效率SQL语句的多个并发会话,这些SQL语句都设法得到相同的数据集,每次执行都带有高 BUFFER_GETS(逻辑读取)的SQL语句是主要的原因。相反,较小的逻辑读意味着较少的latch get操作,从而减少锁存器争用并改善性能。注意v$sql中BUFFER_GETS/EXECUTIONS大的语句。
[color=darkred]2.Hot block热点块当多个会话重复访问一个或多个由同一个子cache buffers chains锁存器保护的块时,热块就会产生。当多个会话争用cache buffers chains子锁存器时,就会出现这个等待事件。有时就算调优了SQL,但多个会话同时执行此SQL,那怕只是扫描特定少数块,也是也会出现HOT BLOCK的。如果是存在热点块,那么介绍两种找出热点块的方法。
找出热点块方法一:--首先找出p1raw [mw_shl_code=sql,true]select p1, p1raw
from v$session_wait
where event = 'latch: cache buffers chains';[/mw_shl_code]
--再根据p1raw找到对象 [mw_shl_code=sql,true]SELECT /*+ RULE */
E.OWNER || '.' || E.SEGMENT_NAME SEGMENT_NAME,
E.PARTITION_NAME,
E.EXTENT_ID EXTENT#,
X.DBABLK - E.BLOCK_ID + 1 BLOCK#,
X.TCH,
L.CHILD#
FROM SYS.V$LATCH_CHILDREN L, SYS.X$BH X, SYS.DBA_EXTENTS E
WHERE X.HLADDR = '00000003576EE324' --p1raw
AND E.FILE_ID = X.FILE#
AND X.HLADDR = L.ADDR
AND X.DBABLK BETWEEN E.BLOCK_ID AND E.BLOCK_ID + E.BLOCKS - 1
ORDER BY X.TCH DESC;
[/mw_shl_code]
找出热点块方法二:--直接找出热点块[mw_shl_code=sql,true] SELECT OBJECT_NAME, SUBOBJECT_NAME
FROM DBA_OBJECTS
WHERE DATA_OBJECT_ID IN
(SELECT DATA_OBJECT_ID
FROM (SELECT OBJ DATA_OBJECT_ID, FILE#, DBABLK, CLASS, STATE, TCH
FROM X$BH
WHERE HLADDR IN (SELECT ADDR
FROM (SELECT ADDR
FROM V$LATCH_CHILDREN
ORDER BY (GETS + MISSES + SLEEPS) DESC)
WHERE ROWNUM < 10)
ORDER BY TCH DESC)
WHERE ROWNUM < 10);
[/mw_shl_code]
分析latch: cache buffers chains,查找逻辑读较多的SQL进行分析:
Problem: Database is slow and 'latch: cache buffers chains' is high in the waits in AWR.Start with Top 5 Waits:[color=gray]Top 5 Timed Events Avg %Total[color=gray]~~~~~~~~~~~~~~~~~~ wait Call[color=gray]Event Waits Time (s) (ms) Time Wait Class[color=gray]------------------------------ ------------ ----------- ------ ------ ----------[color=gray]latch: cache buffers chains 74,642 35,421 475 6.1 Concurrenc[color=gray]CPU time 11,422 2.0[color=gray]log file sync 34,890 1,748 50 0.3 Commit[color=gray]latch free 2,279 774 340 0.1 Other[color=gray]db file parallel write 18,818 768 41 0.1 System I/O-------------------------------------------------------------High cache buffers chains latch indicates that there is likely to be something reading a lot of buffers. Typically the SQL with the most gets is likely to be that which is contending:[color=gray]SQL ordered by Gets DB/Inst: Snaps: 1-2[color=gray]-> Resources reported for PL/SQL code includes the resources used by all SQL[color=gray]statements called by the code.[color=gray]-> Total Buffer Gets: 265,126,882[color=gray]-> Captured SQL account for 99.8% of Total[color=gray] Gets CPU Elapsed[color=gray]Buffer Gets Executions per Exec %Total Time (s) Time (s) SQL Id[color=gray]-------------- ------------ ------------ ------ -------- --------- -------------[color=gray] 256,763,367 19,052 13,477.0 96.8 ######## ######### a9nchgksux6x2[color=gray]Module: JDBC Thin Client[color=gray]SELECT * FROM SALES ....[color=gray] 1,974,516 987,056 2.0 0.7 80.31 110.94 ct6xwvwg3w0bv[color=gray]SELECT COUNT(*) FROM ORDERS ....
The Query with SQL_ID a9nchgksux6x2 is reading 100x more buffers than the 2nd most 'hungry' statement and CPU and Elapsed are off the 'scale' of the report. This is a prime candidate for the cause of the CBC latch issues.You can also link this information to the Top Segments by Logical Reads:
[color=gray]Segments by Logical Reads [color=gray]-> Total Logical Reads: 265,126,882[color=gray]-> Captured Segments account for 98.5% of Total[color=gray] Tablespace Subobject Obj. Logical[color=gray]Owner Name Object Name Name Type Reads %Total[color=gray]---------- ---------- -------------------- ---------- ----- ------------ -------[color=gray]DMSUSER USERS SALES TABLE 212,206,208 80.04[color=gray]DMSUSER USERS SALES_PK INDEX 44,369,264 16.74[color=gray]DMSUSER USERS SYS_C0012345 INDEX 1,982,592 .75[color=gray]DMSUSER USERS ORDERS_PK INDEX 842,304 .32[color=gray]DMSUSER USERS INVOICES TABLE 147,488 .06[color=gray] -------------------------------------------------------------
The top object read is SALES and the top SQL is a select from SALES which appears to correlate towards this being a potential problem select.This SQL should be investigated to see if the Gets per Exec or the Executions figure per hour has changed in any way (comparison to previous reports would show this) and if so the reasons for that change investigated and resolved.In this case the statement is reading > 10,000 buffers per execution and executing > 15,000 timesso both of these may need to be adjusted to get better performance.Note: This is a simple example where there is a high likelihood that the 'biggest' query is the culprit but it is not always the 'Top' SQL that causes the problem. For example, contention may occur on a statement with a smaller total if it is only executed a small number of times so that it may not appear as the top sql. It may still make millions of buffer gets, but will appear lower in the list because other sqls are performing many times, just not contending.So, if the first SQL is not the culprit then look at the others.