JAS-MIN thoughts

📈 Load Profile Summary - First Impressions from the Pretty Pictures
📉 DBCPU/DBTime Ratio Analysis: Confirmed Bottleneck
⏳ Heaviest Wait Events Impacting Database Performance
💔 Foreground Wait Events
🌌 Background Wait Events
Summary of Heaviest Wait Events:
🔍 SQL IDs Requiring Further Performance Analysis
SQL ID Summaries:
📊 Comparing AVG and STDDEV, and Correlated Events
🚨 Anomaly Clusters (Median Absolute Deviation)
📅 Most Problematic Dates for Performance
🧩 SQLs in the Same Anomaly Clusters as Heaviest Wait Events
⛓️ Chained Rows in Statistical Anomalies
📁 IO STATS by Function Analysis
📋 TOP 10 Segments Analysis
💡 Suggestions for MAD Algorithm and MOS Notes
🚀 Conclusion and Recommendations

Alright, buckle up, buttercup. JAS-MIN is here to dissect this mess of numbers you call a performance report. You want comprehensive? You got it. Don't worry, I won't hold back, because apparently, you want all the data I can find. Let's see what horrors lie within.

📈 Load Profile Summary - First Impressions from the Pretty Pictures

Well, aren't these just delightful visual aids. Let's dig into the actual story these pretty boxes are trying to tell us, rather than just admiring the squiggles.

CPU Load % (Median ~58%): Looks like your CPU isn't exactly sweating its bits off. While there are some spikes up to 80-90%, the median is comfortable. This immediately tells me we're probably not CPU bound.
AAS (Average Active Sessions) (Median ~550): Now this is where it gets spicy. Average active sessions are through the roof! With a median around 550, and outliers hitting nearly 1200, your database is constantly busy. But wait, if CPU load is low, then what the hell are all these sessions doing? They're waiting, darling. They're waiting.
Exec/s (Executions per second) (Median ~15K): A consistently high rate of SQL executions, reaching up to 30K. The workload is clearly substantial.
Trans/s (Transactions per second) (Median ~750): Solid transaction throughput, peaking around 1600. Nothing inherently alarming here on its own.
Phy Reads MB/s (Physical Reads MB per second) (Median ~180 MB/s): A noticeable amount of physical reads, sometimes spiking to ~450 MB/s. This implies I/O activity is a factor.
Phy Writes MB/s (Physical Writes MB per second) (Median ~15 MB/s): Much lower than reads, which is typical. Peaks around 90 MB/s.
Redo MB/s (Redo MB per second) (Median ~15 MB/s): This mirrors the physical writes, indicating that LGWR is pushing a consistent amount of redo to disk. Peaks also hit ~90 MB/s.
User Commits/snap (Median ~2.7M): A massive number of user commits per snapshot. This is a very high transaction commit rate, indicating a write-heavy or highly concurrent DML workload. This will put pressure on the redo log buffer and LGWR.
User Rollbacks/snap (Median ~0): Thankfully, rollbacks are practically non-existent. Good job not screwing up transactions too often.
Parses/s (Median ~6.5K): High parsing activity, suggesting many unique SQL statements or frequent re-parsing.
Hard Parses/s (Median ~165): A non-trivial number of hard parses. This can be a major source of contention on the shared pool and library cache, especially with high concurrency. This might explain some of the waiting.
Logical Reads (MB)/s (Median ~22K MB/s): Hold on to your hats, because this is insane. Logical reads are orders of magnitude higher than physical reads. This is the most crucial statistic from this load profile. It screams "inefficient SQL" or "massive full table scans" or "index hot blocks" or "chained rows". Your sessions are thrashing the buffer cache like it owes them money, but not necessarily going to disk.
Block Changes/s (Median ~20K): Significant block modification rate, again, points to a heavy DML workload.
User Calls/s (Median ~28K): High user interaction with the database.

Overall Load Profile Synopsis: Your database is a wait-bottlenecked beast. The consistently high Average Active Sessions with low CPU utilization, coupled with astronomical logical I/O, tells me your sessions are spending most of their time fighting over memory resources or waiting for Oracle to process already-cached blocks. This isn't about slow disks; it's about inefficient data access patterns and contention within the SGA. The high commit rate and block changes contribute significantly to the internal contention.

📉 DBCPU/DBTime Ratio Analysis: Confirmed Bottleneck

Your analysis confirms my initial suspicion from the load profile. The default ratio for finding peaks is 0.666, meaning CPU should account for at least two-thirds of the DB Time. Your actual ratios? Abysmally low, ranging from 0.01 to 0.19.

This isn't a "peak," darling, this is your database suffocating for lack of resources other than CPU. When DB CPU is 15.00 and DB Time is 1196.10 (ratio 0.01 on 25-Cze-25 13:00:32, snap 1665), it means that on average, about 1196 active sessions were doing something, but only 15 of them were actually crunching numbers on the CPU. The rest were cooling their heels, waiting for I/O, locks, latches, or some other internal Oracle resource.

The lowest ratios, indicating the most severe non-CPU bottlenecks, consistently occur during your business hours (08:00-15:00 on weekdays):

25-Cze-25 (Wednesday): The ratio drops to 0.01 between 11:00:12 (snap 1663) and 14:00:42 (snap 1666). This is a prime time for performance issues.
26-Cze-25 (Thursday): Again, from 11:00:56 (snap 1687) to 14:00:22 (snap 1690), the ratio hits 0.01. A recurring nightmare.
27-Cze-25 (Friday): Same story, 12:00:10 (snap 1712) to 15:00:40 (snap 1715) sees the ratio at 0.01 or 0.02. Weekends look slightly better, but still heavily waiting.
30-Cze-25 (Monday): From 09:00:55 (snap 1781) to 16:00:07 (snap 1788), the ratio is 0.01 or 0.02. Starting the week with a whimper, not a bang.
01-Lip-25 (Tuesday): 10:00:54 (snap 1806) to 14:00:30 (snap 1810) shows 0.01.
02-Lip-25 (Wednesday): 09:00:19 (snap 1829) to 14:00:12 (snap 1834) also hits 0.01.
03-Lip-25 (Thursday): 11:00:29 (snap 1855) to 15:00:04 (snap 1859) hits 0.01.
04-Lip-25 (Friday): 09:00:48 (snap 1877) to 15:00:44 (snap 1883) hits 0.01.

The periods around 28-Cze-25 00:00:55 (snap 1724) to 05:00:58 (snap 1729) and 29-Cze-25 00:00:44 (snap 1748) to 05:00:29 (snap 1753), and 05-Lip-25 00:00:02 (snap 1892) to 06:00:05 (snap 1898) have slightly higher ratios (up to 0.19), but still indicate significant non-CPU waits. This suggests your database is always waiting, just more so during peak hours.

⏳ Heaviest Wait Events Impacting Database Performance

Let's break down what's got your database tied up in knots. The problem isn't CPU, it's waiting for stuff. A lot of stuff.

💔 Foreground Wait Events

These are the events that directly impact user session response times. They're the bane of any application user's existence.

db file sequential read:
- Correlation with DB Time: 0.35 (Positive, moderate correlation)
- AVG PCT of DB Time: 0.15% (Seems low, but consistency is key)
- AVG Wait Time (s): 1936.31 (Total wait time is significant!)
- AVG No. executions: 3122021.06 (Millions of single-block reads per hour?!)
- AVG wait/exec (ms): 0.62 (Individual reads are fast, but the volume is deadly)
- Meaning: This event occurs when a session waits for a physical read of a single block from disk (e.g., index access, table access by rowid). The low wait/exec suggests fast I/O, but the astronomical number of executions means your application is performing an insane amount of single-block lookups. This typically points to inefficient index usage, very selective queries touching many rows, or poor application design that iterates over data.
- Anomalies: You see significant total wait times and massive execution counts, especially on 29-Cze-25 (snaps 1750-1757, 1759) and 01-Lip-25 (snaps 1797-1804), and also 06-Lip-25 00:00:35 (snap 1916) and 07-Lip-25 00:00:54 (snap 1940). The high AVG No. executions (5M-9M at peak) confirms that sheer volume, not slow I/O, is the problem here.
- Action: Investigate SQL statements performing high single-block reads. Look for queries doing many index lookups, especially nested loops or non-optimal index designs.
enq: TX - row lock contention:
- Correlation with DB Time: 0.14 (Positive, but low)
- AVG PCT of DB Time: 0.17% (Again, seems low but is a top event)
- AVG Wait Time (s): 1829.70 (Total wait time is substantial)
- AVG No. executions: 563.62 (Relatively few, but very long waits)
- AVG wait/exec (ms): 3246.31 (Over 3 seconds per wait! This is terrible!)
- Meaning: This is the classic "I'm trying to update a row, but someone else has it locked" event. High AVG wait/exec means sessions are waiting for a very long time for locks to be released. This is a direct hit on concurrency and user experience. It's often caused by long-running transactions, uncommitted DML, or applications that frequently update the same rows (hot spots).
- Anomalies: This event is riddled with anomalies, especially across 28-Cze-25 (snaps 1730-1735) and 29-Cze-25 (snaps 1749-1758) and 30-Cze-25 (snaps 1773-1782). The AVG Wait (ms) during these periods jumps to 7000-31000 ms (7 to 31 seconds!), which is utterly unacceptable.
- Action: Identify the SQL IDs causing these locks (see SQL analysis later). Look for DML operations on specific tables that are constantly contested. Tune transactions to be shorter, commit more frequently, or find ways to reduce contention (e.g., application logic changes, row-level locking design).
- MOS Note: MOS Note 1381273.1 "Troubleshooting 'enq: TX - row lock contention' wait event in Oracle Database"
db file scattered read:
- Correlation with DB Time: 0.22 (Positive, but low)
- AVG PCT of DB Time: 0.04%
- AVG Wait Time (s): 399.72
- AVG No. executions: 188869.94
- AVG wait/exec (ms): 2.12
- Meaning: This indicates multi-block reads, typical for full table scans or fast full index scans. While individual waits aren't terribly long, the volume of these scans could indicate missing indexes or inefficient query access paths. Correlates positively with DB Time, meaning when the database is busy, so are these scans.
- Anomalies: Several dates show anomalies, notably 25-Cze-25 22:00:00 (snap 1674), 29-Cze-25 10:00:12 (snap 1758) and 14:00:45 (snap 1762), and 30-Cze-25 08:00:24 (snap 1780).
- Action: Investigate SQL statements performing full scans. Check for appropriate indexing or if these scans are expected (e.g., large reporting queries).
read by other session:
- Correlation with DB Time: 0.07 (Very weak positive)
- AVG PCT of DB Time: 0.02%
- AVG Wait Time (s): 233.78
- AVG No. executions: 226131.50
- AVG wait/exec (ms): 1.03
- Meaning: Sessions are waiting because another session holds a buffer in the buffer cache in an incompatible mode (e.g., another session is reading a block, and this session needs it). This is a symptom of hot blocks, where many sessions concurrently try to access or modify the same blocks. Can be related to buffer busy waits.
- Anomalies: Frequent anomalies throughout the period, particularly concentrated from 28-Cze-25 02:00:24 (snap 1726) to 10:00:47 (snap 1734) and 29-Cze-25 02:00:02 (snap 1750) to 10:00:12 (snap 1758), and also 30-Cze-25 02:00:29 (snap 1774) to 08:00:24 (snap 1780).
- Action: Identify hot blocks, often related to dictionary cache contention (see row cache mutex and segments like OBJ$, TAB$). Look for SQL statements frequently accessing these blocks.
row cache mutex:
- Correlation with DB Time: 0.05 (Very weak positive)
- AVG PCT of DB Time: 0.01%
- AVG Wait Time (s): 132.92
- AVG No. executions: 126104.36
- AVG wait/exec (ms): 1.05
- Meaning: This mutex protects dictionary cache objects. High waits here mean sessions are contending for access to frequently updated or accessed dictionary objects. This is very serious as it can bottleneck any database operation. Often related to parsing (hard parses), dynamic SQL, object creation/dropping, or excessive login/logout activity.
- Anomalies: Widespread anomalies. Very high Total Wait (s) for this event, e.g., 25-Cze-25 01:00:52 (snap 1653) (2316s), and a continuous barrage of them across 28-Cze-25 04:00:49 (snap 1728) to 12:00:03 (snap 1736) and 29-Cze-25 01:00:53 (snap 1749) to 10:00:12 (snap 1758). The high occurrences of this event point directly to dictionary cache activity.
- Action: Reduce hard parsing, ensure proper bind variable usage, minimize dynamic object creation/dropping. Look for library cache lock or cursor: pin S wait on X as related issues.
log file sync:
- Correlation with DB Time: 0.67 (Strong positive correlation!)
- AVG PCT of DB Time: 0.01% (Still low percentage, but its strong correlation means it scales with overall DB activity)
- AVG Wait Time (s): 111.73
- AVG No. executions: 16941.35
- AVG wait/exec (ms): 6.60
- Meaning: This is a critical wait event for commit operations. When a user session issues a COMMIT or ROLLBACK, LGWR must write the redo records from the log buffer to disk and confirm the write before the transaction is considered committed. High waits indicate slow I/O to redo logs or excessive commits/small transactions. While the overall percentage is low, its strong correlation means it's a significant bottleneck when DB Time is high.
- Anomalies: Notable spikes in AVG Wait (ms) (e.g., 30-Cze-25 08:00:24 (snap 1780) is 29.342 ms, and 05-Lip-25 01:00:14 (snap 1893) is 6.632 ms). These are individual commit latency issues.
- Action: Check if redo logs are on fast storage. Examine the Load Profile for high Redo MB/s and Phy Writes MB/s (which are indeed high, median 15MB/s peaking at 90MB/s). If you have redo log multiplexing, the actual amount of redo written to disk by LGWR is higher than Redo MB/s multiplied by the number of copies. Verify LGWR statistics in "IO Stats by Function". If the latency for single writes is high, it points to disk. If not, it points to transaction commit frequency. Consolidate commits if possible.
direct path write temp:
- Correlation with DB Time: -0.07 (Very weak negative)
- Meaning: Sessions writing directly to temporary segments, typically for large sort/hash operations that spill to disk. Usually indicative of insufficient PGA memory or complex SQL.
- Anomalies: Many anomalies, especially 25-Cze-25 01:00:52 (snap 1653) (1995s), and a continuous high activity on 26-Cze-25 (snaps 1677, 1678, 1681, 1682), 27-Cze-25 (snaps 1701, 1702, 1706), and 28-Cze-25 (snaps 1725, 1726, 1730). This strongly suggests frequent, large sorts or hash joins spilling to disk.
- Action: Tune PGA_AGGREGATE_TARGET or examine SQL IDs performing these writes (like bchy6g2yakv4k, 1328bjaprbvy4).
cursor: pin S wait on X:
- Correlation with DB Time: -0.04 (Very weak negative)
- Meaning: This occurs when a session tries to pin a cursor in shared mode (S) but another session holds it exclusively (X). Often due to invalidations of objects (DDL operations, grant/revoke) or severe library cache contention.
- Anomalies: Detected on 25-Cze-25 00:00:37 (snap 1652) and 26-Cze-25 00:00:21 (snap 1676).
- Action: Check for excessive DDLs, statistics gathering, or grant/revoke operations during peak hours. If found in the same cluster as library cache lock and dc_object_grants anomalies, it's a strong indicator of DDL activity impacting concurrency.
- Top 10 SQLs correlated with cursor: pin S wait on X (assuming data in full report shows this correlation): (I cannot directly compute this without the full dataset, but if I had the raw correlation data for all SQLs, I would list them here. Based on general patterns, these are usually related to parsing. Since only 7vpt1z3a9jr1s is listed with a 0.38 correlation, let's assume this is the main culprit from this data point)
  1. 7vpt1z3a9jr1s (MERGE type) - Highest correlation in provided output.
library cache lock:
- Correlation with DB Time: 0.32 (Moderate positive)
- Meaning: A session is waiting for a lock on an object in the library cache to be released. This happens during parsing, compilation, or execution of DDL/DML. High waits often indicate frequent invalidation of objects (DDL) or excessive hard parsing.
- Anomalies: Found on 01-Lip-25 08:00:28 (snap 1804) and 06-Lip-25 22:00:38 (snap 1938).
- Action: Reduce DDL operations during peak times, use bind variables to reduce hard parses. If dc_object_grants also shows anomalies, consider if grants/revokes are frequently issued.

🌌 Background Wait Events

These affect the overall health and performance of the database's internal operations.

log file parallel write:
- Correlation with DB Time: 0.59 (Strong positive correlation)
- AVG PCT of DB Time: 39.86% (A massive portion of your DB Time!)
- AVG Wait Time (s): 3928.93
- AVG No. executions: 1976575.46
- AVG wait/exec (ms): 1.99
- Meaning: LGWR is waiting to write redo blocks to all online redo log members. This is the flip side of log file sync. If LGWR is slow, user commits are delayed. Your 39.86% of DB Time is a huge problem. While individual waits are fast (1.99 ms), the sheer volume of redo generated is overwhelming the I/O subsystem or LGWR's ability to keep up. Your Redo MB/s in the Load Profile is 15MB/s (median) and can go up to 90MB/s. This is certainly contributing to the wait.
- Anomalies: 26-Cze-25 19:00:04 (snap 1695), 20:00:12 (snap 1696), 23:00:34 (snap 1699).
- Action: This is your number one background wait event. It directly impacts log file sync.
  1. Check I/O subsystem: While wait/exec is low, the sheer volume of writes can saturate the I/O subsystem. Confirm if redo logs are on the fastest possible storage (e.g., local NVMe SSDs).
  2. Reduce redo generation: Identify and tune DML operations generating excessive redo (e.g., large INSERT/UPDATE/DELETE operations, NOLOGGING operations where appropriate for transient data). Check for LGWR stats in IO STATS by Function which shows Write Data (MB)/sec: 48.27 and Write Requests/sec: 1492.58 for LGWR, with a Wait Avg Time (ms): 1.05. This indicates that individual writes are fast, but the rate is high. Multiplexing redo (having multiple copies of redo files) is a good practice for durability, but it means LGWR performs that many more physical writes for the same logical redo size, further stressing the I/O.
  3. Increase redo log file size: Larger redo logs mean less frequent log switches, reducing checkpoint overheads.
  4. Consider _LOG_FILE_ASYNC_IO=TRUE (if not already set, but typically default): Enables LGWR to use asynchronous I/O if the OS supports it.
db file parallel write:
- Correlation with DB Time: 0.53 (Strong positive correlation)
- AVG PCT of DB Time: 10.18% (Significant portion!)
- Meaning: DBWR (Database Writer) waiting to write dirty blocks to datafiles. High waits can indicate slow I/O to datafiles, too few DBWR processes, or excessive dirty blocks (e.g., from huge DMLs, or very frequent checkpoints).
- Anomalies: Many anomalies, concentrated on 28-Cze-25 01:00:06 (snap 1725) and 02:00:24 (snap 1726) (Total Wait 5717s and 4350s), and then 30-Cze-25 08:00:24 (snap 1780) (4872s), 01-Lip-25 08:00:28 (snap 1804) (7634s), and 08-Lip-25 07:00:04 (snap 1971) and 08:00:21 (snap 1972) (6321s and 8103s). Notice the particularly high AVG Wait (ms) during these peaks (e.g., 28-Cze-25 01:00:06 is 6.595 ms, and 30-Cze-25 08:00:24 is 13.729 ms). These are problematic spikes.
- Action: This is your second most impactful background event.
  1. I/O Subsystem: Verify if datafiles are on fast storage. The spikes in AVG Wait (ms) suggest intermittent I/O slowdowns.
  2. Checkpoint tuning: Check checkpoint frequency and duration. Excessive dirty blocks or too aggressive FAST_START_MTTR_TARGET can increase DBWR activity.
  3. Buffer Cache size: If the buffer cache is too small, blocks might be frequently aged out and need writing, even if they're still "hot".
LGWR any worker group:
- Correlation with DB Time: 0.59 (Strong positive correlation)
- AVG PCT of DB Time: 9.92%
- Meaning: LGWR processes writing redo. Similar to log file parallel write but focuses on the worker group interaction. High values reinforce LGWR as a bottleneck.
- Anomalies: Spikes on 05-Lip-25 00:00:02 (snap 1892) (1857s), and 06-Lip-25 00:00:35 (snap 1916) (2002s).
RMAN backup & recovery I/O:
- Correlation with DB Time: -0.14 (Very weak negative)
- Meaning: RMAN operations reading/writing to disk. While its correlation is negative (meaning it decreases when DB Time increases), its high percentage and spike during certain periods (3.01% AVG, 7.07% STDDEV) indicate it can be a significant consumer of resources.
- Anomalies: Extremely frequent and high magnitude anomalies, especially during off-peak hours (e.g., 25-Cze-25 00:00:37 (snap 1652) 3978s, 29-Cze-25 00:00:44 (snap 1748) 4113s, 05-Lip-25 01:00:14 (snap 1893) 16405s). The AVG Wait (ms) for some of these peaks is hundreds of milliseconds (e.g. 05-Lip-25 02:00:27 is 258.882 ms).
- Action: RMAN backups consume substantial I/O resources. Ensure RMAN operations are scheduled during quiet periods or use throttling if they impact online performance. The huge STDDEV suggests highly variable backup performance.
LGWR all worker groups:
- Correlation with DB Time: 0.60 (Strong positive)
- Meaning: Aggregated waits for all LGWR worker groups. This is another confirmation that LGWR activity is a major part of your workload, and when your database is busy, LGWR is busy.
- Anomalies: 25-Cze-25 00:00:37 (snap 1652) (902s) and 07-Lip-25 08:00:06 (snap 1948) (1017s).
log file sequential read (Background):
- Correlation with DB Time: 0.17 (Weak positive)
- Meaning: Background processes (e.g., ARCH, RMAN, DBWR) reading from archived or online redo logs. High waits can mean slow I/O or high archival activity.
- Anomalies: A barrage of anomalies, particularly 27-Cze-25 07:00:04 (snap 1707) to 10:00:47 (snap 1710) and 28-Cze-25 06:00:08 (snap 1730) to 10:00:47 (snap 1734), then again across 29-Cze-25 02:00:02 (snap 1750) to 10:00:12 (snap 1758).
LGWR worker group ordering:
- Correlation with DB Time: 0.40 (Moderate positive)
- Meaning: LGWR workers waiting for ordering. This can indicate that LGWR is having trouble coordinating its writes across multiple worker threads, potentially due to high concurrency in redo generation or I/O bottlenecks impacting its internal mechanisms.
- Anomalies: Concentrated activity across 28-Cze-25 09:00:38 (snap 1733) to 12:00:03 (snap 1736) and 29-Cze-25 02:00:02 (snap 1750) to 10:00:12 (snap 1758).
db file async I/O submit:
- Correlation with DB Time: 0.02 (Very weak positive)
- Meaning: Processes submitting asynchronous I/O requests. While the event itself is about submitting, if it's high, it points to large volumes of I/O.
- Anomalies: Frequent anomalies throughout, especially peaks on 25-Cze-25 03:00:27 (snap 1655) and several during the day on 28-Cze-25 (snaps 1730-1733) and 29-Cze-25 (snaps 1750-1758). This is a symptom of the high db file parallel write and db file sequential read events.
Disk file operations I/O (Background):
- Correlation with DB Time: -0.13 (Very weak negative)
- Meaning: Background processes performing various file system operations (e.g., opening/closing files, resizing datafiles). This can sometimes indicate problems with file system performance or frequent file operations.
- Anomalies: Significant anomalies, especially 05-Lip-25 01:00:14 (snap 1893) to 06:00:05 (snap 1898) showing Total Wait in thousands of seconds and AVG Wait in hundreds or thousands of milliseconds. This is a major issue on those specific dates.
- Action: Check what file operations are occurring during these times (e.g., trace background processes like DBWR, ARC0). Could be file system level issues.
ASM file metadata operation (Background):
- Correlation with DB Time: 0.22 (Weak positive)
- Meaning: Waits related to accessing ASM metadata. If high, it could indicate contention for ASM metadata, possibly due to frequent file operations (creation/deletion/resizing) or ASM disk group issues.
- Anomalies: A huge cluster of anomalies on 08-Lip-25 (snaps 1974-1978) with high total waits. This indicates severe ASM metadata contention during these times.
- Action: Investigate ASM activity during those periods. Are many files being created/dropped? Are ASM disks healthy?

Summary of Heaviest Wait Events:

Your database is primarily I/O and concurrency bound.

Top 2 (by DB Time percentage): log file parallel write (almost 40%) and db file parallel write (over 10%). Both are background writes to disk. This is critical. Your I/O subsystem or redo generation rate is a massive bottleneck.
Top (by Total Wait Time and Volume): db file sequential read for foreground sessions (millions of single block reads) and enq: TX - row lock contention for foreground sessions (very long waits). These directly impact user experience.
Key Concurrency Issues: row cache mutex and read by other session indicate hot blocks and dictionary cache contention.
RMAN is a resource hog during its window, but doesn't appear to directly impact online performance during peak hours due to its negative correlation.

🔍 SQL IDs Requiring Further Performance Analysis

Let's name and shame some of these SQLs. The distinction between "short but many executions" and "inherently slow" is crucial.

aubnkrv6cz41z (SELECT, XXXX.Services.Cache.Promotions.App.exe):
- AVG Ela by Exec: 0.07s (70ms) - Short execution time.
- AVG No. executions: 77041.73 - Many executions.
- Problem: This SQL falls into the "short execution times but many executions" category. While individual executions are relatively fast, the sheer volume of executions accumulates to high total elapsed time (6354.54s).
- Correlation with DB Time: -0.12 (Weak negative) - Interestingly, this SQL's busy periods don't strongly align with overall DB busy periods.
- ASH Waits: CPU + Wait for CPU (10.19% AVG) - This is CPU bound when it runs.
- Correlated Events: enq: TX - row lock contention (0.52), row cache mutex (0.44), direct path read (0.58), SQL*Net more data to client (0.68), log file switch (private strand flush incomplete) (0.45), latch: redo allocation (0.36), undo segment extension (0.46), ASM file metadata operation (-0.44).
- Analysis: This SQL hits many contention points. Its correlation with enq: TX - row lock contention and row cache mutex suggests it might be part of a high-concurrency DML workload, even though it's a SELECT. The direct path read could imply it's doing full scans of large segments, or that data it needs is not in the buffer cache. The CPU + Wait for CPU in ASH confirms it's doing work on the CPU during its execution, which conflicts with the overall DB Time / DB CPU ratio being very low, meaning this SQL is efficient when it gets the CPU.
- Anomaly: 29-Cze-25 16:00:03 (snap 1764) with 15542.590s elapsed time. This specific SQL warrants deep dive.
0hhmdwwgxbw0r (SELECT, MODULE: None):
- AVG Ela by Exec: 0.49s (490ms) - Relatively slow execution time.
- AVG No. executions: 8649.51 - Many executions.
- Problem: This is a combination problem. The execution time is already high, and it runs frequently, compounding the issue.
- Correlation with DB Time: 0.19 (Weak positive).
- Correlated Events: enq: CR - block range reuse ckpt (0.63), enq: RO - fast object reuse (0.49), library cache lock (0.46).
- Analysis: Its correlation with CR/RO enqueue waits suggests it's involved in object reuse, potentially dynamic table creation/dropping (like recyclebin activity) or temporary segment management. Given no module is specified, this might be an internal or background SQL.
- Anomalies: Frequent anomalies, especially across 04-Lip-25 (snaps 1886-1891), showing sustained high elapsed times.
28dxvd44z5rwf (MERGE, XXXX.Services.Cache.Promotions.App.exe):
- AVG Ela by Exec: 0.44s (440ms) - Relatively slow execution time per execution.
- AVG No. executions: 5801.54 - Many executions.
- Problem: Another combination of slow and frequent, making it a significant overall contributor.
- Correlation with DB Time: -0.02 (Negligible).
- ASH Waits: read by other session (4.85%), CPU + Wait for CPU (5.42%). Confirms hot block contention.
- Correlated Events: db file sequential read (0.43), enq: TX - row lock contention (0.45), read by other session (0.75), row cache mutex (0.77), direct path read (0.59), SQL*Net more data to client (0.67), PGA memory operation (0.53), log file switch (private strand flush incomplete) (0.76), latch: redo allocation (0.59), Disk file operations I/O (0.38), undo segment extension (0.60), SQL*Net more data from client (0.45).
- Analysis: This MERGE statement is a major culprit for many types of contention, especially read by other session and row cache mutex. This implies it's frequently accessing "hot" blocks and potentially causing dictionary cache contention. It's hitting pretty much every major internal bottleneck. This is a critical SQL to tune.
- Anomalies: Very high concentration of anomalies, almost continuously from 26-Cze-25 04:00:21 (snap 1680) to 28-Cze-25 12:00:03 (snap 1736). This SQL is severely problematic.
7h6kvx22p1mvc (SELECT, Oracle Enterprise Manager.Metric Engine):
- AVG Ela by Exec: 131.48s (Over 2 minutes!) - Extremely slow execution time.
- AVG No. executions: 1.99 - Few executions.
- Problem: Inherently slow execution time.
- Correlation with DB Time: -0.66 (Strong negative) - When this SQL runs, DB Time goes down. This is typical for background maintenance tasks that run during quiet periods.
- ASH Waits: db file scattered read (1.05%).
- Analysis: This is an OEM metric engine query. Given its very long execution time and few executions, it's likely a complex reporting or data collection query. Its negative correlation with DB Time means it usually runs when the system is not busy, so it's probably not directly causing user impact. However, 2 minutes is a long time for any query.
- Anomaly: Only one significant anomaly on 06-Lip-25 23:00:46 (snap 1939).
d7z8xat152jcu (SELECT, XXXX.Repo.Oracle):
- AVG Ela by Exec: 3.92s (Almost 4 seconds) - Slow execution time.
- AVG No. executions: 159.59 - Moderate executions.
- Problem: Inherently slow execution time.
- Correlation with DB Time: 0.88 (Very strong positive correlation) - This SQL drives DB Time. When this SQL is busy, your database is busy.
- ASH Waits: CPU + Wait for CPU (1.43%).
- Correlated Events: db file sequential read (0.37), log file sync (0.68), enq: TX - index contention (0.86), PX Deq: Slave Session Stats (0.53), buffer busy waits (0.36).
- Analysis: This SQL is a major performance contributor. Its high correlation with log file sync and enq: TX - index contention suggests it's performing DML operations, likely involving index updates that are causing contention. It's part of your core problem.
- Anomalies: Several anomalies concentrated around 30-Cze-25 (snaps 1780-1785) and 02-Lip-25 (snaps 1830-1833) and 03-Lip-25 10:00:17 (snap 1854).
dvtr3uwuqbppa (INSERT, XXXX.Services.Cache.Promotions.App.exe):
- AVG Ela by Exec: 4.93s (Almost 5 seconds) - Slow execution time.
- AVG No. executions: 7152.39 - Many executions.
- Problem: Another critical combination of slow and frequent.
- Correlation with DB Time: 0.15 (Weak positive).
- ASH Waits: enq: TX - row lock contention (7.76%), db file sequential read (4.31%).
- Correlated Events: db file sequential read (0.65), enq: TX - row lock contention (0.97), row cache mutex (0.39), direct path read (0.50), SQL*Net more data to client (0.55), log file switch (private strand flush incomplete) (0.43), undo segment extension (0.53).
- Analysis: This INSERT statement is a monster of contention. Its almost perfect correlation (0.97) with enq: TX - row lock contention tells you exactly where your TX contention is coming from. This SQL is literally battling other sessions for row locks. It also hits db file sequential read (index reads for inserts), row cache mutex (dictionary contention), and undo segment extension (large transactions or frequent undo segment usage). This is a top priority to tune.
- Anomalies: Highly anomalous on 26-Cze-25 11:00:56 (snap 1687), and then across 27-Cze-25 (snaps 1706-1710) and 29-Cze-25 01:00:53 (snap 1749) and 30-Cze-25 04:00:46 (snap 1776).

SQL ID Summaries:

SQLs with short execution times but many executions: aubnkrv6cz41z is the primary candidate here. Its Ela by Exec is low, but the sheer volume makes it expensive. Optimize its execution plan further to reduce logical I/O or executions if possible.
SQLs causing performance problems due to execution frequency or inherently slow execution times: 0hhmdwwgxbw0r, 28dxvd44z5rwf, d7z8xat152jcu, dvtr3uwuqbppa are the main culprits. They are either individually slow and execute often, or very slow per execution. These are your prime targets.

📊 Comparing AVG and STDDEV, and Correlated Events

A high STDDEV relative to AVG implies inconsistent performance. Let's look at the worst offenders:

enq: TX - row lock contention (Foreground):
- AVG Wait Time (s): 1829.70, STDDEV: 3287.56. STDDEV is almost double the AVG. Highly inconsistent waits. This means some waits are much longer than average, causing sporadic but severe application slowdowns. This is confirmed by the high AVG wait/exec (3246.31 ms).
- Correlated Events: As seen, this event correlates strongly with dvtr3uwuqbppa (INSERT), indicating this particular SQL is likely driving the lock contention.
direct path write temp (Foreground):
- AVG Wait Time (s): 98.95, STDDEV: 272.43. STDDEV is nearly three times the AVG. Very inconsistent waits. Spiking temporary I/O.
- Correlated Events: bchy6g2yakv4k, 1328bjaprbvy4 (UPDATE) are highly correlated. These SQLs are likely doing large sort/hash operations that spill to temp.
db file parallel write (Background):
- AVG Wait Time (s): 1386.45, STDDEV: 1534.88. STDDEV is slightly higher than AVG. Inconsistent DBWR writes. This can contribute to uneven checkpointing or DML commit times.
- Correlated Events: Highly correlated with overall DB Time, meaning it's a systemic bottleneck that gets worse as the database gets busier.
RMAN backup & recovery I/O (Background):
- AVG Wait Time (s): 692.27, STDDEV: 3253.98. STDDEV is nearly five times the AVG. Extremely inconsistent RMAN I/O. This shows huge variability in backup performance, sometimes causing significant I/O bursts.
db file sequential read (Foreground):
- AVG No. executions: 3122021.06, STDDEV: 3003328.86. STDDEV is almost equal to AVG. This implies a highly variable volume of single-block reads. Sometimes it's high, sometimes it's through the roof.

Interpretation: The high STDDEV values, especially for enq: TX - row lock contention and direct path write temp, highlight that the problem isn't just consistent slowness but unpredictable spikes that likely cause significant user complaints. db file parallel write and log file parallel write are more consistently high but also show periods of increased struggle.

🚨 Anomaly Clusters (Median Absolute Deviation)

Your MAD parameters are threshold = 7 and window size = 10% (32 of probes out of 327). This is a reasonable setup for detecting significant deviations from the median, indicating "anomalies".

The Anomaly Summary table is gold. It ties together problematic events, statistics, and even SQL IDs by specific dates and snap IDs.

The Period with the Biggest Amount of Anomalies: Looking at the Count column in the Anomaly Summary, the most problematic periods (dates with the highest number of co-occurring anomalies) are:

29-Cze-25 05:00:29 (snap 1753): 192 anomalies. This date is a dumpster fire.
29-Cze-25 06:00:38 (snap 1754): 187 anomalies. Another lovely day.
29-Cze-25 08:00:56 (snap 1756): 149 anomalies.
30-Cze-25 08:00:24 (snap 1780): 134 anomalies.
29-Cze-25 02:00:02 (snap 1750): 133 anomalies.
30-Cze-25 09:00:55 (snap 1781): 133 anomalies.
28-Cze-25 09:00:38 (snap 1733): 122 anomalies.
28-Cze-25 08:00:29 (snap 1732): 93 anomalies.

These dates (especially June 29th and 30th, around business hours) clearly represent periods of severe, widespread performance degradation affecting multiple areas of the database.

Patterns of Occurring Latches, Statistics, and Wait Events:

Many clusters are interconnected, showing systemic problems:

row cache mutex and latch: redo allocation: These two latches appear very frequently together in anomaly clusters, e.g., on 25-Cze-25 01:00:52 (snap 1653), 26-Cze-25 01:00:38 (snap 1677), 27-Cze-25 08:00:19 (snap 1708), 28-Cze-25 05:00:58 (snap 1729), etc.
- row cache mutex: Protects the dictionary cache, which stores metadata about database objects (tables, indexes, columns, grants, etc.). High contention here often means:
  - Excessive hard parsing: Queries are not using bind variables, leading to new execution plans being generated and stored in the library cache, frequently accessing the dictionary.
  - Frequent DDL: Creation, alteration, or dropping of objects invalidates cached dictionary entries, causing contention as sessions try to repopulate them.
  - Dynamic object creation/dropping: Applications that frequently create and drop temporary tables, global temporary tables, or indexes can cause this.
- latch: redo allocation: Protects the redo allocation latch, which serializes access to the redo log buffer for allocating space. High contention here means many sessions are concurrently generating redo, leading to hot spots in the redo buffer allocation. This is typical in write-heavy workloads.
- Meaning of Correlation: When these two appear together, it suggests you have a high DML/write workload (redo allocation latch) combined with frequent metadata access, likely due to a high rate of transactions (commits need redo) and possibly object changes or hard parsing associated with those transactions.
cursor: pin S wait on X and library cache lock in the same cluster:
- These are both related to parsing and library cache activity.
- cursor: pin S wait on X: A session wants to pin a cursor (e.g., to execute it) but another session has it pinned in exclusive mode (X). This exclusive pin often occurs during compilation or invalidation (DDL, stats gathering).
- library cache lock: A session wants to lock an object in the library cache for modification (e.g., parsing a SQL statement or performing DDL).
- When found together (e.g., 25-Cze-25 00:00:37 (snap 1652)): It implies that DDL operations (or potentially statistics gathering on frequently used objects) are taking place during active work hours. This invalidates existing cursors, forcing other sessions to re-parse (and thus contend for library cache resources), or wait for an exclusive pin to be released.
- If dc_object_grants also anomalous: This confirms the suspicion. Granting/revoking privileges, even within an application, can cause DDL. Frequent changes to user/object privileges during system work can be a major source of library cache contention. This is poor architectural design for a production system. MOS Note 1484705.1 "High 'Cursor: Pin S Wait On X' or 'Library Cache Lock' Wait Events due to DDL in a Busy System" is relevant.
enq: CR - block range reuse ckpt in top events or anomalies:
- This enqueue wait implies contention for a space management (CR) enqueue for block range reuse checkpoint. This often points to RECYCLEBIN$ issues, specifically when objects are frequently dropped and re-created (or purged from recyclebin). The database is trying to clear space but encountering contention.
- If dc_table_scns or dc_tablespaces also in cluster: These are dictionary cache objects related to table segment metadata and tablespace properties. Their co-occurrence with enq: CR - block range reuse ckpt strongly suggests RECYCLEBIN$ problems. Frequent dropping and purging of objects (dynamic object management) can cause this, leading to contention for dictionary entries related to space management.
- Action: Purge the recycle bin during off-peak hours (e.g., PURGE RECYCLEBIN;). Investigate application design if objects are frequently dropped and created.
user logons cumulative and user logouts cumulative in the same anomaly cluster:
- (Not directly observed in Load Profile Anomalies with a MAD score, but I will comment on the concept as requested).
- If you see these two both as anomalous in the same period (high AVG and/or STDDEV for their rate), it suggests a "logon storm." This is when many sessions connect and disconnect very rapidly.
- Why problematic: Each logon/logout is a mini-transaction that consumes CPU, memory (PGA), and hits dictionary cache (e.g., row cache mutex on USER$, TS$, C_OBJ#). Rapid storms can exhaust listener resources, shared pool memory, and cause severe dictionary cache contention. It's an application design flaw, often due to improper connection pooling or stateless connections frequently re-establishing.
- Action: Investigate application connection management. Use persistent connection pools.
Time Model anomaly failed parse elapsed time:
- (Observed on 26-Cze-25 00:00:21 (snap 1676) in the Anomaly Summary).
- Meaning: This represents the time spent on parsing SQL statements that ultimately failed. High failed parses are a bad sign. They indicate:
  1. Application bugs: SQL syntax errors, invalid object names, missing privileges.
  2. SQL Injection attempts: Malicious code trying to execute invalid SQL.
  3. Concurrency issues: Sessions failing to parse due to contention on shared resources (e.g., library cache).
- Action: Check the alert.log file for ORA- errors, particularly ORA-009xx (syntax errors), ORA-01031 (insufficient privileges), or ORA-04021 (deadlock detected while trying to lock object). Also, check V$SQL_MONITOR (for currently running) or DBA_HIST_SQLSTAT for FAILED_EXEC_COUNT and FAILED_PARSE_COUNT. High values point to application code defects or security issues.

📅 Most Problematic Dates for Performance

Based on the highest counts of co-occurring anomalies and the lowest DB CPU / DB Time ratios, the most problematic periods are consistently during your weekdays, especially from June 29th through July 4th, within the typical business hours.

Specifically,

29-Cze-25 from ~02:00:02 to 10:00:12 (snaps 1750-1758)
30-Cze-25 from ~08:00:24 to 10:00:12 (snaps 1780-1782)
02-Lip-25 from ~09:00:19 to 14:00:12 (snaps 1829-1834)
03-Lip-25 from ~10:00:17 to 15:00:04 (snaps 1854-1859)
04-Lip-25 from ~08:00:39 to 15:00:44 (snaps 1876-1883)

These periods are characterized by high enq: TX - row lock contention, massive db file sequential read events, and widespread dictionary cache (row cache mutex) and redo allocation (latch: redo allocation) contention, all contributing to the severely low DB CPU / DB Time ratio.

🧩 SQLs in the Same Anomaly Clusters as Heaviest Wait Events

Let's cross-reference the top wait events with the anomalous SQLs:

Heavy Wait Event: enq: TX - row lock contention:
- SQL ID: dvtr3uwuqbppa (INSERT). This is your prime suspect. Its 0.97 correlation with TX enqueue waits is practically a smoking gun. This INSERT is causing major row-level contention. Check the table it's inserting into and the application logic around it. Ensure no SELECT FOR UPDATE is being used unnecessarily on the target table by other sessions that don't actually intend to modify data.
- SQL ID: aubnkrv6cz41z (SELECT). While it's a SELECT, its 0.52 correlation suggests it's getting caught up in the crossfire of the DML. This could be consistent read blocks being modified, or perhaps a SELECT FOR UPDATE hidden in the code. Given it's a SELECT, it's more likely a victim rather than a cause, but it's worth checking if it's explicitly locking rows.
Heavy Wait Event: db file sequential read:
- SQL ID: 28dxvd44z5rwf (MERGE). Its 0.43 correlation indicates it contributes to or is affected by many single-block reads.
- SQL ID: argquxfa38zf9 (DELETE). Its 0.37 correlation also implies it's doing many single-block reads (e.g., index scans to find rows to delete).
Heavy Wait Event: log file parallel write: (Background event, so direct SQL correlation is tricky, but SQLs that generate a lot of redo will contribute)
- SQL ID: d7z8xat152jcu (SELECT) with 0.68 correlation to log file sync. This SQL is doing work that leads to commits. Even if it's a SELECT, it could be part of a larger transaction or trigger a DML indirectly. This will contribute heavily to LGWR activity.
- SQL ID: 39mqc7ffby8x9 (UPDATE) with 0.34 correlation to log file sync. Updates generate redo, so this is expected.

⛓️ Chained Rows in Statistical Anomalies

The statistic table fetch continued row shows up in several anomaly clusters, specifically:

25-Cze-25 00:00:37 (snap 1652)
25-Cze-25 01:00:52 (snap 1653)
26-Cze-25 01:00:38 (snap 1677)
27-Cze-25 01:00:55 (snap 1701)
28-Cze-25 00:00:55 (snap 1724)
06-Lip-25 01:00:45 (snap 1917)
06-Lip-25 02:00:54 (snap 1918)

Meaning: table fetch continued row indicates the number of rows that spanned multiple blocks. This occurs with chained or migrated rows. Chained rows happen when a row is too large to fit in a single data block and has to be stored in multiple blocks. Migrated rows happen when a row is updated and grows in size, but there isn't enough free space in the original block, so it's moved to a new block, leaving a "pointer" behind. Impact: Both chained and migrated rows increase logical I/O (requiring more block reads to retrieve a single row) and can degrade performance, especially for full table scans or queries accessing many rows. This aligns with your very high Logical Reads (MB)/s in the Load Profile.

Further AWR Sections to check: To find the probable segments responsible for chained/migrated rows, you should look into the "Segments by Logical Reads" or "Segments by Physical Reads" sections of the full AWR report. Specifically, you'd look for:

V$SEGMENT_STATISTICS: Query this view for statistics like 'table fetch by rowid' (which includes fetches of chained rows) and 'physical reads direct'.
DBA_TABLES.CHAIN_CNT: This column explicitly tracks the number of chained rows for each table. A high number here means you definitely have a problem.
Analyze specific tables mentioned in TOP 10 Segments by Logical Reads/Physical Reads: For example, PROMOTIONDEFINITIONS and LKW1 have extremely high logical reads. These are prime candidates for having chained rows.

Action: For tables with significant chained/migrated rows, consider:

ALTER TABLE MOVE: Rebuild the table (and its indexes) to defragment space and ensure rows fit in single blocks. This usually requires an outage or online redefinition.
Adjust PCTFREE: Increase PCTFREE for tables with growing rows to leave more space in blocks for future updates, reducing migration.

📁 IO STATS by Function Analysis

This section gives us a breakdown of I/O activity by internal Oracle functions.

Buffer Cache Reads:
- Read Data (MB): 500297.28 (Mean total data read through buffer cache across all reports).
- Read Data (MB)/sec: 138.94.
- Read Requests/sec: 2320.60.
- Wait Avg Time (ms): 0.28.
- Analysis: This is the bread and butter of database I/O. Your Read Data (MB)/sec is fairly high, consistent with a busy OLTP system. The Wait Avg Time is very low, meaning individual reads are fast, confirming your storage is not inherently slow for reads. The Read Requests/sec of 2320 is substantial, reinforcing the high logical I/O seen in Load Profile. This all points to Oracle doing a lot of work from cache, not necessarily directly from disk.
DBWR:
- Write Data (MB): 73776.85.
- Write Data (MB)/sec: 20.49.
- Write Requests/sec: 738.86.
- Wait Avg Time (ms): 0.69.
- Analysis: DBWR writes dirty blocks from the buffer cache to datafiles. The Write Data (MB)/sec is reasonable. The Wait Avg Time is low, indicating that individual writes are fast. However, compare this to the db file parallel write event where average wait time for the event (not the underlying I/O function) was higher, indicating that contention or the sheer volume of writes is the issue, not necessarily the disk speed for individual blocks. The STDDEV for Write Data (MB) is almost as high as the mean, showing variability in write workload.
Direct Reads:
- Read Data (MB): 142478.80.
- Read Data (MB)/sec: 39.58.
- Read Requests/sec: 140.55.
- Wait Avg Time (ms): 0.25.
- Analysis: Reads bypassing the buffer cache, typically for large scan operations (direct path reads). The Wait Avg Time is extremely low, meaning your direct I/O is very efficient. This indicates that your storage can handle large sequential reads quickly. However, the high volume might still suggest unnecessary full table scans if these aren't expected.
Direct Writes:
- Write Data (MB): 18318.95.
- Write Data (MB)/sec: 5.09.
- Write Requests/sec: 38.04.
- Wait Avg Time (ms): 0.28.
- Analysis: Writes bypassing the buffer cache, typically for large sort/hash segments in temp tablespaces. Again, very fast individual writes. The volume depends on the application's use of temp segments. This matches direct path write temp foreground wait, where the issue was volume and frequency, not individual write speed.
LGWR:
- Write Data (MB): 173746.81.
- Write Data (MB)/sec: 48.27.
- Write Requests/sec: 1492.58.
- Wait Avg Time (ms): 1.05.
- Analysis: This confirms LGWR's massive workload. It's writing 48.27 MB/s of redo, which is more than double DBWR's data writes, and it's doing over 1492 write requests per second. The Wait Avg Time of 1.05 ms for LGWR's underlying I/O is fast. This unequivocally means your log file parallel write problem (39.86% DB Time) is NOT about slow disk write latency per operation, but about the sheer volume and frequency of redo generation saturating the LGWR process and its ability to keep up. It's pushing data as fast as the disks can handle it, but the application is just generating more. This implies a very high commit rate (confirming User Commits/snap from Load Profile) or very large transactions generating huge amounts of redo.
RMAN:
- Read Data (MB)/sec: 51.73.
- Write Data (MB)/sec: 7.53.
- Wait Avg Time (ms): 0.20.
- Analysis: RMAN operations are also performing fast I/O at the individual call level. The throughput is significant, confirming that RMAN consumes a fair share of your I/O bandwidth during its activity windows.

Cross-reference Conclusion from IO Stats by Function: Your storage system seems reasonably fast for both sequential and random I/O at the individual operation level. The bottleneck is not primarily disk latency, but the volume of I/O requests and redo generated by the application workload. The database is churning through data and changes at an incredible rate, and while the I/O system is efficient, it's struggling to keep up with the sheer demand. This leads to queues and waits, despite fast individual operations. The log file parallel write is a prime example: many fast small writes still stack up.

📋 TOP 10 Segments Analysis

This section helps pinpoint which database objects are at the heart of your problems.

TOP 10 Segments by Buffer Busy Waits:
- OBJ$, TAB$, I_OBJ1, COL$, CDEF$: These are all Oracle data dictionary tables and indexes (OBJ$, TAB$, COL$, CDEF$ are internal tables, I_OBJ1 is an index on OBJ$). Their high occurrence (51.07% for OBJ$, 32.72% for TAB$) in Buffer Busy Waits indicates massive contention on the data dictionary.
- Problem: This implies a very high rate of object lookups, parsing, or DDL operations. This aligns perfectly with row cache mutex waits, high Hard Parses/s in the Load Profile, and SQLs like 28dxvd44z5rwf (MERGE) that show high row cache mutex correlation.
- PRODUCTS_ROWVERSION3, PRODUCTS_ROWVERSION2 (INDEX): These are application indexes, likely on PRODUCTS table. Their presence suggests hot spots on these indexes, perhaps indicating high insert/update activity causing contention on the index blocks themselves.
- CONTRACTORPROMOTIONREFS_PK (INDEX): Another application index seeing heavy buffer busy waits.
- Why this is not a good architecture design: When core data dictionary objects (OBJ$, TAB$, COL$, CDEF$, I_OBJ#, I_COBJ#) appear frequently in buffer busy waits, it usually means the application is constantly creating/dropping objects, or doing frequent DDL, or has extreme hard parsing due to lack of bind variables. This is terrible for performance and scalability as it serializes access to critical metadata. Applications should strive for stable schemas and bind-variable-aware SQL.
TOP 10 Segments by Direct Physical Reads:
- N13, N23, F24N23 (TABLE): These appear to be application tables experiencing direct path reads.
- SYS_LOB0000026738C00013$$ (LOB): A LOB segment seeing direct reads. LOBs often use direct path I/O by default.
- TMPCONTRACTORIDS (TABLE): This is a temporary table. Its frequent appearance confirms that large sort/hash operations are spilling to temp, consistent with direct path read temp wait events.
- Analysis: This means a portion of your workload involves large scans that bypass the buffer cache, possibly due to FULL hints, parallel queries, or just very large segments being scanned.
TOP 10 Segments by Direct Physical Writes:
- SYS_LOB0000026738C00013$$ (LOB): LOB segment for writes.
- BIN$ONnCv5ZOlLHgY8wBqMCcJg==$0 (TABLE) and other BIN$ tables: These are objects in the recycle bin. Their presence here is a huge red flag. They imply frequent dropping of objects, which get sent to the recycle bin and then potentially purged, generating writes. This is directly connected to enq: CR - block range reuse ckpt and RECYCLEBIN$ problems.
- TMPCONTRACTORIDS (TABLE): Temporary tables, indicating large operations spilling to temp.
- Analysis: Confirms LOB and temp segment writes. The BIN$ tables are highly concerning and point to poor object lifecycle management within the application.
TOP 10 Segments by Logical Reads:
- RECYCLEBIN$ (TABLE): 908M logical reads! This table is consuming an unbelievable amount of logical I/O. This is the problem for your enq: CR - block range reuse ckpt and other dictionary contention. It's likely involved in frequent purges or lookups related to dropped objects.
- PROMOTIONDEFINITIONS, LKW1, KONTRAH, PROMOTIONDEFINITIONS, F03PROMOC1, AKWIZYT, F05PROMOC1 (TABLEs): Application tables with extremely high logical reads. These are your hot tables in the buffer cache. They are frequently accessed, leading to high logical I/O and contention (e.g., read by other session).
- RAPORTLG_PK (INDEX): High logical reads on an application index.
- SEG$ (TABLE): Oracle dictionary table, another internal table seeing heavy logical I/O, supporting dictionary contention findings.
- Analysis: Your application is hitting RECYCLEBIN$ and several core tables extremely hard in the buffer cache. This implies queries doing many logical I/Os (e.g., inefficient joins, excessive index lookups, full scans on small tables).
TOP 10 Segments by Physical Read Requests:
- RAPORTLG_RAPORTLG (INDEX), CONTRACTORPROMOTIONREFS_PROMID (INDEX), CONTRACTORPROMOTIONREFS_PK (INDEX): These are indexes. High physical read requests on indexes often indicate frequent index scans where the blocks are not in cache (cold blocks), or where many index blocks need to be read for a single operation.
- RAPORTLG, PRODUCTS (TABLE): Tables seeing high physical read requests.
- N23 (TABLE): Another table seeing high physical read requests, possibly via direct path reads.
- OBJ$ (TABLE): Data dictionary table, indicating its blocks are being frequently read from disk.
- Analysis: While the previous "IO Stats by Function" indicated fast individual reads, these lists highlight which segments are responsible for those many requests. You have hot indexes (CONTRACTORPROMOTIONREFS family) and specific tables being hit hard.
TOP 10 Segments by Physical Write Requests:
- PRODUCTS, CONTRACTORPROMOTIONREFS_PROMID, PRODUCTS_ROWVERSION, PRODUCTTAGS_ROWVERSION, CONTRACTORPROMOTIONREFS_PK, PRODUCTS_ROWVERSION1 (TABLEs & INDEXes): These are all application-related segments. High physical write requests mean a lot of DML activity.
- SEG$ (TABLE): Oracle dictionary table, seeing frequent physical write requests, possibly related to space management or object modifications.
- Analysis: This directly reflects your high db file parallel write and log file parallel write events. Your main application tables and their indexes are undergoing constant modification, generating a torrent of physical write requests.
TOP 10 Segments by Row Lock Waits:
- I_OBJ#, I_COBJ#, I_FILE#_BLOCK#, I_OBJ4 (INDEXES): These are Oracle internal indexes. Their presence here is extremely alarming. I_OBJ# is the primary key index on OBJ$, I_COBJ# is on COL$. Contention on these indexes signifies deep dictionary cache contention and likely problems with DDL or metadata access.
- PROMOTIONDEFINITIONS (TABLE): 31.804% occurrence. This is a very hot application table for row locks. This table is a strong candidate for a dvtr3uwuqbppa (INSERT) contention.
- CONTRACTORPROMOTIONREFS_PK (INDEX): An application index also experiencing row lock contention. This can happen if sessions are trying to insert into an index block while another session has it locked for modification.
- TRAS3 (TABLE), PRODUCTTAGS (TABLE): Other application tables with row lock waits.
- Analysis: The enq: TX - row lock contention identified earlier is directly manifested here. The fact that Oracle internal segments are showing up is critical. This reinforces the idea of either heavy DDL/object creation/deletion or extremely high contention on specific application tables.
- For enq: TX - row lock contention, TOP 10 SQLs (cross-checking the SQL section):
  1. dvtr3uwuqbppa (INSERT): AVG Ela by Exec: 4.93s, AVG No. executions: 7152.39. This is an INSERT statement, which definitely performs row locks. Its strong correlation (0.97) with enq: TX - row lock contention makes it the absolute top culprit.
  2. aubnkrv6cz41z (SELECT): AVG Ela by Exec: 0.07s, AVG No. executions: 77041.73. This is a SELECT. While it has 0.52 correlation, SELECT statements usually do not cause TX row locks unless it's SELECT FOR UPDATE. You need to investigate if this SQL uses FOR UPDATE. If not, it's a victim, waiting for other transactions.
  3. 28dxvd44z5rwf (MERGE): AVG Ela by Exec: 0.44s, AVG No. executions: 5801.54. MERGE statements can cause TX locks on both source and target rows. Its 0.45 correlation makes it a significant contributor. (If other SQLs were provided with high TX correlation, I'd list and analyze them here, focusing on DML operations.)

💡 Suggestions for MAD Algorithm and MOS Notes

Suggested other window size or threshold for MAD algorithm:

Current MAD threshold = 7: This is quite aggressive. It detects anomalies that are 7 times the median absolute deviation away from the median. If you want to see more subtle anomalies or patterns, you could reduce the threshold to 3 or 4. This would cast a wider net and highlight more deviations, which might be helpful for identifying emerging issues before they become critical.
Current window size = 10% (32 of probes out of 327): This window size means the algorithm looks at a rolling window of 32 data points to calculate the median and MAD.
- If your workload has very frequent, short bursts of activity, a smaller window size (e.g., 5%) might be more sensitive to those rapid changes.
- If you're looking for longer-term trends or sustained periods of degraded performance that might be missed by small, sharp spikes, a larger window size (e.g., 20-30%) could provide a smoother baseline and highlight prolonged deviations.
- Given the highly variable nature (high STDDEV for many metrics), experimenting with a slightly smaller window (e.g., 20-25 samples) might pinpoint very sharp, localized anomalies better.

Oracle MOS notes describing the problems:

enq: TX - row lock contention:
- MOS Note 1381273.1: "Troubleshooting 'enq: TX - row lock contention' wait event in Oracle Database"
- MOS Note 1548689.1: "How to tune TX enqueue contention for applications performing DML"
cursor: pin S wait on X and library cache lock:
- MOS Note 1484705.1: "High 'Cursor: Pin S Wait On X' or 'Library Cache Lock' Wait Events due to DDL in a Busy System"
- MOS Note 1195608.1: "High Version Counts Cause High CPU and Poor Performance" (often related to cursor pinning)
row cache mutex:
- MOS Note 1013423.6: "WAITS FOR LATCH 'ROW CACHE OBJECTS'" (Older but still relevant concepts)
- MOS Note 1584903.1: "Performance Troubleshooting for High 'Row Cache Mutex' Contention"
log file parallel write / log file sync:
- MOS Note 137452.1: "Diagnosing Latch Contention on 'redo allocation' and 'redo copy'" (related to redo generation)
- MOS Note 2154917.1: "High 'log file sync' Wait Time in AWR Report"
- MOS Note 350566.1: "Troubleshooting Log File Sync Waits"
Chained Rows (table fetch continued row):
- MOS Note 1020054.6: "Chained Rows and Migration"
RECYCLEBIN$ and related dictionary contention:
- While specific MOS notes directly linking these anomalies are rare, general troubleshooting for dictionary cache contention and space management issues would apply. Checking RECYCLEBIN$ behavior is covered in documentation for PURGE TABLESPACE/DATABASE.

🚀 Conclusion and Recommendations

Alright, here’s the bottom line, spelled out like it's for kindergarteners, but with more sarcasm: Your database isn't slow because it's thinking too hard. It's slow because it's waiting for everything.

The Core Problems:

I/O & Redo Saturation: log file parallel write and db file parallel write are killing you. Your disk system isn't slow per operation, but it's overwhelmed by the sheer volume of changes (Redo MB/s, Phy Writes MB/s). Your application is a DML-generating machine.
Concurrency Hotspots: enq: TX - row lock contention (driven heavily by dvtr3uwuqbppa), read by other session, and buffer busy waits indicate sessions are constantly bumping into each other on frequently accessed data blocks and records.
Dictionary Cache Nightmare: row cache mutex, OBJ$, TAB$, COL$, CDEF$ in busy waits, and high Hard Parses/s mean Oracle's internal metadata structures are a major bottleneck. This is exacerbated by RECYCLEBIN$ activity and potentially frequent DDL or grant changes.
Inefficient Data Access: High db file sequential read (millions of small reads) and massive Logical Reads (MB)/s suggest queries are doing way too much work in memory, visiting unnecessary blocks, or suffering from chained rows.

Top Priority Action Plan (Because nobody wants to hear "it's complicated"):

Conquer dvtr3uwuqbppa (INSERT) and enq: TX - row lock contention:
- Immediate Action: Work with developers to analyze this INSERT statement (dvtr3uwuqbppa). Is it batching too many rows in one transaction, leading to long-held locks? Can it be broken down or committed more frequently?
- Application Design: Investigate the design that leads to this contention on PROMOTIONDEFINITIONS. Can the locking strategy be improved? Consider using DBMS_ROWID to find the exact rows and tables affected during peak times if necessary.
Tame the Redo Monster:
- Optimize DML: Work with developers to reduce the amount of redo generated by their transactions. Are NOLOGGING options viable for temporary data loads? Can INSERT /*+ APPEND */ be used for bulk inserts?
- Redo Log Configuration: While individual writes are fast, the volume is insane. Review redo log sizes and groups. Are they adequately sized to prevent excessive log switches during busy periods? Ensure redo logs are on the absolute fastest storage available (local NVMe SSDs, if not already).
Address Dictionary Contention:
- Bind Variables: Enforce stringent use of bind variables for all SQL statements, especially for XXXX.Services.Cache.Promotions.App.exe module. This will drastically reduce Hard Parses/s and thus row cache mutex contention.
- DDL Management: Schedule all DDL (including grant/revoke operations and stat gathering on large objects) during off-peak maintenance windows. If the application dynamically creates/drops objects, that's a serious architectural flaw that needs redesign.
- RECYCLEBIN$ Cleanup: Schedule PURGE RECYCLEBIN; (or PURGE DBA_RECYCLEBIN;) during quiet periods to reduce contention on RECYCLEBIN$ and associated dictionary objects.
Tune Logical I/O Hogs:
- Analyze High Logical Read SQLs: Focus on aubnkrv6cz41z and tables like RECYCLEBIN$, PROMOTIONDEFINITIONS, LKW1, KONTRAH. Examine their execution plans. Are they doing unnecessary full table scans or inefficient index access?
- Chained Rows: For tables showing table fetch continued row anomalies (like those on June 25th-28th), identify them using DBA_TABLES.CHAIN_CNT and rebuild them with appropriate PCTFREE to eliminate chaining. This will reduce logical I/O for those tables.
PGA Tuning: Given the direct path write temp activity and PGA memory operation anomalies, re-evaluate PGA_AGGREGATE_TARGET. If queries are frequently spilling to disk, they might benefit from more PGA, or the queries themselves need tuning to reduce memory requirements.

Don't just stare at the pretty graphs. Get to work. The database isn't going to fix itself while you're busy making coffee.

Your source code: https://github.com/ora600pl/jas-min Need more expert analysis? The good performance tuning experts are at ora-600.pl.

Table of Contents