First step - identify the snapshots that cover the time period of the testing by looking in dba_hist_snapshot (I've not put in the SQL for this as it's fairly obvious)
My two snapshots are 541 and 542.
Next do something with a dual purpose, check the number of waits associated with each class (why chase things that won't have a big impact) and get the wait_class_id..
SQL >col wait_class for a20
SQL> select wait_class_id, wait_class, count(*) cnt
from dba_hist_active_sess_history
where snap_id between 541 and 542
group by wait_class_id, wait_class
order by 3;
------------- -------------------- ----------
2000153315 Network 1
3290255840 Configuration 1
1740759767 User I/O 2
1893977003 Other 22
3875070507 Concurrency 27
4108307767 System I/O 47
4217450380 Application 48
3386400367 Commit 265
1045
3871361733 Cluster 1273
10 rows selected.
Well, the majority of my waits are cluster related (though next on my hit list will be the commit related events). Let's find out which cluster waits are giving me problems
SQL> col event for a30
SQL> select event_id, event, count(*) cnt from dba_hist_active_sess_history
where snap_id between 541 and 542 and wait_class_id=3871361733
group by event_id, event
order by 3;
EVENT_ID EVENT CNT
---------- ------------------------------ ----------
1742950045 gc current retry 1
661121159 gc cr multi block request 1
3905407295 gc current request 1
2705335821 gc cr block congested 3
3897775868 gc current multi block request 3
105117041 gc buffer busy release 3
1520064534 gc cr block busy 5
3785617759 gc current block congested 7
2685450749 gc current grant 2-way 12
1912606394 gc buffer busy acquire 62
2277737081 gc current grant busy 73
2701629120 gc current block busy 112
737661873 gc cr block 2-way 360
111015833 gc current block 2-way 630
14 rows selected.
So most of my waits are 'gc cr block 2-way' and 'gc current block 2-way'. Take their event_id and see which SQL was doing most of the waiting for those events.
SQL> select sql_id, count(*) cnt from dba_hist_active_sess_history
where snap_id between 541 and 542
and event_id in ( 737661873,111015833 )
group by sql_id
having count(*)>200
order by 2
SQL_ID CNT
------------- ----------
3dh7u84crfyw6 526
So now we have the SQL ID, we can get the text associated with it, and hopefully come up with a fix on the SQL.
SQL> select sql_text from dba_hist_sqltext where sql_id='3dh7u84crfyw6';
SQL_TEXT
--------------------------------------------------------------------------------
INSERT INTO TRANSACTION_LOG( TRL_ID, TRL_BUSINESS_DATE, TRL_SYSTEM_TIMESTAMP, TR
So the majority of our cluster waits is caused by the insert into the transaction log...
The two waits are related to transferring and requesting blocks in memory from the other instance. The Current block is where we're competing for blocks to update. If a block has fewer rows, then the chance that both instances will need the same block is reduced.
This leaves me with 2 simple fixes that doesn't need the application fiddling with:-
1. Reduce row density (tweak PCTFREE)
2. Reduce my blocksize from 8k to 2k.
I chose the 2k blocksize option by creating a new tablespace with the smaller blocksize and a new buffer pool to support it. It increased throughput by 10%