Wednesday, 28 April 2010
Moving platforms
Find me at http://technicalsanctuary.wordpress.com/
Friday, 23 April 2010
Changes to ASM and other complicated jobs
Run ocrcheck to see where your existing OCR location
Status of Oracle Cluster Registry is as follows :
Version : 3
Total space (kbytes) : 262120
Used space (kbytes) : 2744
Available space (kbytes) : 259376
ID : 277541681
Device/File Name : +DATA
Device/File integrity check succeeded
Device/File Name : +LOG1
Device/File integrity check succeeded
Device/File integrity check succeeded
Cluster registry integrity check succeeded
Logical corruption check succeeded
# ocrconfig -delete +LOG1
# ocrconfig -delete +LOG2
This leaves us with a single copy of the OCR in the +DATA diskgroup.
ASM Parameter file stored in +LOG1
This initially seemed to be the most tricky - there is no $ORACLE_HOME/dbs/initASM.ora to point to an ifile and I couldn't see how this was configured in the clusterware.
Look at the existing location
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
spfile string +LOG1/b-rac-cluster/ASMPARAMETERFILE/REGISTRY.253.713418645
There are some new commands that allow you to look after the ASM spfile in 11gR2 http://download.oracle.com/docs/cd/E11882_01/server.112/e10500/asm_util002.htm#CIHDDJAF
The one I'm going to use is spmove
ASMCMD> spmove +LOG1/b-rac-cluster/ASMPARAMETERFILE/REGISTRY.253.713418645 +DATA/b-rac-cluster/ASMPARAMETERFILE/REGISTRY.253.713418645
ORA-15056: additional error message
ORA-17502: ksfdcre:4 Failed to create file +DATA/b-rac-cluster/ASMPARAMETERFILE/REGISTRY.253.713418645
ORA-15177: cannot operate on system aliases
ORA-06512: at line 7 (DBD ERROR: OCIStmtExecute)
ORA-15032: not all alterations performed
ORA-15028: ASM file '+LOG1/b-rac-cluster/ASMPARAMETERFILE/REGISTRY.253.713418645' not dropped; currently being accessed (DBD ERROR: OCIStmtExecute)
hmmm, I guess since ASM is open, it will be accessed by someone.
So, next steps, stop and restart the ASM instances and check they are using the new spfile
SQL*Plus: Release 11.2.0.1.0 Production on Fri Apr 23 03:30:14 2010
Copyright (c) 1982, 2009, Oracle. All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options
SQL> show parameter spfile
NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
spfile string +DATA/spfileasm.ora
Disks
First step, remove all the logfiles from LOG1
alter database drop logfile group XX;
Remember the minimum amount of logfiles for a thread is 2, so I needed to temporarily create logfiles for thread 1 in +LOG2. (not described)
Login as sysasm to the ASM2 instance and dismount the +LOG1 diskgroup
SQL*Plus: Release 11.2.0.1.0 Production on Fri Apr 23 03:30:14 2010
Copyright (c) 1982, 2009, Oracle. All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options
SQL*Plus: Release 11.2.0.1.0 Production on Fri Apr 23 03:30:14 2010
Copyright (c) 1982, 2009, Oracle. All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options
Then, using format as root, repartition the disks
Current partition table (original):
Total disk cylinders available: 36644 + 2 (reserved cylinders)
Part Tag Flag Cylinders Size Blocks
0 usr wm 1 - 5462 80.01GB (5462/0/0) 167792640
1 usr wm 5463 - 10924 80.01GB (5462/0/0) 167792640
2 backup wu 0 - 36643 536.78GB (36644/0/0) 1125703680
3 usr wm 10925 - 16386 80.01GB (5462/0/0) 167792640
4 usr wm 16387 - 21848 80.01GB (5462/0/0) 167792640
5 usr wm 21849 - 27310 80.01GB (5462/0/0) 167792640
6 usr wm 27311 - 32772 80.01GB (5462/0/0) 167792640
7 unassigned wu 0 0 (0/0/0) 0
#ln -s /dev/rdsk/c10t60060E8005B182000000B18200000102d0s1 RAWLOG102s1
#ln -s /dev/rdsk/c10t60060E8005B182000000B18200000102d0s3 RAWLOG102s3
#ln -s /dev/rdsk/c10t60060E8005B182000000B18200000102d0s4 RAWLOG102s4
#ln -s /dev/rdsk/c10t60060E8005B182000000B18200000102d0s5 RAWLOG102s5
#ln -s /dev/rdsk/c10t60060E8005B182000000B18200000102d0s6 RAWLOG102s6
#ln -s /dev/rdsk/c10t60060E8005B182000000B18200000101d0s0 RAWLOG101s0
#ln -s /dev/rdsk/c10t60060E8005B182000000B18200000101d0s1 RAWLOG101s1
#ln -s /dev/rdsk/c10t60060E8005B182000000B18200000101d0s3 RAWLOG101s3
#ln -s /dev/rdsk/c10t60060E8005B182000000B18200000101d0s4 RAWLOG101s4
#ln -s /dev/rdsk/c10t60060E8005B182000000B18200000101d0s5 RAWLOG101s5
#ln -s /dev/rdsk/c10t60060E8005B182000000B18200000101d0s6 RAWLOG101s6
SQL> alter database add logfile thread 1 group 11 ('/ORACLE/RAW/RAWLOG102s0') size 5g reuse;
Do the usual alter system switch logfile until you're into the new logs. Drop any temporary logs you've created for the instance.
Friday, 26 March 2010
Which SQL is causing my Cluster Waits?
First step - identify the snapshots that cover the time period of the testing by looking in dba_hist_snapshot (I've not put in the SQL for this as it's fairly obvious)
My two snapshots are 541 and 542.
Next do something with a dual purpose, check the number of waits associated with each class (why chase things that won't have a big impact) and get the wait_class_id..
------------- -------------------- ----------
2000153315 Network 1
3290255840 Configuration 1
1740759767 User I/O 2
1893977003 Other 22
3875070507 Concurrency 27
4108307767 System I/O 47
4217450380 Application 48
3386400367 Commit 265
1045
3871361733 Cluster 1273
10 rows selected.
Well, the majority of my waits are cluster related (though next on my hit list will be the commit related events). Let's find out which cluster waits are giving me problems
EVENT_ID EVENT CNT
---------- ------------------------------ ----------
1742950045 gc current retry 1
661121159 gc cr multi block request 1
3905407295 gc current request 1
2705335821 gc cr block congested 3
3897775868 gc current multi block request 3
105117041 gc buffer busy release 3
1520064534 gc cr block busy 5
3785617759 gc current block congested 7
2685450749 gc current grant 2-way 12
1912606394 gc buffer busy acquire 62
2277737081 gc current grant busy 73
2701629120 gc current block busy 112
737661873 gc cr block 2-way 360
111015833 gc current block 2-way 630
14 rows selected.
So most of my waits are 'gc cr block 2-way' and 'gc current block 2-way'. Take their event_id and see which SQL was doing most of the waiting for those events.
So now we have the SQL ID, we can get the text associated with it, and hopefully come up with a fix on the SQL.
So the majority of our cluster waits is caused by the insert into the transaction log...
The two waits are related to transferring and requesting blocks in memory from the other instance. The Current block is where we're competing for blocks to update. If a block has fewer rows, then the chance that both instances will need the same block is reduced.
This leaves me with 2 simple fixes that doesn't need the application fiddling with:-
1. Reduce row density (tweak PCTFREE)
2. Reduce my blocksize from 8k to 2k.
I chose the 2k blocksize option by creating a new tablespace with the smaller blocksize and a new buffer pool to support it. It increased throughput by 10%
11g R2 - Moving your voting disk and cluster registry.
They're very simple to move, for example to move them from a cluster filesystem to ASM
OCR
Login as root
Run ocrcheck to see where your existing OCR location (so you can delete it later).
# ocrcheck
Status of Oracle Cluster Registry is as follows :
Version : 3
Total space (kbytes) : 262120
Used space (kbytes) : 2744
Available space (kbytes) : 259376
ID : 277541681
Device/File Name : /OCR/b-rac-cluster/ocr
Device/File integrity check succeeded
Cluster registry integrity check succeeded
Logical corruption check succeeded
# ocrconfig -add +DATA
# ocrconfig -delete /OCR/b-rac-cluster/ocr
Run ocrcheck to check it's all ok
# ocrcheck
Status of Oracle Cluster Registry is as follows :
Version : 3
Total space (kbytes) : 262120
Used space (kbytes) : 2744
Available space (kbytes) : 259376
ID : 277541681
Device/File Name : +DATA
Device/File integrity check succeeded
Cluster registry integrity check succeeded
Logical corruption check succeeded
Voting
Login as Oracle
Find your existing voting disk
$ crsctl query css votedisk
## STATE File Universal Id File Name Disk group
-- ----- ----------------- --------- ---------
1. ONLINE 96953eb1cac64fcabf4d0f60d0012ef9 (/OCR/b-rac-cluster/vdsk) []
Change your voting disk to the +DATA diskgroup
b-rac1 $ crsctl replace votedisk +DATA
CRS-4256: Updating the profile
Successful addition of voting disk 3ddb5231538c4fc3bf2d0053e34a75f0.
Successful deletion of voting disk 96953eb1cac64fcabf4d0f60d0012ef9.
Successfully replaced voting disk group with +DATA.
CRS-4256: Updating the profile
CRS-4266: Voting file(s) successfully replaced
Check it's moved ok
b-rac1 $ crsctl query css votedisk
## STATE File Universal Id File Name Disk group
-- ----- ----------------- --------- ---------
1. ONLINE 3ddb5231538c4fc3bf2d0053e34a75f0 (/ORACLE/ASM/ASM0103) [DATA]
Located 1 voting disk(s).
Thursday, 18 February 2010
Nokia - Free Navigation for all?
However, it isn't that simple, unless you are on the short list of phones that support Ovi Maps 3.0.3, you don't get the free navigation at all. This means if you have a smartphone such as the E71 you don't get the free navigation, and there are some rumours that you cannot buy new navigation licenses.
Next time, no nokia for me.
Wednesday, 27 January 2010
Formatting a 2540 using SSCS
Sometimes to speed up the setup of identical disk systems you might decide to use SSCS rather than the Common Array Manager GUI to configure them.
All syntax mentioned here can be found in http://dlc.sun.com/pdf/820-4192-12/820-4192-12.pdf
First login to your CAMS server using sccs
root@c14-48 # sscs login -h c14-48 -u root
This only tells you if the login has failed, if you don't get a response, everything is ok and you're now connected to your CAMS server until it times out.
Select the profile for the storage you are setting up. You can get a list of profiles on your array (esal-2540-2 in my example) using
root@c14-48 # sscs list -a esal-2540-2 profile
Profile: Oracle_OLTP_HA
Profile: Oracle_DSS
Profile: Oracle_9_VxFS_HA
Profile: Sun_SAM-FS
Profile: High_Performance_Computing
Profile: Oracle_9_VxFS
Profile: Oracle_10_ASM_VxFS_HA
Profile: Random_1
Profile: Sequential
Profile: Sun_ZFS
Profile: Sybase_OLTP_HA
Profile: Sybase_DSS
Profile: Oracle_8_VxFS
Profile: Mail_Spooling
Profile: Microsoft_NTFS_HA
Profile: Microsoft_Exchange
Profile: Sybase_OLTP
Profile: Oracle_OLTP
Profile: VxFS
Profile: Default
Profile: NFS_Mirroring
Profile: NFS_Striping
Profile: Microsoft_NTFS
Profile: High_Capacity_Computing
You can get more detail on a profile using
root@c14-48 # sscs list -a esal-2540-2 profile Oracle_9_VxFS_HA
Profile: Oracle_9_VxFS_HA
Profile In Use: no
Factory Profile: yes
Description:
Oracle 9 over VxFS (High Availability)
RAID Level: 1
Segment Size: 128 KB
Read Ahead: on
Optimal Number of Drives: variable
Disk Type: SAS
Dedicated Hot Spare: no
The next step is to create my pool 'bt-poc'
The syntax for the command is
sscs create -a-p pool
root@c14-48 # sscs create -a esal-2540-2 -p Oracle_9_VxFS_HA pool bt-poc
Logically, at this point you want to create a logical disk, however you can't create one directly but you can create one implicitly as part of the volume creation
sscs create -a-p -s
-nvolume
root@c14-48 # sscs create -a esal-2540-2 -p bt-poc -s 15gb -n 6 volume vol1
Check the name of you new virtual disk using
root@c14-48 # sscs list -a esal-2540-2 vdisk
Virtual Disk: 1
root@c14-48 # sscs list -a esal-2540-2 vdisk 1
Virtual Disk: 1
Status: Optimal
State: Ready
Number of Disks: 6
RAID Level: 1
Total Capacity: 836.690 GB
Configured Capacity: 15.000 GB
Available Capacity: 821.690 GB
Array Name: esal-2540-2
Array Type: 2540
Disk Type: SAS
Maximal Volume Size: 821.690 GB
Associated Disks:
Disk: t85d01
Disk: t85d02
Disk: t85d03
Disk: t85d04
Disk: t85d05
Disk: t85d06
Associated Volumes:
Volume: vol1
Since I need another 4 identical volumes on this virtual disk I can be lazy just script up the creation.
root@c14-48 # for i in 2 3 4 5
do
sscs create -a esal-2540-2 -p bt-poc -s 15gb -v 1 volume vol${i}
done
The sccs commands are asynchronous - they return before the action is completed, so long running tasks like creating a RAID5 volume will still be running while you create your volumes on it.
Can delete any mix ups simply
root@c14-48 # sscs delete -a esal-2540-2 volume vol3
At this point, you can either map your volumes to the default storage domain, and all hosts connected to the storage will be able to see all the volumes, or you can do LUN mapping and limit which hosts can see which volumes.
Map to the default storage domain
for i in 1 2 3 4 5
do
sscs map -a esal-2540-2 volume vol${i}
done
Create host based mappings
Create your hosts
root@c14-48 # sscs create -a esal-2540-2 host dingo
root@c14-48 # sscs create -a esal-2540-2 host chief
Create the initators that map to the World Wide Number (WWN) for the Host Bus Adaptor (HBA) of each machine.
First find your WWN - you can do this either by looking on the storage switch if you have one, or on the hosts that will be accessing the storage.
Looking on the host you issue the command fcinfo hba-port, and look for the HBA port WWN associated with the correct fibre channel devices. I've highlighted the entries in red for clarity.
dingo # fcinfo hba-port
HBA Port WWN: 21000003ba9b3679
OS Device Name: /dev/cfg/c1
Manufacturer: QLogic Corp.
Model: 2200
Firmware Version: 2.01.145
FCode/BIOS Version: ISP2200 FC-AL Host Adapter Driver: 1.15 04/03/22
Type: L-port
State: online
Supported Speeds: 1Gb
Current Speed: 1Gb
Node WWN: 20000003ba9b3679
HBA Port WWN: 210000e08b09965e
OS Device Name: /dev/cfg/c3
Manufacturer: QLogic Corp.
Model: 375-3108-xx
Firmware Version: 3.03.27
FCode/BIOS Version: fcode: 1.13;
Type: N-port
State: online
Supported Speeds: 1Gb 2Gb
Current Speed: 2Gb
Node WWN: 200000e08b09965e
HBA Port WWN: 210100e08b29965e
OS Device Name: /dev/cfg/c4
Manufacturer: QLogic Corp.
Model: 375-3108-xx
Firmware Version: 3.03.27
FCode/BIOS Version: fcode: 1.13;
Type: N-port
State: online
Supported Speeds: 1Gb 2Gb
Current Speed: 2Gb
Node WWN: 200100e08b29965e
root@c14-48 # sscs create -a esal-2540-2 -w 210000e08b09965e -h dingo initiator dingo-1
root@c14-48 # sscs create -a esal-2540-2 -w 210000e08b29965e -h dingo initiator dingo-2
root@c14-48 # sscs map -a esal-2540-2 -v vol1,vol3 host dingo
And there you have it - you've formatted and mapped your volumes without needing the web interface.
How to find out if you're accessing a RAC database
How can you tell if you're accessing a RAC database? Simple!
You can tell if it is a cluster database by looking to see if the cluster database parameter is set:-
SQL> select name, value from v$parameter where name='cluster_database';
NAME VALUE
--------------------- ---------------------
cluster_database TRUE
or
set serveroutput on
BEGIN
IF dbms_utility.is_cluster_database THEN
dbms_output.put_line('Running in SHARED/RAC mode.');
ELSE
dbms_output.put_line('Running in EXCLUSIVE mode.');
END IF;
END;
/
You can tell how many instances are active by:-
SQL> SELECT * FROM V$ACTIVE_INSTANCES;
INST_NUMBER INST_NAME
----------- -----------------------
1 c1718-6-45:AXIOSS1
2 c1718-6-46:AXIOSS2
VIPCA fails when installing Oracle 10g CRS
Running vipca(silent) for configuring nodeapps
The given interface(s), "nxge0" is not public. Public interfaces should be used to configure virtual IPs.
This is can be caused by your public network interface being 10.x.x.x - vipca assumes that this is a private network.
Login as root and run vipca from $CRS_HOME/bin - this will allow you to set the interface types correctly.
Using Oracle 10.2.0.3 on SPARC?
The patch is rolled into 10.2.0.4, but it's not documented as part of the patch notes.