Along with the basic architecture description this article provides some good links which provide a detailed step by step guide for a oracle 11g rac installation on a linux server.
The basic concept of RAC is allowing multiple instances to access the same database.
These instances are connected to each other through a high speed private network called “Interconnect”. In a RAC setup each instance has its own SGA and set of background processes, but a common set of datafiles, control files, redolog files,archive redo log files, flash recovery logs, alert logs and trace files.
So how is data modified in one instance, and not yet written to disk, accessed by another instance.
This is where the RAC architecture comes into play. There is a controlling mechanism called Global Cache Management which maintains read consistency of data blocks across nodes. This is termed as Cache fusion.
Global cache management ensures consistency of multiple copies of the buffer cache across instances. For this there are additional background processes other than the normal ones, that ensure cache coherency across nodes. So if one instance is modifying a data block, the Global Cache Services (GCS) process keeps a track of this and also takes an exclusive lock on that block on behalf of Node 1. Now if a second node wants to modify the same data block, node 1 releases the lock on that block and GCS ensures that Node 2 has the latest block including the modifications from Node 1. GCS also takes an exclusive lock on this block on behalf of Node 2. The Global Resource Manager (GRM) process coordinates the lock requests between multiple RAC instances. This process of managing multiple copies of data blocks across instances is also known as Parallel Cache Management. Similar to GCS there is a Global Enqueue Service (GES) monitor which manages all non data block resources.
Some of the key Oracle background processed in a RAC system are
LMS – Lock Manager Server process – GCS
LMD – Lock Manager Daemon – GES
LCK0 – Lock Process – GES
LMON – Lock Enqueue Monitor – GES
LMS – Lock Manager Server Process is used in Cache Fusion. It enables consistent copies of blocks to be transfered from a holding instance’s buffer cache to a requesting instance’s bufer cache with out a disk write . It rolls back any uncommitted transactions, for any blocks that are being requested for a consistent read by the remote instance. It also handles global deadlock detection and monitors for lock conversion timeouts.
LMON- Lock Monitor Process is responsible for manage Global Enqueue Services (GES). It maintains consistency of GCS memory in case of any process death. LMON is also responsible for the cluster reconfiguration when an instance joins or leaves the cluster. It also checks for the instance death and listens for local messaging.
LMD-Lock Manager Daemon process manages Enqueue manager service requests for GCS. It also handles deadlock detection and remote resource requests from other instances.
LCK-Lock Process manages instance resource requests and cross-instance call operations for shared resources. It builds a list of invalid lock elements and validates lock elements during recovery.
DIAG – Diagnostic Daemon – This is a lightweight process, it uses the DIAG framework to monitor the health of the cluster. It captures information for later diagnosis in the event of failures. It will perform any necessary recovery if an operational hang is detected.
The major components of an Oracle RAC system are
Shared disk system
Oracle Kernel Components
There are many good sites on this topic where you will get a complete step by step guidance on installing RAC on linux. The sites I have found useful when doing the RAC installation are
This site is very good if you are going to make you rac on a vm workstation.
It also provides solutions to various issues that may be faced during the setup.
This site is very good for the installion of linux on the virtual machine as it gives a detailed list of all the rpms and the steps that are required for the os setup. Following the steps from this link for installing RAC on Virtual Box goes through successfully till the last but 1 step of running root.sh on Node 1. When running the root.sh on Node 2, I was getting some scan ip error due to which the cluster services did not start successfully. Maybe some step has got wrongfully executed during the setup.
The machine configuration and the softwares I have used for this oracle 11g rac installation are listed below
CPU AMD FX-8350, 8 core, 4Ghz
Motherboard Gigabyte 78LMT-USB3
Virtual Machine VMware-Workstation-Full-10.0.6-2700073.x86_64.bundle
OS Linux 22.214.171.124(2.6.18-371.el5)
Database Version Oracle 11g R2 (126.96.36.199)
The issues faced during the setup are
1. CPU showing 100% usage when doing the grid installation.
This was slowing down the grid installation considerably and the mouse clicks were also not getting registered on the virtual machine To overcome this, I have disabled 2 cores per cpu.
Following are the changes that were done at the BIOS level
Change CPU core control from Auto -> Manual
One core per unit changed to Enabled
Core performance boost changed from Enabled -> disabled
CPU Clock ratio Auto -> x21
Virtualisation changed from disabled -> enabled
Also, I have assigned 2 CPUs to the Node from which the Grid and Database setup will be done, and 1 CPU to the other Node.
With these changes the grid installation and database installation were moving ahead smoothly on the virtual machine.
2. Unable to access the shared disk from the 2nd Node
When doing the ASM setup , shared disk is created which is used by both the nodes.
After the setup is completed, and both the VMs are restarted, the shared disk may not be visible to the 2nd node.
When you type
# fdisk /dev/sdb if you get the following error
Unable to open /dev/sdb
In that case the following lines need to be added to the .vmx file of the 2nd VM.
scsi1.present = “TRUE”
scsi1.virtualDev = “lsilogic”
scsi1:0.fileName = “C:UsersDocumentsVirtualMachinesSharedDiskasmdisk1.vmdk” – this is the directory of the shared disk. The path will change according to your setup
scsi1:0.mode = “independent-persistent”
scsi1:0.present = “TRUE”
3. Virtual IP entered is invalid
During the grid installation, when adding the nodes with their corresponding VIPs, if you get the error
[INS-40910] Virtual IP: orpheus-vip entered is invalid
Cause – The virtual IP did not resolve to an IP address
Action – Enter a valid VIP that resolves to an IP address
[INS-40718]Single Client Access Name(SCAN) name:orpheus-scan could bot be resolved
Add the VIPs to the /etc/hosts file on both the nodes.
A typical /etc/hosts files as per the gruffdba installation setup would look like this
[oracle@orpheus ~]$ cat /etc/hosts
# Do not remove the following line, or various programs
# that require network functionality will fail.
127.0.0.1 localhost.localdomain localhost
::1 localhost6.localdomain6 localhost6
10.10.1.10 orpheus orpheus.hadesnet
10.10.1.20 eurydice eurydice.hadesnet
10.10.2.10 orpheus-priv orpheus-priv.hadesnet
10.10.2.20 eurydice-priv eurydice-priv.hadesnet
10.10.1.11 orpheus-vip orpheus-vip.hadesnet
10.10.1.21 eurydice-vip eurydice-vip.hadesnet
With the steps given in the above 2 sites for the RAC setup, the machine and software configurations as specified, and the above issues resolved as mentioned, your oracle 11g RAC installation on linux setup should get completed successfully without further hiccups.