Creating DataGuard database in production

(This is public version of document. Use internal version, if it is available; it contains few additional details).

 

Production uses DataGuard oracle cluster for redundancy (and can use Logical Standby for reporting). Creating these databases require very careful configuration, which fail after a single error in the configuration.

 

Here is step-by-step process to plan and create such database.

 

Step 1. Prerequisites.

 

2 Linux servers required, both with SLES9 Sp3 Linux.

Names: traindb01, traindb02 – for staging; xxoddb01, xxoddb02 – for production.

Each server should have file systems for:

 

File systems:

File System

Relative Size %

Goal

/db01/db0

100

DataFiles

/db01/redo0

2

REDO files

/db01/flash0

150

Flash recovery Area

/db01/local0

4

Local copy of redo

/db01/backup0

50

Primary backup area

/db01/arch

300NFS, big area

Long term storage for backups

/local/apps

10 GB

Applications

 

 

 

 

Software.

SLES9 SP3 Linux x86_64;

ISCSI or another SAN storage access system;

Oracle 10.2.0.3 (with patchset)

Oracle Enterprise Manager agent.

 

Step 2. Database Naming (an example),

DataGuard success depends on the careful name selection and following all defaults. We select for the databases names (for production, replace ‘tr’ to ‘pr’):

<pr or tr > - production or training (keep it 2 symbols long);

<xx for xxxx, wf for yyyy, and so on>

<u> for utf-8, nothing for ascii;

<number>

p – primary, s – physical standby, l – logical standby.  7 characters max (we can use 8 maximum in SID).

 

Name

Server1

Server2

Role

DB name

trxx1

Database name

SID

trxx1p

trxx1s

Oracle SID

UNIQ name

trxx1p

trxx1s

Db_uniq_name

Service

trxx1_s

Universal service name

 

 

 

 

DB name

trxx2

Database name

SID

trxx2s

trxx2p

SID

UNIQ name

trxx2s

trxx2p

UNIQ name

Service

trxx2_s

Universal service name

 

 

 

 

Names for other databases (add suffixes p for primary instance, s for standby instance, l for logical standby):

 

And so on. Names must be shorter then 6 characters before adding suffixes, because Unix SID have 8 char limitation in some parts of the DataGuard.

Step 3. Creating primary database (suffix p) – shown on example of trxx1.

 

Slogin to the server (slogin –X oracle@server) with X11.

Start dbca.

 

  1. Select Create Database.
  2. Select Custom Database.
  3. Enter :
    1. Global Database name: trxx1 ;
    2. SID: trxx1p ; !attention. This is not default!
  4. Keep selected ‘Configured the Database with Enterprise Manager / Use Grid…’. !if you don’t see it, it means that you did not installed OEM agent!.
  5. Enter sys password (follow rules);
  6. Keep ‘Use File system’;
  7. Now change settings:
    1. Select ‘Use Oracle-Managed Files’;
    2. Change Database Area to: /db01/db0 ;
    3. Click on ‘Multiplex Redo Logs and Control FIles’:

                                                    i.     First location: /db01/redo0

                                                  ii.     Second one: /db01/local0

    1. OK
  1. Specify Flash Recovery area:
    1. Location: /db01/flash0 ;
    2. Size: 30 GB for 20 GB database, and so on;
    3. Enable Archiving;
    4. Edit Archive Mode parameters:

                                                    i.     Change .dbf onto .arc in file name format;

  1. Create new file location variable: DB_UNAME_UPPER, set to uppercase of {db_unique_name}
  2. Next, and now unselect all options (Data Mining, text and so on) except Enterprise Manager Repository.
  3. Next and now edit configuration:
    1. Change Memory setting. Depending on the # of databases, it should be 20 – 30%. You can change it later.
    2. Click on ‘All initialization parameters’ and then ‘Show Advanced parameters’. Verify and change:

                                                    i.     db_unique_name = trxx1p (see table) – use SID name here;

                                                  ii.     filesystem_options = ‘setall’;

                                                iii.     standby_file_management = ‘AUTO’.

                                                iv.     control_files =
(”/db01/redo0/{DB_UNAME_UPPER}/{DB_UNAME_UPPER}-1.ctl”, “/db01/local0/{DB_UNAME_UPPER}/{DB_UNAME_UPPER}-2.ctl”,/db01/db0/{DB_UNAME_UPPER}/{DB_UNAME_UPPER}-3.ctl “)

                                                  v.     sga_max_size = 2000000000 (2 GB)

                                                vi.     CLOSE

    1. Verify Character Set – keep defaults for ASCII databases and Use Unicode for UTF-8 databases;
  1. Next. Better add 2 more redo files, each 51200 KB, for better reliability.
  2. Create directories  (this step is questionable, but it is harmless).
     /db01/redo0/{DB_UNAME_UPPER}
     /db01/local0/{DB_UNAME_UPPER}
     /db01/redo0 /{DB_UNAME_UPPER}
    (replace {DB_UNAME_UPPER} by the real name).
  3. Check Create Database, Create Scripts and Save as a Template
  4. Next and Next, then Finish – it starts database creating.

 

After primary database is created and started:

  1. Edit /etc/oratab and allow autostart for it (replace N to Y);
  2. Find database in Oracle Enterprise Manager and verify that you can login into Management and Administration pages and that database have not any problems.
  3. Don’t forget, that you have a database with archive logs, which require regular backups, or it will run out of disk space on /db01/flash0 disk because of piled up archive logs. Configure backups in 1 – 2 days (do it after creating physical standby if you plan automated standby failovers).

 

Step 4. Creating Physical Standby.

Now it’s time to create a Physical Standby on the second server. The only way to do it without extra DBA skills and headache is using OEM web interface and following rules. You need a working Oracle Enterprise Manager (web application), version 10.2 or higher.

  1. Verify database status using sqlplus from the command line (and user ‘system’);
  2. Login to Oracle Enterprise Manager, select Targets, then Databases, then <primary database> (you must see it with suffix p);
  3. Select Administration page and then click All Initialization Parameters and login as SYS to verify options which you changed in the previous step (db_unique_name, filesystem_options, standby_file_management);
  4. If this is your first database on the server, verify performance using HAMMERORA, SWINGBENCH or other open-source performance tools;
  5. Now start creating Physical StandBy: Select ‘Maintenance’ and then ‘DataGuard: Setup and Manage’ (very left column);
    1. Click Add Standby Database;
    2. Select ‘Create a new Physical Database’, Continue;
    3. Perform a live backup’, Continue;
    4. Edit here:

                                                    i.     Backup Files Directory: /db01/backup0 (or use disk with enough space);

                                                  ii.     Enter Oracle user name and password (user Oracle on primary DB server);

                                                iii.     Next;

    1. Database Location Page:

                                                    i.     Instance Name (for example, trxx1s); <Database Name + s for physical standby, l for logical standby);

                                                  ii.     Select server (traindb02 for example) other then primary host;

                                                iii.     Next.

    1. File Location Page:

                                                    i.     Backup File Directory Area: /db01/backup0

                                                  ii.     Method = HTTP

                                                iii.     Database Area: /db01/db0 (!don’t forget to change!);

                                                iv.     Flash Recovery Area: /db01/flash0 (!don’t forget to change);

                                                  v.     Flash Recovery Size (the same as for primary DB). (!don’t forget to change!).

                                                vi.     Multiplex REDO and Controls. Use:

1.     /db01/redo0

2.     /db01/local0

                                              vii.     Check everything here!

                                            viii.     Next.

    1. Configuration.

                                                    i.     Database Unique Name – use the same as sid, for example trxx1s ;

                                                  ii.     Target name: use the same name again – trxx1s (for example);

    1. FINISH!

 

Now you can drink coffee and watch how system creates standby database. If everything was done correct (names, disk locations, enough space on disks, listeners configured and so on) you will see DataGuard page with ‘Creating in progress’ status. It takes about 10 – 15 minutes for the system to complete database cloning and change status to Normal (if there are not errors).

 

Step 5. Correcting issues related to the Physical StandBy.

When you see Normal DataGuardStatus, you should verify configuration and fix few possible glitches. The only problem, known for me, rely to ‘standby file management’ – it must be configured as AUTO, but OEM configure it as manual.

 

To correct it, do a few steps (I describe using OEM, but you can use sqlplus instead):

  1. Open standby database in OEM;
  2. Select Administration;
  3. Select All Initialization Parameters;
  4. Login as SysDba (Notice – when working with OEM, if you see errors when trying to open database, logout from database and login again –sometimes it helps). don’t forget to apply changes to SPFile too.
    1. Verify filesystemio_options (should be setall);
    2. Change standby_file_management = ‘AUTO’; (Apply change to both, memory and SPFile);
    3. APPLY
  5. Last change caused inconsistency between a DataGuard and DataBase (error ORA-16792). To fix it, login (by slogin) into the standby server as Oracle and run (sid name for standby database – for example, trxx1s):

    export ORACLE_SID=trxx1s
    dgmgrl
    connect
    (sys login and password here)
    show database verbose <sid-name>
    edit database <sid-name> set property STANDBYFILEMANAGEMENT=’AUTO’;


Step 6. Configuring FastFailover and Observer.

FastFailover and Observer must be configured because we don’t want to start databases manually, and the only mode when they are auto-started and auto-failed-over is this mode.

  1. Open primary Database in OEM; select maintenance -> Data Guard;
  2. Be sure (absolutely) that DataGuard is in Normal status before doing any changes.
  3. Change protection Mode to Maximum Availiability:
    1. Click on ‘Maximum Performance’ mode;
    2. Select ‘Maximum Availiability’, Continue;
    3. Continue; enter Oracle user login and password;
    4. Confirm restart (Yes);
    5. Wait when DataGuard complete reconfiguration;
  4. When system returns to the DataGuard menu, check status again - it must be Normal and mode must be Maximum Availiability;
  5. Now set up Observer and turn FastFailover on:
    1. Click on FastFailover: Disabled ;
    2. Set Observer (select 3-d server – OEMSERVER for staging and development), OK;
    3. Set Timeout (longer then any normal reboot – usually something about 10 minutes);
    4. Allow Automatic restart of the observer; Continue;
    5. Enter Oracle login for observer’s host;
    6. Set up reasonable Flash Recovery time for both servers (I set up 6 hours); Continue;
    7. Confirm (YES);
  6. When system returns you to the DataGuard, check results. Here is a status screen example:

    (Picture removed from public document because of the size).

  7. Configure autostart for standby database (now you can do it) – edit /etc/oratab on standby server and replace N to Y for this database.
  8. WARNING: Fast-start failover requires that a specific service be statically registered with the local listener for each instance (primary and standby) in the configuration. The GLOBAL_DBNAME for this service must be of the form <db_unique_name>_DGMGRL.<db_domain_name>. If the standby database is created without using Grid Control this service will need to be added manually to the listener.ora file. Otherwise Grid Control adds the entry on standby creation to both the primary and standby servers listener.ora file. OEM do it automatically, but please verify.

STEP 7. Creating trigger for the client failover.

When DataGuard is created (and reasonable before testing), you should configure automated service trigger, which set starts service on primary server and stops it on standby server (so allowing client to select active server). I use such script to do it:

  1. Login (using slogin) onto the primary server;
  2. Login into sqlplus as sysdba.
  3. Run script below, substituting &1 by the service name (we use names trxx1_s , see table in the beginning):

oracle@traindb02:/INST/SQL.d> cat set_trigger.sql
#
 
BEGIN
DBMS_SERVICE.CREATE_SERVICE('&1','&1');
END;
/
BEGIN
DBMS_SERVICE.START_SERVICE('&1');
END;
/
 
create or replace trigger managed_service
   after DB_ROLE_CHANGE on database
declare
    role varchar(30);
begin
   select database_role into role from v$database;
   if role = 'PRIMARY' then
       DBMS_SERVICE.START_SERVICE('&1');
   else
       DBMS_SERVICE.STOP_SERVICE('&1');
   end if;
end;
/
create or replace trigger managed_service_start
   after startup on database
declare
    role varchar(30);
begin
   select database_role into role from v$database;
   if role = 'PRIMARY' then
       DBMS_SERVICE.START_SERVICE('&1');
   else
       DBMS_SERVICE.STOP_SERVICE('&1');
   end if;
end;
/

 

For example (example is about trwf1 database):

oracle@traindb02:/INST/SQL.d> export ORACLE_SID=trwf1p
oracle@traindb02:/INST/SQL.d> sqlplus sys as sysdba
 
SQL*Plus: Release 10.2.0.3.0 - Production on Mon Aug 6 19:24:02 2007
 
Copyright (c) 1982, 2006, Oracle.  All Rights Reserved.
 
Enter password:
 
Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.3.0 - 64bit Production
With the Partitioning, OLAP and Data Mining options
 
SQL> @set_trigger.sql trwf1_s
old   2: DBMS_SERVICE.CREATE_SERVICE('&1','&1');
new   2: DBMS_SERVICE.CREATE_SERVICE('trwf1_s','trwf1_s');
 
PL/SQL procedure successfully completed.
 
old   2: DBMS_SERVICE.START_SERVICE('&1');
new   2: DBMS_SERVICE.START_SERVICE('trwf1_s');
 
PL/SQL procedure successfully completed.
 
old   8:        DBMS_SERVICE.START_SERVICE('&1');
new   8:        DBMS_SERVICE.START_SERVICE('trwf1_s');
old  10:        DBMS_SERVICE.STOP_SERVICE('&1');
new  10:        DBMS_SERVICE.STOP_SERVICE('trwf1_s');
 
Trigger created.
 
old   8:        DBMS_SERVICE.START_SERVICE('&1');
new   8:        DBMS_SERVICE.START_SERVICE('trwf1_s');
old  10:        DBMS_SERVICE.STOP_SERVICE('&1');
new  10:        DBMS_SERVICE.STOP_SERVICE('trwf1_s');
 
Trigger created.
 
SQL>

 

To verify, run lsnrctl services | grep <db_name> on both servers and verify, that your service name exists on primary server only.

 

STEP 8. Configuring tnsnames.ora

 

Now you can configure tnsnames.ora on the clients (and use similar string in jdbc configurations). For example, here is configuration for trxx1 database:

trxx1 =
  (DESCRIPTION =
    (ADDRESS_LIST = (LOAD_BALANCE=OFF)(FAILOVER=ON)
      (ADDRESS = (PROTOCOL = TCP)(HOST = traindb01.stg2.mycompany.com)(PORT =1521))
      (ADDRESS = (PROTOCOL = TCP)(HOST = traindb02.stg2.mycompany.com)(PORT =1521))
    )
    (CONNECT_DATA =
      (SERVICE_NAME = trxx1_s)
        (FAILOVER_MODE=
                (TYPE=session)
                (METHOD=basic)
                (RETRIES=180)
                (DELAY=5)
        )
    )
  )
 

STEP 9. Run tests.

Here is what to test:

  1. Create tablespace on primary server, verify that standby created it too;
  2. Drop tablespace on primary server, verify that secondary server dropped it;
  3. Run switchover (using OEM). Verify that both servers switched their roles successfully;
  4. After switchover, verify sqlplus connection again (using name above);
  5. Decrease observer timeout; abort Oracle on the primary server; wait until timeout happen on observer. Verify that:
    1. Standby system became primary;
    2. Service switched to standby and clients can connect using string above;
    3. When you restart primary system (or just oracle) it will be instantiated as a new standby automatically;
    4. Run high performance test (TPCC) and verify that standby do’t lost synchronization with primary;
    5. Switch systems over and run again; performance should not be very different;

 

STEP 10. Backups.

You can’t keep these systems without backup because they will run out of archive space. COnfigure periodic backups with removing all archive logs periodically. Full backup script is in development yet.