Oracle Data Guard
Causes of Data Loss
Hardware & system errors 49%
Human errors 36%
Computer viruses 7%
Software corruption 4%
Natural disasters 3%
Source: Disaster Recovery Journal
System Real Application Clusters
Failures Continuous Availability for all Applications
Site Data Guard
Guaranteed Zero Data Loss
Failures
Unplanned
Downtime Human Flashback
Error Guaranteed Zero Data Loss
Storage/Net ASM Mirroring
Failures Storage Failure Protection
System Dynamic Reconfiguration
Maintenance Capacity on Demand without Interruption
Planned
Downtime Database Online Redefinition
Maintenance Adapt to Change Online
What is a
Standby Database ?
A copy of a production database that you can use for disaster
protection. You can update the standby database with redo logs from
the production database in order to keep it current. If a disaster
destroys the production database, you can activate the standby
database and make it the new production database.
You can maintain the standby data in one of the following modes:
For physical standby databases
Redo Apply
For logical standby databases
SQL Apply
A Standby Database is NOT Data Guard
Why Data Guard?
Data Guard helps you protect your Data.
Takes your data and automatically puts it elsewhere
Makes it available for Failover in case of failure.
The apply process also revalidates the log records to prevent
application of any log corruptions
Geographically dispersed sites
Useful for logical data corruptions if lag behind used
Flexible configuration options for protection level
Reporting and backups can be diverted to standby
Automatic resync for failed primary
Switchover for Maintenance
Traditional Physical Standby Databases
Investment in Disaster Recovery
Active Data Guard 11g
Investment in Improved Quality of Service
Requirements
Data Guard 11g has several options for deploying different
CPU architectures, O.S. binaries and Oracle database
binaries, on primary and standby systems.
For example, the primary database may be on Windows,
and the standby database may be on Linux.
See MetaLink Note 413484.1 for latest capabilities and
restrictions
Bandwidth Requirements
Depends on Redo generation
Find peak redo in AWR report
Load Profile Per Second Per Transaction
~~~~~~~~~~~~ --------------- ---------------
Redo size: 51,944.64 5,177.09
Bandwidth in MBPS =
(redo bytes per sec /0.7)8)/1,000,000
Physical Standby
Protection Modes
Physical Standby Architecture
Standby Redo Logs
Real Time Apply
Automatic Resynchronization
Database Protection Modes
Maximum Protection
No Data Loss and No data divergence
Arch_dest: mandatory, lgwr, sync, affirm
Primary db shutdown when unable to access stdby
Maximum Availability
Arch_dest: mandatory, lgwr, sync, affirm
Protection auto lowered when stdby is unavailable
Maximum Performance
Arch_dest: lgwr/arch, sync/async, mandatory/optional
Minimal performance impact
Maximum Availability Mode
Protection Mode Failure Protection Redo Shipping
Maximum Availability Protects Against Primary LGWR using SYNC
Zero Data Loss Failure
Zero Data Loss as long as the network stays up!
Enforces protection of every transaction
Configuration: LGWR SYNC
If last standby is unavailable, processing continues at primary
When the standby becomes available again, synchronization with the primary is automatic
ALTER DATABASE SET STANDBY TO MAXIMIZE AVAILABILITY;
Architecture
Primary
database MRP or Standby
transactions LSP database
(MRP only)
LGWR RFS
Oracle net
Online
Standby
redo Backup
redo logs
logs
FAL Reports
ARC0
ARC0
Archived redo logs Archived redo logs
Standby Redo Logs
Standby Archived
Redo from primary redo logs redo logs
database
RFS ARC0
MRP/LSP
Standby database
Real Time Apply
Redo data is applied to the standby database as soon as it is received from
the primary database
In Oracle9i Data Guard this apply has to wait till an archivelog is
created on the standby database
For Redo Apply:
ALTER DATABASE RECOVER MANAGED STANDBY DATABASE
USING CURRENT LOGFILE
When real time apply is enabled, RECOVERY_MODE column in
V$ARCHIVE_DEST_STATUS displays “MANAGED REAL TIME APPLY”
SQL> ALTER SYSTEM SET LOG_ARCHIVE_DEST_2='SERVICE=tmstby
2> OPTIONAL LGWR SYNC AFFIRM
3> VALID_FOR=(ONLINE_LOGFILES,PRIMARY_ROLE)
4> DB_UNIQUE_NAME=tmstby';
SQL> ALTER SYSTEM SET
LOG_ARCHIVE_CONFIG='DG_CONFIG=(tmtst,tmstby)'
SQL> ALTER DATABASE SET STANDBY DATABASE TO MAXIMIZE
AVAILABILITY;
SQL> SELECT PROTECTION_MODE, PROTECTION_LEVEL FROM
V$DATABASE;
PROTECTION_MODE PROTECTION_LEVEL
--------------------- ---------------------
MAXIMUM AVAILABILITY MAXIMUM AVAILABILITY
Real-Time Apply Architecture
Physical
Oracle Net Logical
Transactions
Standby
Database
LGWR RFS MRP/ LSP
Standby
Redo
Logs
Online
Redo Real Time
Logs
Primary Apply
ARCH
Database
ARCH
Archived
Redo Logs
Archived
Redo Logs
Real Time Apply
Benefits
Standby databases now more closely synchronized with the
primary
More up-to-date, real-time reporting
Faster switchover and failover times
Reduces planned and unplanned downtime
Better Recovery Time Objective (RTO) for DR
Real Time Apply
-Tuning Media Recovery
Monitor via
Data Dictionary, OEM, Standby Statspack in 11G
Big performance boost in Oracle 11G
Up to 100% increase in redo apply performance
New standby statspack in Oracle 11G
See MetaLinkNote 454848.1
Includes information specific to a standby
Output from V$RECOVERY_PROGRESS
Output from V$MANAGED_STANDBY
Data Dictionary
V$DATABASE
DATABASE_ROLE: LOGICAL STANDBY, PHYSICAL STANDBY
or PRIMARY
PROTECTION_LEVEL: current protection mode setting.
FS_FAILOVER_STATUS: synchronization status
V$DATAGUARD_STATS
V$DATAGUARD_STATUS
V$LOG & V$STANDBY_LOG: Redo log changed.
V$MANAGED_STANDBY : Recovery progress
Determining Query Latency
From Primary (requires database link)
select scn_to_timestamp(
(select current_scn from v$database)
)-scn_to_timestamp(
(select current_scn from v$database@dg)
) from dual;
If you do not wish to connect to the Primary -determine the value for
APPLY LAG for a “best estimate”
Use Enterprise Manager monitoring
QueryV$DATAGUARD_STATS
select value,unit,time_computed from v$dataguard_stats
where name='apply lag';
Automatic Resynchronization
Network connectivity problems may occur
Data Guard automatically resynchronizes standbys after
network connectivity restored
Implicit
ARCH process idling away on the primary ‘pings’ all
standbys on a regular basis to see if they are missing
any redo data
If so it sends them the missing redo data
Explicit
Gap discovered during apply process in physical
standby
Based on FAL_SERVER and FAL_CLIENT settings,
primary notified, and it sends missing redo data
Data Guard Role Transitions
Switchover
Planned role reversal
Used for OS or hardware maintenance
Failover
Unplanned role reversal
Use in emergency
Zero or minimal data loss depending on choice of data protection mode
Different steps for Physical and Logical Standby
Switchover using Enterprise Manager is literally two mouse clicks
We’ll do a Physical Standby Failover via the command line using the Broker
Data Guard Broker
Fast-Start Failover
Fast-Start Failover Demo
Client Failover
Oracle 11G Active Data Guard
Fast-Start failover
Makes Data Guard more than a Standby Database. Enables automatic failover with no
data loss.
A feature of Oracle Database Enterprise Edition.
Only supports up to Maximum Availability Mode.
Requires 3rd server. Install DGMGRL client part of Oracle client administrator
software. Observer process continuously monitors primary and standby databases.
If the listener is not running on port 1521, local_listener must be set in the spfile.
Observer detects failure.
Observer automatically executes database failover once threshold has been exceeded.
DB_ROLE_CHANGE trigger fires: enables primary service. This trigger can be
customized to restart JDBC mid-tier clients and calls any other OCI enabled
application.
Fast-Start Failover
1. Data Guard in steady state – transmitting redo
2. Observer monitoring state of the configuration
Fast-Start Failover
3. Disaster strikes the primary – connections lost
Fast-Start Failover
4. Observer times out
5. Observer validates connection with target standby
6. Observer begins Fast-Start Failover
Fast-Start Failover
7. Target standby automatically becomes new primary (DB_ROLE_CHANGE trigger
fires)
Fast-Start Failover
8. After old primary is repaired, Observer re-establishes connection
9. Observer automatically reinstates old primary to be a new standby
10. Redo transmission starts from new primary to new standby
Events that trigger Fast-Start Failover
Database conditions:
Server crash or shutdown (without db shutdown)
Database instance failure (or last instance failure in a RAC configuration)
Shutdown abort (or shutdown abort of the last instance in a RAC
configuration)
Datafiles taken offline due to I/O errors
Network conditions:
When both the Observer and the standby database lose their network
connection to the primary database, and when the standby database
confirms that it is in a “synchronized” state.
Fast-Start Failover Conclusion
Fast
Site failover time measured in seconds, not minutes
Failover is automatic, no manual intervention
Reliable
Eliminates human error
Zero data loss failover
Simple
Automatically determines if failover criteria is met
Original primary database is automatically reinstated as a new standby
database following failover
Fast-Start Failover Conclusion
Prevention of "Split Brain" due to accidental startup of former primary
database
Reduced downtime through automatic activation of the standby database
A failover solution without a shared disk system
with additional advantages (enhanced data availibity)
and even reduced failover time compared to HA cluster
Many technical prerequisites (Flashback database, special Maximum
Availability Mode)
No automatic failover to a second standby database possible
Fast-Start failover
Requirements
Fast-Start Failover is a feature of Oracle Data Guard, and can't run
without a Data Guard Broker configuration!
Observer machine and configuration
Special entry in Data Guard Broker configuration
Maximum Availability Mode (mandatory)
but: special startup behaviour
but: primary stalls in certain situations
Flashback database must be activated
Demo: Switchover
1. Configure Broker and Fast_Start Failover
2. Configure Observer
3. Shutdown abort on the primary database TMTST
4. Wait until Fast_Start occurs on TMSTBY
5. Restart the old primary TMTST
6. Verify that observer reinstates database TMTST
Demo: Configure Fast_Start Failover
Flash-Recovery areas are setup on both sides
SQL> show parameter DB_RECOVERY_FILE_DEST
NAME TYPE VALUE
db_recovery_file_dest string /tst/dump/oracle/fra
db_recovery_file_dest_size big integer 2G
Setup Flashback Database (on both):
SQL> select FLASHBACK_ON from v$database;
FLASHBACK_ON
NO
SQL> SHUTDOWN IMMEDIATE;
SQL> STARTUP MOUNT;
SQL> ALTER DATABASE FLASHBACK ON;
SQL> ALTER DATABASE OPEN;
SQL> show parameter flash
NAME TYPE VALUE
db_flashback_retention_target integer 1440 (1 day)
Broker Parameters:
SQL> ALTER SYSTEM SET DG_BROKER_CONFIG_FILE1 ='+TVBSDG1/dr1tmstby.dat‘;
SQL> ALTER SYSTEM SET DG_BROKER_CONFIG_FILE2 ='+TVBSDG1/dr2tmstby.dat‘;
SQL> ALTER SYSTEM SET DG_BROKER_START=TRUE;
Demo: Configure Fast_Start Failover
Listener.ora on Primary
SID_LIST_LSNR_DGTEST =
(SID_LIST=
(SID_DESC =
(GLOBAL_DBNAME = tmtst.vodacom.co.za)
(ORACLE_HOME = /tst/opt/apps/oracle/database/10.2.0.4)
(SID_NAME = tmtst)
)
(SID_DESC =
(GLOBAL_DBNAME = tmtst_DGMGRL.vodacom.co.za)
(ORACLE_HOME = /tst/opt/apps/oracle/database/10.2.0.4)
(SID_NAME = tmtst)
)
)
Demo: Configure Fast_Start Failover
Tnsnames on both:
TMTST.VODACOM.CO.ZA =
(DESCRIPTION =
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST = prab03.vodacom.co.za)(PORT = 1521))
)
(CONNECT_DATA =
(SERVICE_NAME = tmtst_DGMGRL.vodacom.co.za)
)
)
TMSTBY.VODACOM.CO.ZA =
(DESCRIPTION =
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST = tvbs01.vodacom.co.za)(PORT = 1521))
)
(CONNECT_DATA =
(SERVICE_NAME = tmstby_DGMGRL.vodacom.co.za)
)
)
Demo: Configure Observer
#> dgmgrl
DGMGRL> connect sys/xxx@tmtst
Connected.
DGMGRL> CREATE CONFIGURATION TMDRTEST AS
> PRIMARY DATABASE IS tmtst -> SHOW PARAMETER DB_UNIQUE_NAME
> CONNECT IDENTIFIER IS tmtst; -> tns entry
Configuration "tmdrtest" created with primary database "tmtst"
DGMGRL> ADD DATABASE tmstby AS
> CONNECT IDENTIFIER IS tmstby
> MAINTAINED AS PHYSICAL;
Database "tmstby" added
DGMGRL> ENABLE CONFIGURATION;
DGMGRL> SHOW CONFIGURATION;
DGMGRL> SHOW DATABASE VERBOSE tmtst;
DGMGRL> EDIT DATABASE tmtst SET PROPERTY 'LogXptMode'='SYNC';
DGMGRL> EDIT DATABASE tmtst SET PROPERTY
FastStartFailoverTarget='tmstby';
DGMGRL> EDIT DATABASE tmstby SET PROPERTY
FastStartFailoverTarget='tmtst';
DGMGRL> ENABLE FAST_START FAILOVER;
DGMGRL> START OBSERVER; --> warning, prompt will not be returned!
Demo: Switchover
DGMGRL> SWITCHOVER TO tmstby; ------ duration 90 seconds!
Performing switchover NOW, please wait...
Operation requires shutdown of instance "tmtst" on database "tmtst"
Shutting down instance "tmtst"...
ORA-01109: database not open
Database dismounted.
ORACLE instance shut down.
Operation requires shutdown of instance "tmstby" on database "tmstby"
Shutting down instance "tmstby"...
ORA-01109: database not open
Database dismounted.
ORACLE instance shut down.
Operation requires startup of instance "tmtst" on database "tmtst"
Starting instance "tmtst"...
ORACLE instance started.
Database mounted.
Operation requires startup of instance "tmstby" on database "tmstby"
Starting instance "tmstby"...
ORACLE instance started.
Database mounted.
Switchover succeeded, new primary is "tmstby"
Verify on both
SELECT DATABASE_ROLE,STATUS,DB_UNIQUE_NAME, PROTECTION_MODE,
PROTECTION_LEVEL, SWITCHOVER_STATUS, checkpoint_change#,
current_scn ,STANDBY_BECAME_PRIMARY_SCN,
FS_FAILOVER_STATUS, FS_FAILOVER_CURRENT_TARGET,
FS_FAILOVER_THRESHOLD, FS_FAILOVER_OBSERVER_PRESENT,
FS_FAILOVER_OBSERVER_HOST
FROM V$DATABASE
/
Client Failover Best Practices
SQL> exec DBMS_SERVICE.CREATE_SERVICE (
service_name => 'tmOCI.vodacom.co.za',
network_name => 'tmOCI.vodacom.co.za ',
aq_ha_notifications => true,
failover_method => 'BASIC',
failover_type => 'SELECT',
failover_retries => 180,
failover_delay => 1);
SQL> exec DBMS_SERVICE.START_SERVICE('tmOCI.vodacom.co.za');
SQL> select value from v$parameter where name = 'service_names';
VALUE
------------------------------------------------
tmtst_DGMGRL.vodacom.co.za, tmOCI.vodacom.co.za
Client Failover Best Practices
Configure startup trigger for service
SQL> CREATE OR REPLACE TRIGGER manage_OCIservice
after startup on database
DECLARE
role VARCHAR(30);
BEGIN
SELECT DATABASE_ROLE INTO role FROM V$DATABASE;
IF role = 'PRIMARY' THEN
DBMS_SERVICE.START_SERVICE('tmOCI.vodacom.co.za');
ELSE
DBMS_SERVICE.STOP_SERVICE('tmOCI.vodacom.co.za');
END IF;
END;
Client Failover Best Practices
Client tns entry Configuration
TMOCI=(DESCRIPTION =
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)
(HOST = prab03.vodacom.co.za)
(PORT = 1521))
(ADDRESS = (PROTOCOL = TCP)
(HOST = tvbs01.vodacom.co.za)
(PORT = 1521))
(LOAD_BALANCE = yes))
(CONNECT_DATA= SERVICE_NAME=tmOCI))
)
Active Data Guard 11g
Investment in Improved Quality of Service
Active Data Guard
Begin with a Data Guard 11g physical standby database
If redo apply is running, stop redo apply
Open the standby database read-only
Start redo apply
Data Guard Broker &
Enterprise Manager
Data Guard Broker CLI
Stop redo apply with the following command
EDIT DATABASE ‘TMSTBY' SET STATE=‘APPLY-OFF’
Open standby read-only via SQL*Plus
SQL> alter database open read only;
Restart redo apply via broker CLI
EDIT DATABASE ‘TMSTBY' SET STATE=‘APPLY-ON’
Oracle Enterprise Manager 10g
Stop redo apply within Data Guard GUI
Open standby in read-only mode in Advanced Startup Options
Restart redo apply within Data Guard GUI
Supported Operations
for Read Only
When connected to an Active Data Guard standby database, read-only
applications can perform/use:
Selects
Alter session / system
Set role
Lock table
Call stored procedures
DBlinks to write to remote databases
Stored procedures to call remote procedures via DBlinks
SET TRANSACTION READ ONLY for transaction level read consistency
Complex queries e.g. grouping set queries and with clause queries
THANK
YOU
[email protected]
http://martinmeyer.blogspot.com