صفحه 1:
MICROSOFT SQL SERVER
HIGH AVAILABILITY
AND DISASTER RECOVERY
صفحه 2:
Database HA & DR
لسلس بير FE XRELIEICE سب
۱ Work with business to determine HA or DR
requirements for applications and data?
© Design HA or DR solutions?
" Administer HA or DR process?
5 Still learning MS SQL Server HA & DR
capabilities?
صفحه 3:
Scope of this Presentation
Beyond Scope of
مماغوغمعوعمم
= In-depth how-to
(available elsewhere)
Partitioned views (federated)
Advanced DBA techniques
Custom application logic
3'-party software solutions
Alternate DBMS engines
(e.g. Oracle; DB2)
HA on virtual machines
Complex scenarios &
solutions
© Load balancing
aC lay نينا
© Data Availability
* Data recovery
* High availability
* Disaster recovery
© Technology Focus
» MS SQL Server
* Physical servers
» SANS
صفحه 4:
So, you need to make your
production database bulletproof...
صفحه 5:
Data Availability Continuum
7” eee
Degrees of protection for information
TE
Data Data loss Redundant data
Recovery
High Downtime of Redundant
Availability database service system
components
Disaster Downtime of Redundant
Recovery business systems
operations and facilities
صفحه 6:
Business Case for
an AVailability
5 Keep business- © Protect against
critical loss of data center
applications © Secondary:
available * Application
5 Secondary: upgrades
* Server * Infrastructure
maintenance upgrades
صفحه 7:
Service Level Agreement
لل “يا
“ Permitted downtime (planned vs. unplanned?)
اهنت این رت
طغمدة ععم_ مدعلا مهم
99.9% 8.76 hours 43.8 minutes
99.99% 52.6 minutes 4.38 minutes
99.999% 5.26 minutes 0.438
ao | minutes
© Application response times
© Mean time to recovery
Note: Database uptime is not equivalent to application
availability
* Failures of other application services
Network outages
صفحه 8:
Protect What?
05 ۳ ccc 0 000000000000027
© Application data stores
* Databases
» Files
* Other data repositories
© Database services
* DBMS availability for applications
" Application services
* Application availability for users and external
systems
Databases are the heart of most information
systems;
they deserve the highest affordable protection.
صفحه 9:
Database Failure Scenarios
2 ET
ع 3 ۱۲۲۵۵۲۵۵ ۳۳۷۱۵
لك الها ا الا
25030 ۲7 Operator errors
subsystem * DBMS interruption
* Disk * Drops / deletes
* Controller © Application
5 Network defects
5 Server 7 DBMS defects
[| Power 5 Data corruption
صفحه 10:
Service Recovery Strategies
Stand |Failover Behavior SQL Server
by لحتنا
Mode
Cold = * Manual intervention + Backup and
stand required to restore restore
by offline data copy
Warm : Data copy online and = Transaction log
stand ready shipping
by * Manual failover + Database
required mirroring
Hot + Automatic failover * Database
stand mirroring
by * Failover
صفحه 11:
Data Recovery—
سسا وهام مع -
Terminology varies for source vs. copy
High Availability توت تن Data Copy
hte tte hg
Backup and Database Backup
Restore
Log Shipping Primary Secondary
Standby
Database Mirroring Principal Mirror
Failover Clustering Primary Secondary
Active Passive
Standby
Inactive
صفحه 12:
Data Recovery هد
[Briefly...]
صفحه 13:
Database Backups
3 [i
° Traditional backup types
* Full backup
* Differential backup
* Transaction log backup
© Disk is better than tape
* First backup to disk (separate physical disk
volume)
* Detect exceptions encountered during backup
* Verify backup files
* Copy backup files to tape or remote disk
" Data retention policy for backup files
صفحه 14:
Database Backup ۷
6
Backup of user databases not sufficient for
recovery
| System database
© Master database
7 MSDB database
Model database
External data stores...
Oo
Oo
صفحه 15:
Synch with External Data
لم20
Synchronize recovered database with
external data stores:
“Identity column seeds
Full-text indexes
(SQL Server 2000)
SLDAP entries
OFile system objects
“Other databases
صفحه 16:
Backup Retention Policy
7 سم
© Location of backup files
© Duration of retention
© Protection of sensitive data
* Sarbanes/Oxley (SOX)
* HIPAA
* Internal policies for data management and
protection
5 Access to backups from offsite data
storage
صفحه 17:
Data Recovery Process
قاض ۰
“ Backup file sets © Recovery strategy
Full baseline, depends on failure
differential, and scenario
transaction logs Create comprehensive
a Retrieving backup files __ ا لك
Offsite storage Devise recovery strategy
for each scenario
* Tape * Does worst-case
Network copy recovery scenario fit
Dependency on within SLA parameters?
multiple people to get 5 Recovery time; SLA
access to backup files © Include future data
growth in recovery plan
~ Fully test recovery
strategies—practice is
essential
صفحه 18:
a reeeesy
صفحه 19:
High Availability
مصعم سه نه 1 0000 17170 ا سب
© Minimize or avoid service downtime
» Whether planned or unplanned
= When components fail,
service interruption is brief or non-existent
* Automatic failover
Eliminate single points of failure (as ك5
affordable)
Redundant components *
Fault-tolerant servers »
صفحه 20:
Redundant Components
7و وماك
Objective: Avoid single points of failure (where affordable)
Approach: Use redundant components for database service
Database server nodes
Server components
ECC RAM; failure-tolerant HW & OS.
DBMS instance
User databases
Storage devices
Storage unit components
MPIO: Interfaces; paths; switches; controllers
RAID: Disks
Networking
MPIO: Interfaces; paths; switches
Data copies
E.g. Recovering torn page from mirror in SQL Server 2008
صفحه 21:
Transaction Log Shipping
2) (Ee
" Warm standby solution
۲ Duplicate user database
* Copy transaction logs to standby server &
restore
" Database available for read-only access
» Users must disconnect for logs to be applied
* Two database licenses required if querying
standby
© Manual application failover
© Supported on standard hardware
" Possible data loss (unapplied transactions)
صفحه 22:
3
Database Mirroring
ude
Redundancy at user database level
Duplicate copy of user database
independent storage devices
Multiple copies of instance databases
Mirrored over private network channel
Mirror always redoing transactions from principal
Negligible impact on transaction throughput
Multiple mirroring modes:
High-availability: commit @ log on mirror;
Sutomnatic Tallover
commit @ log on mirror; manual
High-protecti
failover so
High-performance: commit when logged on
prepa عقف
Very fast automatic failover—seconds
Requires witness server
Mirror-aware application client connection
Provided by client library
Database connection string must specify both
servers
Mirror may be available for read-only access
(snapshots)
Works with standard hardware
صفحه 23:
Mirror Witness
مم ا DE
© With mirroring, more than one server is required
to decide on failover
© Witness automates failover from primary to mirror
* Watches database availability
» Reports observations back to principal and mirror
© Runs in separate SQL Server instance (Express is
OK)
© Prevents “split brain” scenario
© Very low resource consumption
* Can be witness for multiple databases
© Not a single point of failure
صفحه 24:
SQL Server Failover
=a
© Two clustered nodes
* Active/Passive config
© MS SQL services
* Running on virtual =o =a!
server
© Shared storage device
» User databases
» System databases
* Quorum drive
» Redundant internal
components
صفحه 25:
weds @
Active/Passive Failover
دع
Redundancy at database instance
level
All databases fail over together
Shared copy of system databases
Single data copy on shared storage
device
No 1/0 overhead reducing throughput
Storage unit is single point of failure
for cluster
All database services are clustered
SQL Agent; Analysis Services; Full-Text
engine, MS DTC
Automatic failover (up to minutes)
DBMS accessed over virtual IP
Database not available from inactive
node for DB client connections
Storage is controlled by one cluster,
node at atime
Requires hardware certified by
Microsoft for Microsoft Cluster
Service
صفحه 26:
HA Comparison
26
Database Mirroring Failover Clustering
Scope: DBMS instance
” Certified hardware
One SQL license
(only one node can access
on mirror) 9 م fail t
: Very fast failover (seconds) minutes) ellqver (Up tS:
OS flexible (e.g. 32/64) Enterprise OS
Shared storage
Clustered services
Standby not available
Servers are usually co-
located
86888 هم
Scope: user DB
Standard hardware
One SQL license
(unless querying snapshots
Independent storage
Independent services
Reporting on mirror
Geographic separation OK
o
oo
Ht اقا اقا
8
صفحه 27:
Considerations for HA
1 كك 1ر1
= HA complements backup and recovery strategy
* Does not replace data recovery plan
" Application service availability is often
determined by a network of interdependent
services
Availability can be difficult to define (e.g. partial
failures)
* Failure probability difficult to measure or compute
" Increased system complexity could lead to
lower service availability!
* Operator error a leading cause of availability issues
* Increased number/types of system components
* More complex to configure and administer
صفحه 28:
Data Recovery
- وأ معوع ع ع
Cost Low low/Med Medium Medium Medium High
Relative complexity Low low Medium Medium High High
Data loss Possible Latestlog Possible None. None None
Scope of duplication Database Database Database Database Database DBMS.
Fallove Downtime Downtime Manual Manual Seconds Up to minutes
Client redirect Manual Manual Automatic Automatic «Automatic = Automattic
Rolling upgrades & maint. No No OS&DB OS&DB OS&DB OS
‘Access data on secondary Restore Read-only Snapshot Snapshot Snapshots No
Geographic separation OK 9 9 Latency? Latency? Latency?
صفحه 29:
7 Disaster Recovery
صفحه 30:
Disaster Recovery
20 000000000000070
5 Minimize downtime of business
operations
* Redundant systems and facilities
5 SQL Server features:
* Transaction log shipping
* Database mirroring
* Failover clustering
© Other technologies
* Storage-based mirroring
صفحه 31:
Disaster Recovery Planning
2) [i
© Data security requirements
© Clarify SLA, data loss allowance
© Evaluate system cost vs. data protection
© Failure analysis
© System redundancy
© Process validation
© Training for personnel
» Prevention practices
* Executing disaster recovery and business
continuity
© Practice, practice, practice
صفحه 32:
Business Continuity Facility
0۳ 9 م ا
© System redundancy
* Systems: Web servers app servers; database, etc.
* Data: Databases; data files on OS; security info,
etc.
* Networking: Domain, routing, subnet, VIPs, etc.
5 Alternate facilities
* Network bandwidth
* Physical or network access by operations staff
° Failover
* Often a deliberate decision, using manual failover
صفحه 33:
Data Redundancy
3 [i
© Synchronous redundancy
* Network bandwidth cost
* Network latency and application performance
* Network reliability
© Asynchronous redundancy
* Risk of data loss
* More cost-effective
* Resilient to network latency issues
© Candidate Technologies
» SQL Server database mirroring
* Failover clustering with SAN-based mirroring
صفحه 34:
DR Using Database
© Two sites: Primary and DR location
" Separate failover clusters at each site
= SQL Server database mirroring between
sr
(eptowal)
aul
datubase 7 0
sites
Raloverclusterat site ®
صفحه 35:
DR Using SAN-Based
OO _ للسلصص
© Two sites: Primary and DR location
© Four-node failover cluster; one virtual IP
address
" SAN-based mirroring between sites
wine Manual,cluster failover سس سا سم @
صفحه 36:
[Skip if time is running short.]
صفحه 37:
SAN-Based Data Mirroring
2 TE
= Data blocks duplicated at storage level
* Similar to transaction log shipping
© Copy performed in sequence and coordinated with
database checkpoint
» Ensures consistency of mirrored data files
© Synchronous or asynchronous mirroring
© Co-located or geographically dispersed—both are
OK
* SAN link bandwidth must support database I/O rate
= May require extra feature support from SAN
vendor
© Could rely on Failover Clustering for HA
صفحه 38:
SQL Server Database
لطع صسسيو 2022500
© Read-only point-in-time database snapshot
" No data is copied—instantaneous
* Historical snapshot pages tracked separately
from changing pages
© Snapshots can be maintained indefinitely
* Limited only by available storage
= Snapshot copy can be used for reporting
* Read-only, so no locking issues
صفحه 39:
SQL Server Replication
© Subscriber
databases available
for reporting
Replicate data
subsets
= Some data loss is
possible
° Periodically validate
replicated data
+ Transactional replication
* High transaction volume
* Low data latency required
» Mixed technologies:
Integrates with other DBMS
© Merge replication
* Bi-directional data changes
* Typically server-to-client
“ Snapshot replication
* Large, infrequent data
changes
* Data change latency OK
* Best for smaller data sets
صفحه 40:
صفحه 41:
Considerations for App
wom EV EIOD CLS
~ App services tolerant to database service interruptions
“ Application transactions must be handled in code—data
consistency
Exception handling for transaction retry, connection
recovery
~ Requires coding standards, code reviews, and testing
“ Bulk data operations
“ Transaction volume impacts rollback time during failover
Batch jobs must be run on alternate nodes
" Don't bypass transaction logging
© Synchronization with external data sources?
* Be aware of database recovery model
2 Mirroring uses FailoverPartner in connection string
© Use TCP/IP as client protocol
صفحه 42:
Considerations for Admins
BEM (cases
Use identical server hardware, when possible
= Design network redundancies, when feasible
Consider network latency for geographic separation
= Always manage through virtual cluster, not individual cluster
nodes
" Retest failover/failback after HA maintenance
" Diagnose after failover
Repair alternate node
Resynchronize data, as necessary
Be aware of primary/secondary locations
Ensure application services are connected and functioning properly
= Keep server node configurations synchronized:
Service pack and patch levels
Duplicate non-redundant resources
* Jobs; logins and permissions; OS & sys objects
صفحه 43:
HA Risks
۳5 لل
© System performance degradation
" HA system complexity leads to availability
issues
= Some system failures not planned for
© Backup and recovery planning incomplete
© Administrators not fully trained or informed
© User databases not synchronized with
other data sources
صفحه 44:
Common Admin Use Cases
ا030ا-_-ِ_ف3300700ت0006000ااا ا 0
© Maintain HA nodes
» Hardware maintenance
* Rolling upgrades and software patches
= Resynchronize the redundant copy
* Re-synch mirror
* Restart log shipping
9 Diagnose and repair
* Diagnose cause of failover
* Repair failed node and restore failover
capabilities
* Test failover and failback
صفحه 45:
Common Admin Actions
eee
Train and practice administrators to:
5 Initiate a database mirror
© Manually failover mirror database or
cluster node
" Add/remove passive node from mirror or
cluster
5 Upgrade/patch servers nodes
© Restart or redirect application services
صفحه 46:
3 More Information
صفحه 47:
References—Books
<< -_ت_س_ _ص_« «ثح«حطح« ح« « 7
High Availability Related Topics
4 Microsoft SQL Server 2008 | Pro SQL Server 2005
High Availability with Replication
Clustering & Database by Sujoy Paul, 2006.
Mirroring | Pro SQL Server 2005 Servic:
by Michael Otey, 2009. posal 9584 ©
“Microsoft SQL Server High by Klaus Aschenbrenner,
Availability 2007.
by Paul Bertucci, 2004. © The Rational Guide to SQL
“ Pro SQL Server 2005 High Server 2005 Service Broker
Availability by Roger Wolter, 2006.
by Allan Hirt, 2007.
صفحه 48:
References—Presentations
[ff OOO
Microsoft Load Balancing and Clustering
1 ik hi 8 Pa 1 / re 1
۵5۵1۲۶۸۵ 003 0۵20 ۵1306150۳۵۶۱۵۱37 ۵2 تن
SQL Server 2005 High Availability
http://www.atlantamaf.com/Presentations/AtlantaMDF_111207HA.ppt
High Availability Technologies In SQL Server 2000 And SQL Server 2005
http://202.181.238.2/hk/teched2004/ppt/Day_2_Rm407/DAT431(1330-
1445).ppt
Meeting the Availability Challenge
http://download. microsoft.com/download/E/D/C/EDCF54DB-19CD-4882-9FC4-
4F7D46FCEAAG/HighAvailability.ppt
Disaster Recovery Mistakes
http://www.sqlsig.org/Oct%2011%20DASSUG%20-%20Jason%20Hall%2010-11-
07%20MM.ppt
SQL Server 2005 High Availability
http://blogs.msdn.com/sql2005event/attachment/564303.ashx
Effective Usage of SQL Server 2005 Database Mirroring
http:/www.sqlserver-qa.net/SSQA-Effective%20Usage%200f%20SQL
%20Server%202005%20Database%20Mirroring_show.ppt
صفحه 49:
References—Articles
3 متيمايميييا3309090909000ت000006009ا
© Achieve High Availability for SQL Server
http://technet.microsoft.com/en-us/magazine/cc162477.aspx
© Geographically Dispersed Clusters in Windows
Server 2003
http://www. microsoft.com/windowsserver2003/techinfo/overview/
clustergeo.mspx
© Restoring file and filegroup backups
http://support.microsoft.com/kb/281122/en-us
© Restoring specific tables or rows from backups
http://support.microsoft.com/kb/321836/en-us
© Maintaining Availability During Upgrades
http://msdn.microsoft.com/en-us/library/ms191449.aspx