Microsoft SQL Server
اسلاید 1: Microsoft SQL Server High Availability and Disaster RecoveryMichael Poremba // October 2008
اسلاید 2: Database HA & DR Experience…Work with business to determine HA or DR requirements for applications and data?Design HA or DR solutions?Administer HA or DR process?Still learning MS SQL Server HA & DR capabilities?2
اسلاید 3: Scope of this PresentationData AvailabilityData recoveryHigh availabilityDisaster recoveryTechnology FocusMS SQL ServerPhysical serversSANsIn-depth how-to (available elsewhere)Partitioned views (federated)Advanced DBA techniquesCustom application logic3rd-party software solutionsAlternate DBMS engines (e.g. Oracle; DB2)HA on virtual machinesComplex scenarios & solutionsLoad balancingPresentation FocusBeyond Scope of Presentation3
اسلاید 4: So, you need to make your production database bulletproof…Introduction to Data Availability4
اسلاید 5: Data Availability ContinuumDegrees of protection for information systems:Business RiskSolutionData RecoveryData lossRedundant dataHigh AvailabilityDowntime of database serviceRedundant system componentsDisaster RecoveryDowntime of business operationsRedundant systems and facilities5
اسلاید 6: Business Case for AvailabilityKeep business-critical applications availableSecondary:Server maintenanceProtect against loss of data centerSecondary:Application upgradesInfrastructure upgradesHigh AvailabilityDisaster Recovery6
اسلاید 7: Service Level Agreement (SLA)Permitted downtime (planned vs. unplanned?)Acceptable data/transaction lossApplication response timesMean time to recoveryNote: Database uptime is not equivalent to application availabilityFailures of other application servicesNetwork outagesUptime SLADowntime per YearDowntime per Month99.9%8.76 hours43.8 minutes99.99%52.6 minutes4.38 minutes99.999%5.26 minutes0.438 minutes7
اسلاید 8: Protect What?Application data storesDatabasesFilesOther data repositoriesDatabase servicesDBMS availability for applicationsApplication servicesApplication availability for users and external systemsDatabases are the heart of most information systems; they deserve the highest affordable protection.8
اسلاید 9: Database Failure ScenariosStorage subsystemDiskControllerNetworkServerPowerOperator errorsDBMS interruptionDrops / deletesApplication defectsDBMS defectsData corruptionPhysical Infrastructure FailuresLogical Data Failures9
اسلاید 10: Service Recovery StrategiesStandby ModeFailover BehaviorSQL Server FeatureCold standbyManual intervention required to restore offline data copyBackup and restoreWarm standbyData copy online and readyManual failover requiredTransaction log shippingDatabase mirroringHot standbyAutomatic failoverDatabase mirroringFailover clustering10
اسلاید 11: Data Recovery—TerminologyTerminology varies for source vs. copyHigh Availability StrategyData SourceData CopyBackup and RestoreDatabaseBackupLog ShippingPrimarySecondary StandbyDatabase MirroringPrincipalMirrorFailover ClusteringPrimary ActiveSecondary Passive Standby Inactive11
اسلاید 12: [Briefly…]Data Recovery12
اسلاید 13: Database BackupsTraditional backup typesFull backupDifferential backupTransaction log backupDisk is better than tapeFirst backup to disk (separate physical disk volume)Detect exceptions encountered during backupVerify backup filesCopy backup files to tape or remote diskData retention policy for backup files13
اسلاید 14: Database Backup StrategyBackup of user databases not sufficient for recoverySystem databaseMaster databaseMSDB databaseModel databaseExternal data stores…14
اسلاید 15: Synch with External Data StoresSynchronize recovered database with external data stores:Identity column seedsFull-text indexes (SQL Server 2000)LDAP entriesFile system objectsOther databases15
اسلاید 16: Backup Retention PolicyLocation of backup filesDuration of retentionProtection of sensitive dataSarbanes/Oxley (SOX)HIPAAInternal policies for data management and protectionAccess to backups from offsite data storage16
اسلاید 17: Data Recovery ProcessBackup file setsFull baseline, differential, and transaction logsRetrieving backup filesOffsite storageTapeNetwork copyDependency on multiple people to get access to backup filesRecovery strategy depends on failure scenarioCreate comprehensive failure matrixDevise recovery strategy for each scenarioDoes worst-case recovery scenario fit within SLA parameters?Recovery time; SLAInclude future data growth in recovery planFully test recovery strategies—practice is essential17
اسلاید 18: High Availability18
اسلاید 19: High AvailabilityMinimize or avoid service downtimeWhether planned or unplannedWhen components fail, service interruption is brief or non-existentAutomatic failoverEliminate single points of failure (as affordable)Redundant componentsFault-tolerant servers19
اسلاید 20: Redundant ComponentsObjective: Avoid single points of failure (where affordable)Approach: Use redundant components for database serviceDatabase server nodesServer componentsECC RAM; failure-tolerant HW & OSDBMS instanceUser databasesStorage devicesStorage unit componentsMPIO: Interfaces; paths; switches; controllersRAID: DisksNetworkingMPIO: Interfaces; paths; switchesData copiesE.g. Recovering torn page from mirror in SQL Server 200820
اسلاید 21: Transaction Log ShippingWarm standby solutionDuplicate user databaseCopy transaction logs to standby server & restoreDatabase available for read-only accessUsers must disconnect for logs to be appliedTwo database licenses required if querying standbyManual application failoverSupported on standard hardwarePossible data loss (unapplied transactions)21
اسلاید 22: Database MirroringRedundancy at user database levelDuplicate copy of user databaseIndependent storage devicesMultiple copies of instance databasesMirrored over private network channelMirror always redoing transactions from principalNegligible impact on transaction throughputMultiple mirroring modes:High-availability: commit @ log on mirror; automatic failoverHigh-protection: commit @ log on mirror; manual failoverHigh-performance: commit when logged on principalVery fast automatic failover—secondsRequires witness serverMirror-aware application client connectionProvided by client libraryDatabase connection string must specify both serversMirror may be available for read-only access (snapshots)Works with standard hardware22
اسلاید 23: Mirror WitnessWith mirroring, more than one server is required to decide on failoverWitness automates failover from primary to mirrorWatches database availabilityReports observations back to principal and mirrorRuns in separate SQL Server instance (Express is OK)Prevents “split brain” scenarioVery low resource consumptionCan be witness for multiple databasesNot a single point of failure23
اسلاید 24: SQL Server Failover ClusteringTwo clustered nodesActive/Passive configMS SQL servicesRunning on virtual serverShared storage deviceUser databasesSystem databasesQuorum driveRedundant internal components24
اسلاید 25: Active/Passive Failover ClusteringRedundancy at database instance levelAll databases fail over togetherShared copy of system databasesSingle data copy on shared storage deviceNo I/O overhead reducing throughputStorage unit is single point of failure for clusterAll database services are clusteredSQL Agent; Analysis Services; Full-Text engine, MS DTCAutomatic failover (up to minutes)DBMS accessed over virtual IPDatabase not available from inactive node for DB client connectionsStorage is controlled by one cluster node at a timeRequires hardware certified by Microsoft for Microsoft Cluster Service25
اسلاید 26: HA ComparisonScope: user DBStandard hardwareOne SQL license (unless querying snapshots on mirror)Very fast failover (seconds)OS flexible (e.g. 32/64)Independent storageIndependent servicesReporting on mirrorGeographic separation OKScope: DBMS instanceCertified hardwareOne SQL license (only one node can access database)Automatic failover (up to minutes)Enterprise OSShared storageClustered servicesStandby not availableServers are usually co-locatedDatabase MirroringFailover Clustering26
اسلاید 27: Considerations for HAHA complements backup and recovery strategyDoes not replace data recovery planApplication service availability is often determined by a network of interdependent servicesAvailability can be difficult to define (e.g. partial failures)Failure probability difficult to measure or computeIncreased system complexity could lead to lower service availability!Operator error a leading cause of availability issuesIncreased number/types of system componentsMore complex to configure and administer27
اسلاید 28: Data Recovery RequirementsRequirementsBackup and RecoveryLog ShippingDB Mirroring – High-PerformanceDB Mirroring – High-ProtectionDB Mirroring – High-AvailabilityFailover ClusteringCostLowLow/MedMediumMediumMediumHighRelative complexityLowLowMediumMediumHighHighData lossPossibleLatest logPossibleNoneNoneNoneScope of duplicationDatabaseDatabaseDatabaseDatabaseDatabaseDBMSFailoverDowntimeDowntimeManualManualSecondsUp to minutesClient redirectManualManualAutomaticAutomaticAutomaticAutomaticRolling upgrades & maint.NoNoOS & DBOS & DBOS & DBOSAccess data on secondaryRestoreRead-onlySnapshotSnapshotSnapshotNoGeographic separationOKOKOKLatency?Latency?Latency?28
اسلاید 29: Disaster Recovery29
اسلاید 30: Disaster RecoveryMinimize downtime of business operationsRedundant systems and facilitiesSQL Server features:Transaction log shippingDatabase mirroringFailover clusteringOther technologiesStorage-based mirroring30
اسلاید 31: Disaster Recovery PlanningData security requirementsClarify SLA, data loss allowanceEvaluate system cost vs. data protectionFailure analysisSystem redundancyProcess validationTraining for personnelPrevention practicesExecuting disaster recovery and business continuityPractice, practice, practice31
اسلاید 32: Business Continuity FacilitySystem redundancySystems: Web servers app servers; database, etc.Data: Databases; data files on OS; security info, etc.Networking: Domain, routing, subnet, VIPs, etc.Alternate facilitiesNetwork bandwidthPhysical or network access by operations staffFailoverOften a deliberate decision, using manual failover32
اسلاید 33: Data RedundancySynchronous redundancyNetwork bandwidth costNetwork latency and application performanceNetwork reliabilityAsynchronous redundancyRisk of data lossMore cost-effectiveResilient to network latency issuesCandidate TechnologiesSQL Server database mirroringFailover clustering with SAN-based mirroring33
اسلاید 34: DR Using Database MirroringTwo sites: Primary and DR locationSeparate failover clusters at each siteSQL Server database mirroring between sites34
اسلاید 35: DR Using SAN-Based MirroringTwo sites: Primary and DR locationFour-node failover cluster; one virtual IP addressSAN-based mirroring between sitesManual cluster failover35
اسلاید 36: [Skip if time is running short.]Complimentary Technologies36
اسلاید 37: SAN-Based Data MirroringData blocks duplicated at storage levelSimilar to transaction log shippingCopy performed in sequence and coordinated with database checkpointEnsures consistency of mirrored data filesSynchronous or asynchronous mirroringCo-located or geographically dispersed—both are OKSAN link bandwidth must support database I/O rateMay require extra feature support from SAN vendorCould rely on Failover Clustering for HA37
اسلاید 38: SQL Server Database SnapshotsRead-only point-in-time database snapshotNo data is copied—instantaneousHistorical snapshot pages tracked separately from changing pagesSnapshots can be maintained indefinitelyLimited only by available storageSnapshot copy can be used for reportingRead-only, so no locking issues38
اسلاید 39: SQL Server ReplicationTransactional replicationHigh transaction volumeLow data latency requiredMixed technologies: Integrates with other DBMSMerge replicationBi-directional data changesTypically server-to-clientSnapshot replicationLarge, infrequent data changesData change latency OKBest for smaller data setsSubscriber databases available for reportingReplicate data subsetsSome data loss is possiblePeriodically validate replicated data39
اسلاید 40: App Development and Admin40
اسلاید 41: Considerations for App DevelopersApp services tolerant to database service interruptionsApplication transactions must be handled in code—data consistencyException handling for transaction retry, connection recoveryRequires coding standards, code reviews, and testingBulk data operationsTransaction volume impacts rollback time during failoverBatch jobs must be run on alternate nodesDon’t bypass transaction loggingSynchronization with external data sources?Be aware of database recovery modelMirroring uses FailoverPartner in connection stringUse TCP/IP as client protocol41
اسلاید 42: Considerations for AdminsUse identical server hardware, when possibleDesign network redundancies, when feasibleConsider network latency for geographic separationAlways manage through virtual cluster, not individual cluster nodesRetest failover/failback after HA maintenanceDiagnose after failoverRepair alternate nodeResynchronize data, as necessaryBe aware of primary/secondary locationsEnsure application services are connected and functioning properlyKeep server node configurations synchronized:Service pack and patch levelsDuplicate non-redundant resourcesJobs; logins and permissions; OS & sys objects42
اسلاید 43: HA RisksSystem performance degradationHA system complexity leads to availability issuesSome system failures not planned forBackup and recovery planning incompleteAdministrators not fully trained or informedUser databases not synchronized with other data sources43
اسلاید 44: Common Admin Use CasesMaintain HA nodesHardware maintenanceRolling upgrades and software patchesResynchronize the redundant copyRe-synch mirrorRestart log shippingDiagnose and repairDiagnose cause of failoverRepair failed node and restore failover capabilitiesTest failover and failback44
اسلاید 45: Common Admin ActionsTrain and practice administrators to:Initiate a database mirrorManually failover mirror database or cluster nodeAdd/remove passive node from mirror or clusterUpgrade/patch servers nodesRestart or redirect application services45
اسلاید 46: More Information46
اسلاید 47: References—BooksMicrosoft SQL Server 2008 High Availability with Clustering & Database Mirroring by Michael Otey, 2009.Microsoft SQL Server High Availability by Paul Bertucci, 2004.Pro SQL Server 2005 High Availability by Allan Hirt, 2007.Pro SQL Server 2005 Replication by Sujoy Paul, 2006.Pro SQL Server 2005 Service Broker by Klaus Aschenbrenner, 2007.The Rational Guide to SQL Server 2005 Service Broker by Roger Wolter, 2006.High AvailabilityRelated Topics47
اسلاید 48: References—Presentations48Microsoft Load Balancing and Clustering http://ce.sharif.edu/courses/84-85/2/ce317/resources/root/lecture%20slides/ 14.%20Microsoft%20Load%20Balancing%20and%20Clustering.pptSQL Server 2005 High Availability http://www.atlantamdf.com/Presentations/AtlantaMDF_111207HA.pptHigh Availability Technologies In SQL Server 2000 And SQL Server 2005 http://202.181.238.2/hk/teched2004/ppt/Day_2_Rm407/DAT431(1330-1445).pptMeeting the Availability Challenge http://download.microsoft.com/download/E/D/C/EDCF54DB-19CD-4882-9FC4-4F7D46FCEAA6/HighAvailability.pptDisaster Recovery Mistakes http://www.sqlsig.org/Oct%2011%20DASSUG%20-%20Jason%20Hall%2010-11-07%20MM.pptSQL Server 2005 High Availability http://blogs.msdn.com/sql2005event/attachment/564303.ashxEffective Usage of SQL Server 2005 Database Mirroring http://www.sqlserver-qa.net/SSQA-Effective%20Usage%20of%20SQL%20Server%202005%20Database%20Mirroring_show.ppt
اسلاید 49: References—ArticlesAchieve High Availability for SQL Server http://technet.microsoft.com/en-us/magazine/cc162477.aspxGeographically Dispersed Clusters in Windows Server 2003 http://www.microsoft.com/windowsserver2003/techinfo/overview/clustergeo.mspxRestoring file and filegroup backups http://support.microsoft.com/kb/281122/en-usRestoring specific tables or rows from backups http://support.microsoft.com/kb/321836/en-usMaintaining Availability During Upgrades http://msdn.microsoft.com/en-us/library/ms191449.aspx49
نقد و بررسی ها
هیچ نظری برای این پاورپوینت نوشته نشده است.