SQL Server High Availability and Disaster Recovery for Azure, AWS, and GCP: A Guide

The public cloud offers a variety of options for providing high availability and disaster recovery protection for SQL Server database applications. on the contrary, some of the rational choice available in a private cloud are not available in the public cloud. Given the numerous options and limitations, the challenge faced by system and database administrators is conclude the best applicable options for each application running in hybrid and purely public clouds.

All cloud service providers (CSPs) have service level agreements (SLAs) with money-back guarantees for when uptime falls below specified levels, usually ranging from 95.00% to 99.99%. Four nines of uptime is generally accepted as constituting HA, and to be eligible for these 99.99% SLAs, the configurations need to meet certain requirements.

But be forewarned: The SLAs only guarantee “dial tone” at the server level, and explicitly excluded many causes of downtime at the database and application levels. These exclusions inevitably include natural disasters, the customer’s actions (or inactions), and the customer’s system or application software. There may also be a separate SLA for storage that is lower than the one for servers. So while it is advantageous to leverage various aspects of a CSP’s infrastructure, additional provisions are needed to ensure adequate uptime for mission-critical SQL Server databases.


For More Details Visit Our Website


Differences between HA and DR

Properly leveraging the cloud’s resilient infrastructure requires understanding key differences between “failures” and “disasters” because those differences affect the choice of provisions used for HA and DR protections. Failures are small in scale and short in duration, affecting a server, rack, or the power or cooling in a single datacenter. Disasters have more widespread and enduring impacts, and can affect multiple data centers in ways that preclude rapid recovery.

The most consequential effect involves the location of the redundant resources (systems, software and data), which can be local—on a Local Area Network—for recovering from a localized failure. However, the redundant resources needed to recover from a major disaster must extend over a Wide Area Network.

For database applications that require high transactional capacity, the ability to synchronously replicate the data from the active instance over the LAN ensures that the standby instance is "hot" and ready to take over instantly in the event of a failure. Such a rapid recovery should be the goal of all HA requirements



SQL Server High Availability and Disaster Recovery for Azure, AWS, and GCP: A Guide


Data must be replicated asynchronously in DR configurations to prevent the latency inherent in the WAN from adversely impacting on the throughput performance in the active instance. This means that updates being made to the standby instance always lag behind updates being made to the active instance, making it “warm” and resulting in an unavoidable delay during the manual recovery process.

All three major CSPs accommodate these differences with redundancies both within and across datacenters. Of particular interest is the variously named “availability zone” that makes it possible to combine the synchronous replication available on a LAN with the geographical separation afforded by the WAN. These zones connect two or more regional data centers via a low-latency, high-throughput network to facilitate synchronous data replication. With latencies around one millisecond, the use of multi-zone configurations has become a best practice for HA.

For DR, all CSPs have offerings that span multiple regions to afford additional protection against major disasters that could affect multiple zones. For example, Google has what could be called DIY (Do-It-Yourself) DR guided by templates, cookbooks and other tools. Amazon and Microsoft have managed DRaaS (DR-as-a-Service) offerings: CloudEndure Disaster Recovery, and Azure Site Recovery respectively.

For all three CSPs it is important to note that data replication across regions must be asynchronous, so the recovery will need to be performed manually to ensure minimal or no data loss. 

However, the resulting delay in recovery is tolerable as disasters across the region are rare.

Making SQL Server “always on”

SQL Server offers two of its own HA/DR features: Always On Failover Cluster Instances and Always On Availability Groups. FCIs afford three notable advantages: inclusion in the less expensive Standard Edition; protection of the entire SQL Server instance; and support in all versions since SQL Server 7. A significant disadvantage is the need for a storage area network (SAN) or other form of shared storage, which is unavailable in the cloud. The lack of shared storage was addressed in Windows Server 2016 Datacenter Edition with the introduction of Storage Spaces Direct. But S2D also has limitations; most notably its inability to span availability zones.

SQL Server’s other HA/DR feature, Always On Availability Groups, is a more robust solution capable of providing rapid recovery with no data loss. Among its other advantages are inclusion in SQL Server 2017 for Linux, no need for shared storage, and readable secondaries for queries (with appropriate licensing). But for Windows it requires licensing the substantially more expensive Enterprise Edition and it lacks protection for the entire SQL Server instance.

It is worth noting that SQL Server also offers a Basic Availability Groups feature, but it supports only a single database per Availability Group, making it suitable for only the smallest of environments.

The limitations associated with both options have created a need for third-party failover clustering solutions purpose-built to provide HA/DR protection for virtually all Windows and Linux applications in private, public and hybrid cloud environments.
These software solutions at least facilitate real-time data replication, continuous monitoring that can detect application-level errors and configurable failover and failback policies. Most also offer a variety of value-added capabilities, including some specific to popular applications like SQL Server.


Related Article -  5 Major Security Objectives for Cloud Computing