Ensuring High Availability at the Edge

PinIt

By configuring your critical infrastructures for HA — at the edge and in the cloud — you can ensure that your systems remain as responsive and accessible as your organization’s requirements demand.

Edge computing systems are intended to ensure that critical applications and analytics can respond immediately to real-time data feeds. But that ability to respond immediately presumes that the edge systems and databases are available and can act on the real-time data feeds — which is not necessarily a given, particularly if those edge systems are running unattended in the harsh environment of a shop floor or remote oil rig.

If you need to ensure that your key edge systems and databases are running no less than 99.99% of the time – which would mean that your applications experience no more than four and a half minutes of downtime per month – then you need to configure those systems and databases for high availability (HA). At the heart of an HA edge configuration, you’ll have redundant resources supporting your databases or core applications. If the edge systems actively supporting your database or analytics system suddenly stop responding — for any reason — the active system will fail over to the secondary system, which, having a duplicate copy of the database with which to work, can immediately continue performing data collection and analytics to support your edge processing requirements. In an HA configuration, this failover takes only seconds, ensuring that the applications and databases that you’ve placed on the edge for high performance can continue to operate in the manner your requirements demand.

A Double-Edged Approach

The question becomes, how best to configure for HA in an edge environment? The answer to that question depends partly on the nature of the resources you are using. Some databases, such as SQL Server, provide built-in services that can facilitate data replication between two instances of a database. In SQL Server, that service is called an Availability Group (AG). In an AG, one instance SQL Server is active, and whenever data is written to that database, the write is replicated to a secondary instance of that database, which resides on a separate system in a physically separate location. If the physical locations are close enough — within several miles of one another and linked by a reliable low-latency network connection — you can use synchronous replication, which ensures that the two instances of the database are perfectly synchronized because no update written to the primary database is fully committed until it has also been written to the secondary database. If the instances of SQL Server are too far apart to use synchronous replication, you can use asynchronous replication, which causes all updates to the primary database to be written to the secondary database as quickly as can be accomplished — but without waiting for confirmation that the updates to the secondary system have been completed before the primary database considers the transaction complete.

As one might imagine, if the primary SQL Server database were to fail in this scenario, or even appear to fail, the AG would fail over to the secondary instance of SQL Server and start using that instance of the database to support your edge processing requirements. If you’ve been using synchronous replication, failover is automatic, and there will be no loss of data because the secondary database will be identical to the primary database. If you’ve been using asynchronous replication, though, failover will take more time because it must be initiated manually. Moreover, there may be discrepancies between the primary and secondary databases because a few seconds worth of transactions that may have been committed in the primary database may not have been written to the secondary database before the secondary database is brought online as the new primary database.

One issue when using AGs to enable HA is cost vs. functionality. The standard edition of SQL Server supports only Basic AGs, and a Basic AG can only be configured to pair a single SQL Server database with a single secondary instance of the database. If you have multiple SQL Server databases to replicate, you’ll either have to create and manage multiple AGs, which may not fail over in a coordinated manner in a failover situation, or upgrade to the Always On AG functionality found in SQL Server Enterprise Edition, which is much more costly to license (particularly if you no other need for the features of SQL Server Enterprise Edition).

See also: Why SQL Will Remain the Data Scientist’s Best Friend

SANless Clusters

As an alternative to using the HA functionality built into a database product, you can configure your edge systems as what is known as a SANless cluster. In a SANless cluster, you configure a primary and one or more secondary edge systems in a Windows- or Linux-based failover cluster. Instead of the compute node relying on a shared storage area network (SAN) resource for data storage, though, each node in the cluster is configured with local storage, and the SANless clustering software replicates the data written to the primary system database to each of the instances of the database residing on the secondary systems.

SANless clustering can rely on either synchronous or asynchronous replication, but one distinction between a SANless clustering approach and that characterized by the AG approach is that SANless clustering is database agnostic. Where an approach like AG will only replicate a SQL Server database, SANless clustering replicates whatever is in storage at the block level. It doesn’t matter whether the data is part of a database, a text file, or any other format; if it’s part of a disk that is to be replicated to the secondary infrastructure, it will be replicated.

The SANless clustering approach offers its own cost vs. functionality calculus. You’ll have to license whatever database product you are using and a third-party product to create and manage the SANless cluster. However, if you are using a database like SQL Server and relying on multiple edge databases all running on SQL Server Standard Edition, you may find it far less costly to use SANless clustering than to migrate all your databases to SQL Server Enterprise Edition. Similarly, if in a failover situation, you want to have more than just the contents of your databases available to your secondary infrastructure, you’ll want to use SANless clustering because that will replicate anything in storage. A database-based replication system may ensure HA only for the contents of the database.

Ensuring End-to-End Availability

In an isolated edge environment — on an oil rig or a ship, for example — the best you can do to ensure the integrity of your primary and secondary infrastructures may be to place them as far apart and to isolate them electronically as best you can. That way, a physical or electrical problem impacting the primary infrastructure is less likely to impact the secondary infrastructure.

But most edge infrastructures ultimately communicate with a back-end infrastructure, often one residing in the cloud. If your back-end infrastructure needs to remain as available and responsive as your edge infrastructure, ensuring the HA of your edge environment isn’t enough. However, you can configure your back-end infrastructure for HA using the same approach that you take to ensure the HA of your edge infrastructure — and it’s even easier in the cloud. Instead of locating your primary and secondary infrastructures in different parts of the same physical structure, you’ll locate them in two separate cloud availability zones (AZs) and take the same approach to data replication that you’ve taken at the edge. Whether you’ve used a database- or SANless cluster-based approach to ensuring HA at the edge, you can use that same approach to ensure the HA of your data and key applications in the cloud. If you’ve deployed your primary and secondary infrastructures in AZs within a single region, you can rely on synchronous replication to ensure that the data on your secondary systems is identical to that on your primary system — and in the event of a failover situation, your secondary systems can be online and operational in seconds.

By configuring your critical infrastructures for HA — at the edge and in the cloud — you can ensure that your systems remain as responsive and accessible as your organization’s requirements demand.

Dave Bermingham

About Dave Bermingham

Dave Bermingham is the Senior Technical Evangelist at SIOS Technology. He is recognized within the technology community as a high availability expert and has been honored by his peers by being elected to be a Microsoft MVP in Clustering six times and seven times as a Cloud and Datacenter MVP. Dave is a frequent speaker at technical conferences, including SQL Saturdays, Pass Summit, and MSSQL Tips, and is the author of Clustering for Mere Mortals blog. Dave holds numerous technical certifications and has more than thirty years of IT experience, including in finance, healthcare, and education.

Leave a Reply

Your email address will not be published. Required fields are marked *