Cloud SLAs: What Have We Learned Already in 2021?


When it comes to cloud SLAs, the devil is in the details. It’s vital to know what metrics to track and to assess what monetary impact these metrics have on your bottom line.

In the wake of the attack on the U.S. Capitol, Amazon Web Services stopped hosting Parler, the right-wing microblogging alternative to Twitter. According to Amazon, Parler violated their bidirectional service level agreement (SLA) by failing to have an adequate moderation system in place to address illegal, harmful, or offensive content. The fact that AWS was able to remove Parler from its cloud services so quickly has drawn attention to the immense power that cloud providers wield over the content they host in 2021. Not only does this case reveal the power that many service providers hold through cloud SLAs, but it also highlights the need to pay attention to details and maintain a good working relationship with your cloud provider.

Do not rush into a cloud SLA without doing proper due diligence

Cloud SLAs detail the circumstances in which a service provider is and isn’t liable for certain services rendered. These agreements are living documents with many different provisions. Importantly, SLAs cover not only service provider responsibilities, but also customer responsibilities, penalties for contract breaches, and performance metrics, such as response time and resolution time. It’s important that the organization’s general counsel is familiar with the fine print of the document; neither the customer nor the vendor should be taken off guard down the line. To begin with, be sure that the SLA lists the cloud providers’ security standards in detail. For example, even if the cloud provider is not necessarily mandated to do so by law, the customer may want the provider to adhere to HIPAA standards; if so, these terms should be agreed upon up front. Additionally, the agreement should affirm that the customer organization maintains ownership of the data in storage. Lastly, it’s wise to ensure that the customer organization maintains the right to audit the service provider’s compliance down the road.

See also: Multicloud Plans Suffer From The Covid Fog

Make sure both parties are on the same page

Cloud SLAs are generally bidirectional, meaning there are (hopefully clear-cut) expectations for both the customer and the vendor. Each party needs to be cognizant of their own respective responsibilities. At a minimum, this includes knowing the acceptable performance parameters, as well as what penalties, remedies, and service credits are awarded in the event of a breach. Both parties should be on the same page regarding performance data metrics, responsibilities, expectations, and expected service levels during prime-time and non-prime-time. There should be an exact schedule of the preventative maintenance, and a dispute resolution process in place should things go sideways.

Determine your maximum tolerance levels for downtime, latency, and incident response

A 99.9% uptime guarantee may sound adequate; however, this results in roughly 9 hours of annual downtime, which breaks down to 43 minutes a month. It’s no wonder global e-commerce companies generally demand 99.999% uptime from their service providers. Latency and incident response are important metrics as well. At a minimum, it’s vital that you track how long it takes to respond to any given incident. Also, organizations should be aware of what latency levels are acceptable. Usually measured in milliseconds, latency—the quality of the connection within the network—is the amount of time that has lapsed. Latency can include the time it takes to process information, handle a request, or complete any given unit of work. For increased clarity, businesses often group their latency metrics into percentiles. For example, the “median latency,” or 50th percentile latency, accounts for the maximum latency for the fastest 50% of requests. Another popular metric, the 99th percentile latency, refers to the maximum latency for the fastest 99% of all requests. Whenever possible, the business impact of latency should be quantified. Organizations should always have clear-cut KPIs that measure the impact of downtime, latency, and lost data.

Carefully define your metrics

Once your organization has chosen the right metrics to focus on, it’s important to define these metrics appropriately within the cloud SLA. Metrics such as uptime percentage, mean time to resolution (MTTR), and incident response time absolutely need to be defined carefully. As a quick example, depending on the wording in the SLA, an automated email response could potentially qualify as fulfilling the cloud service provider’s incident response time metric. In all likelihood, an email won’t adequately address the problem at hand; hence, it’s vital to dig into the details and assess exactly how the metrics are defined. Additionally, whenever possible, be sure to use metrics that can be automatically monitored to trigger alerts. As an example, if a latency alert is triggered, IT personnel should be alerted, allowing them to troubleshoot whichever network issues are causing the latency at hand. Generally speaking, the more automation in place, the better.

Be sure to maintain a good relationship with your cloud provider

Should agreed upon service levels not be met, the customer likely will receive remedial services as well as service credits, which are financial credits that are deducted from the customer’s bill. It’s important to have service penalties in the SLA to keep the cloud provider honest; however, many customers consider offering their cloud service provider monetary bonuses as a reward for good service. Also, earn back provisions can elicit some goodwill from providers as well. Earn backs are SLA provisions through which service providers regain service-level credits if they perform effectively for a given period of time. Also, as mentioned before, SLAs are living documents; it’s important to renegotiate them periodically, especially if your company is growing quickly. That said, such negotiations can be tumultuous, so it’s wise to limit these renegotiations to once or twice a year.

Put simply, when it comes to cloud SLAs, the devil is in the details. It’s vital to know what metrics to track (e.g., median latency, uptime percentage, incident response time, MTTR), and to assess exactly what monetary impact these metrics have on your bottom line. Also, maintaining a good relationship with your cloud provider is key; after all, cloud SLAs should foster a symbiotic relationship that benefits both parties.

Rajesh Ganesan

About Rajesh Ganesan

Rajesh Ganesan is the Vice President of Product at ManageEngine, a division of Zoho Corporation. He has over 20 years' experience in building enterprise IT products around security, access management, and service management. He spends as much time as possible interacting with thousands of customers around the world and is passionate about solving IT problems with a simple, yet effective, approach. He has built many successful products at ManageEngine, focusing on delivering enterprise IT management solutions as SaaS.

Leave a Reply

Your email address will not be published. Required fields are marked *