Navigating Cloud Outages: Ensuring Operational Resilience

© Getty
How can you mitigate the risks associated with the outage of a major cloud provider?

During a major cloud provider outage, such as the recent incident with Microsoft, businesses encounter significant operational and compliance risks that can severely disrupt their operations.

The outage left banks without visibility into their internal processes, a situation akin to "flying a plane without instruments."

"Critical surveillance systems went dark, while other monitoring tools experienced partial disruptions. This resulted in a situation where, even if some systems were partially operational, businesses couldn’t tell if they were functioning properly," says Don McElligott, VP of Compliance Supervision at Global Relay.

Don McElligott, VP of compliance supervision at Global Relay

The scenario raises serious concerns about the reliability and effectiveness of the entire surveillance infrastructure. Without proper visibility, companies are at risk of non-compliance, as they cannot ensure regulatory requirements are being met or that necessary data is being captured.

Mitigate operational risks during major cloud provider outages.

Contrary to popular belief, utilising a cloud vendor's services does not automatically guarantee redundancy and reliability: "If a business outsources its compliance functions to a vendor that operates on the same cloud infrastructure, that business remains vulnerable to the same risks – all your eggs stay in the same basket," says Don McEllgiott.

"Businesses should prioritise partners that use different cloud services from their own to spread the risk so, if one cloud provider goes down, the entire compliance system isn’t compromised."

 Additionally, businesses need to scrutinise the robustness of the partner’s disaster recovery and business continuity plans. This includes assessing their capability to maintain operations during outages, their transparency regarding system status, and their ability to provide continuous coverage.

Best practices

"One key strategy is to work with vendors that use private clouds or their own independent infrastructure. These vendors won’t go down when other major cloud providers experience outages, spreading the risk and insulating the vendor’s systems from public cloud disruptions. Another critical practice is demanding transparency and accountability from vendors.," says Don.

"Businesses must ensure that vendors provide clear and honest visibility into system status and operational coverage. This includes real-time insights into which systems are fully operational and which are experiencing issues. Vendors must also be held accountable for maintaining this transparency and be able to demonstrate their ability to deliver continuous service. Finally, businesses need to conduct regular audits of their systems and processes to verify the reliability of their disaster recovery plans and the effectiveness of their monitoring tools.

To build resilience and mitigate operational risks associated with heavy reliance on single cloud providers, one key strategy is to work with vendors that use private clouds or their own independent infrastructure. These vendors are insulated from public cloud disruptions, spreading the risk. Another critical practice is demanding transparency and accountability from vendors. Businesses must ensure that vendors provide clear and honest visibility into system status and operational coverage, including real-time insights into which systems are fully operational and which are experiencing issues.

Quantify cloud downtime costs for better risk and insurance plans

"The significant fines imposed by the SEC in recent years, amounting to billions of dollars for off-channel communications and poor recordkeeping, highlight the severe financial consequences of inadequate coverage. These penalties often result from the inability to detect and record all communications."

An outage like the recent Microsoft one may have caused monitoring systems to break without alerting the businesses relying on those systems. This means that businesses may not be capturing the information they should be, without realising it.

"Months down the road, businesses might suddenly discover they haven't been capturing the required data and will be required to report to the regulator, facing potential fines and reputational damage.

To combat this, businesses must conduct thorough audits to ensure comprehensive coverage and identify any gaps that could lead to non-compliance. This involves verifying that all systems are operational and capturing the necessary data. Additionally, businesses should work closely with their insurance providers to understand how cloud service downtime impacts their coverage and to ensure their policies adequately reflect these risks.

"Proactive audits are essential," says Don. "By regularly reviewing and updating systems, businesses can better prepare for and mitigate associated risks, ensuring they remain compliant and financially secure during significant cloud service disruptions."

**************

Make sure you check out the latest industry news and insights at InsurTech Digital and also sign up to our global conference series - FinTech LIVE 2024

**************

Share

Featured Articles

Streamlining Claims: How AXA UK is Leading the Charge

As AXA launches a new online claims platform, the fusion of AI and digital services is revolutionising the customer experience and operational efficiency

Swiss Re Expands Gen AI Partnership with mea Platform

The new partnership was created "off the back of successful pilots across [Swiss Re's] reinsurance and insurance operations"

Verisk 2024 Global Modelled Catastrophe Losses

Insurance and insurtech firms need to harness advanced risk modelling to tackle rising catastrophe losses, driven by climate change and urban expansion

ServiceNow and Deloitte Webinar: Maximising Productivity

Technology & AI

Arch Insurance, Cytora Partner for Risk Intake Digitisation

Insurtech

What is an Insurance API?

Technology & AI