Why system resilience should mainly be the job of the OS, not just third-party applications

Business Security

Let’s Talk About Building Resilience in the Digital Ecosystem

Hey there! Last week, a US congressional hearing shed light on the CrowdStrike incident that caused quite a stir in July. One interesting point raised was the idea of implementing automated recovery systems to prevent future large-scale disruptions.

But here’s the thing: should this automated recovery responsibility fall on the third-party software vendor, or should it be a collaborative effort between the operating system (OS) and third-party applications to ensure seamless recovery processes?

Let’s Dive Deeper

Picture this: your device encounters a catastrophic boot error, resulting in the infamous blue screen of death (BSOD). This error typically occurs when crucial software fails to load during the boot process, leading to a system meltdown. Similar to a car engine needing spark plugs to ignite fuel, software components require smooth operation to avoid crashes.

So, should the onus be on the spark plug manufacturer (software vendor) to create auto-recovery mechanisms, or should the OS take charge of managing recovery processes for all third-party software?

In my view, a standardized recovery process, regardless of the software involved, is key. Imagine if every time a software update caused a glitch, the OS could automatically revert to a previous working state, offering users a seamless recovery option. This collaborative approach between OS and third-party vendors could significantly enhance system resilience.

The Road to Resilience

By incorporating OS-managed recovery for third-party software, we could streamline recovery processes and reduce the burden on individual software developers. This approach, while complex to implement, holds the potential to prevent widespread outages like the one triggered by the faulty CrowdStrike update.

Leave a Reply

Your email address will not be published. Required fields are marked *