This byline, created by BridgeView Marketing’s PR ServicesĀ for Greg Sullivan, Founding Partner at CIOSO Global, brings a fresh perspective to the CrowdStrike incident. This major cybersecurity event impacted millions of systems globally. It offers insightful advice on the often-overlooked elements of the failure, including the critical need to rethink the “trusted provider” concept and the importance of more rigorous update testing. By highlighting the real-world response and lessons learned, the byline provides practical guidance for IT and cybersecurity professionals to avoid similar future pitfalls.
Initially Posted By Solutions Review
The CrowdStrike failure, a watershed moment in cybersecurity, stands as the most significant story of the year and potentially one of the most impactful of the decade. The flawed update it pushed to Windows operating systems worldwide crashed critical machinesāan estimated 8.5 millionāand sparked a global IT outage that will reverberate for months or even years to come.
And while fountains of ink have been spilled on this subject already, for someone like meāa cybersecurity consultant with 30 years of experience in the field, someone who helped clients through this CrowdStrike incident in real-timeāI find that many articles focus on the same few points over and over again. Yes, we are now finding out what happened, how it happened, and what CrowdStrike and its competitors will do to prevent such a situation from happening again.
But a wealth of interesting aspects of the failureāand its falloutāarenāt widely known or being significantly discussed. These illuminate facets of what happened, and as our industry will be reacting to the CrowdStrike incident for years to come, the lessons we take away from this will be necessary for executives and cybersecurity/IT professionals to apply to reduce the impact of future incidents.
The āTrusted Providerā is Finished
First, the risk calculus for this type of event will be different for cybersecurity and risk management professionals from this point forward. We must elevate this kind of riskādamaging software coming not from a hack or social engineering attack but from a relied-on provider. The days of trusting partners, even those in charge of security, with absolute impunity are over.
Instead, we must treat cybersecurity updates like any other software update. This means applying greater resiliency at the end of the IT process and conducting more thorough and mandated testing before deployment. We can no longer give any partners a free pass and push their updates through. As much as we hope it wonāt, we must assume that the CrowdStrike scenario could happen again.
Prepare for Opportunistic Exploiters
While there is no evidence (at the time Iām writing this) that the CrowdStrike failure was caused in any part by a hack or malicious actions, the fact remains that when things go wrong, bad actors will be on the scene trying to take advantage of the situation. The confusion, frustration, and desperation the average person feels when dealing with their work computer that has been āblue-screenedā is real. Very quickly after this incident, a malicious file started making the rounds, claiming to be a quick fix to the problemābut the so-called āCrowdStrike hotfixā was simply malware that was reaching more people than usual as desperation replaced sensible actions.
While this type of situationāa trusted provider being the source of a widespread outageāmay not have been on our radar before, it is now. All employees must be prepared and know what is and is not protocol during these incidents. Thereās always more cybersecurity training for employees at every level, and it should incorporate lessons from the CrowdStrike failure. Start now; donāt let the next incident be the beginning of your teamās learning curve.
A View Into The Cure: Implementing The Fix
Some may be wondering what the breadth of responses and fixes looked like. Sure, we all have read countless retellings of the scramble that airlines and significant healthcare organizations went through in the aftermath of the outage. However, small to mid-sized businesses and local governments worldwide experienced just as much pain. Even in centralized offices, getting machines back up and running was manual and laborious because servers that would have run a fix and pushed it out were also down.
Given the number of devices being deployed remotelyāas weāve all enjoyed a more flexible approach to remote work in recent yearsāfixes were more complicated. In some cases, those fixes were unwieldy or borderline bizarre. IT personnel were going door to door in some locations, fixing machines one at a time. In cities, some organizationsā employees were told to bring their work computers to a central location to be worked on. Sometimes, IT would be forced to walk people through the fixes over the phone. Anyone whoās ever tried to help a relative troubleshoot their laptop over the phone knows how far from efficient that process is.
In some cases, we even saw organizations breaking some of the most cardinal rules of basic security hygiene. I heard of flash drives with the script to implement a fix being mailed nationwide. CrowdStrike knocked us back a few decades regarding the technical sophistication of our fix implementations.
Cybersecurity Tools: N+1 RedundancyĀ
We must reevaluate our processes and tools closely because we canāt rely solely on our ātrusted providerā anymore. I suggest making the most of all the cybersecurity tools at your disposal. Some IT departments turn off overlapping software security featuresābut this is a mistake! Itās additional work to run more processes, but that extra security insight can be helpful as a āchecks and balancesā to confirm an exploit or even pinpoint an utterly different attack surface not identified by other cybersecurity tools.
For example, Iāve seen organizations run multiple vulnerability checks on the same data set or database from different software programsāwithout fail, the results are never the same. Cybersecurity tools are imperfect, but using multiple tools can help mitigate risks. Itās essential to gather information through the lens of different cybersecurity systems and focus that data onto a single-pane-of-glass dashboard where a holistic analysis can yield more informed decisions.
Organizations should look to enable N+1 redundancy in their security measures. The concept of āN+1 Redundancyā refers to a system design principle where there is an extra ā+1ā component for every āNā component necessary to ensure continued system operation. In cybersecurity, this means having a backup cybersecurity solution that is as effective as the primary one.
That said, we may see some companies adapt their updating policy and move to an n-1, or in rare cases, n-2 model for incoming patches. With these updates representing possible risks, thereās an argument for waiting for one cycle of updates to ensure that any similarly disastrous bugs are discovered by others and not oneās own company. Of course, this means that companies will have to live longer with known vulnerabilities, but this is mitigated by avoiding the potential disruption of a bad update. Of course, it might have to be case-by-case on what patches can be done in an n-1 timeframe, which is so critical that the risk must be taken to update them immediately. Each company should set its policies and decide on the right approach.
Whatās Next? Review Your Incident Response Plan
Once your company has recovered from this outage, thereās no better time to reassess yourĀ incident response plan. All IT departments should hold a meeting to review the effectiveness of their response to the CrowdStrike incident and ask questions such as:
- Were clear roles, responsibilities, and policies established?
- Was the current plan executed as it was written up? If not, why?
- When was the last time the communication plan was updated?
- Were the proper individuals notified in the correct order?
- Who is responsible for triage to assess the severity and impact?
- Who is monitoring the restored systems and documenting the effectiveness of all actions?
- Is the incident response plan updated to properly account for the ātrusted providerā scenario discussed here?
With every new incident, reassess the plan and keep it fresh and current by performing tabletop exercises to flesh out any weaknesses or outdated information. Remember, a tabletop assessment aims to test an organizationās preparedness and response against realistic scenarios. Itās crucial to conduct a debriefing, a.k.a. hotwash, where an after-action review is performed to analyze strengths and weaknesses.
Conclusion: Pivot and Prepare
The CrowdStrike incident may have been unexpected, but a similar one in the future should not catch organizations off guard. We need to operate under the assumption that such a failure from a ātrustedā partner will happen again, somewhere, sometime. To prepare for the next occurrence, IT needs to treat cybersecurity updates like they do with standard software patches. Following these established procedures means reviewing any update notes, setting up a test environment and applying the update, getting the necessary approval for deployment, staggering the deployment, and informing all relevant stakeholders.
Our digital ecosystem is too interconnected. Trusting software updates with impunity, circumventing the procedural steps taken for other patches and updates, will continue to thrust our fragile IT habitat into chaos. Apply the ācommonsense filterā to software updates; the devil is in the details, and chaos is in the code.
0 Comments