CIO OPINION
Post event reactions and recommendations
James Maude , Field CTO , BeyondTrust
While any piece of software can be unstable or have bugs , it is particularly an issue for security vendors such as CrowdStrike , as they have a very deep integration into the operating system in order to monitor and protect the endpoint . This means that any bugs or instability can cause the entire operating system to crash which appears to be what we have unfortunately experienced .
There are a few strategies to mitigate the risks of unstable software updates , but ultimately it starts with the vendor conducting rigorous QA in test environments that are as representative of customer environments as possible . Then , having a phased deployment process , gradually rolling out the updates , in stages , to groups of real users , to ensure the software is stable in real world environments before deploying to all users . In this case , it appears that the vendor was confident in the update and had deployed it at scale .
Rick Vanover , VP Product Strategy , Veeam
This outage by CrowdStrike highlights the dependencies of the hyperscale public clouds , the Internet , and more for critical leading services . In this era of software as a service offerings , SaaS powered in the cloud ; this is a risk that we take . Generally speaking , hyperscale public cloud services offer better availability than most organisations can offer in their own data centre practices . While a good record of accomplishment is comforting , it is important to have a tested process in place to handle scenarios such as this to diminish business disruptions .
Ray Umerley , CISO , Coveware by Veeam
This potential issue can arise regardless of the product . Applications like Endpoint Detection and Response , EDR tools are intricately connected with the operating system through APIs and system calls to perform their functions effectively . This dependency allows EDR tools to access critical system logs , processes , and network activities , enabling the comprehensive threat detection and response capabilities they are designed to deliver .
Any changes in the operating system or the EDR tool can disrupt this dependency , potentially leading to system crashes or disruptions . Updates or modifications in the OS that alter APIs or system behaviours can cause compatibility issues , while changes in the EDR tool ’ s mechanisms may fail to align with the OS , affecting overall system stability and security functions . Organisations should recognise this dependency and the potential for future disruption .
• Check agent automatic update settings for your endpoint protection tool .
• Ensure the settings are consistent with your existing organisational change control policy and the desired state to match your organisation ’ s risk tolerance .
• Ensure any patching of vulnerabilities are thoroughly tested prior to deployment .
• As a best practice , stage updates in increments to avoid 100 % failure .
• Check with vendors to ensure all updates honour the staged update policy .
• Actively manage burnout and fatigue in your team because fatigue increases the risk of error .
• Consider rotating operational staff , and provide resources to alleviate stress in collaboration with HR .
Long-term : Actions to be taken over eight to 12 weeks
• The primary focus for long-term actions is to mitigate or reduce the risk of the same level of business impact or exposure caused by the CrowdStrike outage :
• Review prevention , response and support procedures for large-scale outages .
• Many organisations report they are unable to handle the sudden large volume of support requests .
• Check and update downtime procedures for key operations , and revise crisis communication plans , incident response processes and business continuity management , IT disaster recovery plans .
• Ensure key employees with response and recovery responsibilities have the necessary competencies and are involved in testing enterprise systems .
• The CrowdStrike outage reinforces the need to focus on resilience .
• Use a top-down approach to connect the approach to overall strategic objectives .
• Assess the operational hit before deploying a security agent by weighing the impact against the expected security benefit .
• Endpoints ’ agents have unavoidable consequences on performance and vulnerabilities to updates on other applications .
• Protect against threats by selecting endpoint security tools that use end-to-end user behaviour analytics , containment , machine learning , and endpoint detection and response , as well as legacy techniques such as the use of signature-based antivirus software .
• Evaluate the efficacy of current endpoint protection mechanisms to identify areas requiring improvement to forestall recurrence of a similar incident . p
Source : Minimize Disruption From the CrowdStrike Windows Outage by Gartner , 19 July
40 INTELLIGENTCIO AFRICA www . intelligentcio . com