Did CrowdStrike Learn the Lesson? I Don’t Think So

I did not see the real engineering solution in CrowdStrike’s Preliminary Post Incident Review, besides pledges better testing.

Zhimin Zhan

--

https://www.bbc.com/news/articles/ce58p0048r0o.amp

Five days after causing the horrific global IT outage, CrowdStrike released a detailed review of the incident on 2024–07–24.

“The sensor release process begins with automated testing, both prior to and after merging into our code base. This includes unit testing, integration testing, performance testing and stress testing.

In this “How Do We Prevent This From Happening Again?” section:

  • Improve Rapid Response Content testing by using testing types such as:
  • Local developer testing
  • Content update and rollback testing
  • Stress testing, fuzzing and fault injection
  • Stability testing
  • Content interface testing

Wow, there are a lot of testing terms there, even with a fancy name: “Rapid Response Content testing”. To non-IT people, it seems that this company is getting serious about testing from now on.

I don’t think CrowdStrike fully learned a lesson well from this review document. Let’s revisit the basics.

--

--

Responses (2)