Business Continuity Management / Disaster Recovery , Governance & Risk Management , Video
ISMG Editors: The CrowdStrike Outage - One Week Later
The Recovery Progress, Impact on Commercial and Public Sectors, and Lessons Learned Anna Delaney (annamadeline) • July 26, 2024In the latest weekly update, Information Security Media Group editors discussed the massive CrowdStrike IT outage that crashed 8.5 million Windows systems and severely affected the healthcare, finance and transportation sectors. Here's what you need to know one week later about the recovery, impact and lessons learned.
See Also: Enterprise Browser Supporting Healthcare, Cyber Resilience
The panelists - Anna Delaney, director, productions; Mathew Schwartz, executive editor of DataBreachToday and Europe; Marianne Kolbasuk McGee, executive editor, HealthcareInfoSecurity; and Michael Novinson, managing editor, ISMG business - discussed:
- How buggy testing software led to the Crowdstrike outage and the lessons that can be learned to improve system resilience and security;
- The overall impact of the outage on the healthcare sector - as well as other industries - and comparisons to recent ransomware and other cyber incidents;
- The potential business impact of the incident on CrowdStrike and other endpoint security vendors.
The ISMG Editors' Panel runs weekly. Don't miss our previous installments, including the July 5 edition on AT&T’s ransom payment in the Snowflake breach and the July 12 edition on what the Crowdstrike outage taught us so far.
Transcript
This transcript has been edited and refined for clarity.
Anna Delaney: Hello and welcome to the ISMG Editors' Panel. I'm Anna Delaney, and today we're focusing on the recent massive IT disruption caused by a faulty CrowdStrike update. To recap, on July 19, this outage led to 8.5 million Windows systems crashing, severely impacting critical sectors like healthcare, finance and transportation. So today, we'll be discussing the current state of recovery, the impact on various sectors and the lessons learned from this significant incident. To help us do so, I am joined by Marianne Kolbasuk McGee, executive editor of HealthcareInfoSecurity; Mathew Schwartz, executive editor of DataBreachToday and Europe; and Michael Novinson, managing editor for ISMG business. Brilliant to see you all.
Mathew Schwartz: Thanks for having us.
Delaney: Mat, you've been keeping on top of this story from the get go. You did a fine job of bringing us up to speed on initial events on Friday evening on our previous Emergency Panel with Ian Thornton-Trump. Nearly a week on, we've seen that the recovery has proven to be slow and labor-intensive. But, despite these challenges, over 90% of affected systems have been restored - thanks to the proactive efforts of CrowdStrike, Microsoft and many IT teams out there. So where are we now Mat?
Schwartz: By all accounts, it was a very long weekend for a very great many IT teams who have been tasked with having to clean up the mess, as it were. To give a brief history. Early Friday, we'll say about after midnight East Coast time in the United States, CrowdStrike pushed a faulty file, and we'll get into that later, but for a period of 78 minutes, this faulty file got downloaded to all systems that were online and running its Falcon EDR, what we used to call antivirus software, and if they got this, it crashed. The computer rebooted, started up CrowdStrike, crashed the computer, rebooted and got stuck in this endless loop. So, once CrowdStrike excised the file, in some cases, systems were able to reboot and they were okay, but we know that for several systems, that didn't happen. So to help them, CrowdStrike as well as Microsoft have released some tools to help. Most of these need to be loaded onto your USB drive and plugged into a system. Logistically, this creates complications. If you are a remote employee, you've got to come back to the mothership or maybe you courier your computer back to headquarters to get it fixed. But that has taken a lot of time, and the individual process for recovering a computer can take up to 10-20 minutes each, especially if you need to input a BitLocker key. So, best practice is to encrypt your hard drive. Lot of companies use BitLocker from Microsoft, and when you try to recover from this faulty update, you first got to unlock the hard drive. Getting those keys and getting them out to users and the IT team is another logistical challenge. So, a few challenges here. We've seen some help being provided in the form of utilities. Other people have been documenting more rapid approaches that they've been able to use. We heard late Tuesday that a fix previewed by CrowdStrike had come to pass, which is they have now eliminated the bad file that they had pushed. They've added it to their own bad file lists. So, if their software finds it, it nukes it. It says, in some cases, this will now recover systems. You've got a bunch of windows hosts here. You've got Windows computers like laptops and desktops, servers, virtual machines - all sorts of things. As you mentioned, according to some estimates from IT asset tracking firms, it looks like we were at about 93% recovery, which is great. By Monday morning, Microsoft estimated that about 8.5 million hosts were affected. Maybe about half a million to go as of Monday morning, but still that's great progress. As we saw, a lot of the organizations using CrowdStrike aren't your mom and pop shops. They are some of the largest organizations in some of the most sensitive sectors, as you mentioned in your introduction. So, that's why we've seen such massive knock-on effects from Delta Air Lines in particular. It has been hit hard. On Tuesday, CrowdStrike published a preliminary report into what happened. That gave great detail, and we'll get into this. The transparency has been pretty good. The firm also recommitted to publishing a full root cause analysis once their investigation is complete, and they've also promised to put some fixes in place to make sure this sort of thing never happens again. All of the sort of things you would be wanting to hear not that many days after this whole big debacle went down.
Delaney: Mat, what are the lessons that we can learn from this in terms of improving system resilience and security?
Schwartz: There are some big picture issues that need to be addressed here. For example, the ability to work directly with the kernel in a Windows system. The Mac OS ecosystem and Linux are much more closed. They do not allow this. Neither Mac nor Linux endpoints running CrowdStrike had a problem here. This is a bigger picture thing though. A lot of people are saying, "Why when Windows failed, did it fail into an endless loop? Why did it not say bad CrowdStrike EDR software! You're not allowed to run. Put it in the timeout box." These are things that hopefully will start to get addressed, and Microsoft will have some impetus for doing so and also a responsibility to make it happen. Some other people have been saying, we need to be able to zero trust our cybersecurity software. It shouldn't have these God-mode-like rights to the system. It's too easy for things to go wrong, or this can also be abused, for example, by nation states if they're able to weasel their way into this sort of thing. So, big picture resiliency questions here. CrowdStrike has caused them to be posed. We've seen this sort of thing before, not just with CrowdStrike. So ideally, it wouldn't have happened. But, accidents do happen, as we were discussing on Friday in our Emergency Editors' Panel. Hopefully, we'll see some more big fixes as a result of this. Oftentimes, that doesn't happen, but hopefully we will see some follow-up here, especially from governments to make that happen.
Delaney: What about how we test and deploy software updates and security patches? Do you think there's going to be change on that front, more generally, in the industry?
Schwartz: There will definitely be change from CrowdStrike or so they have pledged. There's a number of things they can do, number of things they probably should have already been doing in terms of testing and then also not rolling everything out to everybody all at once. They're going to do more of a canary in the coal mine approach, which is what companies like Microsoft, with its endpoint protection platform, had already been doing. There's a lot of stuff that they can do and that they should be doing. Hopefully, we'll see other companies learn these lessons. But, we have seen this sort of thing before even with antivirus software, where a bad update has a massive fallout. So, we are going to need some bigger picture changes here.
Delaney: Mat, excellent work. We'll be talking about this for a while longer though. Thank you. Marianne, unfortunately, the faulty update disrupted healthcare operations worldwide, forcing hospitals to cancel procedures. Here in the U.K., many NHS services faced operational issues, which meant we were back to the old days of manual processing. What can you tell us about the overall impact on the healthcare sector?
Marianne McGee:The CrowdStrike Microsoft outage was the last thing that the healthcare sector needed to be dealing with, considering that it was, is, and remains a big target for all sorts of other disruptions, such as ransomware attacks. In the U.K., NHS facilities were affected, but in the U.S., there were hundreds of organizations also reportedly affected, and the impact was on a variety of different kinds of patient services - from lab collections, secure file transfers, transcription services, manufacturing, phone systems, electronic medical records, pharmacy orders, insurance billing and many other sorts of processes, according to the Health Information Sharing and Analysis Center last week. Also, some medical devices that rely on Windows were impacted, among the various healthcare tech that was affected. Electronic health record vendor Epic was also impacted for a while. The company told me that its video client for telehealth visit features were unavailable in the early hours of the outage, but the company quickly restored access to those features. Nonetheless, Epic said that while the CrowdStrike update didn't directly affect the company's EHR software and services, the incident did cause technical issues that prevented some healthcare organizations that use Epic from accessing their systems. In Massachusetts alone, about 40 hospitals, including some of Boston's largest medical centers, including Mass General Hospital, were affected. Right now, it seems that most of the services have been restored at many of the affected healthcare facilities that were forced to either cancel or reschedule patient appointments and procedures last Friday and resort to using paper processes for charting and other work. Massachusetts' Department of Public Health last night told me that they had been helping affected healthcare organizations in the state with downtime procedures on Friday. But, as of Tuesday evening, the hospitals and healthcare providers that were previously affected seem to now be without any reported ongoing disruptions. Now, some of the affected entities have dealt with these kinds of IT disruptions before, with ransomware attacks on either their facilities or even the impact of a neighboring hospital that's been hit and disrupted. And now, hospitals in the region get stuck having to deal with the overflow of people who can't be treated at the hospitals that were hit. But, the bottom line of this outage is the latest reminder for the healthcare sector to expect the unexpected and to have their business continuity and disaster plans ready. This also means having staff, especially the younger employees, that are so much more used to working with electronic health records and other sort of IT tools to be prepared to handle clinical processes manually and on paper in case the EHRs go down. So unfortunately, 1000s of entities in the U.S. have already suffered due to a vendor IT service outage this year, especially during the Change Healthcare attack, and this was the latest reminder of more unwelcome IT incidents and perhaps other cyber incidents, although this wasn't one, that they'll have to contend with. It was sort of another dress rehearsal for these entities.
Delaney: Interesting point about the younger generation there. But, you made the comparison with what the sector's been dealing with in terms of ransomware attacks and other cyber incidents. Are there any lessons learned from them that are being applied now?
McGee: Industry groups such as the Health-ISAC and the American Hospital Association are advising hospitals and other healthcare providers to use this latest incident as another opportunity to review and practice business and clinical continuity procedures that include identifying and rehearsing downtime procedures for all internal and third party life-critical and mission-critical technology services and supply chain sort of things. This includes testing cyber incident response and emergency preparedness plans as well as communication channels if one major source of communication like phone systems and email go down. Experts are also recommending that the healthcare entities overall plan for technology disruptions and cyber incidents on a regional basis as well, and in the near term, these experts are recommending that on the heels of this incident, hospitals and other healthcare organizations should be on the alert for increased email phishing scams and other sorts of schemes that relate to this disruption. So, we know that so many breaches in healthcare have been linked to phishing and business email compromises, which we need to looked out for.
Delaney: Thanks Marianne. Always concerning when healthcare entities are disrupted. Thank you. Michael, you've been exploring the potential business impact for CrowdStrike and other endpoint security vendors. What insights have you gained so far?
Michael Novinson: As Mat highlighted in his response, in general, people have been pretty pleased with CrowdStrike's response. Okta, for a couple years, has had a handful of security incidents in recent years. Certainly, with some of the earlier ones, there is the feeling that they were not sufficiently forthcoming and they took too long to divulge information. People do have higher expectation for security vendors. Of course, this was not a security incident per se. It was an IT incident. But still, there's the expectation that, if you are in this industry, you should be setting the standard, and for a lot of people, what was then FireEye now Mandiant, would be the gold standard for disclosing an incident what turned out to be the SolarWinds incident back in November and December of 2020. If that's the gold standard, this is probably a very strong silver, certainly coming out with details in just a few hours. Certainly, there was a heartfelt message from the CSO and more factual updates from the CEO. George Kurtz - the CEO - was on The Today Show the morning over the incident occurred. He looked very tired, but was still there. Mat was talking about having this preliminary description of what happened coming out within a week, and five days later, a good report was provided. This is the type of information that people were looking for to understand what specifically happened and what's going to be different. To be getting at least that type of understanding within a week of a massive global incident, which has had people working 24/7, is very reassuring for customers and prospects. In terms of financially, CrowdStrike did file an 8-K with the U.S. Securities and Exchange Commission on Monday. The long and the short of it was essentially to be determined. We continue to evaluate the impact of the event on businesses and operations. So essentially, we have no idea yet, which is understandable. They've been a little busy, but they're reporting earnings in early September and certainly at that point people are going to want a lot more transparency around the financial impact. So, that's going to take multiple forms. First, in terms of their business itself that's going to be about, are we seeing delays in deal closure? Is there additional due diligence that prospects in particular are doing or customers who are facing renewals or considering upsell? Certainly, there's going to be additional questions around quality assurance and testing processes. So, one would expect that at least for the next quarter or two; it may take longer for deals to close. Then, the other thing would be competitive win rates. Certainly, there's been blowback to companies that have tried to capitalize too much publicly. Notably, Cybereason had set up a so-called emergency hotline with the phone number 1833-NO-CROWD. They were very harshly criticized. The hotline was essentially their salespeople or ... and they've removed little traces of the press release, but I certainly imply that people are certainly emphasizing the risk of a modern culture, emphasizing the risk of platformization, and are trying to make it clear that you don't want to have too many eggs in one basket - this provides assurance. So, do we see an uptick in win rates for competitors in the next quarter or two? Certainly that could slow things down. Then, the other side of this is the cost for CrowdStrike, certainly the direct remediation cost, which I'm sure will be a high number. In addition, supplementary costs for lawsuits are expected. From an investor standpoint, that's not necessarily a huge deal. Investors are looking to turn a quick buck. Any type of a lawsuit is going to take years to resolve. We can see from SolarWinds that the settlements came years later. So, for quarterly capitalism, we can expect people looking to make quick easy money and suing in the coming weeks. There's not going to be resolution for years, and I'd assume that they're going to be bundled up into a class action. So, that's further down the pike. Obviously, there's questions around insurance and warranties at this point, since it is not a cyber incident. All indications at this point seem to be that cyber insurance will not address this, meaning that organizations are on the hook. CrowdStrike's warranty - my understanding from analysts who I have spoken to say that the warranty does not include this type of an IT incident. It includes a security incident, but not an outage caused by faulty software update, certainly for folks like Delta Air Lines and other organizations that were severely disrupted. There's going to be a lot of pressure to make things right, whatever form that takes. Because obviously, this was a massive business disruption. Depending on the size of the account, if you just tell people, "Well, you're on the hook for it, you signed on the dotted line," - that can certainly lead to a whole lot of customer attrition as contracts come up. People do point out that EDR is fairly sticky. It's not the easiest thing in the world to change. There's harder things to change, like secure web gateways are super hard to change, but it's not a super easy change to make. There's certainly not going to be a mass exodus, but on a macro level, there's going to be some questions. Certainly, the first half of 2024 was kind of the year of platformization. Palo Alto Networks coined the term. And when you look at stock prices, you can see how well CrowdStrike and Palo stocks were doing compared to many of the other publicly traded security companies that were narrower in scope. Certainly, this provides a decently strong counterweight that you don't want to standardize too much on a single vendor and you do want different vendors doing different things to provide some redundancy. There's been some talk about whether well-resourced organizations use multiple EDRs. Allie Mellen at Forrester was saying that's pretty hard to do if they're on the same system. If you're a manufacturer and you have different systems in different parts of the company, like you have the manufacturing systems and you have your standard corporate systems, maybe you can use separate EDRs, but it's not a realistic thing for most organizations. So, long in the short of it, impact will not be zero. It never is. The impact will probably be concentrated to the next three to six months. Heading into 2025, the impact to stock price or revenue growth or outlook will probably be minimized. Overall, the impact to date seems like it's going to be fairly minimal, because certainly all indications are that the response has been good. At some level, people judge you more for how you respond to bad things than the actual bad thing that happened.
Schwartz: Yeah, and I'll underline what Michael just said there. A lot of the IT people that I have been interacting with or reading their middle of the night missives about all this have said, "I wish it hadn't happened, but I do need to give them credit for being transparent and same again with just not even five days later, coming out with a preliminary report outlining how they screwed up." So, one other thing that I was thinking as we were talking here was about the transparency aspect. It's interesting because another EDR company, Kaspersky, when it got into hot water, tried to get out by creating so called transparency centers, where you could come in and look at their code to assuage yourself that nothing untoward was going to happen. And that made me think about how no other AV company said, "Oh, that's a great idea. We should do that too." There was a resounding silence, from the likes of Symantec and the other big players back at that point. I would love to see that now. If a company can't get its act together or if you're looking for assurance, you should be able to go in and audit their code. Right now, it is functioning very much as a blackbox, and that doesn't cut it, especially when, for example, if a nation state can get in and cause some mischief, how would you know? That was one of the questions with Kaspersky. So, I do think some unanswered questions are coming home to roost here.
Delaney: Fantastic points. Yeah, I'd love these to be answered. But, a question for Michael. Do you think these events could benefit CrowdStrike in the long term? Could it leverage its response to the outage to eventually strengthen its brand and mark its position?
Novinson: That may be a bit far for me. They're worth $93 billion before this, so I don't think. They're doing quite well. In particular, we have seen this bifurcation in the economic recovery, where between the haves and the have nots, public cyber companies, such as CrowdStrike, are very much on the have side. So, in that way, it would be hard to be a net positive to these things. We are already so positive, but there certainly will be a slight negative in the long run. Obviously, they've been extraordinarily critical. George Kurtz has been one of the most vocal critics of Microsoft, considering their lax approach to security and insecure architecture and the fact that they try to productize and upsell everything. This was not a security incident, but a bit of a glass house stones territory, where like everybody has bad days, maybe the release of Microsoft has a lot of bad days is because they're so broadly used. People go after them, because you can have such a broad impact. That certainly has been a pretty consistent part of their strategy, both from the CEO on down through the chain. Chain has been extremely critical of Microsoft's approach to security. Certainly, Microsoft is their biggest long-term competitor across endpoint and cloud. It makes it a little harder to engage in that type of criticism when you've had such a high-profile incident yourself. So, I'm imagining some of that might get toned down, which maybe isn't that positive for them, because certainly they have felt it was good. They certainly thought it was helpful to draw that contrast in the past.
Delaney: Great stuff. It's amazing to see how many different angles exist for this incident. Thank you Michael. Finally, just for fun, if you could travel back in time to the early days of the Internet, what one piece of cybersecurity advice would you give to the pioneers.
Schwartz: I'm going to go out the gate here Anna and say, "Please, for the love of all that is human, don't bolt on security as an afterthought, because we're still dealing with the repercussions of so many protocols that didn't have security in mind."
Delaney: But, you're going to say, "Ban password '123' to like don't use your names."
Schwartz: It would definitely be a top 10 man.
McGee: Mine is sort of similar to Mat's, but mine's more, of course, healthcare focused. In 2000, on the internet, security became important for the healthcare industry. In 2004, with George W. Bush as president, a goal was set for all Americans to have electronic health records by 2014. Then, you had Obama administration in 2008 pass the HITECH Act, which paid hospitals and doctors to implement electronic health records. When you look back, you kind of wish that there was more emphasis early on on resiliency and redundancy. And what do we do when everyone's taking for granted that these patient records are available? What do you do when they're not available? So, it's kind of like now we kind of bolt on new policies and procedures. They didn't think it through, because maybe they never dreamed that it could happen where these systems just go down and everybody panics.
Delaney: So true. Michael?
Novinson: For me, that's a bit more consumer-oriented, but multi-factor authentication from the onset. We all work in the industry. We have had family and friends ask us, what should I be doing to be more secure? What's your biggest piece of advice for me? And that's almost always what I come back to. Now, it's almost the default, but I've been in the industry long enough that what it wasn't, I'd say, always opt in, even if they're making it the option, even if they're giving you the option not to do it. Yes, it's inconvenience. But, it's absolutely worth paying that it just is that additional layer, and now the conversation's obviously more evolved with possibly this efficient and resistant MFA, but that second factor prevents the vast majority of people trying to impersonate from being successful. I still can't believe when I see security questions. My credit union still asks me security questions. I can't believe we're still doing this in 2024 when we live our lives online. Maybe people didn't anticipate how much of our lives were going to be online, but security questions are all publicly available information. United Air Lines is still using security questions if you're setting up frequent flyer accounts, which I couldn't believe. I was setting some up my account a few weeks ago, and I deducted questions, randomly generated numbers and pinged them or sent them via email. But, why are we still doing security questions when that information is probably already living on the dark web.
Schwartz: I make it up Michael, and I track what I've made up in my password.
Delaney: I am with you Michael. I was going to say, build an MFA from the start. Think of all those incidents we could have avoided. So, there you go. Thank you so much for playing on, and thank you so much for the excellent discussion. Lots to unpack with this story. So, thank you for your thorough analysis.
Schwartz: Thanks for having us on.
Novinson: Thank you Anna.
McGee: Thanks Anna.
Delaney: Thanks so much for watching. Until next time.