![[personal profile]](https://www.dreamwidth.org/img/silk/identity/user.png)
What it really did was demonstrate bad IT practices, or IT shops that put entirely too much faith in their vendors (I could name a couple....)
The best practice for deploying an update is to have a computer lab that is isolated from your user/production network. Push the patch there, see what happens. Have a mix of machines in that environment. And with the proliferation of using virtual machines, it's not hard to do. You can have a mix of servers and workstations and different operating systems. THEN if everything works well there, push it out to a SUBSET of your production network.
Clearly that isn't what a lot of people did. They trust CrowStrike and just blasted it out. After all, it wasn't a code update, it was just like a virus update. What could possibly go wrong?
The problem was the update crashed the CrowdStrike driver, resulting in a blue screen of death upon reboot. And if the machine had an encrypted hard drive, it required manual intervention by IT boffins. All you had to do was delete one little bitty file, but you might not have had access to said little bitty file, particularly if said machine was encrypted.
Everything at the university yesterday seemed fine when I got in to work, no emails from main campus about subsystems being down, so that was nice. And it only affected Microsoft machines. Linux and Mac were safe.
To compound matters, Microsoft had some problems with their Azure cloud service, unrelated to the ClownStrike problem.
https://krebsonsecurity.com/2024/07/global-microsoft-meltdown-tied-to-bad-crowstrike-update/
The best practice for deploying an update is to have a computer lab that is isolated from your user/production network. Push the patch there, see what happens. Have a mix of machines in that environment. And with the proliferation of using virtual machines, it's not hard to do. You can have a mix of servers and workstations and different operating systems. THEN if everything works well there, push it out to a SUBSET of your production network.
Clearly that isn't what a lot of people did. They trust CrowStrike and just blasted it out. After all, it wasn't a code update, it was just like a virus update. What could possibly go wrong?
The problem was the update crashed the CrowdStrike driver, resulting in a blue screen of death upon reboot. And if the machine had an encrypted hard drive, it required manual intervention by IT boffins. All you had to do was delete one little bitty file, but you might not have had access to said little bitty file, particularly if said machine was encrypted.
Everything at the university yesterday seemed fine when I got in to work, no emails from main campus about subsystems being down, so that was nice. And it only affected Microsoft machines. Linux and Mac were safe.
To compound matters, Microsoft had some problems with their Azure cloud service, unrelated to the ClownStrike problem.
https://krebsonsecurity.com/2024/07/global-microsoft-meltdown-tied-to-bad-crowstrike-update/
no subject
Date: 2024-07-21 01:35 am (UTC)So, someone made a decision to push the update without enough testing? Quelle fucking surprise.
no subject
Date: 2024-07-21 04:32 am (UTC)Oh, how interesting! Personally, I never cared much for McAfee or Symantec. For PCs I use Zone Alarm for firewall and antivirus/malware.
no subject
Date: 2024-07-21 04:18 am (UTC)Based on these pages (https://www.crowdstrike.com/blog/statement-on-falcon-content-update-for-windows-hosts/ and https://www.crowdstrike.com/blog/falcon-update-for-windows-hosts-technical-details/), Crowdstrike pushed the bad update out at 12:09am Eastern U.S. time (0409 UTC), and pushed the fixed version out already at 1:27am (0527 UTC), and the problem only affected computers that were online during that 1.5 hour period.
I did not encounter a problem on Friday though several of my colleagues (including ones in India) did. I may have just missed it as per my notes, I finished my work up at 12:10am and must have put my laptop to sleep right around then.
no subject
Date: 2024-07-21 04:34 am (UTC)Interesting! Amazing how timing can be the deciding factor. I prevented our network at the police department from getting slammed by I Love You back in the '90s by reading a Slashdot post on it at 7am as part of my 'get to work' routine, and we pulled our firewall connection to the City to isolate ourselves and had pretty much zero infection from it as a result. Sometimes you get lucky, sometimes you get slammed.
no subject
Date: 2024-07-21 08:48 am (UTC)Hugs, Jon
no subject
Date: 2024-07-21 04:00 pm (UTC)According to one timeline in a comment on the previous post regarding the incident, a fixed update was released an hour after the buggy release. If your computer was on when the bugged release was distributed, it was bombed and would have to be fixed. If your PC was off until after the fixed release got out, when you turned it on it would be okay. The problem is, servers are usually on and connected 24/7/365, so unless someone had a maintenance window and had their network down, or quarantined updates before releasing them, they were hosed. Me, my work PC is normally on 24/7: I only turn it off when I'm going to be out of town or when I'm told to by IT. My home PC is off when I'm not using it, though it turns itself on for backups at 7am.
no subject
Date: 2024-07-21 04:08 pm (UTC)no subject
Date: 2024-07-21 04:30 pm (UTC)I hibernate my PC, which is a deeper sleep.
no subject
Date: 2024-07-21 09:53 pm (UTC)no subject
Date: 2024-07-21 05:36 pm (UTC)Also, of course, by the time some youngster starts an IT career, they've been personally subject to forced updates for at least a decade, on their cell phone and personal computer. Unlike old fogies like me, they don't regard those updates as an imposed game of Russian Roulette - they regard them as not just normal, but in some sense the only way to distribute software, or at least the only modern way.
Remember, computers are not supposed to be reliable. Of course they crash regularly, etc. etc. ad nauseam. I'm glad to be out of that industry.
Editted to add: I see above that the mechanism used for the update gave customers no control. Surprise!
no subject
Date: 2024-07-21 06:00 pm (UTC)From the beginning I've found it interesting that CrowdStrike's own testing didn't see the crash. To me, this says they either rushed this out the door, or there's a bad flaw in their testing process. Me, none of my systems update automatically: I install them and reboot them when I want to do it! And like you, I'm glad to be out of that industry, it's a lot more pleasant spending my remaining years as a librarian. And normally licensing is on production systems, not test or development systems. But it varies.
no subject
Date: 2024-07-22 04:40 am (UTC)