Many of you may recognize that I work for Broadcom and handle a lot of VMware security, compliance, and what I like to call “operational resilience” topics. However, these are my thoughts on this matter with CVE-2024-37085, done on my own time and site, and do not reflect Broadcom’s stance.
In fact, Broadcom’s stance, like VMware’s before it, is always: apply the patches or the workaround in the VMSA as your organization requires. This is good advice, if a bit terse. Lack of verbosity isn’t unique to Broadcom; no vendor can have a more nuanced stance because your environment is unique, and the context of your environment matters deeply. If a vendor tells you something like “oh you’re safe if you do X and Y” and there’s something about your environment that makes that untrue, you might suffer a breach. So it’s on you to figure out whether you’re safe or not, and if you need help with that get a security consultant or vendor professional services to take a closer look.
Frankly, the only way to be truly safe in this modern age is to not have computers. No joke. All the rest is a spectrum of risk you’re willing to accept, or not.
Anyhow, the core issue in this kerfuffle is trust, and specifically the choice to trust an identity provider (IdP) without realizing the implications of that choice. In the Microsoft post the main character looks like ESXi, but it really isn’t. It’s actually Active Directory, and specifically the fact that the admins of any IdP are implicit admins of any system that trusts that IdP.
Anyone saying otherwise should be considered dangerously uninformed.
Background
On June 25, 2024, Broadcom released a VMware Security Advisory (VMSA-2024-0013). There are two issues there: what is titled an “authentication bypass” (CVE-2024-37085) and a separate denial-of-service condition around snapshots. More on that in the next section.
On July 29, 2024, Microsoft posted about their observations related to this advisory, as security researchers do. They noted three scenarios where they were worried that threat actors could move laterally from a compromised Active Directory to ESXi. Understandably, they didn’t dwell on the compromised Active Directory idea, and that’s a very important part of this. They also noted that they’d actually only observed one of those scenarios in action.
The Vulnerabilities
The denial-of-service condition is straightforward: someone with administrator/root privileges on a VM can do something that causes issues on the host. There are patches for ESXi 7 and 8. A denial-of-service is not good but better than a data breach, in my opinion.
The “authentication bypass” is more complicated. In my opinion, it’s a stretch to call this a vulnerability. It’s more of a weakness in a feature that has been present since vSphere 5, but it’s been assigned a CVE (CVE-2024-37085) and so it’s a vulnerability now. Whatever. The feature is the ability to join ESXi to a Microsoft Active Directory domain for centralized authentication and authorization. By default, when you did this, ESXi looked for a pre-configured group, “ESX Admins” and makes the members of that group into ESXi root-level admins.
The problem here is that someone who can create a group in Active Directory could create the “ESX Admins” group and they’d be granted access to ESXi, if the hosts are domain-joined. Another problem is that someone could modify, rename, or replace the ESX Admins group, too. Unfortunately, lots of people typically can create groups in Active Directory deployments, including people who are not Domain Admins, and the attackers have figured this out.
More evidence that anything helpful to legitimate admins is also a tool for attackers. As the saying goes, “this is why we can’t have nice things.”
The “Fixes” for CVE-2024-37085
VMSA-2024-0013 discloses that in ESXi 8.0.3 the defaults for three advanced parameters changed, to tighten this up a bit. This is documented in KB 369707.
- Config.HostAgent.plugins.hostsvc.esxAdminsGroup controls what group ESXi looks for. The old default was “ESX Admins” and in ESXi 8.0.3 and newer it will be null, or “”.
- Config.HostAgent.plugins.hostsvc.esxAdminsGroupAutoAdd controls whether members of that group get automatically added as root-level admins. The old default was “true” and in ESXi 8.0.3 and newer it will be “false.”
- Config.HostAgent.plugins.vimsvc.authValidateInterval controls how often ESXi checks to see if the group membership changed, in minutes. The old default was 1440, the new default in ESXi 8.0.3 and beyond is 90. You can set this lower, too, but remember that this might be a lot of new traffic to your AD DCs if you have many hosts.
Here’s the deal, though: none of these changes fix the fact that, if you’ve trusted Active Directory, that trust can be abused. As I noted above, when you trust an identity provider, you implicitly trust the admins of that provider. That means that when the IdP has rogue, unauthorized, or compromised admins, those rogue admins can grant themselves privileges to the systems that trust the AD by creating or modifying the group that ESXi looks for.
If I’m not attached to Active Directory am I safe?
From CVE-2024-37085, yes. Remember that your vCenter Server can be attached to AD, too.
Can I just change the name of the group ESXi looks for?
If you change the name of the group ESXi looks for you’re still trusting AD. Someone who has broken into AD will simply modify that group instead… they are admins in AD, they can do what they want. Besides, you’re only going to name it “VMware Admins” or something equally obvious to both them and you. Security through obscurity (hiding stuff) doesn’t really work, anyhow.
I have 10000 hosts, can I change these programmatically?
You can retrofit these parameters with PowerCLI, and it’s easy:
Get-VMHost | Get-AdvancedSetting Config.HostAgent.plugins.hostsvc.esxAdminsGroup | Set-AdvancedSetting -Value "" -Confirm:$false
and so on with the other parameters. Test it on a single host first, please. If you want to do it cluster by cluster try:
Get-Cluster -Name ClusterName | Get-VMHost | Get-AdvancedSetting Config.HostAgent.plugins.hostsvc.esxAdminsGroup | Set-AdvancedSetting -Value "" -Confirm:$false
You may want to tell your AD admins that the traffic is going to go up, as all your hosts query that group more. Security is always a tradeoff in some way.
What about older versions of ESXi?
A common question floating about on Reddit and Twitter is “why isn’t there a patch for ESXi 7?” I cannot speak directly to that but I will say that something I learned from an AIX engineer at IBM a long time ago, when I was a customer, is that it’s very tricky for a vendor to change defaults in the middle of a major version. There’s a lot of pressure from the partner ecosystem and customers because changes require testing and requalification. This is still broadly true throughout the industry.
The inertia of the whole ecosystem gets logarithmically worse for an older version of a product, like ESXi 7. Organizations run vSphere 7 explicitly to gain stability, waiting for vSphere 8 to reach Update 3 and stop having features added or changed. Those organizations tend not to appreciate changes as much as others might. People like to quip that test is never like production, and there’s truth to that. A vendor testing something is also never the same as customers running it for real. And given that the real threat is that someone might come through AD to the hosts, which is a design decision and not a flaw in the product, it makes more sense to leave changes to older products as a customer decision.
If my hosts are joined in AD should I make these changes?
I wouldn’t change the name of the group ESXi is looking for, that’ll mean you can’t get in anymore. And if you rely on the auto-add I might not shut that off until you have another method for handling it.
Either way, I’d try it on a single host to see what the implications are for logging into ESXi, as well as what happens when you add or remove people in that AD group, and how you’ll deal with it moving forward.
The next question you’ll have is: “But if I don’t change these am I still vulnerable?” to which I’d mention, again, that the problem is actually your trust in AD, not these settings, and that I think it’s definitely a stretch to call it a vulnerability.
Can’t I just create the group in Active Directory and lock it down?
Yes, you can, and it’s probably a good idea. That doesn’t change the fact that you’re still susceptible to compromise through that AD, because of that trust. No way around that without disconnecting. The admins of that AD instance, legitimate, rogue, or otherwise, are all implicit admins of all the downstream systems, and can grant themselves access, reset credentials, and so on.
How do I secure my Active Directory?
Depends on which part you’re talking about. If it’s the insides of Active Directory, that’s out of scope for me and for the folks at Broadcom.
If it’s about using vSphere 8 to protect Active Directory I do have a video talking about all that: https://www.youtube.com/watch?v=RqRK6Qg__vo from last fall. Everything there is still valid.
Can I use the “Red Forest” design to protect against this?
“Red Forest” is a colloquial name for the now-retired Microsoft Enhanced Security Admin Environment (ESAE) design pattern for Active Directory. It does not remove trust between systems, but instead adds more layers to it. From a vSphere perspective the observed attacker behavior would still be a concern in this model, as well as with their follow-on design patterns involving cloud authentication.
I’d also note, apropos of nothing and on a completely unrelated topic, that zero trust means removing trust, not adding more.
People on the Internet say this is a very serious vulnerability…
It’s a very serious issue, but not directly with ESXi as it’s portrayed. Lots of ransomware attacks involve compromised identity providers, because the attackers know it’s a fast way to gain access to a whole enterprise. But as for stuff on the Internet, remember two things: people commenting have a variety of motivations and levels of knowledge, and fear, uncertainty, and doubt gets clicks.
If anything, though, I’m happy that this issue is a platform to call attention to the important system design topic of trust between systems like these.
I think you’re wrong and bad.
Thank you for your feedback.
So what would you do?
I said earlier that these are my thoughts, and your organization is different and unique and needs to make its own decision about what is really a design problem in a lot of IT systems (not just vSphere). I don’t know what YOU should do, but if I were doing this stuff I’d follow the advice my friend and then-colleague Viviana Miranda and I presented at VMware Explore 2023 for better access control. Here’s my current version of it:
- Isolate from corporate/enterprise IdPs so that the trust you have is explicit and well-scoped. A way forward that many organizations are choosing is to create an “infrastructure only” IdP or cloud IdP tenant, that only has the same admins as the systems that trust it (so it’s identical scope, or even more restrictive), and is very locked down, often inside the management network perimeter. Remember that if you hook a new IdP up to user provisioning systems and monitoring and whatnot, you’re recreating some potential attack vectors. Keep it simple, keep it very protected.
- Isolate your management policies for administrator accounts. Newsworthy high-profile attacks in the autumn of 2023 were possible because the threat actor called the Help Desk and had them reset the cloud IdP administrator’s password. That should never be possible. If an infrastructure admin in your organization gets locked out or needs their password reset, you need to address that in-person or through a process with a much higher level of verification. In fact, if someone with admin privileges calls the Help Desk for authentication issues, it should probably trigger an immediate incident response. Also remember that deepfakes exist, impersonating people like CEOs. If you cannot reach out and physically poke the admin needing a password reset, you should be very skeptical about who you’re talking to.
- If you can, do authorization inside your systems, not inside your IdP (don’t rely on IdP groups, for example). This makes auditing access easier, and helps prevent an IdP admin (rogue or legitimate) from quietly adding and removing themselves to gain access. It does not protect against IdP admins resetting passwords of admins, or removing controls like MFA and such, but those are more noticeable actions. The scale of some implementations might make this impractical. Not a hill you should die on, but nice if you can achieve it.
- Use Identity Federation to introduce MFA. MFA is an absolute must-have for all day-to-day access to all computers. Local accounts in vCenter Server or ESXi cannot do MFA, so those should be treated as break-glass. Use federated identity through vCenter for all access.
- Restrict access to vCenter Server to only those who absolutely need it. Do this in your permission model, as well as with the VCSA and perimeter firewalls. Your CEO does not need access to vCenter “just in case.” Nor does your Infosec team — least privilege applies to them, too.
- Severely restrict direct access to ESXi. Drive day-to-day management through the vCenter & RBAC model. If you need constant access to ESXi then there’s something amiss and it should be addressed. Remember, though, that you may still need a way to get to the host client at times, to take a snapshot of the VCSA before an upgrade, or to turn the VCSA back on after a power outage. Find a balance here.
- If you use Privileged Access Workstations (PAWS), examine their dependencies. Have you tied them into your corporate auth systems? Was that a good idea? Also, make sure that the workstation can get to host clients and whatnot in times of trouble.
- Severely restrict access to other infrastructure systems’ management interfaces. None of this stuff is unique to vSphere, your firewalls and storage arrays and network equipment and monitoring systems and so on are all susceptible to this same stuff. And God help you if you connect your backup & restore systems to any other enterprise systems in a read/write manner. That’s your last line of defense.
- Reduce permissions for service accounts to the minimum needed. This is tedious work but important. One way to reduce this work is to just not connect systems together in the first place. Why does your storage system need access to vSphere, anyhow?
- Enable IdP advanced features like conditional access, geographic location, phishing-resistant MFA (number matching), and device hygiene. This is another reason to run your own IdP or IdP tenant, as you can turn up the intensity of these things for infrastructure admins without terrorizing all your corporate users by making them re-auth every minute.
- Ensure access logs are being retained in your IdP, for as long as possible. Some of these breaches go on for months. It’s also important that the logs cannot be deleted by the IdP admins, and are reviewed, or at least accessible, to people who are not the IdP admins (separation of duties). You do not want rogue admins able to cover their own tracks.
- Ensure the management interfaces of your IdP are well-protected. Especially with cloud IdPs where there’s always a public “front door” to attack. This is not me saying cloud IdPs are bad, just that there’s different things to think about with them. Find their best practices and turn it all on.
- Keep it simple. The best access control policies are simple and easy to audit by looking at them. Same for firewall rule sets. ESXi all needs to talk to each other, so let them. vCenter needs to talk to ESXi, and vice-versa. Let them. Admins need to talk HTTPS to vCenter Server, and sometimes HTTPS to ESXi. Let them. You need DNS and NTP, but beyond that, deny all both in and out. Don’t get so bogged down in the specifics of which ports and protocols you need that you don’t have time left to do better security elsewhere, or set a trap for yourself if something changes in the future.
That’s what I’d do, based on my decades of experience doing this stuff both as an IT infrastructure admin and architect, augmented by my time working for a vendor and hearing from customers who are trying to level their security up. You do you. Don’t get stuck on one thing, and don’t get stuck in the weeds.
Beyond that, apply the hardening guidance, subscribe to the VMSA mailing list if you haven’t, and take what you see on the Internet with the proverbial grain of salt. As everything is moving around with the VMware acquisition I’m building a list of links into resources, if for no other reason than it helps me find things. The VMSA mailing list and links are at the top:
https://bob.plankers.com/vmware-security/
Good luck and stay safe.