Build a VM Template for Rocky Linux 9

While anyone can create a VM template, only a few know tips and tricks for making a reliable, small, and secure VM template on VMware vSphere and VMware Cloud Foundation using Rocky Linux 9. Those tips & tricks are below!

Prerequisites

To succeed while following these instructions you will need:

The installation media for your guest OS as an ISO, mountable on the virtual machine either from a Content Library, a datastore, or through the Remote Console application.
Enough capacity in your vSphere/VCF environment to run the template and a test clone.
Network connectivity for your template and a separate test clone or two. I recommend assigning your templates permanent static IP addresses on a separate, templates-only network segment, and including real FQDNs for them. One IP address per OS variant should be enough (one IP for Rocky Linux 8, one for Rocky Linux 9, one for Windows Server 2016, one for Windows 11, etc.).
Firewall rules that are enough to develop, test, and support these templates long-term. Likely don’t need ingress/incoming rules, because you won’t be monitoring these systems or running services from them, but you will need DNS, NTP, access to updates, and access to configuration management systems. You will also need the ability to SSH into the template as root, because doing all this via the console is painful without cut & paste.
Knowledge of whether your storage supports UNMAP/TRIM commands. VMware vSAN 8 ESA does by default, and you can enable it on OSA, too. Check to see where you stand on it. If your storage does not support it ask what happens if those commands are issued (the array will probably ignore it, but it’s worth asking), and when your storage vendor will support that. It’s a very effective cost-savings tool.

Create a New VM

First, we need a fresh VM. Make the new VM the latest virtual hardware version you can. See “Upgrade VM Hardware Versions” for more discussion on this.

Choose the right operating system. In this case, Rocky Linux is in the list. Alternately, you could choose Red Hat Enterprise Linux 9 for EL-family Linux distributions.

I create all my template VMs as 2 CPU, 4 GB of RAM, and 50 GB of disk. Once we are done partitioning the storage there will be 36 GB free, which is often enough space for my needs. If I need more I just add a second virtual disk and add it to the logical volume manager group.

Ensure the SCSI controller is VMware Paravirtual and the NIC is VMXNET3 and on the correct port group. Linux has had support for pvscsi and vmxnet3 in the kernel for decades, so there is no reason to use other virtual peripheral types.

In VM Options, ensure that the Firmware is set to EFI, and Secure Boot is enabled. You may want to set a boot delay as well. I set mine to 10 seconds (10000 milliseconds) so that my remote console connects in time.

Don’t be concerned about applying the Security Configuration Guide controls at this point, we’ll do that at the end.

Check the Installation Media

Get the Rocky Linux installation ISO from their site. I used the Minimal ISO, figuring I’d get the rest from their online repositories. If that’s not something you can do then make accommodations for yourself.

When you boot you have the option of testing the media. Better yet, you should check to make sure that what you downloaded is the same as what they’re serving. To do that, Rocky Linux supplies a CHECKSUM file with the SHA256 hash of the ISO right on the download page:

# Rocky-x86_64-minimal.iso: 1829634048 bytes

SHA256 (Rocky-x86_64-minimal.iso) = ee3ac97fdffab58652421941599902012179c37535aece76824673105169c4a2

In Powershell you can compute the SHA256 hash of a file with this command:

PS C:\Users\plankers\Downloads> Get-FileHash -Algorithm SHA256 .\Rocky-9.4-x86_64-minimal.iso

Algorithm       Hash                                                                   Path

---------       ----                                                                   ----

SHA256          EE3AC97FDFFAB58652421941599902012179C37535AECE76824673105169C4A2

These two hashes match so I know I have a good copy of the installation media.

Use the checksum for the file YOU downloaded, not this one! This is just an example.

Install Rocky Linux on the Template VM

Mount the ISO and boot. If you checked the SHA256 hash you probably don’t need to test the media. Select your language and keyboard.

Configure the Network

I want to be able to configure network time and other things from this installation process, so I configure the network first. I give all my templates a known IP address of their own. Because I am not using IPv6 I disable it. Toggle the switch to turn the connection on.

Set the Time & Date

Configure the time zone for your preferences. If you want to use NTP and you configured the network you can toggle this on.

Software Selection

In general, I recommend people do a minimal installation on their templates, then install the packages they need later. Only add things to a template that are absolutely necessary for all the deployed VMs to connect to however you configure your systems. In my case I will almost immediately attach the system to Puppet, which will take care of everything else.

In this case I have the minimal ISO, so the only option for me is “Minimal Install.” If you have the full DVD ISO you will have other options. Choose what you want but keep it small.

Installation Destination

I am not a fan of the cloud-esque “one big root filesystem” approach. For starters, when it fills up someday, and it will, you won’t be able to get into the machine. Another reason is that common security and compliance guidelines require us to isolate certain filesystems, like /var/log/audit. Might as well do that in the template and save some time, especially since mount points are hard to change later.

Select “Custom” under Storage Configuration, then click Done. You will be taken to the Manual Partitioning dialog.

Ensure that new mount points will use the LVM partitioning scheme. Do not check “Encrypt my data” without further consideration. It is often better to use VM Encryption or vSAN Data-at-Rest encryption to protect virtual machine storage.

Click “Click here to create them automatically” to give you a starting point.

Configure Partitions

Once you have some sample partitions, you can edit their parameters. One of the beauties of using LVM partitions is that you can grow them on the fly, dynamically, so you can create smaller template filesystems and allocate space where you need it for workloads later.

Edit and add filesystems so that you have something like what I have:

/ = 4 GiB
/boot/efi = Leave the Default
/boot = Leave the Default
/var = Add, 2 GiB
/opt, Add, 1 GiB
/tmp, Add, 1 GiB
/home, Add, 1 GiB
/var/log, Add, 1 GiB
/var/log/audit, Add, 500 MiB
/var/tmp, Add, 500 MiB

I also reduce swap to 1 GiB. Some swap is good for an OS, but if you’re doing a lot of paging to disk you should add memory.

One other thing to consider before you leave this screen: changing the name of the volume group on your Linux VM template. Each Linux distribution seems to rename the default volume group to it’s own thing, like “rl” for Rocky Linux. My installations from 20+ years ago used the name “Volume00” and I have lots of scripts that assume that’s the name. So I modify the volume group and change its name. You may want to, too.

When you’re done, you should show 36+ GiB of available space in the volume group. That’s good, it’ll allow you to grow and add filesystems for most workloads without having to resize the VM, and with thin provisioning and UNMAP support that space won’t be wasted.

Clicking Done will get you a list of all the changes. Check them, then accept.

Kdump

Are you planning for your VMs to crash a lot? I’m not. Do you have a specific need to gather the crash dump information when it does? No, I don’t, either. Can we turn this on later if we need it? Yes.

Shut off kdump so it doesn’t consume RAM we can use in more productive ways.

Root Password

Yes, leaving root enabled long-term is a bad idea. However, someone is going to have to configure this Linux VM template after it’s deployed, and it’s a lot easier to do it via SSH as root. If you don’t use root you’ll have to create another user to do it, and now you have two privileged accounts to protect. Plus, you don’t want to create an actual user in the template, because that doesn’t sit well with least privilege. Create them later after deployment.

My advice here is to set a root password you know, and then use your post-deployment processes to change it to something else after you add local users that can use sudo to elevate their privileges.

Set a root password you can type or paste (because you’ll need to, a few times) and check the “Allow root SSH login with password” box.

User Creation

Don’t. Just use root for right now, as discussed above.

Security Profile

Similar to the root password discussion, I would not do this to a template, for a couple reasons. One, not all workloads have the same requirements, so which one do you pick? Two, these sorts of things break applications, and having the ability to deploy a clean VM to see if it works without the security settings is helpful. Three, once you set it you can’t undo it. Make it part of your deployment process instead.

Begin Install

Click Begin Install and go have lunch. When you get back Reboot and let it come back up.

Post-Install Configuration

At this point you can SSH into the VM as root. You can also log into the console as root, but it’s easier to paste commands into an SSH terminal.

Remove Unneeded Software

A minimal installation is pretty sparse, but you don’t need these hardware firmware packages for a Linux VM template.

yum erase -y linux-firmware-whence linux-firmware microcode_ctl

Add Needed Software

We are going to need a few things, and unfortunately installing Perl will also install hundreds of CPAN modules. We need them for the VMware Tools customization, though.

yum install -y nano bind-utils perl open-vm-tools dbus-tools bc util-linux

Nano is a great editor and it’s simple. Editor holy wars are stupid. Judge yourself and others on the things you get done, not on the tools you used. Bind-utils gets you nslookup. Perl is needed for open-vm-tools, and dbus-tools contains the dbus-uuidgen tool that recreates /etc/machine-id during the VMware Tools customization process. You’ll need bc for my sample scripts later in this document, and util-linux is probably already there but we’ll need it for fstrim.

These are not set up as dependencies in open-vm-tools because some places want to use cloud-init instead. However, what cloud-init needs is complicated in a different way, so we’ll stick with what works. If you’re a Kubernetes person you’ll likely have a different opinion.

Enable LVM Discards

We want to enable LVM to send TRIM and UNMAP discard information down to vSAN, to help keep our VMs space-efficient over time. We want to do this in the template because, as part of the template prep, we are also going to zero out the space so the template is as small as possible.

nano /etc/lvm/lvm.conf

Ctrl-W to search for “issue_discards” and change the line from:

# issue_discards = 0

issue_discards = 1

Make sure you remove the octothorpe (hash) for the comment.

Ctrl-X to save, Y, hit enter to save it to the same file.

Enable the periodic fstrim operation with:

systemctl enable fstrim.timer

This will cause the VM to TRIM its filesystems once a week, on a random timer. This is good, it will keep your VM small on disk.

Update the Linux VM Template

yum -y update

Shut the VM down

shutdown -h now

Take a VM snapshot so you can go back to this state if you need to.

Remove Unnecessary Virtual Hardware

Edit the VM and remove the following items if present. You most likely do not need them for ongoing operations, and they represent additional attack surface, as occasionally there will be a vulnerability disclosed against these types of devices. If you don’t have them on your VMs then you are protected.

CD/DVD Drive 1
SATA Controller 0 (have to remove the CD/DVD drive first, though)
USB Controller

Save the changes. If you get errors about concurrent operations do the CD/DVD first, save, then do the SATA/AHCI controller, and save again.

Edit the VM again. This time, in VM Options:

VMware Remote Console Options, Maximum number of sessions: 1
Encryption, Encrypted vMotion: Required
Encryption, Encrypted FT: Required
Boot Options, Boot Delay: 10000 milliseconds (helps when trying to attach to the console)
Power Management, Standby Response: Suspend the virtual machine

This guidance comes from the VMware vSphere 8 Security Configuration & Hardening Guide, which I maintain. Save again.

Add Advanced Parameters for Security

By default the virtual machine has its advanced parameters configured securely, however I want to go a step further and prevent network booting. To do this I edit the VM and add:

bios.bootDeviceClasses with the value "allow:hd"

bios.bootDeviceClasses has the format “allow:XXXX” or “deny:XXXX” where XXXX is a comma-delimited list of boot classes. Boot classes are net (network PXE boot), usb (from attached USB devices), pcmcia (PCMCIA expansion cards, not used nowadays), cd (from attached virtual CD/DVD devices), hd (from attached virtual hard disks), fd (from attached virtual floppy devices), reserved (from unknown devices), efishell (into the EFI shell), all, or any (same as all).

Use of allow or deny also implicitly states the opposite. For example, deny:all disallows all boot, deny:net disallows network boot but allows all others, allow:hd allows only hd boot denies all others, allow:hd,cd allows hd then cd device boot and denies all others.

Save the changes. Take a snapshot. If something happens and the VM does not deploy correctly it’s easier to revert to a working state and try again than having to troubleshoot.

Test Cloning & Customization

From here you should test cloning and guest OS customization. The Event Log under “Monitor -> Tasks and Events -> Events” in the vSphere Client will show you if the customization succeeded, and if not, where the error log is (often at /var/log/vmware-imc/toolsDeployPkg.log). Common problems stem from forgetting to install the bind-utils, perl, open-vm-tools, or dbus-tools packages as directed above!

Template Cleanup

Once your first cloning and guest customization succeeds we can work to optimize the template a little bit. First, we remove logs and other things that might be unique to the template that we don’t need on a deployed VM. Second, we zero-out the disk space so the template is as small as possible.

I have sample Linux VM template scripts in my GitHub repository at: https://github.com/plankers/virtualization-security-compliance/tree/main/templates/linux:

setup.sh can be called during Guest OS Customization (see below)
vmprep.sh will remove log files and whatnot to make sure deployed VMs are unique
zero-out.sh will write zeroes to the filesystems, which the storage can clean up for us (see below)

I put these files in /root/setup on the template. Use curl to retrieve them:

curl https://raw.githubusercontent.com/plankers/virtualization-security-compliance/main/templates/linux/vmprep.sh > vmprep.sh

You might need to customize them a bit for your environment (if you use a different configuration management tool than Puppet, for instance). You will need to “chmod +x” them so they’re executable:

cd /root/setup
chmod +x *.sh

Once you have taken a snapshot of the VM, run vmprep.sh, and then shut the VM down. The vmprep.sh script will probably throw a bunch of errors, but that’s alright, it’s generic and will try things that aren’t valid on your template.

Repeat your cloning and customization test from before to ensure that the VM continues to come up properly. If it doesn’t succeed note the error in the guest customization log and troubleshoot it. In particular, it’s had some weirdness with the removal of /etc/machine-id, which is why the script recreates it with /bin/systemd-machine-id-setup.

VM Customization Script

One of the powerful things about vSphere is the guest customization options that are available. I use this to run a script on the newly-deployed VM that does some additional cleanup, registers it with my Puppet instance, and sets the VMware Tools security options according to the Security Configuration Guide. In the VM Customization Specification I use this as the customization script:

#!/bin/sh
if [ x$1 = x"precustomization" ]; then
    echo ""
elif [ x$1 = x"postcustomization" ]; then
    /root/setup/setup.sh
    /sbin/shutdown -r now
fi

This calls my “setup.sh” script that’s in my GitHub repository as a sample. You’ll see that my sample script removes “echoes” of the network connection (so you’ll want to customize that UUID with the one from your template). I don’t know if that’s intentional or a bug but I don’t like having two network connections appearing in NetworkManager, and it’s easy enough to fix at deployment time. I also disable IPv6 because I’m not using it, and don’t want the potential backdoor if IPv6 becomes available on my network suddenly.

If you don’t want to put everything (or anything) in the setup.sh script you can also move it out:

#!/bin/sh
if [ x$1 = x"precustomization" ]; then
    echo ""
elif [ x$1 = x"postcustomization" ]; then
    /bin/nmcli connection delete ens33 
    /bin/nmcli connection modify 'VMware customization ens33' connection.id "ens33"
    /usr/bin/yum -y update
    /usr/bin/dnf -y remove --oldinstallonly --setopt installonly_limit=2 kernel
    /usr/bin/dnf -y clean all
    /usr/bin/find /var/cache/dnf -delete -print 2>&1
    /usr/bin/find / -name \*.rpmnew -delete -print
    /usr/bin/find /etc/puppetlabs/puppet/ssl -type f -delete
    /usr/bin/find /opt/puppetlabs/puppet/cache/ -delete
    /usr/bin/find /etc/puppetlabs/ -delete
    /opt/puppetlabs/puppet/bin/puppet config set server your.puppet.server --section agent
    /opt/puppetlabs/puppet/bin/puppet config set environment rhel9 --section agent
    /opt/puppetlabs/puppet/bin/puppet agent -t --environment rhel9
    /sbin/fstrim -a
    /usr/bin/vmware-toolbox-cmd config set deployPkg enable-customization false
    /usr/bin/vmware-toolbox-cmd config set deployPkg enable-custom-scripts false
    /usr/bin/vmware-toolbox-cmd config set autoupgrade allow-add-feature false
    /usr/bin/vmware-toolbox-cmd config set autoupgrade allow-msi-transforms false
    /usr/bin/vmware-toolbox-cmd config set appinfo disabled true
    /usr/bin/vmware-toolbox-cmd config set containerinfo poll-interval 0
    /usr/bin/vmware-toolbox-cmd config set guestoperations disabled true
    /usr/bin/vmware-toolbox-cmd config set gueststoreupgrade policy off
    /usr/bin/vmware-toolbox-cmd config set servicediscovery disabled true
    /usr/bin/vmware-toolbox-cmd config set logging log true
    /usr/bin/vmware-toolbox-cmd config set logging vmsvc.handler syslog
    /usr/bin/vmware-toolbox-cmd config set globalconf enabled false
    /usr/bin/vmware-toolbox-cmd config set autoupgrade allow-remove-feature false
    /usr/bin/vmware-toolbox-cmd config set autoupgrade allow-upgrade true
    /sbin/shutdown -r now
fi

This patches the VM, cleans up the RPM cache, attaches the VM to Puppet, runs Puppet to configure the VM, TRIMs the filesystems, sets the VMware Tools security options from the Security Configuration Guide, and then reboots. If you do this sort of thing in the Customization Specification you can customize it easily, versus having to edit the setup.sh script.

Whether you move it to a customization spec or not will depend on your organizational structure and how vSphere/VCF admins work with the operating system support teams.

Test a deployment with these parameters so you know it works. Yes, this will be your third or fourth test deployment. Spending time getting it right means time saved later. Linux boxes are really small, too, so they provision quickly. Quit complaining. 🙂

Keep in mind that when you set “deployPkg enable-customization” to false you will not be able to re-customize the VM without setting it back to true. This is good for security, but might jam you up later. Don’t set that on your templates.

Template Storage Optimization

If you are using storage that honors UNMAP and TRIM commands, like VMware vSAN, you can make your template “thin” on disk anytime you want. That saves space and keeps costs low. If you aren’t using vSAN check with your storage vendor for support. Using this approach we’d fill up the VM with zeroes, delete them, and run fstrim to tell the storage to discard all of what we just deleted.

You may also be able to use Storage vMotion to “re-thin” a VM, too. Storage vMotion, when moving a VM between two different arrays (like from a fibre channel array to a local datastore and back), will remove the zeroes from the storage if you choose thin provisioning. So, if we write zeroes to all the free space on the VM the VM will temporarily become its full size, but then you vMotion it, and it’s back to being tiny. Note that this doesn’t work if you vMotion between two datastores on the same array. If the array supports VAAI (and they all should at this point) it’ll do the copy on the array itself, and won’t actually zero anything out. I keep a single host with a local datastore to use for this sort of thing.

The “zero-out.sh” script in my GitHub repository is a sample of how to do this. It has comments so you can understand what it’s doing.

Delete your snapshots first, or you will get huge snapshots during this process! Make sure you are able to deploy the VM correctly before you delete your snapshots, too.

Final VM Template Preparation

Once this is all done, and you know that your deployments work correctly:

Delete any remaining snapshots
Ensure the VM template is on the correct datastore, if you vMotioned it to make it thin again
Ensure the VM template has the correct VM name in vSphere/VCF.
Add it to your backup system so you can recover it if something bad happens
Add it to your DR replications so you have a copy if something even worse happens
Import it into a Content Library to replicate to all your other vSphere/VCF instances
Change your workflows in vRealize/Aria/VCF Automation system to use the new version

If your Infosec people use a tool like Aide or Tripwire to do forensic analysis this may be a good time to run it and store the results as a baseline. Run it from a cloned copy, not from the template. You never want to pollute the template with extra stuff; all extra stuff should be pushed from your configuration management system or at customization time.

This also goes for troubleshooting. If you do have to troubleshoot something, once you figure out what the problem was you should go back to a clean snapshot or copy and fix it. You don’t want any garbage from troubleshooting sitting around on the template.

Ongoing Template Support

Periodically, Linux distributions have a “point” release where there are hundreds of updates. This is a good time to clone the template and update it. How I do that is:

All my templates are named like TEMPLATE-LINUX-ROCKY-93, so I clone it to TEMPLATE-LINUX-ROCKY-94 (or whatever the target version is). I do not customize the clone, so that the UUID and IP address of the network connection stay the same.
Start the new VM. If you are like me and give your templates static IPs it should just come up on the network.
SSH into the template. Update it via yum/dnf/apt/whatever. I also update other tools, like Puppet, if needed.
Reboot so we’re using the new kernel and can delete the old one.
vmprep.sh
zero-out.sh
shutdown -h now

Then I try a deployment. If the deployment fails I troubleshoot it, then take a clean copy and start over with the fix. When everything works I follow the steps in “Final VM Template Preparation” above. I keep a copy of the old template around for a while in case someone needs it. If your backup/DR system is licensed per-VM you could export it as an OVA, too, or just stop the backup and un-license that VM, letting it age out in the backup repository.

If you’ve set your environment up to have static IPs for the templates and test clone VMs, and have all the right firewall rules, this will take you less than an hour. You can do all your templates in parallel, too. I used to do ~10 of them in a morning once a quarter.

Every time you turn the template on you need to re-run vmprep.sh! And yes, every time you do that it’ll erase the SSH keys, which will make SSH complain. Try using “ssh-keygen -R” to remove the old keys.

Thank You & Feedback

Thanks for sticking with this guide to the end. I’d appreciate it if you’d take a second to share it online with your friends.

On occasion, things I say or write have been wrong, had typographical errors, or had things change that aren’t reflected in the document. Should you encounter such a circumstance feel free to reach out on Twitter or something and I’ll see about fixing it.

Dedication

This page is dedicated to the memory of Michael White. He and I had a lot of the information about VM template building on our blogs in the early VMware days. I’m sure that, wherever he is now, he’s got better things going on than enabling LVM discards. 🙂 I hope I run into him again some day.