Finding the host that locks the VM

We’ve been experiencing some issues with file locks lately caused by VMotions gone wrong and VM’s that where somehow deleted while still running (?).
So here is a little manual on how to troubleshoot this.

First off, enable SSH on a host within the cluster. It doesn’t matter which one, but if you have a suspect that might be holding the lock, i would recommend picking that one.
Then logon to the host (yes, using SSH) and cd to the VM folder (cd /vmfs/volumes/<datastore name>/<vm name>)

Now lets find out who is holding the lock

vmkfstools -D <vmname>.vmx

The “-D” parameter is one of the hidden parameters, used to reveal metadata.
The output will be something like this

Lock [type 10c00001 offset 131880960 v 79, hb offset 3571712
gen 333, mode 1, owner 5873bf49-950f200a-23bc-0017a4776430 mtime 564987
num 0 gblnum 0 gblgen 0 gblbrk 0]
Addr <4, 280, 11>, gen 12, links 1, type reg, flags 0, uid 0, gid 0, mode 100755
len 6017, nb 1 tbz 0, cow 0, newSinceEpoch 1, zla 2, bs 8192

Now the really interesting part is the highlighted part of the second line from the top

gen 333, mode 1, owner 5873bf49-950f200a-23bc-0017a4776430 mtime 564987

Now this matches one of the VMNIC’s of the vSphere host holding the lock. Now use an rvtools export or check all hosts to find the culprit.

When you found the host, enable SSH on it and logon as root.
Now run the following command to find the process locking the files

esxcli vm process list

Scroll through the output until you find the VM, then note the WorldID value

Then kill that process

esxcli vm process kill --type soft --world-id <worldID>

And all is well again 🙂

Leave a Reply

Your email address will not be published. Required fields are marked *