Fixing Non Persistant GPU Passthrough On ESXI
Table of Contents
📔 Intro #
Recently I have been experiencing some odd behaviour in my new whitebox ESXI server build. I had a GTX 1080 in the server passed through to an Ubuntu VM for GPU accelerated decoding/encoding with modified drivers to work around Nvidia’s ‘fuck you’ limitations.
This worked well, however, I managed to snag a pretty good deal on an RTX A2000 which I purchased and replaced the GTX 1080 with. Immediately I noticed a frustrating issue…
😤 The Problem #
Upon reboot the A2000 would become disabled for passthrough and the attached VM would not start because of this. I was able to enable the device manually everytime but this isn’t great for a remote server.
After some investigation, I discovered unusual behavior that I had never seen with ESXi before, the VMK was taking the GPU for the graphical management of the server on every boot, which was in turn disabling the passthrough.
Thankfully, the fix for this is rather simple.
🔧 Permanant Fix #
The solution is to log in via SSH to your host and execute the following command. Your GPU should no longer be taken by the VMK on a boot and passthrough will be persistant.
esxcli system settings kernel set -s vga -v FALSE
The GPU passthrough state should now persist across reboots of the host. To reverse this change, simply run the command again with TRUE
.
🚧 DCUI Access #
Losing GUI access to the server shouldn’t be a big deal after the inital setup. If you for some reason need to access this again, however, it is very easy to do so via SSH.
Simply login via SSH and exectute the following, you will then be presented with the GUI in the terminal, like this.
DCUI
🎊 Fin. #
I haven’t been quite able to work out why this behaviour occurs.
My current theory is that I just haven’t seen this before as other systems have had some form of video device, usually offered by the BMC/IPMI or CPU itself these days, neither of which my 5900X on an X570 board have.
This doesn’t explain why this wasn’t the case with the GTX 1080 on the same system, though.
I have recently heard of this exact issue from other people though so it may be a bug, but it may not.
Anyway, I hope this saves you from the hours of frustration I had with this. 😄