Home ยป AI ยป Run Ollama with NVIDIA GPU in Proxmox VMs and LXC containers
AI

Run Ollama with NVIDIA GPU in Proxmox VMs and LXC containers

Learn how to run Ollama with an NVIDIA GPU in Proxmox for an enhanced AI experience in your home lab and great chat performance

I have been having tons of fun working with local LLMs in the home lab the last few days and I wanted to share a few steps and tweaks having to do with how to run Ollama with NVIDIA GPU in Proxmox in your VMs, LXC, and then exposing this to clients on your network. GPU processing makes the AI experience SO much better than without. However, you don’t have to have the latest and greatest. I will show you that with even something as old as an NVIDIA RTX 1060, you can have a pleasant experience running models in the home lab.

Why Run Ollama with GPU in Proxmox?

Proxmox is an awesome home lab hypervisor and production for that matter. I definitely recommend virtualization for those looking to get into a home lab, even with the added complexity on the frontend. Hypervisors make learning easier with all the tools you have available with them, like snapshots.

I wish I could say Proxmox was super easy to do GPU passthrough for. However, the steps are definitely convoluted to do it. I think the developers could definitely make it easier to do GPU passthrough than it is currently. Hopefully, this is something that is being worked on.

Once you enable GPU passthrough though, it is easy to pass these PCI devices to your virtual machines, or LXC containers.

Ollama has support for GPU acceleration using CUDA. When you have GPU available, the processing of the LLM chats are offloaded to your GPU. This way, you can run high-performance LLM inference locally and not need a cloud service.

This setup is really good if you are:

  • Using a headless VM for AI model serving
  • Running other services in LXC containers (like OpenWebUI)
  • Looking to serve models over your local network for your home lab

Let’s take a look at setting up GPU passthrough in Proxmox.

1. Set Up GPU Passthrough in Proxmox

First things first, you will need to enable IOMMU in your BIOS. Every BIOS is different, so consult with your individual motherboard manufacturer for turning this setting on. You will also need to enable IOMMU in Proxmox:

##Edit this file
nano /etc/default/grub

##Add or alter this line to include the below
GRUB_CMDLINE_LINUX_DEFAULT="quiet iommu=on iommu=pt video=efifb:off video=vesafb:off"

##Run these commands
update-grub
reboot

You should be able to grep for IOMMU configuration status, with the command (below is for an AMD CPU):

dmesg | grep -e IOMMU -e AMD-Vi
checking the status of IOMMU in Proxmox
checking the status of IOMMU in Proxmox

First, identify your GPU and its audio function using:

lspci -nn | grep -i nvidia
Grepping for your devices
Grepping for your devices

You should see entries like:

01:00.0 VGA compatible controller [10de:1c03]
01:00.1 Audio device [10de:10f1]

Next, tell vfio-pci to claim these IDs

nano /etc/modprobe.d/vfio.conf

options vfio-pci ids=10de:1c03,10de:10f1 disable_vga=1

You will now need to blacklist NVIDIA drivers to keep the Proxmox host from using them:

cat <<EOF > /etc/modprobe.d/blacklist-nvidia.conf
blacklist nouveau
blacklist nvidia
blacklist nvidiafb
blacklist rivafb
EOF

Now, we need to add vfio modules to /etc/modules:

echo -e "vfio\nvfio_iommu_type1\nvfio_pci\nvfio_virqfd" >> /etc/modules

update-initramfs -u -k all

reboot
More proxmox setup for gpu passthrough
More proxmox setup for gpu passthrough

After the reboot, you can confirm vfio-pci is Now Active for the GPU (below is just examples of what you might see):

##video portion
lspci -nnk | grep -iA 3 '10de:1c03'

##audio portion
lspci -nnk | grep -iA 3 '10de:10f1'

2. Add the PCI passthrough device to a Proxmox VM

In the Proxmox web UI, add 01:00.0 and 01:00.1 to your VM using PCI Passthrough:

  • Select “Raw Device”
  • Check “All Functions” (this includes both GPU and audio)
  • Enable “PCI-Express”
  • Do not check “Primary GPU” if you’re not using this VM for display output

Many of the guides out there will say you need to use the OVMF (UEFI BIOS). However, I didn’t find this to be the case, if I wasn’t using the GPU for the main display adapter for the VM. So, with AI, the GPU is fully dedicated to the AI token processing, so not used for display. Do keep this in mind and test in your environment.

Below is a look at the Add menu > PCI Device.

Adding pci device to a virtual machine
Adding pci device to a virtual machine

Here I am selecting the Raw Device radio button, then clicking the Device drop down. You should see your GPU listed. Don’t select Primary GPU.

Selecting the pci device for your nvidia gpu
Selecting the pci device for your nvidia gpu

After adding the PCI device. Also, note that if you select the All functions checkbox, you don’t have to also select the audio device that goes along with the GPU.

Gpu device added to proxmox
Gpu device added to proxmox

3. Install NVIDIA Drivers Inside the VM

If you have passed through the NVIDIA GPU into your Proxmox virtual machine, you will need to install the driver as you would normally. This will allow the VM to have full access to the latest driver for your card.

In Debian Linux/Ubuntu, you can easily install the latest driver using the command structure below.

sudo apt update
sudo apt install -y nvidia-driver-570
sudo reboot

After reboot, run:

nvidia-smi

You should see your GPU listed, indicating the driver is correctly installed.

4. Install Ollama

Install Ollama via the official script:

curl -fsSL https://ollama.com/install.sh | sh

This will:

  • Download the latest .deb package
  • Install Ollama
  • Start it as a systemd service

You can now test it locally:

ollama run llama3

You can also install it in Docker: Self-Hosting LLMs with Docker and Proxmox: How to Run Your Own GPT

5. Expose the Ollama API to LAN Clients

One thing you will want to correct and change most likely, is Ollama only listens on 127.0.0.1:11434 by default. To expose it to the LAN, you need to set the OLLAMA_HOST environment variable in its systemd service.

Create a systemd override

Run:

##Edit the ollama service file
sudo vi /etc/systemd/system/ollama.service

#Make sure your Service configuration includes the following
[Service]
Environment="OLLAMA_HOST=0.0.0.0:11434"

##Run commands
sudo systemctl daemon-reload
sudo systemctl restart ollama

Confirm it worked:

ss -tulnp | grep 11434

You should now see something similar to the following:

tcp LISTEN 0 4096 0.0.0.0:11434

6. Allow Network Access (UFW or LAN Firewall)

If you’re using UFW in your Ubuntu installation, you can allow the Ollama API port using the following:

sudo ufw allow 11434/tcp

Or restrict access to a subnet:

sudo ufw allow from 10.3.33.0/24 to any port 11434 proto tcp

7. Use OpenWebUI in a container

How I like to run the OpenWebUI container is in an LXC container. One of the easiest ways to run OpenWebUI is to spin up the LXC container and then install Docker inside LXC. Docker makes spinning up the OpenWebUI interface super easy.

To easily spin up an OpenWebUI container, run the following Docker run command. Replace your ollama host URL with the address of your Ollama instance:

##OpenWebUI
docker run -d -p 3000:8080 --gpus all --add-host=host.docker.internal:host-gateway -e OLLAMA_BASE_URL=http://<your ollama host URL>:11434 -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:cuda

Access OpenWebUI in your browser via the container’s IP on port 3000.

Wrapping up

This setup gives you a powerful, GPU-accelerated Ollama backend running in a Proxmox VM, fully accessible to any client on your local network, including LXC containers. Whether you’re using OpenWebUI, building your own AI tools, or just experimenting with models like LLaMA 3, this gives you control and flexibility โ€” all on your own infrastructure.

With this guide, youโ€™re now able to:

  • Run Ollama with full GPU acceleration
  • Serve LLMs to any device in your lab
  • Integrate with frontends like OpenWebUI or custom scripts

Let me know if you’d like to expand this with authentication, reverse proxies, or remote TLS access!


Subscribe to VirtualizationHowto via Email ๐Ÿ””

Enter your email address to subscribe to this blog and receive notifications of new posts by email.



Brandon Lee

Brandon Lee is the Senior Writer, Engineer and owner at Virtualizationhowto.com, and a 7-time VMware vExpert, with over two decades of experience in Information Technology. Having worked for numerous Fortune 500 companies as well as in various industries, He has extensive experience in various IT segments and is a strong advocate for open source technologies. Brandon holds many industry certifications, loves the outdoors and spending time with family. Also, he goes through the effort of testing and troubleshooting issues, so you don't have to.

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.