Artificial Intelligence (AI) - VHT Forum

What AI tools are you running in your home lab?

Brandon Lee — Sun, 02 Nov 2025 23:00:15 +0000

Hey all! Just wanted to start a good thread here on what AI tools you are running in your home lab. Here is my shortlist:

Ollama
OpenWebUI
AnythingLLM
WhispherX
Stable Diffusion
ComfyUI

Extending the conversations from my last video to this forum thread:

https://youtu.be/illvibK_ZmY

I'm running some in local Docker Desktop and also Proxmox LXC container with GPU passed through. What about you?

Docker launches MCP Catalog for secure MCP server discovery and usage

Brandon Lee — Thu, 25 Sep 2025 15:44:03 +0000

Just saw this pop up on the Docker blog and figured it was worth sharing. Docker just launched something called the MCP Catalog, and it’s basically a secure way to search out and run Model Context Protocol (MCP) servers. So, think of it like a Docker registry for MCP servers in a sense.

For those who haven’t been following the MCP space, it’s an open standard that lets tools like AI models, dev tools, or other services talk to each other. Instead of hacking together random APIs and hoping permissions are locked down, MCP force secure communication and sandboxing. Although there has been plenty of news lately about the insecurities or vulnerability potential for MCP servers.

Now Docker is getting in on it with this catalog that works almost like Docker Hub. You can browse servers, pull them down, and install them into your environment without worrying about sketchy configs or whether it was packaged correctly. The whole point seems to be security and trust. So you know what you’re running hasn’t been tampered with.

Here is a link directly to the MCP catalog from Docker: Docker Hub

A couple of highlights from their official blog:

Secure discovery - Docker’s catalog gives you a place to find trusted MCP servers, so you’re not just hunting random repos
Easy to run - Once you find a server, running it is as simple as pulling a container. You don't have to mess around with setup scripts
Future potential - With MCP use growing this feels like Docker making itself an “app store” if you will for MCP servers

Personally, I think this is a step in the right direction. If MCP takes off the way people think it will, having Docker as the front to deploy these servers makes a lot of sense. It also gives home labbers and devs an easy way to experiment with AI locally without diving too deep into complicated setups.

I'm curious what you all think. Is this going to be the start of an MCP ecosystem in Docker the same way Docker Hub made container adoption snowball?

Link for anyone that wants to read the news straight from Docker:
https://www.docker.com/blog/docker-mcp-catalog-secure-way-to-discover-and-run-mcp-servers/

Ollama adds new memory management and scheduling improvements to the latest release

Brandon Lee — Wed, 24 Sep 2025 16:42:28 +0000

Just saw the latest Ollama update and thought it was worth sharing with the community. In this latest release, they have reworked how model scheduling and memory management works. It looks like it will be a step in the right direction for stability and performance.

734

Now, instead of estimating memory needs like it did before, Ollama now measures exactly how much memory a model needs before it runs it. That means:

Way fewer crashes from out-of-memory errors
Better GPU utilization (more memory goes where it’s needed, which boosts token speeds)
Smarter scheduling for multiple GPUs, even if they’re mismatched
Reporting in nvidia-smi now lines up with ollama ps, so the tracking usage there makes sense

A couple of example benchmarks they shared seem to show the difference:

gemma3:12b on a single RTX 4090 (128k context)

Old: 52 tokens/s → New: 85 tokens/s
Old: 48/49 layers on GPU → New: 49/49 layers on GPU

mistral-small3.2 with 2× RTX 4090s (32k context)

Old: 127 tokens/s prompt eval → New: 1380 tokens/s
Old: 43 tokens/s generation → New: 55 tokens/s

These enhancements are not included in all models as of yet. But the following ones have been enhanced: gpt-oss, llama4, gemma3, qwen3, mistral-small3.2, and the all-minilm embeddings. They mention that more are on the way.

From the numbers they have posted, this looks like a pretty huge leap forward, especially for anyone running long contexts or trying to squeeze performance out of multiple GPUs. You can read more about it here: New model scheduling · Ollama Blog

Anyone here tried this update yet? Curious if you’re seeing the same gains they’ve shown in the benchmarks.

What is Docker Model Runner for Local AI Models?

Brandon Lee — Thu, 18 Sep 2025 22:32:13 +0000

If you have been following the news from Docker, they have introduced something called Docker Model Runner. I think this is a pretty big deal if you care about running AI locally in either a home lab or a corporate environment. Docker has been making a lot of strides along these lines and is incorporating some really cool tools in Docker Desktop that can be used for AI purposes.

At a high level, Docker Model Runner (DMR), gives you a way to pull, run, and distribute large language models (LLMs) using the same Docker tools and probably workflows that you already know using Docker. Instead of learning an entirely new ecosystem or setting up complex Python environments, you can treat models like OCI artifacts. You can then run them much like containers.

730

For home lab environments, I think this helps to lower the barrier to experimenting with AI in several ways. You can pull a model from Docker Hub or HuggingFace and be consuming an OpenAI-compatible API in just a couple of minutes. You can do this with just a simple command, which is what has popularized Docker in the first place. It comes with hardware acceleration support, whether you are on Apple Silicon, or if you have an NVIDIA GPU on Windows, or even ARM and Qualcomm devices.

It means older gaming rigs or modern mini PCs with decent GPUs instantly become useful for AI workloads. And let's face it, we all probably have these types of machines around the home lab that we are looking for a role they can fill. Since everything runs inside Docker, you are not adding bloat to your operating system with dependencies, and you can spin up and tear down models in a way that doesn't break anything. On top of that, Docker Hub already has a curated catalog of models packaged as OCI artifacts. With this being the case, you don't have to sort through repositories trying to figure out which ones are maintained and compatible.

I think this will be of value in the enterprise as well. The first is security and compliance. DMR runs inside the existing Docker infrastructure. So you don't need a totally new tool, just use docker. It respects guardrails like Registry Access Management and fits into enterprise policies that most already have in place any way. Treating models as OCI artifacts means you can store, scan, and distribute them through the same registries you already use.

DMR also integrates with tools developers are already using, like Docker Compose, Testcontainers, and CI/CD pipelines. This will allow models to be wired into workflows without coming up with something totally new. Debugging and observability are also baked in, with tracing and inspect features that make it easier to understand token usage and framework behaviors. For enterprises tracking costs and performance the visibility they will get with this tool and framework will be very helpful. Looking ahead, Docker has made it clear that support for new inference engines like vLLM or MLX is coming. So teams are not locked into llama.cpp as the only option.

What is the big picture here? It is impossible to know the direction of AI and local AI models. However, we can see that Docker can be very helpful in solving many of the problems that AI models have today. Things like packaging, distribution, and compatibility. This is what Docker is purpose-built for. This will create a standardized approach using OCI formats, which could make local AI far more accessible because it fits into the workflows developers already know. For home labbers it means frictionless experimentation, while for enterprises. I think this will mean a consistent way to let teams start building with AI.

I am curious if anyone has already tested the Beta or GA release of Docker Model Runner? How does performance compare to running llama.cpp or Ollama directly if you have used it?

Upgrading Ollama to the latest version in Linux

Brandon Lee — Sun, 17 Aug 2025 18:53:34 +0000

If you are running Ollama in an LXC container in Proxmox or Linux in general, how do you update it? Well, the process as it turns out is the same as installing it. You can just simply run the command below to perform the upgrade as well.

curl https://ollama.ai/install.sh | sh

After running the command it will overwrite the existing version of Ollama you have installed with the latest.

Adding Terraform MCP server to Claude Desktop

Brandon Lee — Sat, 19 Jul 2025 14:23:38 +0000

I have been experimenting with a lot of MCP servers lately. How about you? One of the interesting ones I wanted to add was the Terraform MCP server since I work with a lot of Terraform code. You can easily add this to your Claude Desktop environment with just a simple file edit.

However, you will need to have Docker installed and running locally. In Windows I am using Docker Desktop to spin up the local Docker container. Just make sure you have Docker running and healthy before working with the Terraform MCP server.

In Windows, navigate to "c:\users\\AppData\Roaming\Claude\claude_desktop_config.json" and edit this file. Add the following code to the file and save it.

{
  "mcpServers": {
    "terraform": {
      "command": "docker",
      "args": 
    }
  }
}

Exit out of Claude desktop and make sure you exit out in the task tray as well as it will keep running in the background by default. Now, relaunch Claude Desktop.

In Docker Desktop, you should see the following container launch:

703

Also, if you navigate to Claude > File > Settings > Developer you will see the following:

704

Microsoft Phi-4 Download and Run a Self-hosted Small Language Model Locally

Brandon Lee — Thu, 09 Jan 2025 15:37:23 +0000

If you are like me, are you tinkering around with self-hosted small language models and running these locally? I noticed that Microsoft open-sourced their Phi-4 language model. You can get this up and running fairly quickly with LM Studio.

Just in case you didn't know LM Studio is a free tool that provides a really nice GUI to work with local language models and allow you to easily download these and run them on your own hardware.

You can download LM studio here: LM Studio - Discover, download, and run local LLMs

Once you have it downloaded, you can search for new models and when I pulled it up, the phi-4 model was actually listed at the top of the list (probably because it is new). However, if you don't see it there, you can search for it.

422

It is a little hefty at 9 GB download, but it is still amazing to me the power of these models even at 9 GB in size, that is not that large for the training data you get. Once you get the model downloaded, you will see the button to the side to Load Model.

423

This will automatically load the model into LM Studio. You can still do this if you don't do this here. When you go to your Chat screen, it will give you the option to load a model, where you can select the phi-4 model. Once the model is loaded, you can simply navigate to your chat screen on the left that looks like a "text" icon and start chatting (if you have already loaded the phi-4 model), otherwise, load the model and then start your chat! So far it looks pretty cool and powerful.

You can take a look at the hugging face download here: