Ollama adds new memory management and scheduling improvements to the latest release

Last Post

RSS

Posts: 611

Brandon Lee

Admin

Topic starter

Sep 24, 2025 10:42 am

(@brandon-lee)

Member

Joined: 15 years ago

Just saw the latest Ollama update and thought it was worth sharing with the community. In this latest release, they have reworked how model scheduling and memory management works. It looks like it will be a step in the right direction for stability and performance.

ollama new memory management and scheduling improvements

Now, instead of estimating memory needs like it did before, Ollama now measures exactly how much memory a model needs before it runs it. That means:

Way fewer crashes from out-of-memory errors
Better GPU utilization (more memory goes where it’s needed, which boosts token speeds)
Smarter scheduling for multiple GPUs, even if they’re mismatched
Reporting in nvidia-smi now lines up with ollama ps, so the tracking usage there makes sense

A couple of example benchmarks they shared seem to show the difference:

gemma3:12b on a single RTX 4090 (128k context)

Old: 52 tokens/s → New: 85 tokens/s
Old: 48/49 layers on GPU → New: 49/49 layers on GPU

mistral-small3.2 with 2× RTX 4090s (32k context)

Old: 127 tokens/s prompt eval → New: 1380 tokens/s
Old: 43 tokens/s generation → New: 55 tokens/s

These enhancements are not included in all models as of yet. But the following ones have been enhanced: gpt-oss, llama4, gemma3, qwen3, mistral-small3.2, and the all-minilm embeddings. They mention that more are on the way.

From the numbers they have posted, this looks like a pretty huge leap forward, especially for anyone running long contexts or trying to squeeze performance out of multiple GPUs. You can read more about it here: New model scheduling · Ollama Blog

Anyone here tried this update yet? Curious if you’re seeing the same gains they’ve shown in the benchmarks.

Topic Tags

Ollama adds new memory management and scheduling improvements to the latest release

Like this:

Proxmox Server Build Components

Gaming/Proxmox Build 2025

Ollama adds new memory management and scheduling improvements to the latest release

Like this:

Sign In or Register