AMD Ryzen AI Max+ 395 128GB Local LLM Benchmarks

Size / Speed Frontier

Model size on x, generation speed on y, bubble area by VRAM. This is the main “what actually wins here?” chart.

Comparison context

Prompt-processing retention versus each model’s own 4096 baseline. Brighter is more stable as context grows.

VRAM, system RAM, and GTT at the active context, sorted by VRAM pressure.

Raw throughput view at the active context. Symbol size tracks model size, not memory.

Prompt-processing throughput across all tested contexts, one line per model family entry.

Setup and caveats

Machine details, runtime settings, and the benchmark notes that meaningfully affect interpretation.

Gemma 4 required a newer local Vulkan-enabled build because the older system build lacked gemma4 support.
Gemma 4 31B Dense was manually stopped at 32768, so it is excluded from the charted active comparison points beyond measured contexts.
Qwen3-Coder-Next is plotted as 80B-class on the size axis to reflect the benchmark notes in this repo.

Raw result matrix

Every benchmark point used in the curated charts, including the larger long-context runs.

Date	Model	Family	Size	Context	PP t/s	TG t/s	VRAM	RAM	GTT