Leaderboard result

247.1tok/sCanonical score
ClassB

Public class

Placed #9 / 27 globally under the canonical benchmark profile.

Best used as a baseline before larger local AI targets.

AMD Radeon(TM) Graphics
ModelTinyLlama 1.1B (Q4_K_M)
QuantQ4_K_M
DateApr 30, 2026
Workload & profileMeasured benchmark configuration
ModelTinyLlama 1.1B (Q4_K_M)
MethodQ4_K_M
ProfileCanonical profile
Recommended workloadsCurrent envelope is still tight
Base inference only

Use the sections below if you want the full fit matrix and planner-backed next step.

Details & explanationContext and scoring rules
Run meaningOfficial scored run

This run sets an official score, class, and public ranking under the canonical benchmark profile.

Trust & system contextVerified GPU (Leaderboard Eligible)

This is the canonical number the leaderboard uses for this run.

vulkan backendDiscrete GPU memoryFirst limit: Memory headroom

Step 01

Measured output

The official score plus the few supporting numbers that explain how this run behaved.

Decode ThroughputThe official decoded token rate from the scored pass. This is the canonical number the leaderboard uses.Scored pass
247.1tok/s

This is the canonical number the leaderboard uses for this run.

ProfileTinyLlama 1.1B (Q4_K_M)
Standing#9 / 27 global
Prefill throughputHow fast the model processes the prompt before it starts generating tokens. Higher is better because setup latency drops.
11232.9 tok/s
TTFT p50Time to first token at the 50th percentile. Lower is better because the first visible response arrives sooner.
5 ms
Effective model memoryThe practical memory footprint seen by the benchmark for this run. This is the memory number that matters most for fit planning.
0.0 GB
Scored pass durationLength of the strict pass that produced the official score shown on this page.
4s
Stability rangeVariation across strict passes. Smaller ranges mean the run was more stable and repeatable.Decode / TTFT
0.32% / 6.47%

Step 02

What this run supports

This is the plain-language support map for the current build: what runs well now, what is close, and what still needs bigger hardware.

Ready nowNo strong lane yet

This workload still needs more memory or speed before it becomes practical.

Base inference only
Closest stretchNothing is one small step away

The current build already covers the best near-term lanes, so there is no obvious single stretch target.

No clear stretch lane
Still outside envelopeCoding · General chat · Math / reasoning · Embeddings

These workloads still need more memory or speed than this machine can comfortably provide.

CodingGeneral chatMath / reasoningEmbeddings

Using the local model catalog for this guidance.

CodingNot a good fit yet
No strong fits in the tracked set yet
Works well nowCoding still needs more hardware

This workload still needs more memory or speed before it becomes practical.

What more headroom changesA targeted upgrade opens more options

The clearest next unlocks are Granite Code 3B · StarCoder2 3B · Qwen3 4B Instruct.

Main limit under reviewBased on local model catalog
General chatNot a good fit yet
No strong fits in the tracked set yet
Works well nowGeneral chat still needs more hardware

This workload still needs more memory or speed before it becomes practical.

What more headroom changesA targeted upgrade opens more options

The clearest next unlocks are Llama 3.2 1B Instruct · TinyLlama 1.1B · Falcon3 1B Instruct.

Main limit under reviewBased on local model catalog
Math / reasoningNot a good fit yet
No strong fits in the tracked set yet
Works well nowMath / reasoning still needs more hardware

This workload still needs more memory or speed before it becomes practical.

What more headroom changesA targeted upgrade opens more options

The clearest next unlocks are Llama 3.2 1B Instruct · TinyLlama 1.1B · Falcon3 1B Instruct.

Main limit under reviewBased on local model catalog
EmbeddingsNot a good fit yet
No strong fits in the tracked set yet
Works well nowEmbeddings still needs more hardware

This workload still needs more memory or speed before it becomes practical.

What more headroom changesA targeted upgrade opens more options

The clearest next unlocks are all-minilm · Nomic Embed Text v1.5 · mxbai-embed-large.

Main limit under reviewBased on local model catalog

Step 03

Primary constraint

This is the clearest limit in the current build, how we know it, and why the recommendation points there.

Primary constraintMemory is the first thing this build runs out of

This run had 0.0 GB of effective model memory. That is enough for smaller local AI workloads, but larger coding and reasoning models run out of room before raw speed becomes the problem.

What hits first0.0 GB effective model memory

Bigger coding and reasoning families run out of room before speed becomes the main issue.

What still worksThis build is best used for smaller local AI targets

Smaller local AI workloads are already comfortable on this build today.

Why we recommend thisPrioritize more memory headroom

That is the fastest way to move from smaller models to larger ones without changing the whole character of the build.

Why this recommendationMore memory headroom unlocks the next tier of models

That is why the recommendation points to more memory headroom first. It unlocks larger model families sooner than chasing speed alone.

Coding · General chat · Math / reasoning · EmbeddingsStep 04 is still rebuilding the ladder from the measured envelope of this run.
Trust and system context

Compact evidence behind why this run should be believed and how the machine was identified.

Execution profileVerified GPU benchmark
Validation statusVerified
Run qualityStable
Methodologyv1.1
GPUAMD Radeon(TM) Graphics
CPUAMD Ryzen 5 7640HS w/ Radeon 760M Graphics
VRAM / RAM0.4 / 31.3 GB
OSWindows 11 24H2 (Build 26200)
Client version0.1.0
DriverNot reported

Optional depth

Planner, catalog, and audit trail

The report above stays focused on the outcome. Open this workspace when you want to compare model families, browse the tracked catalog, or inspect the pass evidence behind the score.

Planner snapshotCurrent envelope is the clearest signal

0% of tracked coding fits clear on this machine today.

Coding · 0% clear now · Based on local model catalog
Open this only when you need the deeper planning layer

0% of tracked coding fits clear on this machine today.