Leaderboard result

152.7tok/sCanonical score
ClassC

Public class

Placed #23 / 37 globally under the canonical benchmark profile.

Best fit today: Coding · General chat · Math / reasoning · Embeddings.

NVIDIA GeForce GTX 1080
ModelTinyLlama 1.1B (Q4_K_M)
QuantQ4_K_M
DateMar 29, 2026
Workload & profileMeasured benchmark configuration
ModelTinyLlama 1.1B (Q4_K_M)
MethodQ4_K_M
ProfileCanonical profile
Recommended workloadsBest fit today
Coding
General chat
Math / reasoning
Embeddings

Use the sections below if you want the full fit matrix and planner-backed next step.

Details & explanationContext and scoring rules
Run meaningOfficial scored run

This run sets an official score, class, and public ranking under the canonical benchmark profile.

Trust & system contextVerified GPU (Leaderboard Eligible)

This is the canonical number the leaderboard uses for this run.

vulkan backendDiscrete GPU memoryFirst limit: Memory headroom

Step 01

Measured output

The official score plus the few supporting numbers that explain how this run behaved.

vs previous-45.9%
Decode ThroughputThe official decoded token rate from the scored pass. This is the canonical number the leaderboard uses.Scored pass
152.7tok/s

This is the canonical number the leaderboard uses for this run.

ProfileTinyLlama 1.1B (Q4_K_M)
Standing#23 / 37 global
Prefill throughputHow fast the model processes the prompt before it starts generating tokens. Higher is better because setup latency drops.
2353.2 tok/s
TTFT p50Time to first token at the 50th percentile. Lower is better because the first visible response arrives sooner.
336 ms
Effective model memoryThe practical memory footprint seen by the benchmark for this run. This is the memory number that matters most for fit planning.
6.4 GB
Scored pass durationLength of the strict pass that produced the official score shown on this page.
8s
Stability rangeVariation across strict passes. Smaller ranges mean the run was more stable and repeatable.Decode / TTFT
0.49% / 0.32%

Step 02

What this run supports

This is the plain-language support map for the current build: what runs well now, what is close, and what still needs bigger hardware.

Ready now4 workload lanes run well

Coding · General chat · Math / reasoning · Embeddings are the clearest comfortable fits on this machine today.

CodingGeneral chatMath / reasoningEmbeddings
Closest stretchNothing is one small step away

The current build already covers the best near-term lanes, so there is no obvious single stretch target.

No clear stretch lane
Still outside envelopeNo lane is fully off-limits

Nothing here is completely out of reach at the workload-lane level.

No hard no lane

Using the local model catalog for this guidance.

CodingRuns well now
25% of tracked models fit now
Works well nowCoding works well on this build

Good starting points: Granite Code 3B · StarCoder2 3B · Qwen3 4B Instruct.

What more headroom changesAn upgrade adds breathing room

More headroom would make this lane feel safer, but it would not unlock a new class of models yet.

Main limit under reviewBased on local model catalog
General chatRuns well now
5% of tracked models fit now
Works well nowGeneral chat works well on this build

Good starting points: Llama 3.2 1B Instruct · TinyLlama 1.1B · Falcon3 1B Instruct.

What more headroom changesAn upgrade adds breathing room

More headroom would make this lane feel safer, but it would not unlock a new class of models yet.

Main limit under reviewBased on local model catalog
Math / reasoningRuns well now
4% of tracked models fit now
Works well nowMath / reasoning works well on this build

Good starting points: Llama 3.2 1B Instruct · TinyLlama 1.1B · Falcon3 1B Instruct.

What more headroom changesAn upgrade adds breathing room

More headroom would make this lane feel safer, but it would not unlock a new class of models yet.

Main limit under reviewBased on local model catalog
EmbeddingsRuns well now
100% of tracked models fit now
Works well nowEmbeddings works well on this build

Good starting points: all-minilm · Nomic Embed Text v1.5 · mxbai-embed-large.

What more headroom changesAn upgrade adds breathing room

More headroom would make this lane feel safer, but it would not unlock a new class of models yet.

Main limit under reviewBased on local model catalog

Step 03

Primary constraint

This is the clearest limit in the current build, how we know it, and why the recommendation points there.

Primary constraintMemory is the first thing this build runs out of

This run had 6.4 GB of effective model memory. That is enough for smaller local AI workloads, but larger coding and reasoning models run out of room before raw speed becomes the problem.

What hits first6.4 GB effective model memory

Bigger coding and reasoning families run out of room before speed becomes the main issue.

What still worksCoding · General chat · Math / reasoning · Embeddings

Smaller local AI workloads are already comfortable on this build today.

Why we recommend thisPrioritize more memory headroom

That is the fastest way to move from smaller models to larger ones without changing the whole character of the build.

Why this recommendationMore memory headroom unlocks the next tier of models

That is why the recommendation points to more memory headroom first. It unlocks larger model families sooner than chasing speed alone.

No workload lane is fully blockedThis is why Step 04 centers the NVIDIA ladder on Keep current GPU.
Trust and system context

Compact evidence behind why this run should be believed and how the machine was identified.

Execution profileVerified GPU benchmark
Validation statusVerified
Run qualityStable
Methodologyv1.1
GPUNVIDIA GeForce GTX 1080
CPUIntel(R) Core(TM) i7-8700K CPU @ 3.70GHz
VRAM / RAM7.9 / 63.9 GB
OSWindows 11 24H2 (Build 26300)
Client version0.1.0
DriverNot reported

Optional depth

Planner, catalog, and audit trail

The report above stays focused on the outcome. Open this workspace when you want to compare model families, browse the tracked catalog, or inspect the pass evidence behind the score.

Planner snapshot5 tracked coding fits clear now

25% of tracked coding fits clear on this machine today.

Coding · 25% clear now · Based on local model catalog
Open this only when you need the deeper planning layer

25% of tracked coding fits clear on this machine today.