Methodology

The public benchmark policy behind verified VRAM Check results.

This page explains the shared benchmark profile, score construction, trust states, class system, and leaderboard eligibility rules so users can understand exactly what a result means before they compare or share it.

Canonical profileTinyLlama 1.1B Q4_K_M

The current public benchmark profile is intentionally fixed so verified runs can be compared consistently.

Verification pathOfficial release, published checksum, local verification

Official releases can be verified locally with vramcheck verify-release --file .\vramcheck-windows.exe.

Score flow

How the public score is constructed.

A VRAM Check score is not a synthetic composite. It is a real inference result governed by a canonical profile and a public trust model.

Canonical profile

VRAM Check uses one fixed TinyLlama 1.1B Q4_K_M profile so verified runs stay comparable across systems and over time.

Primary score

Decode throughput is the canonical score. Prefill throughput and TTFT remain visible because they describe different parts of the inference experience.

Strict pass selection

The benchmark runs multiple passes and selects the representative pass closest to the multi-metric median.

Stability gating

Decode, prefill, and TTFT variance are measured so unstable or noisy runs do not enter the competitive board as if they were equally trustworthy.

Metrics

What each visible metric is trying to tell you.

Decode throughput

The primary leaderboard score. It reflects sustained token generation speed after the model is already running.

Prefill throughput

How quickly the model processes prompt tokens before generation begins. This is critical for long prompts and larger contexts.

TTFT p50

Median time-to-first-token across the canonical prompts. Lower is better because it reflects interaction latency, not just throughput.

Session total vs canonical pass time

Total session time includes strict passes, warmup, and setup. Canonical pass time is only the selected scored pass used in the result.

Trust states

Why not every useful run becomes a competitive rank.

Verified GPU

Acceleration is active, the canonical profile ran, and the run carries the verification metadata expected by the backend.

Verified provisional

The run is real and verified but did not satisfy the active leaderboard gate, usually because of stability or eligibility policy.

Compatibility

The run is useful for practical guidance but does not represent an officially ranked accelerated benchmark.

Estimated

A fast exploratory fit result that never enters the leaderboard and should never be interpreted as verified throughput.

Leaderboard eligibility

What a run must satisfy before it can claim competitive position.

  • Runtime acceleration must be active and the execution profile must be verified GPU.
  • The canonical profile must complete successfully with the expected verification metadata.
  • Strict stability checks must stay within the active decode, prefill, and TTFT policy thresholds.
  • The backend must classify the run as eligible, not provisional or compatibility.

Class system

Class describes capability. Rank describes field position.

Class

Class is the absolute capability band of the machine under the active policy.

Current public scale: S, A,B, C, D,E.

Rank

Rank is competitive context inside the active leaderboard pool or board segment.

A machine can have a modest class and still place well in a narrow field, or a high class and a weaker competitive position in a crowded frontier board.

Verification path

What users can validate locally before they trust the binary.

Download official release -> compare published .sha256 -> run `vramcheck verify-release --file .\vramcheck-windows.exe` -> benchmark

The public supported installation path is the official binary release. npm-style global installers are not offered yet because the product still prioritizes verified binary distribution and release clarity over package-manager surface area.