Machine Learning Research for CS2 Demo Review

NullCS studies suspicious behavior without turning the review process into a black box.

NullCS is a behavioral review project for Counter-Strike 2 demos. It ranks suspicious players from structured demo signals and returns evidence that can be reviewed, especially in the harder cases where subtle cheating and strong legitimate play start to look closer than they should.

Training Corpus

894 labeled matches spanning cheater, normal legit, and pro stress-test slices

Encounter Scale

281,792 encounter rows feed the current CNN and player-level ranking stack

Review Output

Ranked players, evidence, and reasons for review instead of verdict language

Current direction

Serious review for the loud cases and the subtle ones

Some demo metrics do scream that something is wrong. The harder problem is the quieter behavior that only starts to separate once timing, context, and process are modeled together.

Read the project background

What NullCS Is

Structured demo review built for the hard cases.

NullCS analyzes CS2 demos, builds behavior signals, and returns ranked review output that can still be explained under scrutiny.

The project studies whether suspicious behavior can be surfaced more reliably in real demos without flattening the problem into a single loud metric. Some cases are obvious. The harder ones are the lobbies where subtle assistance, strange timing, and strong legitimate play start to sit uncomfortably close together.

Training scale

894 matches

Cheater, normal, and pro slices in the current CS2 training stack.

Encounter scale

281,792 rows

One match expands into many encounter windows and control-path measurements.

Why this matters

The difficult cases are the ones where the evidence has to stay readable even when suspicious behavior and strong legitimate play start to overlap.

Structured inputs

Demos are normalized into comparable event and encounter data before any ranking is produced.

Control-path telemetry

The stack can follow usercmd-derived mouse behavior, how those inputs translate into view-angle movement, crosshair acceleration, aim collapse, and post-acquire settling.

Analyst-readable output

The goal is a useful review surface: ranked players, evidence, and context that stays interpretable under scrutiny.

Why It Exists

Built to separate the loud cases from the hard ones.

Obvious abuse is not the full problem. The harder review task is telling strong legitimate play apart from lower-visibility cheating without pretending one score can settle it.

Some metrics can scream cheat

Rage behavior, impossible-looking snaps, or repeated obvious abuse can light up a demo quickly. Those are real signals, but they are not the whole problem.

Strong legit can still look strange

High-ELO and pro players produce uncomfortable rounds too. A useful system has to stay quieter there than a noisy model would, or the output stops being actionable.

Subtle cheats are the real challenge

Aim assist, recoil assist, and information abuse often try to stay close enough to normal play to avoid obvious signatures. That is the gap NullCS is starting to break into.

Capabilities

What the current stack actually does.

The public version is meant to show the real system shape: demo input, behavior signals, ranked output, and evidence for review.

Demo input

Counter-Strike 2 demos are parsed into event, engagement, and encounter structure instead of being judged from clip-level moments or surface stats.

Behavior signals

The current stack builds hundreds of player-level signals plus deeper encounter timing channels from usercmd-style mouse deltas, view-angle response, aim collapse, angular jerk, and recoil-settling behavior.

Match-relative ranking

Models rank players inside a demo so standouts can be surfaced in context instead of pretending one metric can settle the case alone.

Evidence output

Scores are paired with reasons and benchmark context so the output can support review, especially when strong legitimate play and subtle cheats start to overlap.

Proof and Benchmarks

Benchmark views that show where the current system is actually separating.

These plots are there to show how suspicious slices behave against legit and pro baselines, not to decorate the page.

Suspicious median top-ranked signal

0.030

Normal legit median top-ranked signal

0.0031

Pro stress-test median top-ranked signal

0.0034

Suspicious top-3 retrieval

0.875

Granular example

NullCS can inspect control-path behavior deeply enough to see when a player looks too efficient, not just loud.

This benchmark example is built from encounter-level mouse and crosshair-process aggregates. Normal players tend to be coarser and noisier, pros tend to be more efficient, and suspicious slices can start looking efficient in a different way: less corrective burst, less manual oversteer, and cleaner settling than expected for the difficulty of the encounter. That is the kind of control-path evidence NullCS is trying to surface.

The key point is not that one bar proves cheating. It is that NullCS can inspect the process behind the aim, not just the outcome.

NullCS can inspect control-path behavior deeply enough to see when a player looks too efficient, not just loud.

Benchmark slices

Top-ranked demo signal separates suspicious slices from legit and pro holdouts

The strongest player signal in suspicious demos shifts upward, while held-out legit and pro slices stay compressed near zero. That is the core credibility test: separation without broad false-positive drift.

Per-demo surface

Top-1 and top-3 demo aggregates separate suspicious lobbies from quiet baselines

Suspicious demos move up and to the right, while legit and pro demos stay near the origin. That matters because the signal is not just one loud outlier; the top of the lobby is coherently louder.

Control path

Usercmd-derived mouse and crosshair behavior diverge across cheater, normal, and pro benchmark slices

These panels come from mouse-delta and crosshair-process aggregates built out of encounter windows. They show why control-path telemetry matters: suspicious slices are not just louder in score space, they behave differently in input and aim process too.

Control features

Input-stability, mouse-delta, and angular-jerk features carry real separating power on their own

This is not a one-metric story. Usercmd-derived mouse behavior, quiet-after-acquire behavior, and angular-jerk features all show measurable lift by themselves before they are folded into the full ranking model.

Training scale

The current stack is trained on roughly one thousand matches, which expands into hundreds of thousands of modeled encounters

The data scale matters because a single CS2 match is not one row. It becomes many encounter windows, control-path sequences, player aggregates, and benchmark slices. That is what allows the model to study hard cases instead of just memorizing clips.

How It Works

Parse, score, explain.

Raw demos are turned into structured events, behavior signals, and ranked review output.

Parse and structure

Raw demos are turned into event, engagement, and encounter data. A single match expands into hundreds of encounter windows and thousands of tick-aligned measurements before ranking begins.

Build hundreds of behavior signals

The current stack uses 449 player-level engineered features, plus encounter timing and control-path channels built from mouse delta, aim process, visibility transitions, and crosshair movement.

Score and explain

Models rank standouts inside a match and export evidence meant to support careful review, especially when the case is not obvious.

Desktop beta

The NullCS beta client is available now.

The current public release is the local Windows desktop app. Load a Counter-Strike .dem file, run the pipeline on your PC, and review ranked players with supporting evidence.

Use GitHub Releases for the installer. Avoid unofficial mirrors, and treat beta output as review triage that still needs demo context.

What it includes

Local demo intake, ranked players, review-facing reasons, score context, and supporting evidence panels.

Current status

Public beta. The desktop client is live, while the model and review workflow continue to be trained and polished.

Technical Credibility

Repository, benchmarks, and ongoing work.

NullCS is public as a real technical project now. The repository and benchmark pages are the best entry points into the current state of the work.

Repository

Explore the GitHub repository

The project is still evolving. Use the repo to inspect the current pipeline, benchmark artifacts, and model work as it continues to improve.

Some demo metrics can scream that something is wrong, but that does not automatically settle the case on its own.

The real pressure test is whether suspicious slices rise without inflating strong legitimate and pro-level play at the same time.

Research is still ongoing. The current result is serious progress, not a claim that the problem is solved.

Loading NullCS

The NullCS beta client is available now.

The current public release is the local Windows desktop app. Load a Counter-Strike .dem file, run the pipeline on your PC, and review ranked players with supporting evidence.

Use GitHub Releases for the installer. Avoid unofficial mirrors, and treat beta output as review triage that still needs demo context.