OPEN SOURCE · v3.2.0 · MIT / APACHE-2.0

The engine that powers
Nyquest. Open source.

Drop-in compression proxy for LLMs. 532 regex rules, bounded LRU cache, local LLM semantic stage, 15–75% token savings. Same Rust engine that runs in production behind app.nyquest.ai — now with a published Docker image.

🦀 Full Rust ⚡ <1ms p50 🪨 532 rules 🧠 Local LLM stage ♻️ ~3,730× cache 🐳 Docker / GHCR
git clone https://github.com/Nyquest-ai/nyquest-rust-fullstack-3.2.0 ~/nyquest && bash ~/nyquest/install.sh
Star on GitHub ★ 4 README → Quickstart ↓

No telemetry · No phone-home · Privacy mode by default

Nyquest Engine
532Rules
<1msp50 Hot Path
8,266req/s single
18,287req/s c=20
75%Max Savings
~3,730×Cache Speedup
48.5 MBRSS
47 / 47Tests Passing
// Same Engine in Production

The exact code running behind app.nyquest.ai.

No "lite" version. No fork that lags behind. The compression engine that processes every request on the production app is the one in this repository — sanitized for public release with hardcoded keys and internal paths removed. v3.2.0 adds a RegexSet prefilter, zero-allocation rule misses via Cow<str>, a bounded LRU compression cache, per-rule hit telemetry, a regression suite of 47 passing tests, and a multi-stage Dockerfile auto-published to GHCR.

// Quickstart

Three ways to install

All three converge on a running proxy at localhost:5400. Health check, web dashboard, OpenAI-compatible endpoint, Anthropic-native endpoint — all there.

# Multi-stage Dockerfile — Rust builder + Debian-slim runtime, non-root user, tini PID-1
docker pull ghcr.io/nyquest-ai/nyquest-engine:3.2.0
docker run --rm --network host \
  -e NYQUEST_CACHE_CAPACITY=2048 \
  ghcr.io/nyquest-ai/nyquest-engine:3.2.0

# Or via docker-compose (includes healthcheck, log volume, host networking)
git clone https://github.com/Nyquest-ai/nyquest-rust-fullstack-3.2.0 ~/nyquest \
  && cd ~/nyquest \
  && bash scripts/run-local.sh
# Hardware check → deps → Rust → build → semantic stage → preflight → wizard → start
git clone https://github.com/Nyquest-ai/nyquest-rust-fullstack-3.2.0 ~/nyquest \
  && bash ~/nyquest/install.sh
cargo build --release
./target/release/nyquest preflight -v    # System check + tier recommendation
./target/release/nyquest install         # Interactive setup wizard
./target/release/nyquest serve           # Start proxy on :5400
nyquest install --defaults \
  --set port=8080 \
  --set compression_level=0.7 \
  --set semantic_enabled=true

After install: curl localhost:5400/health · dashboard at localhost:5400/dashboard · cache snapshot at localhost:5400/admin/v2/compression/cache

// Pipeline

Six stages, one binary

Every request flows through this pipeline. Stages 2 and 5 are toggleable; stages 1, 3, 4, 6 always run.

01Normalizededup, conflict resolution, speculation boundaries
02OpenClaw Agent Mode7-strategy agentic optimization (toggleable)
03Cache Reordersort for provider prefix-cache hits
04Rule Compress532 regex rules · RegexSet prefilter · LRU cache · telegraph · code minify · format optimizer
05Semantic LLMQwen 2.5 1.5B via Ollama · 56% system · 75% history (toggleable)
06Auto-scale + Forwarddynamic level based on context window · provider routing

Stages run in microseconds (1, 3, 4) to a few hundred ms (5, GPU). Hot path stays under 1ms (p50 <1ms measured) · cached identical content returns in ~19µs (~3,730× speedup vs cold).

// Compression Levels

Dial it from off to aggressive

A single 0.0–1.0 knob. Server default is 0.7 (balanced). Override per-request with x-nyquest-level.

LevelStrategyRules ActiveTypical Savings
0.0Pass-through (metrics only)0%
0.2Filler removal~95 rules5–12%
0.5Structural compression~250 rules15–28%
0.7Default · balanced~390 rules18–33%
1.0Aggressive + format + minify532 rules23–35% rules · up to 75% with semantic
// Hardware Tiers

Run anywhere from a Pi to a workstation

The preflight command (nyquest preflight -v) inspects your machine and recommends a tier. You can always force a different one in nyquest.yaml.

TIER 1

Rules only

15–37% savings

Pure regex pipeline. No semantic stage. Ridiculously fast hot path — sub-millisecond p50.

  • 2+ CPU cores
  • 512 MB RAM
  • No GPU required
  • Runs on a Raspberry Pi 4
TIER 3

CPU semantic

15–75% savings

No GPU? CPU runs the semantic stage at 1–4 seconds per call. Same savings, slower path.

  • 4+ CPU cores
  • 8 GB RAM
  • No GPU
  • Ollama installed
// Benchmarks

Production numbers, not synthetic

Measured live on the v3.2.0 Docker container (Ubuntu 24.04 host, --network host). Full breakdown in CHANGELOG.md.

TestResultDetail
Reference prompt (358 tokens) @ L0.326.6% saved319 → 234 tokens
Reference prompt @ L0.528.2% saved319 → 229 tokens
Reference prompt @ L1.034.8% saved319 → 208 tokens
Aggregate across 8 realistic scenarios @ L1.022.6% mean2,201 → 1,703 tokens · Haiku 4.5 conservative profile
Semantic stage — system prompts55.9% saved200–350ms on GPU
Semantic stage — conversation history75% saved200–350ms on GPU
Health throughput (ab -n 5000 -c 1)8,266 req/sp50 <1ms · host networking
Concurrent (ab -n 5000 -c 20)18,287 req/sp50 1ms · p95 1ms · p99 2ms
Cache hit (warm replay)~3,730× speedup~70.9 ms cold → ~19 µs hit on ~500-char prompt
Failed requests0 / 10,000across single-thread + concurrent runs
Memory footprint48.5 MB RSS15 threads · Docker container
Test suite47 / 47 passing16 safety-invariant snapshots · 6 measurement reports · 25 role personas

Compression rows measured against Claude Haiku 4.5, which uses the engine's conservative profile (raised in v3.2.0 with specific-marker matching so -mini / -flash / -nano / -light always win over family-name substrings). Frontier models (Opus, Sonnet, GPT-5.5, Grok-3) use the aggressive profile and typically see 6–10 percentage points higher savings on the same prompts.

Lifetime stats from the live engine's /metrics endpoint, processing real traffic behind app.nyquest.ai:

18,085
Requests
6.1M
Tokens In
1.54M
Tokens Saved
10.9%
Avg Mixed
77.7%
Single-Req Max

Average is low because production traffic skews toward short prompts — the long-context savings are where the engine earns its keep.

Per-prompt compression on Haiku 4.5 (conservative profile). Each scenario is a verbose, professional system prompt of the kind real users send. Re-measured on v3.2.0 (tests/v320_compression_report.rs).

Scenario@ 0.5@ 0.7@ 1.0
Customer Support28.1%33.4%34.7%
Legal Review14.1%16.0%27.5%
Data Science14.6%16.5%18.1%
Travel Planning22.9%23.3%25.9%
Code Review11.7%13.8%20.7%
Financial Advisor12.5%10.9%17.5%
HR Policy11.6%13.6%18.2%
Medical Education9.3%10.8%14.3%
AGGREGATE (mean)15.9%17.7%22.6%
// Cost Impact at Scale

What 100M tokens/month looks like

Savings calculated against published provider rates. With BYOK + the engine in front of your app, these come out of your bill, not Nyquest's.

ModelPrice / 1M input100M tok/mo costSaved @ 0.7Saved @ 1.0
Claude Haiku 4.5$1.00$100$17.70$22.60
Claude Sonnet 4.6$3.00$300$53.10$67.80
Claude Opus 4.8$5.00$500$88.50$113.00
GPT-5.5$5.00$500$88.50$113.00
Grok-3$3.00$300$53.10$67.80

Standard provider rates as of May 2026. Savings calculated against v3.2.0 aggregate mean (8-scenario suite, conservative profile). Add the semantic stage and the savings double or triple on long-context workloads.

// Integration

Drop-in. Nothing to change in your code.

Point your existing OpenAI or Anthropic client at localhost:5400 instead of the cloud endpoint. Done. The proxy auto-detects which provider you're calling and translates between formats as needed.

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:5400/v1",
    api_key="your-api-key",
)

# Use any model — Nyquest auto-routes to the right provider
response = client.chat.completions.create(
    model="claude-haiku-4-5-20251001",  # or gpt-5-5, grok-3, gemini-3-flash
    messages=[{"role": "user", "content": "Hello!"}],
)
import anthropic

client = anthropic.Anthropic(
    base_url="http://localhost:5400",
    api_key="your-anthropic-key",
)

response = client.messages.create(
    model="claude-haiku-4-5-20251001",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Hello!"}],
)
# Anthropic native
curl -X POST http://localhost:5400/v1/messages \
  -H "x-api-key: $ANTHROPIC_API_KEY" \
  -H "anthropic-version: 2023-06-01" \
  -d '{"model":"claude-haiku-4-5-20251001","max_tokens":100,
       "messages":[{"role":"user","content":"Hello!"}]}'

# OpenAI-compatible
curl -X POST http://localhost:5400/v1/chat/completions \
  -H "Authorization: Bearer $API_KEY" \
  -d '{"model":"claude-haiku-4-5-20251001",
       "messages":[{"role":"user","content":"Hello!"}]}'
# Override compression level for this request
curl -H "x-nyquest-level: 1.0" ...

# Route to a specific provider
curl -H "x-nyquest-base-url: https://api.openai.com" ...

# Enable OpenClaw agent mode (tool pruning, schema min, sliding window)
curl -H "x-nyquest-openclaw: true" ...

# Override response-compression age (turns left untouched)
curl -H "x-nyquest-response-age: 2" ...
// Features

What ships in v3.2.0

532 rules, 19 categories
Filler removal, verbose phrases, imperative conversion, clause collapse, code minify (Python/JS/Bash), JSON→CSV, AI-output noise, and more. Tier-gated by compression level. Every rule has an atomic hit counter exposed via the admin API.
RegexSet prefilter
Each category builds a regex::RegexSet at startup. On every call, only candidate rules run — rules that can't match are skipped without a full text scan. Combined with Cow<str> zero-allocation misses, the hot path stays well under 1ms.
Bounded LRU cache
Compression results keyed by sha256(content) + level + profile + mode. Default 2048 entries / 64KB max entry size. Tunable via NYQUEST_CACHE_CAPACITY / NYQUEST_CACHE_MAX_ENTRY_SIZE. Warm replay: ~3,730× speedup on a ~500-char prompt.
Local LLM stage
Qwen 2.5 1.5B via Ollama. Condenses long system prompts (~56%) and conversation history (~75%) with provider-aware fallback to extractive compression on timeout.
OpenClaw Agent Mode
7-strategy pipeline for autonomous agents: tool result pruning, schema minimization, thought-block compression, error dedup, sliding window, cache injection, file-view condensation.
Per-model profiles
Aggressive (Opus, Sonnet, GPT-5), Balanced (unknown), Conservative (Haiku, Mini, Flash, Nano, Light). v3.2.0 fixed selector ordering so specific markers (-mini, -flash, -nano, -light) always win over family-name substrings.
Safety invariants
tool_use and image blocks pass through byte-identical. Python tuples preserved. Four-digit dates preserved. Role declarations rewritten as Role: X. instead of stripped. Fail-closed length contract: returns original if compression produced a longer string.
Admin observability
/admin/v2/compression/rules exposes per-category and per-rule hit counts (no prompt content). /admin/v2/compression/cache exposes cache snapshot (entries, hits, misses, evictions). /dashboard for live request stats.
Docker + GHCR
Multi-stage Dockerfile (Rust builder + Debian-slim runtime, non-root user, tini PID-1). docker-compose.yml with --network host and a prod profile (read-only root FS, resource caps). Auto-published to ghcr.io/nyquest-ai/nyquest-engine on every v* tag.
Test suite
47 / 47 passing. 16 regression snapshots for safety invariants (Python tuple syntax, four-digit dates, profile-detection ordering, byte-identical tool_use pass-through, non-English text). 6 measurement reports. 25 role personas across 7 categories.
Privacy mode
No telemetry. No phone-home. Prompt content is never logged — admin endpoints expose hit counts and regex sources, not data. Run it air-gapped.
Encrypted key vault
Optional AES-256-GCM encryption for API keys at rest. Keys never logged, never written to request logs, only decrypted in memory at dispatch.
// CLI

Operate from the command line

CommandWhat it does
nyquest installInteractive 11-section setup wizard
nyquest install --defaultsHeadless mode with defaults (CI / Docker)
nyquest preflight -vOS, CPU, RAM, disk, GPU, Ollama, network — verbose with tier rec
nyquest doctor10-point health check (config, port, keys, dashboard, logs, systemd)
nyquest config showDisplay all resolved config values
nyquest config set <key> <value>Set a single value (dot-notation)
nyquest serveStart the proxy server (default if no command)

Run it. Star it. Open an issue.

v3.2.0 is current. main is what runs in production. Pull the Docker image from ghcr.io/nyquest-ai/nyquest-engine:3.2.0 or build from source. PRs welcome — see CONTRIBUTING.md.

View on GitHub Read the README Issues All Products →