Architecture
This page is the mental model you need to debug anything in Softprobe. No CLI flags, no SDK signatures — just what-talks-to-what and why.
The topology
Softprobe sits under your application, not inside it. Your app and its HTTP dependencies are unchanged; a single sidecar proxy sees every request and response on both directions.
┌───────────────┐ ┌───────────────┐
│ Test client │ │ HTTP │
│ (Jest/pytest/ │ │ dependency │
│ CLI / curl) │ │ (Stripe, …) │
└──────┬────────┘ └──────▲────────┘
│ (1) ingress │ (4) egress
▼ │
┌─────────────────┐ (2) forward ┌────────────────┐ │ forward
│ Envoy + Softprobe│ ───────────────► │ Application │───┘
│ WASM filter │ │ under test │
└──────┬──────────┘ └────────────────┘
│ (3) OTLP /v1/inject + /v1/traces
▼
┌─────────────────┐ JSON HTTP ┌────────────────┐
│ softprobe- │ ◄────────────────► │ Your test code │
│ runtime │ │ + CLI │
│ (Go service) │ └────────────────┘
└─────────────────┘
│
▼
*.case.json (OTLP traces on disk)Each numbered edge is one of the two HTTP flows:
- Ingress — the test client hits the proxy, which forwards to the app.
- Egress — the app makes an outbound call, which the same proxy intercepts.
- Control channel — the proxy asks the runtime, per hop, "is this mocked? should I capture it?"
- Forward / not forward — on miss, the proxy forwards to the real dependency. On hit (mock), it returns the canned response without touching the dependency.
One proxy, two directions
In a real Istio mesh, ingress and egress are the same sidecar — the routing layer just invokes it twice. In the local Docker Compose harness we model this with one Envoy with two listeners (:8082 for ingress, :8084 for egress) to avoid iptables redirection.
The four moving parts
1. The application under test (SUT)
Ordinary HTTP service. It knows nothing about Softprobe. The only requirement is that outbound HTTP calls propagate standard W3C traceparent / tracestate — which any OpenTelemetry HTTP client does by default.
You do not add Softprobe imports, mock wrappers, or test hooks to your application code.
2. The proxy (data plane)
Envoy with the Softprobe WASM filter. It does three things:
- Observes ingress and egress HTTP headers, bodies, and status codes.
- For every hop, issues
POST /v1/injectto the runtime with an OTLP span describing the request. The runtime answers200(mock hit — use this response) or404(miss — forward to the real upstream). - Asynchronously ships observed exchanges to the runtime via
POST /v1/tracesfor capture.
The proxy is deliberately dumb. It doesn't know what a "case" is, what a "rule" is, or what session the traffic belongs to beyond what's in the OTLP span attributes. All policy lives in the runtime.
3. The runtime (control plane + OTLP handler)
One Go binary. It serves two API surfaces from the same process, backed by a single in-memory session store:
| Surface | Called by | Spec |
|---|---|---|
| HTTP control API (JSON) | tests, CLI, SDKs | http-control-api.md |
| OTLP trace API | proxy only | proxy-otel-api.md |
Because both handlers read from the same store, any rule registered by a test is visible to the proxy on the very next inject lookup — no cache, no sync, no database (v1).
The runtime's responsibilities:
- Own session state (
sessionId,sessionRevision, mode, policy, rules, loaded case bytes). - Match OTLP inject spans against stored rules and return
200 + response attrson hit or404on miss. - Buffer extracted spans during capture mode and flush them to a
*.case.jsonfile on session close.
The runtime is not a routing control plane. It does not push config to Envoy. It reacts to the proxy's OTLP requests, nothing more.
4. The SDKs and the CLI
Both are clients of the runtime's HTTP control API. They never speak OTLP to the proxy.
- SDKs (
softprobe-js,softprobe-python,softprobe-java,softprobe-go) expose ergonomic test-authoring APIs (findInCase,mockOutbound,loadCaseFromFile,clearRules,close). They compile those calls into JSON payloads against/v1/sessions/{id}/.... - CLI (
softprobe) is a single static Go binary. It is language-agnostic and is the preferred interface for humans, CI pipelines, and AI agents. It covers orchestration (sessions, suites, capture, export, doctor) without you writing code.
Control plane vs. data plane
┌─────────────────────────────────────────────┐
│ Control plane │
│ ───────────── │
│ Tests, CLI, SDKs ─── JSON HTTP ───► Runtime
│ (sessions, load-case, rules, policy) │
└─────────────────────────────────────────────┘
│ (shared in-memory store)
┌──────────────────┴──────────────────────────┐
│ Data plane │
│ ────────── │
│ Proxy ───── OTLP /v1/inject + /v1/traces ───► Runtime
│ (per-request lookup; async capture) │
└─────────────────────────────────────────────┘Three invariants to remember:
- Tests never call
/v1/injectdirectly. Only the proxy does. - The proxy never calls the control API on the request path. Only OTLP.
- Both halves share one in-memory store. A rule posted by a test is visible to the proxy's next inject call with no latency.
The session model
A session is a test-time scope. Creating a session gives you a UUID (sessionId) that you attach to HTTP requests via x-softprobe-session-id. The runtime uses that id to look up the session's policy, rules, and loaded case when the proxy asks.
POST /v1/sessions → session created, sessionRevision = 1
POST /v1/sessions/$ID/load-case → case loaded, sessionRevision = 2
POST /v1/sessions/$ID/rules → rules replaced, sessionRevision = 3
─── proxy sees rev 3 on the next /v1/inject ───
POST /v1/sessions/$ID/close → state deletedEvery mutating control call bumps sessionRevision. Proxy-side inject caches, if any, key on (sessionId, sessionRevision, requestFingerprint) so stale hits can never survive a rule change.
Session lifetime and teardown: always call close() in an afterAll (Jest / JUnit) or teardown (pytest / go test). Closing removes all session state from the runtime. Between cases, you can call clearRules() to drop just the mock rules while keeping the session alive.
See Sessions and cases for the full lifecycle.
The capture artifact
A case file is one JSON document on disk, typically named cases/<scenario>.case.json. Its top level looks like:
{
"version": "1.0.0",
"caseId": "checkout-happy-path",
"createdAt": "2026-04-15T10:00:00Z",
"traces": [ /* array of OTLP ExportTraceServiceRequest payloads */ ],
"rules": [ /* optional: ship with default rules */ ],
"fixtures":[ /* optional: auth tokens, metadata */ ]
}Each entry in traces[] is an OTLP-compatible JSON trace describing a single HTTP hop (one request + its response). The schema is defined in spec/schemas/case.schema.json and exported to OpenTelemetry collectors when you want to feed replay data into your observability pipeline.
Because the file is plain JSON, you can:
- diff two captures in a code review,
- edit a span by hand (e.g. to redact a token),
- regenerate cases from an LLM prompt,
- ship example cases in
spec/examples/cases/for tutorials.
Where decisions are made
One design principle drives the rest of the system: the proxy is a dumb mirror; the runtime is a dumb interpreter; the SDK is where cleverness lives.
| Decision | Made by | Why |
|---|---|---|
| Is the incoming HTTP traffic tagged with a session? | Proxy | It's the only thing that sees the header. |
| Is there a matching rule for this hop? | Runtime | It stores the rules. |
| What response bytes should the mock return? | SDK (at authoring time) | The test author is the only one who knows what the test needs. |
| Is this outbound call allowed under strict policy? | Runtime | Policy is just a synthesized rule. |
Should sessionRevision bump? | Runtime | It owns the store. |
Replay selection used to happen in the runtime ("walk traces to find the next matching response"). It no longer does. The SDK's findInCase runs a synchronous, in-memory lookup over the loaded case and produces a concrete response that the SDK hands to the runtime as an explicit mock rule. This keeps the runtime deterministic and the SDK expressive — you can mutate the captured response before mocking (bump a timestamp, swap a test card, fix a date).
Why proxy-first, not framework-patching?
Softprobe's predecessor monkey-patched Node.js HTTP clients. Every new framework (Fastify, Koa, Undici, Postgres driver, Redis client) was a new patch to write and maintain — and none of it worked for Python or Java.
Moving interception below the app swaps that cost for a modest one-time operational setup:
| Approach | Upfront cost | Ongoing cost | Cross-language? |
|---|---|---|---|
| Framework patches | Low | High (every dependency upgrade) | No |
| Proxy-first (Softprobe) | Moderate (run a sidecar) | Low | Yes |
If your team already runs Istio or Linkerd, the upfront cost is near-zero — you add a WasmPlugin to your existing mesh config.
Next: Sessions and cases →