The stack · open · EU-hosted · auditable

Every layer on GitHub. Every byte in the EU.

Anthropic / OpenAI / Operator all share one shape: a US opaque-box runtime at the bottom of the stack. We built the opposite — every component is open-source software your engineers can read, hosted on German infrastructure your security review can reach, with cross-tenant learning gated by a scrubbing pipeline you can audit per tenant. Sovereignty is the architecture, not a checkbox.

The four layers

From your Chrome down to the metal — nothing proprietary, nothing offshore.

Each layer is a known open-source primitive, deployed on Hetzner Cloud or in your VPC. The diagram below is the actual deployment shape — not a marketing simplification. License badges link to the repos; the audit trail starts there.

Browser runtime

your Chrome — never our sandbox

SiteBridge MCPMIT

Chrome extension + MCP server · ~70 tools

↗ github.com/agiliton/sitebridge

Model gateway + observability

every prompt routed, every call traced, every cent counted

LiteLLMMIT

model router · failover · cost · 100+ providers

↗ github.com/BerriAI/litellm

LangfuseMIT

trace store · datasets · scoring · PII masking

↗ github.com/langfuse/langfuse

Storage + learning

patterns up · data down · per-tenant by construction

PostgreSQLPostgreSQL

relational store · per-tenant schemas

↗ postgresql.org

pgvectorPostgreSQL

skill library · cross-tenant pattern recall · k-anonymity

↗ github.com/pgvector/pgvector

Infrastructure

German operator · EU-only data residency · no US data transfer

Hetzner Cloudoperator

Helsinki (fsn1) · Nuremberg (nbg1) · Falkenstein (fsn1)

↗ hetzner.com/legal/dpa

✓GDPR / BDSG by construction

✓EU-only data residency · no SCCs required

✓DPA / AVV available on request

✓Self-hostable end-to-end (drop into your VPC)

✓Optional on-prem inference (Llama · Mistral · vLLM)

✓Audit log for every model call + every MCP tool call

Every line of LiteLLM, Langfuse, pgvector, and the SiteBridge MCP server is on GitHub under MIT or Apache‑2. You don't need our permission to read it. You don't need our permission to host it.

Request lifecycle

How a single agent action traverses the stack.

A browser action — clicking Save in HubSpot, dropping an HTTP node in n8n, rescuing a stuck Base44 chat — flows through the stack in a fixed shape. The interesting part is how often it short-circuits before reaching the model.

User in Chrome

real session · already logged in · no sandbox

SiteBridge MCP server

tool dispatch · pre-click guard · undo snapshot ring

↳ audit log entry

LiteLLM gateway

vector recall · skill library lookup · routing decision

↳ Langfuse trace · PII masked

↙ cache HIT (~70%)

Replay cached MCP sequence

deterministic execution · €0 inference · ~50 ms

↘ MISS — ask the model

Routed model

MiniMax · Sonnet · Opus · GPT-5 · Llama (self-host)

Verifier · pre-click guard

did the world look like the recipe expected?

pgvector skill library

scrubber → fingerprint → k-anonymity ≥ 5 → cross-tenant pool

↳ tenant-only overlay

The point

Most browser-agent calls are repeats. The first time the agent figures out 'click the green Save in HubSpot's stage dropdown' costs full-LLM. The next 4 200 times — across all tenants — it costs ~€0 and 50 ms.

Why Langfuse here

Every LiteLLM call writes to Langfuse via a 2-line callback. PII masking runs before the trace lands. Datasets feed the offline distiller that turns successful runs into recipes. One trace store, per-tenant by design.

The wall

pgvector holds two namespaces: tenant overlay (your selectors, your conventions, your data) and pattern library (cross-tenant, scrubbed, k-anonymity ≥ 5). Patterns flow up. Data stays down. Audit log shows every contribution.

The headline

Models + context + MCP — delivered as one product.

Anthropic's pitch is "subscribe to Claude, install our extension." That sounds simple until you multiply it by every user, every team, every procurement cycle, every regional billing constraint. SiteBridge through this stack lets you sell the outcome — a working browser agent — instead of three components each with its own contract.

Without us

User signs up for Anthropic Pro / Max / Team / Enterprise (per-user, per-tier).
Procurement signs Anthropic DPA, reviews EU data-residency clauses, files SCCs anyway because the data plane is US.
Org admins manage Anthropic console: usage, members, billing.
Outage = downtime. No second provider in the path.
Model gets deprecated mid-quarter — the team scrambles.
No insight into how the agent is reasoning — Anthropic owns the trace store.

With SiteBridge

User signs in via your SSO. Done.
Procurement signs one Agiliton contract. EU data residency by construction — no SCCs.
Admins use one dashboard for users, usage, budgets, audit.
Outage on any single provider = transparent failover.
Model deprecation = config swap on our side. Customers don't notice.
Every tool call traced in your Langfuse instance — you own the data, not us.

Twelve concrete benefits

What this stack actually buys you.

1.Open source end-to-end

LiteLLM (MIT), Langfuse (MIT), pgvector (PostgreSQL license), the SiteBridge MCP server (MIT). Every layer is on GitHub — your engineers can read, audit, fork, and self-host before procurement asks. No black boxes in the data plane.

2.Hosted in EU · no US data plane

Compute and storage on Hetzner Cloud (Helsinki fsn1, Nuremberg nbg1, Falkenstein). German operator, EU-only data residency, no Standard Contractual Clauses required. Anthropic / OpenAI / Operator can offer a Frankfurt subsidiary; their data plane still terminates in us-east-1.

3.No third-party subscription per user

Users don't sign up for Anthropic, OpenAI, or anyone else. We deliver the model, the agent runtime, and the MCP context layer in one product. Onboarding is SSO — not a five-step API-key dance per user.

4.Every major model, one gateway

Claude (Anthropic), GPT (OpenAI), Gemini (Google), Grok (xAI), Llama (Meta), Mistral, MiniMax, Z.ai's GLM — all available through one product. Pick the model that wins on your task; switch any time without changing tools, contracts, or onboarding flows.

5.One contract, one bill, one DPA

Procurement signs once. Finance reconciles one invoice. Legal reviews one DPA / AVV. We handle the upstream multi-vendor mess so the customer never has to negotiate eight provider relationships.

6.Cost: complexity-routed

Browser tasks are mostly cheap: find element, click element, read result. LiteLLM auto-routes those to MiniMax M2.5 / GLM 5.1 at fractional price. Reserve Claude / GPT-5 / Opus for hard reasoning. We capture the spread; customer sees a flat price.

7.Failover built in

Anthropic outage? LiteLLM falls back to OpenAI / xAI / Gemini in the same call. Your fleet doesn't notice. Claude-for-Chrome customers get a 502 and a tweet from Anthropic Status. Resilience is a config knob, not a re-architecture.

8.Privacy: route to self-hosted

Regulated workload? Route to on-prem Llama 4 / Qwen 3 / Mistral via the same MCP interface. Inference never leaves the VPC. Audit log shows the model was self-hosted. One product, two compliance postures.

9.Patterns up · data down

Cross-tenant pattern library makes the n8n / WordPress / Stripe helpers smarter for everyone. The wall: selectors, tool sequences, recovery strategies pool with k-anonymity ≥ 5. Form values, IDs, secrets, customer names — never. Audit log per tenant shows exactly what crossed.

10.Per-team budgets & quotas

LiteLLM virtual keys give every team / project / agent its own spend cap, rate limit, and metadata bag. "Which team's automation burned €400 yesterday?" — one query. Anthropic's console gives you one bill, one number, no answer.

11.Single-pane observability

Every model call across the fleet — Claude, GPT, Grok, Llama — and every MCP tool call lands in Langfuse as a tagged trace. Cost per task, latency per provider, error rate per model, full tool sequence. One dashboard for the team, not eight vendor dashboards stitched together.

12.Future-proof model layer

Sonnet 5 ships? We add the deployment, customers benefit silently. GPT-6? Same. Open-weights leapfrog? Same. Tool surface and safety stack stay stable; the model layer is hot-swappable. Claude-for-Chrome customers wait for Anthropic to ship.

Cost math, concrete

Per-user / per-month — and why the bundled price beats Anthropic-direct.

Anthropic charges per user, per tier. Agiliton via this stack charges per outcome, with the spread between routed-cheap inference and a reasonable seat price as the margin. For a moderately active user (~50 browser tasks/day, 30k tokens each, 22 working days/month):

Configuration	What user pays	Underlying inference cost	Vendor exposure
Claude for Chrome (Pro)	$20/mo per seat	(Anthropic captures all margin)	Anthropic only · per-seat sub required · US data plane
Claude for Chrome (Max)	$100–200/mo per seat	(Anthropic captures all margin)	Anthropic only · per-seat sub required · US data plane
SiteBridge · bundled routed	Customer's negotiated seat / metered price	~$5–10/seat/mo (MiniMax/GLM majority + bypass minority)	Multi · failover ready · we manage · EU-hosted
SiteBridge · self-hosted models	Customer's negotiated price	~$0 marginal (their compute)	Zero · in customer VPC · open-source verified

Numbers illustrative as of May 2026. The ratio is the point: ~10× spread between routed-cheap inference and Claude Pro list price, and the customer never sees the spread — they see a single bundled price for "browser agent that works."

Already operational

This isn't roadmap — it's how Agiliton runs today.

The full stack — LiteLLM gateway, Langfuse trace store, pgvector skill library, SiteBridge MCP server — is core infrastructure at Agiliton. Routing decisions, virtual keys, complexity routing, the bypass-flag mechanism, scrubbing pipeline are shipping in production behind multiple internal agents — including the Matrix bot, customer-facing automations, and SiteBridge sessions themselves. The integration is battle-tested, not theoretical.

Complexity router live. Auto-downgrades sonnet/haiku/opus aliases to MiniMax/GLM when prompt complexity allows.
Bypass flag proven. Per-key bypass_complexity_router=true shipped after morphology errors in non-English replies; matrix-ai-agent uses it daily.
Virtual keys with metadata. Per-tenant, per-agent keys with spend caps and audit metadata are the norm, not the exception.
Self-hosted path validated. LiteLLM routes to Anthropic, OpenAI, xAI, Google, and self-hosted Ollama-served models in production.
Langfuse trace ingest live. Every model call + every MCP tool call captured, PII-masked, queryable per-tenant.

Three objections, three answers

Anticipated pushback.

“Adding a proxy adds latency and a failure point.”

LiteLLM adds ~5–20 ms of routing overhead. For a browser task that costs 800–3000 ms of model latency, this is in the noise. The failure-point concern is real but inverted: without LiteLLM, your single point of failure is Anthropic. With it, the failover chain is N providers deep. You trade one critical dependency for a routable mesh.

For self-hosted LiteLLM the gateway runs on the same network as your agents — no extra public-internet hop.

“My team standardized on Anthropic. Why complicate things?”

You don't have to use it differently. SiteBridge speaks MCP; day-1 config can be "everything to Claude." The option to route is the value — not having it means you can't react when Anthropic raises prices, deprecates a model mid-migration, or has a multi-hour outage. With LiteLLM in the path, you keep the option. Without it, you're vendor-coupled and the vendor knows it.

The flexibility is there when you need it — and procurement still signs one contract.

“We already have an LLM gateway / observability stack.”

Good — bring it. LiteLLM is one component; if you already run Portkey, Helicone, your own gateway, point SiteBridge at it. Langfuse is one component; if you already pipe traces to Datadog, OpenTelemetry, or Grafana, the LiteLLM callback supports those too. The stack is composable, not coupled.

What you can't bring: the cross-tenant skill library, the pre-click guards, the Chrome-MCP toolkit. Those are SiteBridge proper.