Agiliton CRM — Architecture for Data Sovereignty

Agiliton CRM — Architecture

What the AI reads, when a customer writes

Layers

Input
1. Customer messages
2. Customer profile & narrative
3. Health data & lab values
4. Knowledge chunks
5. Coach voice & prompts
6. Coach role & rules
7. Cross-tenant patterns

Scope

Single customer
Tenant-wide
Anonymised cross-tenant

Input·input

The message

What the customer just sent. The five seconds the AI spends on this single sentence are why every layer around it exists.

Scroll to zoom in

Agiliton CRM — Architecture

What the AI reads, when a customer writes

Seven layers of context, each with its own scope rule. The geometry is the proof.

Layer 1 of 7·single customer

Customer messages

Recent messages plus semantic-search hits in this customer's full message history. Every read is filtered by customerId in the database — never reaches another customer.

context-assembler.ts:863–884

Layer 2 of 7·single customer

Customer profile & narrative

Lifecycle status, enrolment, due date, plus an AI-maintained narrative summary of this customer's journey. Regenerates as new data arrives — and stays inside this one record.

context-assembler.ts:771–832

Layer 3 of 7·single customer

Health data & lab values

Per-person blood panels, hair mineral analyses, supplements, symptoms, treatment plans. Phone numbers are AES-256 encrypted at rest. Every read is row-level filtered.

context-assembler.ts:833–862

Layer 4 of 7·tenant-wide · mode-filtered · PII-redacted

Knowledge chunks

Retrieved chunks from the coach's library — programme materials, protocols, references. PII is stripped before chunks ever enter the index, and the customer's lifecycle status pre-filters by sales vs. coaching.

context-assembler.ts:751–769

Layer 5 of 7·tenant-wide

Coach voice & prompts

The coach's tone and the mode-specific system prompts (sales for leads, coaching for active care). No customer data; identical across every reply this coach generates.

context-assembler.ts:726–749

Layer 6 of 7·tenant-wide

Coach role & rules

The hard rules the AI inherits: what role it plays, what it must never say, how it must cite sources, deontological constraints. Tenant-wide; the customer cannot see or edit it.

context-assembler.ts:712–724

Layer 7 of 7·anonymised cross-tenant

Cross-tenant patterns

Coaching patterns extracted daily from anonymised feedback signals. No names, no specific lab values, no conversation snippets — only general claims weighted by confidence. The one cross-tenant artefact, by construction inverted-name-safe.

context-assembler.ts:886–896

Boundaries enforced in the query, not the prompt.

What you just scrolled through is not a metaphor. Every layer's scope is a WHERE clause on a database query — a misbehaving prompt cannot widen the boundary. Below, the same rules made explicit: how the mode is selected, how PII is stripped before indexing, how cross-tenant learning happens without cross-tenant data.

Mode selection

The mode is picked from data, not from the LLM.

A pre-purchase lead and an active coaching client need fundamentally different conversations. We don't ask the model to decide which one this is — we read the customer's lifecycle status from the database and pick the mode there.

Mode: Sales

Pre-enrolment touchpoints

Only sales-tagged knowledge chunks are searchable (consultation scripts, programme description, objection handling). The AI uses the sales prompt from coach settings.

LeadConsultation bookedConsultation completed

Mode: Coaching

Active care

Only coaching chunks are searchable (nutrition, supplements, thyroid, cycle, etc.). The AI uses the coaching prompt — different voice, different scope.

EnrolledActivePregnantPausedCompleted

Edge cases stay conservative. For Churned and Anonymised statuses, no scope filter is applied — the AI sees all chunks regardless of mode. Better to widen retrieval than to narrow it for an undefined state.

Personal data protection

Personal data never enters the knowledge base in the first place.

The textbook approach is to encrypt PII at rest and rely on access control. We go further: by the time a chunk is embedded, every name, address, phone number and date of birth has been replaced with a placeholder. The original is never indexed.

Regex layer (fast)

Email addresses, phone numbers (DE/AT/CH formats), street addresses, dates of birth, insurance numbers — all replaced with placeholders like [EMAIL], [PHONE], [ADDRESS].

AI name detection

A name-extraction model finds personal names (first/last, with titles like Dr., Prof.) and replaces with [NAME]. Catches names regex misses — including German compound names and academic titles.

Filename sanitisation

A document called 'Application Plan Maria Schmidt.pdf' is renamed to 'Application Plan [NAME].pdf' before its title even reaches the chunk index.

Chat-export awareness

WhatsApp transcripts are recognised on import and sender names are redacted automatically. The text the AI sees never carries the original sender's name.

Only the cleaned text is embedded and stored. The original document is read once, redacted, then forgotten by the knowledge base. The vector store contains 1,536-dimensional embeddings of redacted text — no path back to the customer's name.

Cross-tenant learning

The system learns from every coach. No coach learns from another coach's customers.

Coaches benefit from system-wide experience patterns. A pattern is only ever extracted from anonymised feedback signals — never raw conversations, never customer names, never specific health values.

What patterns contain

A general claim (e.g. 'Iron bisglycinate is better tolerated than ferrous sulphate')
Confidence (0–1, derived from feedback signal count)
Evidence count (how many anonymised events support the pattern)
Recency weight (newer patterns are preferred)
Conflict group (so contradictory patterns can't both surface)

What patterns never contain

Customer names or any identifier
Specific lab values or dosages
Conversation snippets
Tenant identifiers — patterns are stored once, queried by every tenant
Anything that could be inverted to a person

A coach in tenant A learns from tenant A's history only. Patterns are the one cross-tenant artefact — and they carry no person-level data, by construction.

Data pipeline

From document upload to AI suggestion, one path.

Every document a coach drops into Google Drive runs through the same pipeline. Every step is observable, reversible, and bounded by the redaction layer at the front.

Sync

Google Drive folder is polled every 4 hours (or on demand). Content-hash diff means unchanged files don't get re-indexed.

Extract

PDF, DOCX, Google Docs, Sheets, CSV. Scanned PDFs are routed through OCR (Gemini Flash). Empty pages are dropped.

Redact

The two-layer PII pass described above. Filename is sanitised here too.

Chunk

500-token chunks with 50-token overlap, cut at sentence boundaries. Section headings are preserved as metadata on each chunk.

Embed

text-embedding-3-large (1,536-dim) via OpenRouter. Embeddings stored in pgvector + a German-language full-text index (tsvector).

Retrieve

Per message: scope filter → vector search + full-text search → Reciprocal Rank Fusion → optional cross-encoder rerank → top 5–10 chunks to the AI.

Infrastructure

Where the data lives, and where it doesn't go.

EU-only datacenters

Hosted at Hetzner Cloud (Frankfurt / Helsinki)
No US datacenters in the request path
PostgreSQL, pgvector, Redis — all in-region
Backups encrypted, EU-resident

Encryption & access

Phone numbers AES-256 encrypted at rest
Per-tenant credentials via OpenBao (HashiCorp Vault fork)
Row-level filtering by tenantId + customerId on every query
Audit log of every read/write, retained 7 years

LLM access

Routed through OpenRouter under enterprise DPA
No-training-on-customer-data terms
Default model: Claude Sonnet 4.6 (Anthropic)
No human at any provider routinely reviews data

Compliance

GDPR you can run, not just talk about.

Most privacy promises are policies. These are endpoints — they exist in the code, can be invoked by the customer, and produce a verifiable result.

Art. 17

Right to erasure

A single request anonymises every personal-data field for the customer. Identifiers are replaced with ANONYMIZED, message bodies are deleted. Audit log entries remain — required for compliance — but carry no path back to the person.

Art. 20

Data portability

A full export of profile, messages, sessions, health data, insights and audit log entries — generated on demand, delivered as a structured archive.

Limited Use

Google API compliance

Customer Drive data is used only for the feature the customer enabled. Not for advertising, not for training general-purpose models, not subject to routine human review.

The full legal text — including OAuth scopes, sub-processor list, and retention periods — lives in the Agiliton CRM privacy policy.

Want to talk?

We're happy to walk a technical buyer or a DPO through the architecture, run live queries against a sandbox tenant, or share the schema diagram.

Reach us at service@agiliton.eu.