Many enterprise AI products follow the same pattern: you upload your documents to a cloud service, pay per page or per query, and receive answers from a system you cannot inspect.
For regulated industries including Law, Insurance, and Healthcare, this model creates three specific problems:
- Data Control: Documents leave your control. For organizations subject to data residency or confidentiality requirements, this is a compliance exposure.
- Scalability Costs: Costs scale with every document. Growth becomes an expense multiplier rather than an efficiency gain.
- Auditability: When the system produces an incorrect answer on critical data, you have no visibility into why, and no way to trace or verify the logic.
We built a reference architecture to demonstrate a different approach: a document intelligence pipeline where all critical data processing runs locally, using open-source components. No data leaves your infrastructure. Full visibility into every step.
The Architecture: Precision Logic
Each component was selected for a specific function: parsing, embedding, retrieval, and reasoning. All data processing runs locally on your hardware.
1. Document Parsing: Docling (IBM)
Standard OCR treats a document as a flat collection of text. It reads words, but it does not recognize structure.
- The Problem: If you process a complex regulatory PDF through a standard text extraction pipeline, it merges headers, footnotes, and tables. The original document hierarchy is lost.
- The Solution: We used Docling, an open-source IBM project that performs layout analysis. It identifies that bold text is a section header, and that the paragraph below it belongs to that specific concept.
- The Result: Structured, hierarchy-aware document chunks. The system understands a clause's meaning based on its position in the document, not just because specific words appear on the page.
2. Local Embedding: EmbeddingGemma (Google)
Most RAG systems depend on external embedding APIs. Every document you embed is transmitted to a third-party server.
- The Model: We used EmbeddingGemma, Google's open-source embedding model.
- The Implementation: We chose to access the model via the ONNX runtime, which allows it to run locally on CPU.
- Why it Matters: Your proprietary data stays on your own server. The mathematical "fingerprint" of your data never leaves your infrastructure.
- The Effect: High-quality semantic understanding without the latency or privacy exposure of an external API dependency.
3. Hybrid Search: PostgreSQL with pgvector
Vector search finds "conceptually similar" text. But in Law or Insurance, you often need an exact clause number or a specific regulatory term.
- The Stack: PostgreSQL with the
pgvector extension.
- The Method: We implemented Reciprocal Rank Fusion (RRF). This technique merges multiple ranked lists into a single, more accurate result set. Each search simultaneously produces two rankings:
- Semantic ranking: Cosine similarity on vector embeddings.
- Keyword ranking: PostgreSQL full-text search relevance.
- The Result: Exact phrase matches are prioritized when they exist, while the system still benefits from semantic understanding. This grounds every answer in real document excerpts.
4. The Interface: Pluggable AI Agent
The three components above form the pipeline. The AI Agent is the reasoning layer that turns retrieved document chunks into answers.
- The Design Principle: The agent is the only component that can be swapped without affecting your data.
- Why This is True: In a RAG architecture, your "Memory" (the indexed documents and vector embeddings) is decoupled from your "Reasoning Engine" (the LLM). Because the parsing and embedding logic is standardized, you can swap a cloud-based model (like Gemini) for a local one (like Deepseek) without re-processing a single page.
- The Benefit: Your data asset remains permanent. The model is just a replaceable utility on top. If a model is deprecated or pricing changes, you swap the agent. Your processed data is unaffected.
The Commercial Case: Rent vs. Own
Why invest the engineering effort instead of subscribing to a managed platform?
1. Fixed Cost vs. Variable Expense
SaaS vendors charge per page. This is a variable cost that increases with your document volume.
With a locally-hosted pipeline, the economics shift from OpEx to CapEx. You pay for the infrastructure and the initial build. After that, processing one million additional pages has near-zero marginal software cost. You own the throughput.
2. Auditable by Design
Every LLM can produce inaccurate results. The difference is how much visibility you have when it happens.
- With a managed service: You typically see the answer but not the full retrieval logic, such as which documents were ranked or why certain passages were selected.
- With a custom pipeline: You control the retrieval layer. You can inspect retrieved chunks, filter weak matches before they reach the model, and log every query for compliance review. You control the quality of what goes in.
3. Vendor Independence
APIs change and models are deprecated.
This architecture separates the data layer (your indexed documents) from the reasoning layer (the LLM). Your asset — your processed, indexed data — remains under your control regardless of which model you use.
The Trade-offs
- Engineering Expertise: This is not a self-service platform. It requires specialists who understand document processing and system architecture.
- Operational Responsibility: You own the system's performance, reliability, and security.
- Hardware Requirements: Full local inference (including the LLM) requires GPU infrastructure. Our reference builds often use a hybrid approach: local document processing with cloud for the reasoning layer.
Conclusion
Managed AI services are effective for general-purpose tasks.
But if your organization operates in a regulated industry, where data sovereignty and long-term cost predictability are requirements, the architecture of your AI pipeline is a strategic decision. You can rent that capability and accept the constraints. Or you can own it — and build an asset that increases in value with every document you add.
Ready to evaluate a sovereign architecture?
See how a sovereign RAG system works in our Interactive Knowledge Base Demo.
Our Strategic Blueprint is a fixed-price, 4-week diagnostic. We analyze your document workflows, map the commercial case, and deliver a technical roadmap.
Reach out at info@enblock.net to start the conversation.
Enblock — Business Strategy. Engineered.