Skip to content

docstack-ocr

Dual-VLM OCR + tenant-authored extraction. Self-hosted, model-agnostic, audit-first.

docstack-ocr is a self-hosted document understanding platform designed to plug into a larger system — not to be the system. Submit a document, poll for completion, retrieve a rich CanonicalOcrDocument plus structured fields conforming to a tenant-authored template.

Dual-VLM OCR

Every region is OCR’d by PaddleOCR-VL-1.6 and GLM-OCR in parallel. Paddle priority on agreement; GLM as fallback and cross-check; per-block risk flags surface advisory data.

Tenant-authored templates

Document types are runtime-defined. JSON Schema (Draft 2020-12) plus a declarative validation DSL (required, regex, arithmetic, date_plausible, enum, checksum, cross_agreement). Zero hardcoded templates.

Two auth modes

bearer (built-in users + API keys + sessions + CSRF) or trusted_headers (your gateway forwards X-Tenant-Id / X-Actor-Id / X-Actor-Roles).

Model-agnostic LLM

vLLM, Ollama, OpenAI, or native Gemini. Swap via runtime infrastructure config; no code changes.

Review-first

Validation failures route to a queue with approve / reprocess / field-level override. Every override re-runs validation and logs an audit row.

Auditable + recoverable

Every mutation is audited. Soft-delete is the default; restore is one idempotent endpoint away.

  • Quickstart — boot the stack, mint a key, submit your first document. ~5 minutes.
  • Infrastructure setup — configure OCR and LLM endpoints across vLLM, Ollama, OpenAI, and Gemini.
  • Full integration guide — every endpoint, every parameter, every failure mode. The single canonical markdown reference.
  • API reference — interactive OpenAPI browser with a “Try it” panel.
  • Contract testing — keep your integration honest by fuzz-testing the API against its own OpenAPI spec.

Every page exposes a Copy as Markdown action and an Open in ChatGPT/Claude dropdown. The full docs surface is also exported in machine-friendly form: