Reference
Glossary
Terms used in the Docira API, dashboard, and documentation. Linkable anchors so you can cite a specific term in support tickets, design reviews, or compliance documents.
- Bounding box (bbox)
- A rectangle on the document page defined by
{x, y, width, height}coordinates. Docira returns a bbox alongside every extracted block so you can render highlights, crop the source region, or train downstream classifiers on the spatial layout. - Circuit breaker
- Per-provider state machine that opens (skips that provider) after a threshold of consecutive failures and closes (resumes traffic) after a cooldown. Prevents one degraded upstream from cascading into a global failure.
- Classifier (page classifier)
- Stage 2 of the pipeline. Inspects the page image and returns a feature vector and complexity score that drives routing.
- Complexity score
- A number in [0, 1] computed from page features (table presence, handwriting, math, scan quality, layout density). Drives tier selection: low scores route to Fast, medium to Pro, high to Premium.
- Confidence
- Per-page output from the verifier (Stage 5). Combines OCR-baseline agreement with VLM-output coherence into a single number in [0, 1]. Low confidence may trigger a re-route to a higher tier.
- Forced escalation
- A routing decision where the router promotes a page above its score-based tier, typically because of a domain rule (e.g., handwriting always Premium, multi-column tables always Pro+). Surfaced in the routing trace with an explicit flag.
- Grounding
- Linking each piece of extracted output back to its source location on the page via bounding boxes. Lets users verify where text was read from, build redaction tools, or render side-by-side overlays.
- HMAC signature
- SHA-256 keyed-hash message authentication used to verify webhook payloads. The signature header is computed as
HMAC-SHA256(secret, body)and your endpoint must verify it before trusting the event. - Idempotency key
- A client-supplied unique string that lets the server detect retries. Sending the same parse request twice with the same key returns the same result instead of double-charging.
- Markdown output
- Default output mode. Returns CommonMark-compatible Markdown with headings, lists, tables, and code blocks preserved from the source layout. Pairs naturally with chunking for RAG.
- OCR (optical character recognition)
- Traditional approach to extracting text from images. Operates at the character level via bounding-box detection, then groups characters into words and lines. Strong on plain prose; weak on tables, math, and handwriting.
- Page classification
- See classifier.
- Pipeline (six stages)
- ingest → classify → route → VLM → verify → deliver. Every parse runs all six stages; the routing trace records the latency and outcome of each.
- Provider
- Upstream vision-language-model service in Docira's pool. Current list: AIMLAPI, Anthropic, Fireworks, Google, Groq, NVIDIA, OpenAI, Together, vLLM (self-hosted).
- RAG (retrieval-augmented generation)
- LLM application pattern that retrieves relevant document chunks at query time and injects them into the prompt. Docira's Markdown output is designed to be chunked cleanly for RAG indexing.
- Re-route
- When the verifier flags low confidence on a tier-N output, the router re-runs the page on tier N+1. Re-route doubles cost on that page but is logged in the trace so you can decide whether to disable it for a given workload.
- Routing trace
- Per-page record returned with every parse: tier, provider, model, complexity score, latency, cost, confidence, and any forced escalations or re-routes. The trace is the audit log.
- Schema-guided extraction
- Output mode where the caller passes a JSON Schema and Docira returns JSON conforming to it. Works on any tier; output is validated against the schema before return.
- Tier (Fast / Pro / Premium)
- Routing tier selected per page. Fast = small VLMs, low cost, simple pages. Pro = mid-size VLMs, balanced. Premium = large VLMs, complex pages (tables, math, handwriting). Distinct from pricing plan tiers.
- VLM (vision-language model)
- Multimodal model that accepts both image and text input and produces text output. Reads documents the way a human reader does — as a visual composition rather than a sequence of character bounding boxes.
- Webhook
- HTTP POST sent from Docira to a customer-supplied URL when an asynchronous event happens (batch completion, parse completion, failure). Signed with HMAC-SHA256.
Missing a term? Tell us and we'll add it.
Related: Agentic routing · Grounding & bboxes · API reference