Wednesday, June 24, 2026

Mistral OCR + Sparrow: Document to JSON

Integrated Mistral OCR as a new cloud inference backend into Sparrow, an open-source document extraction platform. This gives Sparrow a full cloud option alongside its existing local backends (MLX, vLLM), so users without GPU infrastructure can still run enterprise-grade document extraction.

Pipeline: Mistral OCR converts the document to structured HTML, then Mistral Small extracts and transforms the data into JSON based on a defined schema with field-level hints.

In this video, extracting a bonds portfolio table with hint-driven rules:

  • Instrument name normalization (extracting issuer brand from full fund names)
  • European number formatting (period as thousands separator, comma as decimal)
  • Percentage formatting with sign preservation
  • Derived risk classification computed from profit/loss percentage
Same Sparrow API, same schema and hint format as local backends — just switch the backend flag to run on Mistral Cloud instead of MLX or vLLM.

Sparrow is open source and local-first by design — documents never leave your infrastructure unless you choose the cloud backend.

⭐ GitHub: github.com/katanaml/sparrow
🌐 Live demo: sparrow.katanaml.io 

 

Monday, June 15, 2026

Sparrow 0.6.0: New Production-Ready UI for Local Document AI

Sparrow just got a complete UI overhaul — rebuilt from the ground up with Next.js and shadcn for a production-grade experience.

What's new in this release:

- Faster document upload and extraction workflow
- Real-time analytics dashboard with usage metrics, model distribution, and geographical reach
- Built-in feedback collection
- Dark mode support Fully responsive mobile layout 

Sparrow remains fully local — your documents are processed on-device with Vision LLMs, with nothing stored on disk and no cloud dependencies.

Wednesday, June 10, 2026

Gemma 4 12B vs Ministral 14B: Who Wins at Structured Table Extraction?

Head-to-head test: Gemma 4 12B vs Ministral 14B on structured table extraction.

In this video, I run a head-to-head test: Gemma 4 12B (8-bit and bf16) vs Ministral 14B (8-bit), extracting data from a 5-row table — two columns, JSON schema, array output.

Results:

  • Gemma 4 12B (both quantizations): fails to return a proper JSON array
  • Ministral 14B 8-bit: extracts all rows correctly

Monday, June 1, 2026

Building Agentic AI Pipelines for Document Analysis

In this video, I show how to build a local agentic AI pipeline using Sparrow to extract and analyze data from financial documents. 

 The agent runs two steps: 

- Extract structured data from a bonds table image using Sparrow Parse pipeline and Ministral 3B 14B model 
- Analyze portfolio risk using Sparrow Instructor pipeline and Gemma 4 31B model — classifying each position as low, medium, or high risk
 
Both steps run as Prefect tasks inside a single flow, fully locally — no data leaves your machine.

 

Monday, May 18, 2026

Instruction-Based Data Analysis with Sparrow and Local LLM

In this video, I show how to use Sparrow instruction processing pipeline to analyze a bond portfolio JSON extracted from a financial document — all running locally, no external APIs.

I run three different analysis cases using Gemma 4 31B on Apple Silicon Mac Mini M4 Pro:

  • Risk classification — categorize each position into low, medium, or high risk based on loss percentage
  • Concentration risk — flag overweight positions above 20% portfolio weighting
  • Portfolio aggregation — total valuation, weighted average P&L, best and worst performer

All three cases use the same sparrow-instructor pipeline, demonstrating how different instruction types — classification, rule-based flagging, and aggregation — are handled by a single local LLM.

Monday, May 11, 2026

Smart Document Extraction with Business Rules — Gemma vs Qwen vs Ministral

In this video I show how Sparrow hints work — a powerful feature that goes beyond simple field extraction. Using a bank bonds portfolio document, I demonstrate how to define business rules directly in the hints file: formatting rules for European number standards, short name normalization, and risk classification logic derived from extracted fields. I test the same hints across three local vision models — Gemma 4 31B Dense, Qwen 3.6 27B Dense, and Ministral 3 14B. All processing runs locally with no cloud dependencies.

 

Monday, May 4, 2026

Large Table Extraction to JSON with dots.ocr — No Vision LLM Hallucinations

Sparrow now supports a dedicated table mode for extracting large, complex tables into structured JSON — without Vision LLM hallucinations. 

Vision LLMs struggle with dense tabular data: they hallucinate values, misalign rows, and lose precision at scale. Sparrow's table mode solves this by using dots.ocr to capture the full table structure as HTML, then applying a generic Sparrow template to convert that HTML into clean, structured JSON.