Monday, May 11, 2026

Smart Document Extraction with Business Rules — Gemma vs Qwen vs Ministral

In this video I show how Sparrow hints work — a powerful feature that goes beyond simple field extraction. Using a bank bonds portfolio document, I demonstrate how to define business rules directly in the hints file: formatting rules for European number standards, short name normalization, and risk classification logic derived from extracted fields. I test the same hints across three local vision models — Gemma 4 31B Dense, Qwen 3.6 27B Dense, and Ministral 3 14B. All processing runs locally with no cloud dependencies.

 

Monday, May 4, 2026

Large Table Extraction to JSON with dots.ocr — No Vision LLM Hallucinations

Sparrow now supports a dedicated table mode for extracting large, complex tables into structured JSON — without Vision LLM hallucinations. 

Vision LLMs struggle with dense tabular data: they hallucinate values, misalign rows, and lose precision at scale. Sparrow's table mode solves this by using dots.ocr to capture the full table structure as HTML, then applying a generic Sparrow template to convert that HTML into clean, structured JSON.