Monday, May 18, 2026

Instruction-Based Data Analysis with Sparrow and Local LLM

In this video, I show how to use Sparrow instruction processing pipeline to analyze a bond portfolio JSON extracted from a financial document — all running locally, no external APIs.

I run three different analysis cases using Gemma 4 31B on Apple Silicon Mac Mini M4 Pro:

  • Risk classification — categorize each position into low, medium, or high risk based on loss percentage
  • Concentration risk — flag overweight positions above 20% portfolio weighting
  • Portfolio aggregation — total valuation, weighted average P&L, best and worst performer

All three cases use the same sparrow-instructor pipeline, demonstrating how different instruction types — classification, rule-based flagging, and aggregation — are handled by a single local LLM.

Monday, May 11, 2026

Smart Document Extraction with Business Rules — Gemma vs Qwen vs Ministral

In this video I show how Sparrow hints work — a powerful feature that goes beyond simple field extraction. Using a bank bonds portfolio document, I demonstrate how to define business rules directly in the hints file: formatting rules for European number standards, short name normalization, and risk classification logic derived from extracted fields. I test the same hints across three local vision models — Gemma 4 31B Dense, Qwen 3.6 27B Dense, and Ministral 3 14B. All processing runs locally with no cloud dependencies.

 

Monday, May 4, 2026

Large Table Extraction to JSON with dots.ocr — No Vision LLM Hallucinations

Sparrow now supports a dedicated table mode for extracting large, complex tables into structured JSON — without Vision LLM hallucinations. 

Vision LLMs struggle with dense tabular data: they hallucinate values, misalign rows, and lose precision at scale. Sparrow's table mode solves this by using dots.ocr to capture the full table structure as HTML, then applying a generic Sparrow template to convert that HTML into clean, structured JSON.