Andrej Baranovskij Blog: 2026

Monday, June 29, 2026

Building an AI Agent That Searches the Web and Makes Investment Decisions

In this video I build a local agentic AI pipeline that analyzes a bond portfolio and makes sell/hold decisions based on risk analysis and live web search data.

The agent runs four steps: load portfolio positions from JSON, classify each position as low/medium/high risk, search the web per position via Tavily API for historical performance and current outlook, then make a final sell/hold decision with reasoning — all powered by Gemma 4 31B running locally on Apple Silicon via mlx-vlm. No data leaves your machine.

All steps orchestrated with Prefect.

🔗 GitHub: https://github.com/katanaml/sparrow
🌐 Live: https://sparrow.katanaml.io
📧 Enterprise inquiries: abaranovskis@redsamuraiconsulting.com

Wednesday, June 24, 2026

Mistral OCR + Sparrow: Document to JSON

Integrated Mistral OCR as a new cloud inference backend into Sparrow, an open-source document extraction platform. This gives Sparrow a full cloud option alongside its existing local backends (MLX, vLLM), so users without GPU infrastructure can still run enterprise-grade document extraction.

Pipeline: Mistral OCR converts the document to structured HTML, then Mistral Small extracts and transforms the data into JSON based on a defined schema with field-level hints.

In this video, extracting a bonds portfolio table with hint-driven rules:

Instrument name normalization (extracting issuer brand from full fund names)
European number formatting (period as thousands separator, comma as decimal)
Percentage formatting with sign preservation
Derived risk classification computed from profit/loss percentage

Same Sparrow API, same schema and hint format as local backends — just switch the backend flag to run on Mistral Cloud instead of MLX or vLLM.

Sparrow is open source and local-first by design — documents never leave your infrastructure unless you choose the cloud backend.

⭐ GitHub: github.com/katanaml/sparrow
🌐 Live demo: sparrow.katanaml.io

Monday, June 15, 2026

Sparrow 0.6.0: New Production-Ready UI for Local Document AI

Sparrow just got a complete UI overhaul — rebuilt from the ground up with Next.js and shadcn for a production-grade experience.

What's new in this release:

- Faster document upload and extraction workflow

- Real-time analytics dashboard with usage metrics, model distribution, and geographical reach

- Built-in feedback collection

- Dark mode support Fully responsive mobile layout

Sparrow remains fully local — your documents are processed on-device with Vision LLMs, with nothing stored on disk and no cloud dependencies.

Wednesday, June 10, 2026

Gemma 4 12B vs Ministral 14B: Who Wins at Structured Table Extraction?

Head-to-head test: Gemma 4 12B vs Ministral 14B on structured table extraction.

In this video, I run a head-to-head test: Gemma 4 12B (8-bit and bf16) vs Ministral 14B (8-bit), extracting data from a 5-row table — two columns, JSON schema, array output.

Results:

Gemma 4 12B (both quantizations): fails to return a proper JSON array
Ministral 14B 8-bit: extracts all rows correctly

Monday, June 1, 2026

Building Agentic AI Pipelines for Document Analysis

In this video, I show how to build a local agentic AI pipeline using Sparrow to extract and analyze data from financial documents.

The agent runs two steps:

- Extract structured data from a bonds table image using Sparrow Parse pipeline and Ministral 3B 14B model

- Analyze portfolio risk using Sparrow Instructor pipeline and Gemma 4 31B model — classifying each position as low, medium, or high risk

Both steps run as Prefect tasks inside a single flow, fully locally — no data leaves your machine.

Monday, May 18, 2026

Instruction-Based Data Analysis with Sparrow and Local LLM

In this video, I show how to use Sparrow instruction processing pipeline to analyze a bond portfolio JSON extracted from a financial document — all running locally, no external APIs.

I run three different analysis cases using Gemma 4 31B on Apple Silicon Mac Mini M4 Pro:

Risk classification — categorize each position into low, medium, or high risk based on loss percentage
Concentration risk — flag overweight positions above 20% portfolio weighting
Portfolio aggregation — total valuation, weighted average P&L, best and worst performer

All three cases use the same sparrow-instructor pipeline, demonstrating how different instruction types — classification, rule-based flagging, and aggregation — are handled by a single local LLM.

Monday, May 11, 2026

Smart Document Extraction with Business Rules — Gemma vs Qwen vs Ministral

In this video I show how Sparrow hints work — a powerful feature that goes beyond simple field extraction. Using a bank bonds portfolio document, I demonstrate how to define business rules directly in the hints file: formatting rules for European number standards, short name normalization, and risk classification logic derived from extracted fields. I test the same hints across three local vision models — Gemma 4 31B Dense, Qwen 3.6 27B Dense, and Ministral 3 14B. All processing runs locally with no cloud dependencies.

Monday, May 4, 2026

Large Table Extraction to JSON with dots.ocr — No Vision LLM Hallucinations

Sparrow now supports a dedicated table mode for extracting large, complex tables into structured JSON — without Vision LLM hallucinations.

Vision LLMs struggle with dense tabular data: they hallucinate values, misalign rows, and lose precision at scale. Sparrow's table mode solves this by using dots.ocr to capture the full table structure as HTML, then applying a generic Sparrow template to convert that HTML into clean, structured JSON.

Monday, April 27, 2026

MoE vs Dense Models for Structured Data Extraction — Who Wins?

MoE or Dense — which model architecture wins for structured data extraction from documents? It depends on document complexity. In this video, I test MoE vs Dense models on real extraction tasks and show that MoE struggles with larger tables, while Dense models handle them reliably. In my benchmark, Qwen 3.6 Dense comes out on top.

Tuesday, April 21, 2026

Gemma 4 for Structured Data Extraction: Can It Beat Qwen 3.5?

In this video, I put Gemma 4 to the test on a real-world task — extracting structured data from bank statements — and benchmark it head-to-head against Mistral's Ministral and Qwen 3.5.

I run both the MoE and Dense variants of Gemma 4 to see how architecture affects accuracy on financial documents, then compare the results side-by-side.

My takeaway: Gemma 4 holds its own and performs on par with Qwen 3.5 — a strong result for local structured extraction workflows.

Thursday, April 2, 2026

Running Multiple Models on One GPU with vLLM and GPU Memory Utilization

In this video I show how to run multiple vLLM model instances on the same GPU (Nvidia) in parallel by adjusting the --gpu-memory-utilization flag.

You'll see:

- How to launch separate vLLM servers for different models

- How to split GPU memory between them without running out of VRAM

This approach works when you want to serve several smaller models concurrently on limited hardware.

Tuesday, March 24, 2026

How to Cache vLLM Model in FastAPI for Faster Inference

I show you how to keep your vLLM model loaded in FastAPI cache for much faster inference — without reloading it on every request.

Monday, March 16, 2026

Qwen 3.5 Test for JSON Structured Data Extraction

Quick test of the new Qwen 3.5 models on JSON structured data extraction from images. Testing and comparing results for 9B FP16, 27B Q8, and A3B 35B Q8. The 35B Q8 model wins in terms of both speed and accuracy. Test was run on MLX-VLM using a Mac Mini M4 Pro with 64GB RAM

Thursday, March 12, 2026

Fast Large Table Extraction: Sparrow + dots.ocr to JSON

Sparrow provides table processing mode. It is optimized to handle large tables, it comes with separate template script (new templates can be easily added) to process dots.ocr markdown output into structure JSON with field mapping.

Wednesday, March 4, 2026

Local OCR Comparison: dots.ocr More Accurate, DeepSeek-OCR 2 Faster (Sparrow + MLX)

I run local tests with Sparrow to compare DeepSeek OCR2 and dots.ocr (by RedNote), both run on MLX-VLM in FP16 precision. Dots.ocr consistently beats DeepSeek OCR2 in accuracy, but DeepSeek OCR2 deliveres much better inference performance.

Monday, February 16, 2026

GLM-OCR vs DeepSeek OCR 2: Which One Wins at Markdown Extraction?

I compare two OCR models using real test cases: GLM OCR and DeepSeek OCR2. Both are evaluated on their ability to extract document content and convert it into well-structured Markdown. I demonstrate which model performs better and which one is faster.

Monday, February 9, 2026

Get Vision LLMs to Follow Your Rules: Prompt-Guided JSON Formatting

JSON query helps to fetch structured output with Vision LLM and extract document data. I describe how to improve such output with additional rules provided through LLM prompt. In this video I share example of number formatting, based on applied rule LLM will output values in requested format.

Tuesday, January 27, 2026

Vision LLM Output Control for Better OCR with Prompt Hints

I explain my approach to enforce better OCR output from vision LLMs with prompt hints. This allows to set rules for output data validation and formatting.

Thursday, January 22, 2026

DeepSeek OCR Markdown Processing in Sparrow for Large Tables

I describe new functionality in Sparrow, where DeepSeek OCR is used to extract text data in markdown format and in the next step instruction LLM inference is utilized to convert data into structured JSON format. This approach helps to improve large table processing and avoid vision LLM hallucinations.