Andrej Baranovskij Blog

Instruction-Based Data Analysis with Sparrow and Local LLM

2026-05-18T13:10:06.573+02:00

In this video, I show how to use Sparrow instruction processing pipeline to analyze a bond portfolio JSON extracted from a financial document — all running locally, no external APIs.

I run three different analysis cases using Gemma 4 31B on Apple Silicon Mac Mini M4 Pro:

Risk classification — categorize each position into low, medium, or high risk based on loss percentage
Concentration risk — flag overweight positions above 20% portfolio weighting
Portfolio aggregation — total valuation, weighted average P&L, best and worst performer

All three cases use the same sparrow-instructor pipeline, demonstrating how different instruction types — classification, rule-based flagging, and aggregation — are handled by a single local LLM.

Smart Document Extraction with Business Rules — Gemma vs Qwen vs Ministral

2026-05-11T18:56:38.412+02:00

In this video I show how Sparrow hints work — a powerful feature that goes beyond simple field extraction. Using a bank bonds portfolio document, I demonstrate how to define business rules directly in the hints file: formatting rules for European number standards, short name normalization, and risk classification logic derived from extracted fields. I test the same hints across three local vision models — Gemma 4 31B Dense, Qwen 3.6 27B Dense, and Ministral 3 14B. All processing runs locally with no cloud dependencies.

Large Table Extraction to JSON with dots.ocr — No Vision LLM Hallucinations

2026-05-04T20:35:00.005+02:00

Sparrow now supports a dedicated table mode for extracting large, complex tables into structured JSON — without Vision LLM hallucinations.

Vision LLMs struggle with dense tabular data: they hallucinate values, misalign rows, and lose precision at scale. Sparrow's table mode solves this by using dots.ocr to capture the full table structure as HTML, then applying a generic Sparrow template to convert that HTML into clean, structured JSON.

MoE vs Dense Models for Structured Data Extraction — Who Wins?

2026-04-27T12:47:00.004+02:00

MoE or Dense — which model architecture wins for structured data extraction from documents? It depends on document complexity. In this video, I test MoE vs Dense models on real extraction tasks and show that MoE struggles with larger tables, while Dense models handle them reliably. In my benchmark, Qwen 3.6 Dense comes out on top.

Gemma 4 for Structured Data Extraction: Can It Beat Qwen 3.5?

2026-04-21T09:36:00.002+02:00

In this video, I put Gemma 4 to the test on a real-world task — extracting structured data from bank statements — and benchmark it head-to-head against Mistral's Ministral and Qwen 3.5.

I run both the MoE and Dense variants of Gemma 4 to see how architecture affects accuracy on financial documents, then compare the results side-by-side.

My takeaway: Gemma 4 holds its own and performs on par with Qwen 3.5 — a strong result for local structured extraction workflows.

Running Multiple Models on One GPU with vLLM and GPU Memory Utilization

2026-04-02T20:46:00.009+02:00

In this video I show how to run multiple vLLM model instances on the same GPU (Nvidia) in parallel by adjusting the --gpu-memory-utilization flag.

You'll see:

- How to launch separate vLLM servers for different models

- How to split GPU memory between them without running out of VRAM

This approach works when you want to serve several smaller models concurrently on limited hardware.

How to Cache vLLM Model in FastAPI for Faster Inference

2026-03-24T20:50:00.004+01:00

I show you how to keep your vLLM model loaded in FastAPI cache for much faster inference — without reloading it on every request.

Qwen 3.5 Test for JSON Structured Data Extraction

2026-03-16T11:27:00.002+01:00

Quick test of the new Qwen 3.5 models on JSON structured data extraction from images. Testing and comparing results for 9B FP16, 27B Q8, and A3B 35B Q8. The 35B Q8 model wins in terms of both speed and accuracy. Test was run on MLX-VLM using a Mac Mini M4 Pro with 64GB RAM

Fast Large Table Extraction: Sparrow + dots.ocr to JSON

2026-03-12T13:56:00.005+01:00

Sparrow provides table processing mode. It is optimized to handle large tables, it comes with separate template script (new templates can be easily added) to process dots.ocr markdown output into structure JSON with field mapping.

Local OCR Comparison: dots.ocr More Accurate, DeepSeek-OCR 2 Faster (Sparrow + MLX)

2026-03-04T07:33:00.005+01:00

I run local tests with Sparrow to compare DeepSeek OCR2 and dots.ocr (by RedNote), both run on MLX-VLM in FP16 precision. Dots.ocr consistently beats DeepSeek OCR2 in accuracy, but DeepSeek OCR2 deliveres much better inference performance.

GLM-OCR vs DeepSeek OCR 2: Which One Wins at Markdown Extraction?

2026-02-16T20:06:00.003+01:00

I compare two OCR models using real test cases: GLM OCR and DeepSeek OCR2. Both are evaluated on their ability to extract document content and convert it into well-structured Markdown. I demonstrate which model performs better and which one is faster.

Get Vision LLMs to Follow Your Rules: Prompt-Guided JSON Formatting

2026-02-09T10:06:00.004+01:00

JSON query helps to fetch structured output with Vision LLM and extract document data. I describe how to improve such output with additional rules provided through LLM prompt. In this video I share example of number formatting, based on applied rule LLM will output values in requested format.

Vision LLM Output Control for Better OCR with Prompt Hints

2026-01-27T08:10:00.000+01:00

I explain my approach to enforce better OCR output from vision LLMs with prompt hints. This allows to set rules for output data validation and formatting.

DeepSeek OCR Markdown Processing in Sparrow for Large Tables

2026-01-22T08:49:00.003+01:00

I describe new functionality in Sparrow, where DeepSeek OCR is used to extract text data in markdown format and in the next step instruction LLM inference is utilized to convert data into structured JSON format. This approach helps to improve large table processing and avoid vision LLM hallucinations.

DeepSeek OCR Review

2025-12-27T21:32:00.004+01:00

I'm testing structured data extraction with DeepSeek OCR. It works well and gives good data accuracy and performance to disrupt traditional cloud based document processing solutions.

New Ministral 3 14B vs Mistral Small 3.2 24B Review

2025-12-15T09:00:00.001+01:00

I review data accuracy retrieval and inference speed for the new Ministral 3 14B model vs older Mistral Small 3.2 24B. Older and larger 24B model wins this time.

Structured Data Retrieval with Sparrow using OCR and Vision LLM [Improved Accuracy]

2025-12-03T08:54:00.004+01:00

I explain improvements I'm adding into Sparrow to achieve better accuracy for structured data. I'm using a method, where I run OCR step first, then construct advanced prompt with injected OCR data. This prompt is sent along with image to Vision LLM for structured data retrieval. All this happens as part of a single pipeline.

Ollama and MLX-VLM Accuracy Review (Qwen3-VL and Mistral Small 3.2)

2025-11-26T09:47:00.000+01:00

I was running detail tests to compare accuracy for the same models (Qwen3-VL and Mistral Small 3.2) running on Ollama and MLX-VLM (recent 0.3.7 version). MLX-VLM runs faster, but with lower accuracy. The same is valid across different models.

Comparing Qwen3-VL AI Models for OCR Task

2025-11-11T09:08:00.006+01:00

I'm comparing the Qwen3-VL 8B BF16 and Qwen3-VL 30B Q8 models for OCR and structured data extraction tasks. Based on my findings, the quantized 30B model runs faster and with better accuracy than the 8B BF16 model, despite using more memory.

Qwen3-VL Accuracy Differences on Ollama vs MLX

2025-11-04T08:47:00.000+01:00

I run couple of tests with structured data extraction using newest Qwen3-VL model on Mac Mini M4 Pro with 64GB. I discovered the same Qwen3-VL model with the same level of quantantization performs differently on Ollama vs. MLX. It seems model conversion step is crucial and we must evaluate model performance on different platforms before going to production.

Qwen3-VL New Models Comparison and Performance on Mac Mini M4

2025-10-21T08:39:00.007+02:00

I run and compare newest Qwen3-VL models in Sparrow. Qwen3-VL models run fast and provide good accuracy.

Ollama Support in Sparrow and Update to Latest MLX

2025-10-10T12:33:00.006+02:00

I explain whats new in Sparrow and what was updated in the recent version.

Ollama vs MLX Inference Speed on Mac Mini M4 Pro 64GB

2025-09-16T08:34:00.002+02:00

MLX runs faster on first inference, but thanks to model caching or other optimizations by Ollama, second and next inference runs faster on Ollama.

Advanced Structured Data Processing in Sparrow

2025-09-10T20:51:00.001+02:00

I added instruction and validation functionality into Sparrow. This allows to process business logic with document data directly through Sparrow query. For example, it allows to check if given fields are present in the document.

My Experience with PyCharm AI Assistant

2025-09-01T14:38:00.003+02:00

Explaining my experience with PyCharm AI Assistant. Showing example how code changes can be reviewed one by one, before they are accepted into your codebase.