Tuesday, January 27, 2026

Vision LLM Output Control for Better OCR with Prompt Hints

I explain my approach to enforce better OCR output from vision LLMs with prompt hints. This allows to set rules for output data validation and formatting.

 

Thursday, January 22, 2026

DeepSeek OCR Markdown Processing in Sparrow for Large Tables

I describe new functionality in Sparrow, where DeepSeek OCR is used to extract text data in markdown format and in the next step instruction LLM inference is utilized to convert data into structured JSON format. This approach helps to improve large table processing and avoid vision LLM hallucinations. 

 

Saturday, December 27, 2025

DeepSeek OCR Review

I'm testing structured data extraction with DeepSeek OCR. It works well and gives good data accuracy and performance to disrupt traditional cloud based document processing solutions.

 

Monday, December 15, 2025

New Ministral 3 14B vs Mistral Small 3.2 24B Review

I review data accuracy retrieval and inference speed for the new Ministral 3 14B model vs older Mistral Small 3.2 24B. Older and larger 24B model wins this time. 

 

Wednesday, December 3, 2025

Structured Data Retrieval with Sparrow using OCR and Vision LLM [Improved Accuracy]

I explain improvements I'm adding into Sparrow to achieve better accuracy for structured data. I'm using a method, where I run OCR step first, then construct advanced prompt with injected OCR data. This prompt is sent along with image to Vision LLM for structured data retrieval. All this happens as part of a single pipeline.

 

Wednesday, November 26, 2025

Ollama and MLX-VLM Accuracy Review (Qwen3-VL and Mistral Small 3.2)

I was running detail tests to compare accuracy for the same models (Qwen3-VL and Mistral Small 3.2) running on Ollama and MLX-VLM (recent 0.3.7 version). MLX-VLM runs faster, but with lower accuracy. The same is valid across different models.

 

Tuesday, November 11, 2025

Comparing Qwen3-VL AI Models for OCR Task

I'm comparing the Qwen3-VL 8B BF16 and Qwen3-VL 30B Q8 models for OCR and structured data extraction tasks. Based on my findings, the quantized 30B model runs faster and with better accuracy than the 8B BF16 model, despite using more memory.