Andrej Baranovskij Blog

Monday, June 16, 2025

Boost Vision LLM Accuracy with OCR Text Integration

I show an interesting approach where I send both an image and OCR text to a Vision LLM. The prompt is constructed to instruct the Vision LLM to prioritize the OCR text. This allows the use of a Vision LLM for structured output construction while relying on external OCR text, giving you more control over the results.

Tuesday, June 10, 2025

Solving Vision LLM Number Formatting Issues Using PaddleOCR and Sparrow

Discover how to fix number formatting errors in vision LLMs like Mistral! In this video, I show how Mistral misreads "56,000" as "56000" and how combining PaddleOCR’s text extraction with Sparrow’s spatial data processing solves this hallucination issue.

Tuesday, June 3, 2025

PaddleOCR 3.0: Supercharge Your AI

I upgraded to PaddleOCR 3.0 and explain the new PaddleOCR API integration. My goal is to integrate OCR result output with Vision LLM processing to enhance large-scale, structured table data output.

Monday, May 26, 2025

Box Annotations in Sparrow for Structured Data Extraction

Check out my video on Box Annotations in Sparrow for Structured Data Extraction! I’ll show you how the Qwen2.5 vision model pulls bounding box annotations from images based on what you need. Plus, create simple descriptions and confidence score boxes.

Monday, May 19, 2025

Structured Data Annotation with Qwen2.5 VL and MLX-VLM

Qwen2.5 VL can provide bounding box coordinates and confidence values for extracted structured data. This is useful for visual data review and reporting. I will explain with a practical example what prompt should be used to ensure Qwen2.5 returns this data.

Tuesday, May 13, 2025

LLM Microservice with Instruction Calling

I describe the idea of implementing interaction with LLM through a concept of microservice with instruction calling. This works great for enterprise application use cases, such as data validation, workflor decisions.

Monday, May 5, 2025

Local LLM Instruction Processing with Sparrow

I explain how to execute instructions with a payload using a local LLM. This is useful when you want to process your data with an LLM and provide contextual instructions, specifying the desired outcome of what needs to be achieved.