Monday, June 16, 2025
Boost Vision LLM Accuracy with OCR Text Integration
I show an interesting approach where I send both an image and OCR text to a Vision LLM. The prompt is constructed to instruct the Vision LLM to prioritize the OCR text. This allows the use of a Vision LLM for structured output construction while relying on external OCR text, giving you more control over the results.
Tuesday, June 10, 2025
Solving Vision LLM Number Formatting Issues Using PaddleOCR and Sparrow
Discover how to fix number formatting errors in vision LLMs like Mistral! In this video, I show how Mistral misreads "56,000" as "56000" and how combining PaddleOCR’s text extraction with Sparrow’s spatial data processing solves this hallucination issue.
Tuesday, June 3, 2025
PaddleOCR 3.0: Supercharge Your AI
I upgraded to PaddleOCR 3.0 and explain the new PaddleOCR API integration. My goal is to integrate OCR result output with Vision LLM processing to enhance large-scale, structured table data output.
Monday, May 26, 2025
Box Annotations in Sparrow for Structured Data Extraction
Check out my video on Box Annotations in Sparrow for Structured Data Extraction! I’ll show you how the Qwen2.5 vision model pulls bounding box annotations from images based on what you need. Plus, create simple descriptions and confidence score boxes.
Labels:
OCR,
Python,
Structured Data
Monday, May 19, 2025
Structured Data Annotation with Qwen2.5 VL and MLX-VLM
Qwen2.5 VL can provide bounding box coordinates and confidence values for extracted structured data. This is useful for visual data review and reporting. I will explain with a practical example what prompt should be used to ensure Qwen2.5 returns this data.
Tuesday, May 13, 2025
LLM Microservice with Instruction Calling
I describe the idea of implementing interaction with LLM through a concept of microservice with instruction calling. This works great for enterprise application use cases, such as data validation, workflor decisions.
Labels:
LLM,
Microservices,
Python
Monday, May 5, 2025
Local LLM Instruction Processing with Sparrow
I explain how to execute instructions with a payload using a local LLM. This is useful when you want to process your data with an LLM and provide contextual instructions, specifying the desired outcome of what needs to be achieved.
Monday, April 28, 2025
Vision LLM on Mac Mini M4 Pro: Real-World MLX Performance
I discuss the real-world MLX performance of Sparrow for structured data extraction with public access. The current Sparrow online instance runs on a Mac Mini M4 Pro with 64GB of memory. On average, it processes one page in 100 seconds. I explain why tokens-per-second measurements can be misleading when evaluating structured data extraction.
Tuesday, April 22, 2025
Running Vision Models on Apple Silicon with MLX-VLM
I show and explain how to run Qwen and Mistral vision models on Apple Silicon with MLX-VLM. I share technical tips about how to run both models and show how to pass query prompt.
Tuesday, April 15, 2025
Dashboard with Gradio Python
This video showcases the Sparrow dashboard, where you can view statistics on document data extraction events processed by Sparrow. This elegant dashboard is built with Python using Gradio, a server-side web UI framework.
Monday, March 31, 2025
Extract Structured Data from Documents with Sparrow (Free Tier Available)
I built Sparrow for document data extraction 🚀
It's fully open-source and runs locally on your machine
You can extract structured data from any document using powerful Mistral 24B 8bit and Qwen 2.5 72B 4bit models
It's free to try with no registration (3 calls per 6 hours, max 3-page documents) and doesn't send your documents to third parties
Tuesday, March 25, 2025
Oracle DB 23ai Free Connection Pool in Python
I describe how to connect to Oracle DB from Python. I explain why DB connection pool is important for better performance. Connection is done through thin oracledb mode, without installing Oracle Client.
Monday, March 17, 2025
Temporary Files Cleaner for Gradio Web App
Learn how to implement an automatic temporary file cleanup solution for Gradio web applications. This tutorial shows you how to prevent disk space issues by periodically removing old upload files and folders that Gradio leaves behind. Perfect for developers who deploy Gradio apps in production environments or run memory-intensive applications.
Wednesday, March 12, 2025
Building AI Agent for Local Structured JSON Output
I explain key steps of building AI agent to process document and extract structured JSON data locally. I'm running it with Sparrow and using Qwen VL model for vision processing backend and OCR. The steps are explained with Sparrow code walkthrough.
Monday, March 3, 2025
Querying Non Existing Fields with Qwen2.5 Vision LLM
I describe how Sparrow helps to query non existing fields with Qwen2.5 Vision LLM. Running it locally with MLX and MLX-VLM.
Monday, February 10, 2025
Structured Data Extraction with Sparrow Agent: Vision LLM & Prefect in Action
Discover how to streamline your data extraction process with Sparrow Agent! In this tutorial, I showcase how Sparrow Agent leverages Vision LLM to intelligently handle complex data tasks, while Prefect ensures every step is logged and monitored for maximum transparency and efficiency. Join me as I break down the process and share tips for optimizing your automated workflows.
Tuesday, February 4, 2025
Building Web UI Apps with Python Gradio – A Java Developer’s Perspective
I explain building Web UI apps with Python Gradio framework. I used to work with Java in the past and was building enterprise Web UI apps with JSF. Based on this experience I can tell, Gradio is awesome framework for server side generated UI - it is easy to define UI components and control UI flow with event triggers.
Tuesday, January 28, 2025
Improving Qwen-VL Structured Output with Image Cropping
Explaining how I'm improving structured output results from Qwen-VL with image cropping in Sparrow.
Monday, January 20, 2025
Apple MLX Vision LLM Server with Ngrok, FastAPI and Sparrow
I show how I run Apple MLX backend on my local Mac Mini M4 Pro 64GB and access it from the Web through Ngrok, with automatically provisioned HTTPS certificate.
Tuesday, January 14, 2025
Vision LLM Structured Output with Sparrow
I show how Sparrow UI Shell works with both image and PDF docs to process and extract structured data with Vision LLM (Qwen2) in the MLX backend.
Subscribe to:
Posts (Atom)