Monday, June 16, 2025
Boost Vision LLM Accuracy with OCR Text Integration
I show an interesting approach where I send both an image and OCR text to a Vision LLM. The prompt is constructed to instruct the Vision LLM to prioritize the OCR text. This allows the use of a Vision LLM for structured output construction while relying on external OCR text, giving you more control over the results.
Tuesday, June 10, 2025
Solving Vision LLM Number Formatting Issues Using PaddleOCR and Sparrow
Discover how to fix number formatting errors in vision LLMs like Mistral! In this video, I show how Mistral misreads "56,000" as "56000" and how combining PaddleOCR’s text extraction with Sparrow’s spatial data processing solves this hallucination issue.
Tuesday, June 3, 2025
PaddleOCR 3.0: Supercharge Your AI
I upgraded to PaddleOCR 3.0 and explain the new PaddleOCR API integration. My goal is to integrate OCR result output with Vision LLM processing to enhance large-scale, structured table data output.
Subscribe to:
Posts (Atom)