I show an interesting approach where I send both an image and OCR text to a Vision LLM. The prompt is constructed to instruct the Vision LLM to prioritize the OCR text. This allows the use of a Vision LLM for structured output construction while relying on external OCR text, giving you more control over the results.
No comments:
Post a Comment