Sunday, June 23, 2024

Sparrow Parse API for PDF Invoice Data Extraction

I explain how Sparrow Parse API is integrated into Sparrow for data extraction from PDF documents, such as invoices, receipts, etc. 

 

Monday, June 17, 2024

Avoid LLM Hallucinations: Use Sparrow Parse for Tabular PDF Data, Instructor LLM for Forms

LLMs tend to hallucinate and produce incorrect results for table data extraction. For this reason in Sparrow we are using Instructor structured output for LLM to query form data and Sparrow Parse to process tabular data within the same document in combined approach. 

 

Monday, June 10, 2024

Effective Table Data Extraction from PDF without LLM

Sparrow Parse helps to read tabular data from PDFs, relying on various libraries, such as Unstructured or PyMuPDF4LLM. This allows us to avoid data hallucination errors often produced by LLMs when processing complex data structures. 

 

Monday, June 3, 2024

Instructor and Ollama for Invoice Data Extraction in Sparrow [LLM, JSON]

Structured output from invoice document, running local LLM. This works well with Instructor and Ollama.