Monday, May 27, 2024
Hybrid RAG with Sparrow Parse
To process complex layout docs and improve data retrieval from invoices or bank statements, we are implementing Sparrow Parse. It works in combination with LLM for form data processing. Table data is converted either into HTML or Markdown formats and extracted directly by Sparrow Parse. I explain Hybrid RAG idea in this video.
Monday, May 20, 2024
Sparrow Parse - Data Processing for LLM
Data processing in LLM RAG is very important, it helps to improve data extraction results, especially for complex layout documents, with large tables. This is why I build open source Sparrow Parse library, it helps to balance between LLM and standard Python data extraction methods.
Monday, May 13, 2024
Invoice Data Preprocessing for LLM
Data preprocessing is important step for LLM pipeline. I show various approaches to preprocess invoice data, before feeding it to LLM. This is quite challenging step, especially to preprocess tables.
Monday, May 6, 2024
You Don't Need RAG to Extract Invoice Data
Documents like invoices or receipts can be processed by LLM directly, without RAG. I explain how you can do this locally with Ollama and Instructor. Thanks to Instructor, structured output from LLM can be validated with your own Pydantic class.