Monday, February 27, 2023

Document Data Extraction - Data Mapping for Donut Model Fine-Tuning Dataset (Document AI)

I explain the current status of my work related to dataset preparation for ML Donut model fine-tuning. I plan to use this model to run data extraction tasks from invoice documents. I share hints about data mapping and how to structure data to achieve better fine-tuning results.

 

Monday, February 20, 2023

Streamlit Button Group UI (Flowbite) Component

Streamlit doesn't provide an option to display multiple buttons side-by-side horizontally. I explain how to achieve this functionality using a custom Streamlit component and Flowbite button group UI.

 

Monday, February 13, 2023

Preparing Dataset for Donut Fine-Tuning (part 3, Document AI)

In this episode, I explain redesigned Sparrow UI for data annotation. Sparrow UI is improved with Streamlit Grid component (aggrid). I show how to group related fields generated by OCR into a single entity and map it with the label. I will briefly review the code and discuss how you can set up a grid component in Streamlit - a convenient and helpful UI element.

 

Monday, February 6, 2023

Preparing Dataset for Donut Fine-Tuning (part 2, Document AI)

I explain how to group OCR results into a single entity using Sparrow annotation tool. This is useful for such fields as an address, item description - when field text is based on multiple words.