Andrej Baranovskij Blog
Blog about Oracle, Full Stack, Machine Learning and Cloud
Monday, February 27, 2023
Document Data Extraction - Data Mapping for Donut Model Fine-Tuning Dataset (Document AI)
›
I explain the current status of my work related to dataset preparation for ML Donut model fine-tuning. I plan to use this model to run data ...
Monday, February 20, 2023
Streamlit Button Group UI (Flowbite) Component
›
Streamlit doesn't provide an option to display multiple buttons side-by-side horizontally. I explain how to achieve this functionality u...
Monday, February 13, 2023
Preparing Dataset for Donut Fine-Tuning (part 3, Document AI)
›
In this episode, I explain redesigned Sparrow UI for data annotation. Sparrow UI is improved with Streamlit Grid component (aggrid). I show ...
Monday, February 6, 2023
Preparing Dataset for Donut Fine-Tuning (part 2, Document AI)
›
I explain how to group OCR results into a single entity using Sparrow annotation tool. This is useful for such fields as an address, item de...
Tuesday, January 31, 2023
Preparing Dataset for Donut Fine-Tuning (part 1, Document AI)
›
I explain the dataset I will be using to fine-tune Donut model. I show how PDFs are converted to image files for further processing and OCR ...
Monday, January 23, 2023
How To Fine-tune Donut Model
›
Donut is an awesome Document AI model to extract data from docs. I share my experiences in fine-tuning the model, with CORD dataset, based o...
Monday, January 16, 2023
Donut 🍩 - ChatGPT for Document AI
›
Donut - OCR-free Document Understanding Transformer. This ML model can process documents (images, scans) and return JSON structured info ab...
‹
›
Home
View web version