Andrej Baranovskij Blog

Blog about Oracle, Full Stack, Machine Learning and Cloud

Monday, February 27, 2023

Document Data Extraction - Data Mapping for Donut Model Fine-Tuning Dataset (Document AI)

›
I explain the current status of my work related to dataset preparation for ML Donut model fine-tuning. I plan to use this model to run data ...
Monday, February 20, 2023

Streamlit Button Group UI (Flowbite) Component

›
Streamlit doesn't provide an option to display multiple buttons side-by-side horizontally. I explain how to achieve this functionality u...
Monday, February 13, 2023

Preparing Dataset for Donut Fine-Tuning (part 3, Document AI)

›
In this episode, I explain redesigned Sparrow UI for data annotation. Sparrow UI is improved with Streamlit Grid component (aggrid). I show ...
Monday, February 6, 2023

Preparing Dataset for Donut Fine-Tuning (part 2, Document AI)

›
I explain how to group OCR results into a single entity using Sparrow annotation tool. This is useful for such fields as an address, item de...
Tuesday, January 31, 2023

Preparing Dataset for Donut Fine-Tuning (part 1, Document AI)

›
I explain the dataset I will be using to fine-tune Donut model. I show how PDFs are converted to image files for further processing and OCR ...
Monday, January 23, 2023

How To Fine-tune Donut Model

›
Donut is an awesome Document AI model to extract data from docs. I share my experiences in fine-tuning the model, with CORD dataset, based o...
Monday, January 16, 2023

Donut 🍩 - ChatGPT for Document AI

›
Donut - OCR-free Document Understanding Transformer. This ML model can process documents (images, scans) and return JSON structured info ab...
‹
›
Home
View web version

About Me

My photo
Andrej Baranovskij
Vilnius, Lithuania
I'm Oracle ACE Director, Oracle Groundbreaker Ambassador, CEO and Technical Expert at Red Samurai Consulting with focus on Oracle Fusion Middleware and Oracle Cloud technologies.
View my complete profile
Powered by Blogger.