Monday, March 27, 2023

Donut ML Model Fine-Tuning with Hugging Face API

I explain how Donut ML model can be fine-tuned on your own dataset by following different approaches. Either with PyTorch Lighting or Hugging Face Trainer API. I explain the pros and cons of both and what works best for me.

 

Tuesday, March 21, 2023

How I'm Using ChatGPT/GPT-4 as a Solo Python Developer

I'm working as a solo Python developer and using ChatGPT to speed up the development process. In this video, I explain how ChatGPT is helping me with various tasks, from code explanation to suggesting solutions.

 

Sunday, March 12, 2023

Hugging Face Dataset for Donut Model Fine-Tuning (Document AI)

Hugging Face Dataset is a very convenient way to store and share data for ML model fine-tuning. In this post, I share my experience creating a dataset for fine-tuning the Donut model. I made a set of scripts to generate the dataset, push it to the Hub and test it locally.

 

Monday, March 6, 2023

Improve OCR Results with Sparrow (running on Streamlit/Python and Ngrok)

OCR can often generate results in a different order. But to produce a dataset for data extraction ML model fine-tuning (for example - Donut), fields in all documents must be ordered correctly. Our solution (open-source), Sparrow, for data annotation/labeling includes functionality for OCRed field reordering. In this video, I explain and show how it works.