Tuesday, January 31, 2023

Preparing Dataset for Donut Fine-Tuning (part 1, Document AI)

I explain the dataset I will be using to fine-tune Donut model. I show how PDFs are converted to image files for further processing and OCR data extraction. In the next step, JSON data is converted to the format understandable by Sparrow annotation processing/review tool.


No comments: