Tuesday, January 31, 2023
Preparing Dataset for Donut Fine-Tuning (part 1, Document AI)
I explain the dataset I will be using to fine-tune Donut model. I show how PDFs are converted to image files for further processing and OCR data extraction. In the next step, JSON data is converted to the format understandable by Sparrow annotation processing/review tool.
Labels:
Machine Learning,
Python
Monday, January 23, 2023
How To Fine-tune Donut Model
Donut is an awesome Document AI model to extract data from docs. I share my experiences in fine-tuning the model, with CORD dataset, based on example from Transformers Tutorials.
Labels:
Donut,
Hugging Face,
Machine Learning
Monday, January 16, 2023
Donut 🍩 - ChatGPT for Document AI
Donut - OCR-free Document Understanding Transformer. This ML model can process documents (images, scans) and return JSON structured info about the content. It works for different use cases: form understanding, visual question answering about the document, document image classification.
Labels:
Donut,
Hugging Face,
Machine Learning
Thursday, January 5, 2023
Best Platform for Python Apps Deployment - Hugging Face Spaces with Docker
I walk through Hugging Face Spaces Docker SDK deployment option. I was using it to deploy our Streamlit/Python app Sparrow. So far very happy with Spaces Docker SDK - simple setup, very stable and good runtime performance, HTTPS out of the box, content compression out of the box too.
Labels:
Docker,
Hugging Face,
Python
Subscribe to:
Posts (Atom)