Andrej Baranovskij Blog
Blog about Oracle, Full Stack, Machine Learning and Cloud
Tuesday, January 31, 2023
Preparing Dataset for Donut Fine-Tuning (part 1, Document AI)
I explain the dataset I will be using to fine-tune Donut model. I show how PDFs are converted to image files for further processing and OCR data extraction. In the next step, JSON data is converted to the format understandable by Sparrow annotation processing/review tool.
Labels:
Machine Learning,
Python
Monday, January 23, 2023
How To Fine-tune Donut Model
Donut is an awesome Document AI model to extract data from docs. I share my experiences in fine-tuning the model, with CORD dataset, based on example from Transformers Tutorials.
Labels:
Donut,
Hugging Face,
Machine Learning
Monday, January 16, 2023
Donut 🍩 - ChatGPT for Document AI
Donut - OCR-free Document Understanding Transformer. This ML model can process documents (images, scans) and return JSON structured info about the content. It works for different use cases: form understanding, visual question answering about the document, document image classification.
Labels:
Donut,
Hugging Face,
Machine Learning
Thursday, January 5, 2023
Best Platform for Python Apps Deployment - Hugging Face Spaces with Docker
I walk through Hugging Face Spaces Docker SDK deployment option. I was using it to deploy our Streamlit/Python app Sparrow. So far very happy with Spaces Docker SDK - simple setup, very stable and good runtime performance, HTTPS out of the box, content compression out of the box too.
Labels:
Docker,
Hugging Face,
Python
Monday, December 19, 2022
File Upload/Download in Streamlit/Python
File upload/download is supported by Streamlit out of the box. There are a few hints to share about more effective file upload implementation. You will learn how to wrap the file upload widget with Streamlit form, use Submit button to confirm the upload and reinitialize the upload widget. Additionally, I will show you an example of how to download JSON file from the server with Streamlit download component.
Monday, December 12, 2022
Dependent UI Widgets in Streamlit/Python
This video explains how to implement dependent UI widgets refresh in Streamlit/Python, when the value changes. I'm using Streamlit Empty widget as a placeholder to update selectbox with a new entry, after the new file upload. Selectbox displays the list of uploaded files.
Sunday, December 4, 2022
Invoice Annotation with Sparrow/Python
I explain our Streamlit component for invoice/receipt document annotation and labeling. It can be used either to create new annotations or review and edit existing ones. With this component you can add new annotations directly on top of the document image. Existing annotations can be resized/moved and values/labels assigned.
This component is part of Sparrow - our open-source solution for data extraction from invoices/receipts with ML.
Labels:
Machine Learning,
Python
Subscribe to:
Posts (Atom)