Andrej Baranovskij Blog
Blog about Oracle, Full Stack, Machine Learning and Cloud
Tuesday, March 21, 2023
How I'm Using ChatGPT/GPT-4 as a Solo Python Developer
I'm working as a solo Python developer and using ChatGPT to speed up the development process. In this video, I explain how ChatGPT is helping me with various tasks, from code explanation to suggesting solutions.
Sunday, March 12, 2023
Hugging Face Dataset for Donut Model Fine-Tuning (Document AI)
Hugging Face Dataset is a very convenient way to store and share data for ML model fine-tuning. In this post, I share my experience creating a dataset for fine-tuning the Donut model. I made a set of scripts to generate the dataset, push it to the Hub and test it locally.
Labels:
Hugging Face,
Python,
Sparrow
Monday, March 6, 2023
Improve OCR Results with Sparrow (running on Streamlit/Python and Ngrok)
OCR can often generate results in a different order. But to produce a dataset for data extraction ML model fine-tuning (for example - Donut), fields in all documents must be ordered correctly. Our solution (open-source), Sparrow, for data annotation/labeling includes functionality for OCRed field reordering. In this video, I explain and show how it works.
Monday, February 27, 2023
Document Data Extraction - Data Mapping for Donut Model Fine-Tuning Dataset (Document AI)
I explain the current status of my work related to dataset preparation for ML Donut model fine-tuning. I plan to use this model to run data extraction tasks from invoice documents. I share hints about data mapping and how to structure data to achieve better fine-tuning results.
Labels:
Donut,
Machine Learning,
Python
Monday, February 20, 2023
Streamlit Button Group UI (Flowbite) Component
Streamlit doesn't provide an option to display multiple buttons side-by-side horizontally. I explain how to achieve this functionality using a custom Streamlit component and Flowbite button group UI.
Monday, February 13, 2023
Preparing Dataset for Donut Fine-Tuning (part 3, Document AI)
In this episode, I explain redesigned Sparrow UI for data annotation. Sparrow UI is improved with Streamlit Grid component (aggrid). I show how to group related fields generated by OCR into a single entity and map it with the label. I will briefly review the code and discuss how you can set up a grid component in Streamlit - a convenient and helpful UI element.
Labels:
Donut,
Machine Learning,
Sparrow
Monday, February 6, 2023
Preparing Dataset for Donut Fine-Tuning (part 2, Document AI)
I explain how to group OCR results into a single entity using Sparrow annotation tool. This is useful for such fields as an address, item description - when field text is based on multiple words.
Labels:
Donut,
Machine Learning,
Python
Subscribe to:
Posts (Atom)