Andrej Baranovskij Blog

Blog about Oracle, Full Stack, Machine Learning and Cloud

Sunday, March 27, 2022

Hugging Face LayoutLMv2 Model True Inference

I explain why OCR quality matters for Hugging Face LayoutLMv2 model performance, related to document data classification. If input from OCR is poor, ML classification inference results will be low quality too. This is why it is important to use high quality OCR system to extract text and coordinates from the document, before applying ML solution.

Posted by Andrej Baranovskij at 9:33 PM 0 comments

Email This BlogThis!Share to X Share to Facebook Share to Pinterest

Labels: Hugging Face, Machine Learning, Python

Sunday, March 20, 2022

Get Receipt Data with Hugging Face ML Model

This tutorial is about how to use fine-tuned Hugging Face model to extract data from scanned receipt documents. We are executing inference action - passing receipt image, along with words and coordinates to the model. As a result, we get back predictions - class labels assigned to each input. This helps to classify document elements and extract correct data. I share a hint on how to match input words with classified labels. Input words and coordinates are expected to be retrieved from separate OCR.

Posted by Andrej Baranovskij at 4:29 PM 0 comments

Email This BlogThis!Share to X Share to Facebook Share to Pinterest

Labels: Hugging Face, Machine Learning, Python

Sunday, March 13, 2022

Fine-Tuning with Hugging Face Trainer

In this tutorial, I explain how I was using Hugging Face Trainer with PyTorch to fine-tune LayoutLMv2 model for data extraction from the documents (based on CORD dataset with receipts). The advantage of Hugging Face Trainer - it simplifies model fine-tuning pipeline and you can easily upload the model to Hugging Face model hub.

Posted by Andrej Baranovskij at 10:37 PM 0 comments

Email This BlogThis!Share to X Share to Facebook Share to Pinterest

Labels: Hugging Face, Machine Learning, Python

Sunday, March 6, 2022

Hugging Face Datasets - Example with Receipts Data

Hugging Face Datasets library provides a useful API to work with data for ML model fine tuning. It allows you to load and process any external datasets with your own Python functions. As a result, you will get a unified data interface and could reuse the same API for fine-tuning various Hugging Face models.

Posted by Andrej Baranovskij at 8:39 PM 0 comments

Email This BlogThis!Share to X Share to Facebook Share to Pinterest

Labels: Hugging Face, Machine Learning, Python

Newer Posts Older Posts Home

Subscribe to: Posts (Atom)

About Red Samurai

Oracle PaaS Partner Community Award for Outstanding Visual Builder Cloud Service Contribution 2019

Oracle Fusion Middleware Partner Community Award for Outstanding ACM/BPM Contribution 2015

Oracle Fusion Middleware Innovation Award Winner 2010

SOA Partner Community Award for Outstanding Contribution Across the World 2010

2010 Enterprise 2.0 Blazer: Enterprise 2.0 Leader Award

About Me

Andrej Baranovskij: Vilnius, Lithuania; I'm Oracle ACE Director, Oracle Groundbreaker Ambassador, CEO and Technical Expert at Red Samurai Consulting with focus on Oracle Fusion Middleware and Oracle Cloud technologies.

View my complete profile

Search This Blog

Blog Archive

► 2025 (23)
- ► July (1)
- ► June (5)
- ► May (4)
- ► April (3)
- ► March (5)
- ► February (2)
- ► January (3)

► 2024 (42)
- ► December (4)
- ► November (4)
- ► October (3)
- ► September (3)
- ► August (2)
- ► July (3)
- ► June (4)
- ► May (4)
- ► April (3)
- ► March (5)
- ► February (3)
- ► January (4)

► 2023 (43)
- ► December (3)
- ► November (4)
- ► October (4)
- ► September (3)
- ► August (3)
- ► July (3)
- ► June (4)
- ► May (4)
- ► April (3)
- ► March (4)
- ► February (4)
- ► January (4)

▼ 2022 (47)
- ► December (3)
- ► November (4)
- ► October (5)
- ► September (4)
- ► August (3)
- ► July (4)
- ► June (4)
- ► May (5)
- ► April (3)
- ▼ March (4)
- ► February (4)
- ► January (4)

► 2021 (45)
- ► December (4)
- ► November (5)
- ► October (4)
- ► September (3)
- ► August (5)
- ► July (3)
- ► June (3)
- ► May (5)
- ► April (4)
- ► March (3)
- ► February (4)
- ► January (2)

► 2020 (7)
- ► December (2)
- ► October (1)
- ► June (1)
- ► March (1)
- ► February (1)
- ► January (1)

► 2019 (36)
- ► December (1)
- ► November (1)
- ► October (3)
- ► September (3)
- ► August (3)
- ► July (4)
- ► June (4)
- ► May (1)
- ► April (3)
- ► March (2)
- ► February (6)
- ► January (5)

► 2018 (54)
- ► December (7)
- ► November (7)
- ► October (6)
- ► September (4)
- ► August (4)
- ► July (4)
- ► June (4)
- ► May (3)
- ► April (1)
- ► March (7)
- ► February (4)
- ► January (3)

► 2017 (60)
- ► December (5)
- ► November (7)
- ► October (2)
- ► September (4)
- ► August (3)
- ► July (7)
- ► June (5)
- ► May (5)
- ► April (6)
- ► March (7)
- ► February (4)
- ► January (5)

► 2016 (73)
- ► December (11)
- ► November (6)
- ► October (4)
- ► September (5)
- ► August (7)
- ► July (5)
- ► June (7)
- ► May (7)
- ► April (4)
- ► March (7)
- ► February (6)
- ► January (4)

► 2015 (69)
- ► December (8)
- ► November (5)
- ► October (7)
- ► September (7)
- ► August (3)
- ► July (7)
- ► June (4)
- ► May (6)
- ► April (6)
- ► March (6)
- ► February (5)
- ► January (5)

► 2014 (83)
- ► December (6)
- ► November (6)
- ► October (7)
- ► September (8)
- ► August (8)
- ► July (7)
- ► June (5)
- ► May (8)
- ► April (7)
- ► March (8)
- ► February (6)
- ► January (7)

► 2013 (100)
- ► December (10)
- ► November (6)
- ► October (7)
- ► September (6)
- ► August (7)
- ► July (9)
- ► June (10)
- ► May (10)
- ► April (9)
- ► March (13)
- ► February (6)
- ► January (7)

► 2012 (104)
- ► December (9)
- ► November (10)
- ► October (11)
- ► September (8)
- ► August (10)
- ► July (7)
- ► June (10)
- ► May (10)
- ► April (7)
- ► March (8)
- ► February (7)
- ► January (7)

► 2011 (106)
- ► December (8)
- ► November (8)
- ► October (8)
- ► September (7)
- ► August (8)
- ► July (7)
- ► June (10)
- ► May (9)
- ► April (9)
- ► March (12)
- ► February (9)
- ► January (11)

► 2010 (97)
- ► December (9)
- ► November (7)
- ► October (7)
- ► September (7)
- ► August (5)
- ► July (7)
- ► June (8)
- ► May (15)
- ► April (8)
- ► March (9)
- ► February (9)
- ► January (6)

► 2009 (95)
- ► December (15)
- ► November (14)
- ► October (14)
- ► September (14)
- ► August (5)
- ► July (9)
- ► June (5)
- ► May (2)
- ► April (5)
- ► March (4)
- ► February (4)
- ► January (4)

► 2008 (75)
- ► December (5)
- ► November (3)
- ► October (9)
- ► September (4)
- ► August (9)
- ► July (6)
- ► June (10)
- ► May (4)
- ► April (5)
- ► March (7)
- ► February (5)
- ► January (8)

► 2007 (65)
- ► December (6)
- ► November (7)
- ► October (6)
- ► September (4)
- ► August (3)
- ► July (4)
- ► June (6)
- ► May (5)
- ► April (6)
- ► March (4)
- ► February (5)
- ► January (9)