Andrej Baranovskij Blog

Blog about Oracle, Full Stack, Machine Learning and Cloud

Monday, March 27, 2023

Donut ML Model Fine-Tuning with Hugging Face API

I explain how Donut ML model can be fine-tuned on your own dataset by following different approaches. Either with PyTorch Lighting or Hugging Face Trainer API. I explain the pros and cons of both and what works best for me.

Posted by Andrej Baranovskij at 9:50 PM 0 comments

Email This BlogThis!Share to X Share to Facebook Share to Pinterest

Labels: Donut, Hugging Face, Python

Tuesday, March 21, 2023

How I'm Using ChatGPT/GPT-4 as a Solo Python Developer

I'm working as a solo Python developer and using ChatGPT to speed up the development process. In this video, I explain how ChatGPT is helping me with various tasks, from code explanation to suggesting solutions.

Posted by Andrej Baranovskij at 9:40 AM 0 comments

Email This BlogThis!Share to X Share to Facebook Share to Pinterest

Labels: ChatGPT, GPT-4, Python

Sunday, March 12, 2023

Hugging Face Dataset for Donut Model Fine-Tuning (Document AI)

Hugging Face Dataset is a very convenient way to store and share data for ML model fine-tuning. In this post, I share my experience creating a dataset for fine-tuning the Donut model. I made a set of scripts to generate the dataset, push it to the Hub and test it locally.

Posted by Andrej Baranovskij at 10:24 PM 0 comments

Email This BlogThis!Share to X Share to Facebook Share to Pinterest

Labels: Hugging Face, Python, Sparrow

Monday, March 6, 2023

Improve OCR Results with Sparrow (running on Streamlit/Python and Ngrok)

OCR can often generate results in a different order. But to produce a dataset for data extraction ML model fine-tuning (for example - Donut), fields in all documents must be ordered correctly. Our solution (open-source), Sparrow, for data annotation/labeling includes functionality for OCRed field reordering. In this video, I explain and show how it works.

Posted by Andrej Baranovskij at 9:19 AM 0 comments

Email This BlogThis!Share to X Share to Facebook Share to Pinterest

Labels: OCR, Python, Sparrow

Newer Posts Older Posts Home

Subscribe to: Posts (Atom)

About Red Samurai

Oracle PaaS Partner Community Award for Outstanding Visual Builder Cloud Service Contribution 2019

Oracle Fusion Middleware Partner Community Award for Outstanding ACM/BPM Contribution 2015

Oracle Fusion Middleware Innovation Award Winner 2010

SOA Partner Community Award for Outstanding Contribution Across the World 2010

2010 Enterprise 2.0 Blazer: Enterprise 2.0 Leader Award

About Me

Andrej Baranovskij: Vilnius, Lithuania; I'm Oracle ACE Director, Oracle Groundbreaker Ambassador, CEO and Technical Expert at Red Samurai Consulting with focus on Oracle Fusion Middleware and Oracle Cloud technologies.

View my complete profile

Search This Blog

Blog Archive

► 2025 (24)
- ► July (2)
- ► June (5)
- ► May (4)
- ► April (3)
- ► March (5)
- ► February (2)
- ► January (3)

► 2024 (42)
- ► December (4)
- ► November (4)
- ► October (3)
- ► September (3)
- ► August (2)
- ► July (3)
- ► June (4)
- ► May (4)
- ► April (3)
- ► March (5)
- ► February (3)
- ► January (4)

▼ 2023 (43)
- ► December (3)
- ► November (4)
- ► October (4)
- ► September (3)
- ► August (3)
- ► July (3)
- ► June (4)
- ► May (4)
- ► April (3)
- ▼ March (4)
- ► February (4)
- ► January (4)

► 2022 (47)
- ► December (3)
- ► November (4)
- ► October (5)
- ► September (4)
- ► August (3)
- ► July (4)
- ► June (4)
- ► May (5)
- ► April (3)
- ► March (4)
- ► February (4)
- ► January (4)

► 2021 (45)
- ► December (4)
- ► November (5)
- ► October (4)
- ► September (3)
- ► August (5)
- ► July (3)
- ► June (3)
- ► May (5)
- ► April (4)
- ► March (3)
- ► February (4)
- ► January (2)

► 2020 (7)
- ► December (2)
- ► October (1)
- ► June (1)
- ► March (1)
- ► February (1)
- ► January (1)

► 2019 (36)
- ► December (1)
- ► November (1)
- ► October (3)
- ► September (3)
- ► August (3)
- ► July (4)
- ► June (4)
- ► May (1)
- ► April (3)
- ► March (2)
- ► February (6)
- ► January (5)

► 2018 (54)
- ► December (7)
- ► November (7)
- ► October (6)
- ► September (4)
- ► August (4)
- ► July (4)
- ► June (4)
- ► May (3)
- ► April (1)
- ► March (7)
- ► February (4)
- ► January (3)

► 2017 (60)
- ► December (5)
- ► November (7)
- ► October (2)
- ► September (4)
- ► August (3)
- ► July (7)
- ► June (5)
- ► May (5)
- ► April (6)
- ► March (7)
- ► February (4)
- ► January (5)

► 2016 (73)
- ► December (11)
- ► November (6)
- ► October (4)
- ► September (5)
- ► August (7)
- ► July (5)
- ► June (7)
- ► May (7)
- ► April (4)
- ► March (7)
- ► February (6)
- ► January (4)

► 2015 (69)
- ► December (8)
- ► November (5)
- ► October (7)
- ► September (7)
- ► August (3)
- ► July (7)
- ► June (4)
- ► May (6)
- ► April (6)
- ► March (6)
- ► February (5)
- ► January (5)

► 2014 (83)
- ► December (6)
- ► November (6)
- ► October (7)
- ► September (8)
- ► August (8)
- ► July (7)
- ► June (5)
- ► May (8)
- ► April (7)
- ► March (8)
- ► February (6)
- ► January (7)

► 2013 (100)
- ► December (10)
- ► November (6)
- ► October (7)
- ► September (6)
- ► August (7)
- ► July (9)
- ► June (10)
- ► May (10)
- ► April (9)
- ► March (13)
- ► February (6)
- ► January (7)

► 2012 (104)
- ► December (9)
- ► November (10)
- ► October (11)
- ► September (8)
- ► August (10)
- ► July (7)
- ► June (10)
- ► May (10)
- ► April (7)
- ► March (8)
- ► February (7)
- ► January (7)

► 2011 (106)
- ► December (8)
- ► November (8)
- ► October (8)
- ► September (7)
- ► August (8)
- ► July (7)
- ► June (10)
- ► May (9)
- ► April (9)
- ► March (12)
- ► February (9)
- ► January (11)

► 2010 (97)
- ► December (9)
- ► November (7)
- ► October (7)
- ► September (7)
- ► August (5)
- ► July (7)
- ► June (8)
- ► May (15)
- ► April (8)
- ► March (9)
- ► February (9)
- ► January (6)

► 2009 (95)
- ► December (15)
- ► November (14)
- ► October (14)
- ► September (14)
- ► August (5)
- ► July (9)
- ► June (5)
- ► May (2)
- ► April (5)
- ► March (4)
- ► February (4)
- ► January (4)

► 2008 (75)
- ► December (5)
- ► November (3)
- ► October (9)
- ► September (4)
- ► August (9)
- ► July (6)
- ► June (10)
- ► May (4)
- ► April (5)
- ► March (7)
- ► February (5)
- ► January (8)

► 2007 (65)
- ► December (6)
- ► November (7)
- ► October (6)
- ► September (4)
- ► August (3)
- ► July (4)
- ► June (6)
- ► May (5)
- ► April (6)
- ► March (4)
- ► February (5)
- ► January (9)

► 2006 (9)
- ► December (9)

Blog Topics

12.2.1 (8) 12.2.1.1 (8) 12.2.1.2 (10) 12.2.1.3 (2) ACM (3) Activation (3) ADF (731) ADF BC (174) ADF Code Corner (1) ADF Essentials (1) ADF Query (21) ADF Task Flow (29) ADF UI (45) ADS (2) AIA (1) Alta UI (27) Amazon (2) Android (2) Apex (1) API (5) Apple (4) Application Module (1) Architecture (5) Automation (2) Awards (1) AWS (1) Bad Practices (4) Best Practices (3) Bindings (4) Blockchain (1) BPEL (5) BPM (12) BPM 11g (29) BPM 12c (5) Bug (1) Bugs (47) Build (2) Business Automation (2) Business Groups (1) Cache (3) Case Study (1) Celery (3) Chatbot (8) ChatGPT (9) Checkbox (2) Cloud (45) Coherence (5) Community (1) Composite (2) Computer Science (1) Contextual Events (3) Copilot (1) CRUD (18) Crypto (1) Data (4) Data Science (2) Data Source (2) Database (5) DB View (1) Declarative (5) Declarative Mode (1) Deployment (1) Digital Assistant (1) Django (14) Docker (12) DocumentProcessing (1) Donut (6) DVT (2) Dynamic (10) EMG (1) Enterprise Manager (8) Events (67) Excel (1) Extensions (23) Facets (1) FastAPI (27) Features (2) Flask (2) flowbite (1) Forecast (2) Forms (12) FreeTier (3) Git (1) GitHub (3) Glassfish (3) GPT-3 (1) GPT-4 (1) Gradio (3) Groovy (14) Grunt (1) Haystack (1) Heap Memory (1) HTML (3) HTMX (4) Hugging Face (22) Hybrid (6) Instructor (1) Integrated WLS (1) Integration (32) Invoice (3) iPad (1) iPhone (7) Iterator (1) Jasper (1) Java (2) JavaScript (68) JCS (1) JDBC (2) JDeveloper 11g (469) JDeveloper 11g R2 (205) JDeveloper 12c (208) JDeveloper 19 (1) JET (106) JQuery (1) JSON (8) Katana (3) Keras (1) Kubernetes (9) LangChain (3) Layout (3) Linux (3) List View (1) LLama2 (2) LlamaIndex (6) LLM (35) Location (4) Locking (5) Logging (1) LOV (18) Machine Learning (74) MAF (13) Masterclass (2) MCS (8) MDS (23) MicroFrontends (1) Microservices (21) Mistral (1) MLOps (3) MLX (4) Mobile (45) MongoDB (2) Multitasking (1) NetBeans (21) Neural Networks (2) Nginx (1) Ngrok (2) NLP (1) Node.js (1) Nomination (2) Notifications (1) OAuth (1) OCR (32) ODTUG Kaleidoscope (15) Offline (11) OFUG (2) Ollama (2) On-Prem (1) OpenAI (1) Oracle (5) Oracle Cloud (3) Oracle Fusion (18) Oracle Magazine (1) Oracle Mobile (1) Oracle OpenWorld (46) Oracle Specialization (4) Oracle University (2) PaddleOCR (1) Patching (1) PCS (3) PDF (2) Performance (48) plugins (3) Podcasts (1) Poll (1) PopUp (3) Process Automation (4) Product (2) Proxy (3) PS6 (2) PyCharm (1) Pydantic (1) PyScript (3) Python (139) Qwen2 (2) Qwen2.5 (1) RabbitMQ (7) Radio Group (1) RAG (27) RDK (3) React (3) Red Samurai (37) Reference (1) Refresh (3) Reports (4) REST (70) RIDC (3) Rollback (1) Router (1) San Francisco (1) Security (37) Sequence (1) Service Bus (8) Skipper (5) SOA (43) SOAP (1) Socket.io (1) Sparrow (13) Spatial (12) SQL (1) Stocks (1) Streamlit (11) Structured Data (2) structureddata (1) tailwind (2) Templates (3) TensorFlow (22) Testing (5) TopLink (9) Training (3) Transaction (1) Traveling (20) Tree (7) Tuning (86) TypeScript (1) UCM (12) UI (19) UI Shell (5) UKOUG (6) Uncategorized (14) Update (1) upwork (1) User Data (2) Validation (13) VBCS (10) Vector DB (1) Versioning (1) Vgo Software (1) View Accessor (2) Virtual Machine (2) vision (2) VisionLLM (21) vLLM (5) VLM (1) Weaviate (1) Web (16) Web Casts (2) Web Components (2) Web Services (11) Web Tier (1) Web3 (1) WebCenter (70) WebLogic (40) WebSockets (9) Workarounds (32) Worklist (6) Workspace (2) XGBoost (1) ZeroGPU (1)

Live Visitors Map

All Visitors

Followers

Pageviews last month

Syndicate

Subscribe in a reader

Disclaimer

All views expressed on this blog are my own and do not necessarily reflect the views of my employer.