Tuesday, April 26, 2022

UI for ML - Django, React or Streamlit?

UI is an important part for ML app to be successful. In this video I discuss multiple UI options I was looking into to build UI for our ML product. While deciding on which UI framework or library to use, you should point your attention to multiple things - such as ease of data transfer, UI flexibility, and ability to build user-friendly functionality.

 

Monday, April 18, 2022

Mindee docTR - Probably the Best Open-Source OCR

Do you want to build ML pipeline to automate data extraction from business documents (receipts, invoices, forms)? Then your first step should be to integrate OCR for text extraction. OCR extraction quality must be good, the whole pipeline will depend on initial text data extraction quality. If extracted data will be accurate, this means ML models will be able to run proper classification. I spent time researching available solutions for OCR and I think Mindee docTR currently is one of the best open-source OCR solutions available. Check the video, where I run and show multiple tests.

 

Monday, April 11, 2022

Document Information Extraction Demo on Hugging Face Spaces

This video shows how fine-tuned LayoutLMv2 document understanding and information extraction model runs on Hugging Face Spaces demo environment. I show how data extraction works for different receipts and why you should not rely on OCR which comes pre-configured together with LayoutLMv2 model.