tag:blogger.com,1999:blog-58749794291880937802024-03-18T06:47:00.789+01:00Andrej Baranovskij BlogBlog about Oracle, Full Stack, Machine Learning and CloudAndrej Baranovskijhttp://www.blogger.com/profile/04468230464412457426noreply@blogger.comBlogger1178125tag:blogger.com,1999:blog-5874979429188093780.post-6387875252671164892024-03-17T15:32:00.005+01:002024-03-17T15:32:41.892+01:00FastAPI File Upload and Temporary Directory for Stateless APII explain how to handle file upload with FastAPI and how to process the file by using Python temporary directory. Files placed into temporary directory are automatically removed once request completes, this is very convenient for stateless API. <div><br /><div style="text-align: center;">
<iframe allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen="" frameborder="0" height="315" src="https://www.youtube.com/embed/SQjMDG7pPb8?si=zetEf-D0R_FY_OZn" title="YouTube video player" width="560"></iframe></div></div>Andrej Baranovskijhttp://www.blogger.com/profile/04468230464412457426noreply@blogger.com0tag:blogger.com,1999:blog-5874979429188093780.post-19710280845609218352024-03-10T20:09:00.001+01:002024-03-10T20:09:12.081+01:00Optimizing Receipt Processing with LlamaIndex and PaddleOCRLlamaIndex Text Completion function allows to execute LLM request combining custom data and the question, without using Vector DB. This is very useful when processing output from OCR, it simplifies the RAG pipeline. In this video I explain, how OCR can be combined with LLM to process image documents in Sparrow.<div><br /><div style="text-align: center;">
<iframe allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen="" frameborder="0" height="315" src="https://www.youtube.com/embed/tQuybOG24y8?si=KZVCHnfyMfE4-en1" title="YouTube video player" width="560"></iframe></div></div>Andrej Baranovskijhttp://www.blogger.com/profile/04468230464412457426noreply@blogger.com0tag:blogger.com,1999:blog-5874979429188093780.post-80736640806020545032024-03-03T20:03:00.005+01:002024-03-03T20:03:35.275+01:00LlamaIndex Multimodal with Ollama [Local LLM]I describe how to run LlamaIndex Multimodal with local LlaVA LLM through Ollama. Advantage of this approach - you can process image documents with LLM directly, without running through OCR, this should lead to better results. This functionality is integrated as separate LLM agent into Sparrow. <div><br /><div style="text-align: center;">
<iframe allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen="" frameborder="0" height="315" src="https://www.youtube.com/embed/5M5qiDJvuv0?si=mYDOmy1A9tlEr8uZ" title="YouTube video player" width="560"></iframe></div></div>Andrej Baranovskijhttp://www.blogger.com/profile/04468230464412457426noreply@blogger.com0tag:blogger.com,1999:blog-5874979429188093780.post-89671964787024550492024-02-26T08:53:00.000+01:002024-02-26T08:53:49.522+01:00LLM Agents with SparrowI explain new functionality in Sparrow - LLM agents support. This means you can implement independently running agents, and invoke them from CLI or API. This makes it easier to run various LLM related processing within Sparrow. <div><br /><div style="text-align: center;">
<iframe allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen="" frameborder="0" height="315" src="https://www.youtube.com/embed/t3XpVUCNLwM?si=--pk5onCyLFTembi" title="YouTube video player" width="560"></iframe></div></div>Andrej Baranovskijhttp://www.blogger.com/profile/04468230464412457426noreply@blogger.com0tag:blogger.com,1999:blog-5874979429188093780.post-7952187410430841212024-02-20T09:49:00.002+01:002024-02-20T09:49:13.582+01:00Extracting Invoice Structured Output with Haystack and Ollama Local LLMI implemented Sparrow agent with Haystack structured output functionality to extract invoice data. This runs locally through Ollama, using LLM to retrieve key/value pairs data. <div><br /><div style="text-align: center;">
<iframe allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen="" frameborder="0" height="315" src="https://www.youtube.com/embed/-q_NUOIzDXc?si=hYbCqi7uYM8oG9FO" title="YouTube video player" width="560"></iframe></div></div>Andrej Baranovskijhttp://www.blogger.com/profile/04468230464412457426noreply@blogger.com0tag:blogger.com,1999:blog-5874979429188093780.post-78626138049481816042024-02-04T16:12:00.005+01:002024-02-04T16:12:21.635+01:00Local LLM RAG Pipelines with Sparrow Plugins [Python Interface]There are many tools and frameworks around LLM, evolving and improving daily. I added plugin support in Sparrow to run different pipelines through the same Sparrow interface. Each pipeline can be implemented with different tech (LlamaIndex, Haystack, etc.) and run independently. The main advantage is that you can test various RAG functionalities from a single app with a unified API and choose the one that works best in the specific use case. <div><br /><div style="text-align: center;">
<iframe allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen="" frameborder="0" height="315" src="https://www.youtube.com/embed/0rNlck_ZBzs?si=tPo-I1fI6y5pNnog" title="YouTube video player" width="560"></iframe></div></div>Andrej Baranovskijhttp://www.blogger.com/profile/04468230464412457426noreply@blogger.com0tag:blogger.com,1999:blog-5874979429188093780.post-87570668362916295822024-01-29T20:27:00.003+01:002024-01-29T20:27:56.003+01:00LLM Structured Output with Local Haystack RAG and OllamaHaystack 2.0 provides functionality to process LLM output and ensure proper JSON structure, based on predefined Pydantic class. I show how you can run this on your local machine, with Ollama. This is possible thanks to OllamaGenerator class available from Haystack. <div><br /><div style="text-align: center;">
<iframe allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen="" frameborder="0" height="315" src="https://www.youtube.com/embed/rsSRQm1aREc?si=pImFBPD0_BSvSgU7" title="YouTube video player" width="560"></iframe></div></div>Andrej Baranovskijhttp://www.blogger.com/profile/04468230464412457426noreply@blogger.com0tag:blogger.com,1999:blog-5874979429188093780.post-92225277987988838012024-01-23T09:16:00.000+01:002024-01-23T09:16:11.289+01:00JSON Output with Notus Local LLM [LlamaIndex, Ollama, Weaviate]In this video, I show how to get JSON output from Notus LLM running locally with Ollama. JSON output is generated with LlamaIndex using the dynamic Pydantic class approach. <div><br /><div style="text-align: center;">
<iframe allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen="" frameborder="0" height="315" src="https://www.youtube.com/embed/6EXgark9GpA?si=tkYZeRvz9iHkVAe8" title="YouTube video player" width="560"></iframe></div></div>Andrej Baranovskijhttp://www.blogger.com/profile/04468230464412457426noreply@blogger.com0tag:blogger.com,1999:blog-5874979429188093780.post-43911586568770711712024-01-15T10:21:00.003+01:002024-01-15T10:21:38.565+01:00FastAPI and LlamaIndex RAG: Creating Efficient APIsFastAPI works great with LlamaIndex RAG. In this video, I show how to build a POST endpoint to execute inference requests for LlamaIndex. RAG implementation is done as part of Sparrow data extraction solution. I show how FastAPI can handle multiple concurrent requests to initiate RAG pipeline. I'm using Ollama to execute LLM calls as part of the pipeline. Ollama processes requests sequentially. It means Ollama will process API requests in the queue order. Hopefully, in the future, Ollama will support concurrent requests. <div><br /><div style="text-align: center;">
<iframe allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen="" frameborder="0" height="315" src="https://www.youtube.com/embed/vntNI33wrcI?si=alvrwneBcYBXITh2" title="YouTube video player" width="560"></iframe></div></div>Andrej Baranovskijhttp://www.blogger.com/profile/04468230464412457426noreply@blogger.com0tag:blogger.com,1999:blog-5874979429188093780.post-88323949343709056912024-01-08T09:49:00.004+01:002024-01-08T09:49:54.069+01:00Transforming Invoice Data into JSON: Local LLM with LlamaIndex & PydanticThis is Sparrow, our open-source solution for document processing with local LLMs. I'm running local Starling LLM with Ollama. I explain how to get structured JSON output with LlamaIndex and dynamic Pydantic class. This helps to implement the use case of data extraction from invoice documents. The solution runs on the local machine, thanks to Ollama. I'm using a MacBook Air M1 with 8GB RAM. <div><br /><div style="text-align: center;">
<iframe allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen="" frameborder="0" height="315" src="https://www.youtube.com/embed/VKeYaIEk82s?si=3G79R1bEgGTjdGuY" title="YouTube video player" width="560"></iframe></div></div>Andrej Baranovskijhttp://www.blogger.com/profile/04468230464412457426noreply@blogger.com0tag:blogger.com,1999:blog-5874979429188093780.post-71554363360514821432023-12-17T14:59:00.003+01:002023-12-17T14:59:31.723+01:00From Text to Vectors: Leveraging Weaviate for local RAG Implementation with LlamaIndexWeaviate provides vector storage and plays an important part in RAG implementation. I'm using local embeddings from the Sentence Transformers library to create vectors for text-based PDF invoices and store them in Weaviate. I explain how integration is done with LlamaIndex to manage data ingest and LLM inference pipeline. <div><br /><div style="text-align: center;">
<iframe allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen="" frameborder="0" height="315" src="https://www.youtube.com/embed/ROm0R2EdQqg?si=NHJ8dDox3St955BI" title="YouTube video player" width="560"></iframe></div></div>Andrej Baranovskijhttp://www.blogger.com/profile/04468230464412457426noreply@blogger.com0tag:blogger.com,1999:blog-5874979429188093780.post-11855936340888182002023-12-11T13:54:00.001+01:002023-12-11T13:54:05.658+01:00Enhancing RAG: LlamaIndex and Ollama for On-Premise Data ExtractionLlamaIndex is an excellent choice for RAG implementation. It provides a perfect API to work with different data sources and extract data. LlamaIndex provides API for Ollama integration. This means we can easily use LlamaIndex with on-premise LLMs through Ollama. I explain a sample app where LlamaIndex works with Ollama to extract data from PDF invoices. <div><br /><div style="text-align: center;">
<iframe allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen="" frameborder="0" height="315" src="https://www.youtube.com/embed/tiYQiWWd7rE?si=Suy8Jy5sN1EuoYx8" title="YouTube video player" width="560"></iframe></div></div>Andrej Baranovskijhttp://www.blogger.com/profile/04468230464412457426noreply@blogger.com0tag:blogger.com,1999:blog-5874979429188093780.post-57980392664320585132023-12-05T10:41:00.001+01:002023-12-05T10:41:08.982+01:00Secure and Private: On-Premise Invoice Processing with LangChain and Ollama RAGThe Ollama desktop tool helps run LLMs locally on your machine. This tutorial explains how I implemented a pipeline with LangChain and Ollama for on-premise invoice processing. Running LLM on-premise provides many advantages in terms of security and privacy. Ollama works similarly to Docker; you can think of it as Docker for LLMs. You can pull and run multiple LLMs. This allows to switch between LLMs without changing RAG pipeline. <div><br /><div style="text-align: center;">
<iframe allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen="" frameborder="0" height="315" src="https://www.youtube.com/embed/mONpftuo02M?si=InFucuqFsUl6nX0E" title="YouTube video player" width="560"></iframe></div></div>Andrej Baranovskijhttp://www.blogger.com/profile/04468230464412457426noreply@blogger.com0tag:blogger.com,1999:blog-5874979429188093780.post-76132991226415229102023-11-27T14:11:00.000+01:002023-11-27T14:11:31.992+01:00Easy-to-Follow RAG Pipeline Tutorial: Invoice Processing with ChromaDB & LangChainI explain the implementation of the pipeline to process invoice data from PDF documents. The data is loaded into Chroma DB's vector store. Through LangChain API, the data from the vector store is ready to be consumed by LLM as part of the RAG infrastructure. <div><br /><div style="text-align: center;">
<iframe allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen="" frameborder="0" height="315" src="https://www.youtube.com/embed/Higmr8qMoNk?si=AEHuIw0BmYti8YdW" title="YouTube video player" width="560"></iframe></div></div>Andrej Baranovskijhttp://www.blogger.com/profile/04468230464412457426noreply@blogger.com0tag:blogger.com,1999:blog-5874979429188093780.post-17613971685754002652023-11-19T15:54:00.003+01:002023-11-19T15:54:54.768+01:00Vector Database Impact on RAG Efficiency: A Simple OverviewI explain the importance of Vector DB for RAG implementation. I show with a simple example, how data retrieval from Vector DB could affect LLM performance. Before data is sent to LLM, you should verify if quality data is fetched from Vector DB. <div><br /><div style="text-align: center;">
<iframe allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen="" frameborder="0" height="315" src="https://www.youtube.com/embed/WTiLD3C8CFg?si=BEbzq7N8hM6iXHNB" title="YouTube video player" width="560"></iframe></div></div>Andrej Baranovskijhttp://www.blogger.com/profile/04468230464412457426noreply@blogger.com0tag:blogger.com,1999:blog-5874979429188093780.post-4817256405465538482023-11-13T21:00:00.003+01:002023-11-13T21:00:38.206+01:00JSON Output from Mistral 7B LLM [LangChain, Ctransformers]I explain how to compose a prompt for Mistral 7B LLM model running with LangChain and Ctransformers to retrieve output as JSON string, without any additional text. <div><br /><div style="text-align: center;">
<iframe allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen="" frameborder="0" height="315" src="https://www.youtube.com/embed/lUoCPXYS9AU?si=zAXYcpXQ2Ho2EOQZ" title="YouTube video player" width="560"></iframe></div></div>Andrej Baranovskijhttp://www.blogger.com/profile/04468230464412457426noreply@blogger.com0tag:blogger.com,1999:blog-5874979429188093780.post-52768120695891033302023-11-06T15:00:00.000+01:002023-11-06T15:00:13.874+01:00Structured JSON Output from LLM RAG on Local CPU [Weaviate, Llama.cpp, Haystack]I explain how to get structured JSON output from LLM RAG running using Haystack API on top of Llama.cpp. Vector embeddings are stored in Weaviate database, the same as in my previous video. When extracting data, a structured JSON response is preferred because we are not interested in additional descriptions.<div><br /><div style="text-align: center;">
<iframe allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen="" frameborder="0" height="315" src="https://www.youtube.com/embed/mvHFCp97USM?si=9TU4cLFRapT-NTzV" title="YouTube video player" width="560"></iframe></div></div>Andrej Baranovskijhttp://www.blogger.com/profile/04468230464412457426noreply@blogger.com0tag:blogger.com,1999:blog-5874979429188093780.post-14276495710451181762023-10-22T20:54:00.002+02:002023-10-22T20:54:07.371+02:00Invoice Data Processing with Llama2 13B LLM RAG on Local CPU [Weaviate, Llama.cpp, Haystack]I explained how to set up local LLM RAG to process invoice data with Llama2 13B. Based on my experiments, Llama2 13B works better with tabular data compared to Mistral 7B model. This example presents a production LLM RAG setup with Weaviate database for vector embeddings, Haystack for LLM API, and Llama.cpp to run Llama2 13b on a local CPU. <div><br /><div style="text-align: center;">
<iframe allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen="" frameborder="0" height="315" src="https://www.youtube.com/embed/XuvdgCuydsM?si=iJP5VN7HHG5BOZm4" title="YouTube video player" width="560"></iframe></div></div>Andrej Baranovskijhttp://www.blogger.com/profile/04468230464412457426noreply@blogger.com0tag:blogger.com,1999:blog-5874979429188093780.post-38052982750586682022023-10-16T21:19:00.002+02:002023-10-16T21:19:12.993+02:00Invoice Data Processing with Mistral LLM on Local CPUI explain the solution to extract invoice document fields with open-source LLM Mistral. It runs on CPU and doesn't require Cloud machine. I'm using Mistral 7B LLM model, Langchain, Ctransformers and Faiss vector store to run it on a local CPU machine. This approach gives a great advantage for enterprise systems, when running ML models on Cloud is not allowed for privacy reasons. <div><br /><div style="text-align: center;">
<iframe allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen="" frameborder="0" height="315" src="https://www.youtube.com/embed/9RERupqcFL4?si=ZuKNVxVX8mUCgLNj" title="YouTube video player" width="560"></iframe></div></div>Andrej Baranovskijhttp://www.blogger.com/profile/04468230464412457426noreply@blogger.com0tag:blogger.com,1999:blog-5874979429188093780.post-32765028262391618012023-10-09T10:03:00.001+02:002023-10-09T10:03:06.462+02:00Skipper MLOps Debugging and Development on Your Local MachineI explain how to stop some of the Skipper MLOps services running in Docker and debug/develop these services code locally. This improves development workflow. There is no need to deploy code change to Docker container, it can be tested locally. Service that runs locally, connects to the Skipper infra through RabbitMQ queue.<div><br /><div style="text-align: center;">
<iframe allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen="" frameborder="0" height="315" src="https://www.youtube.com/embed/TqU9tyQudnw?si=EGhl3V1AH2jm2ZK3" title="YouTube video player" width="560"></iframe></div></div>Andrej Baranovskijhttp://www.blogger.com/profile/04468230464412457426noreply@blogger.com0tag:blogger.com,1999:blog-5874979429188093780.post-37208606321053527792023-10-02T12:52:00.003+02:002023-10-02T12:52:42.859+02:00Pros and Cons of Developing Your Own ChatGPT PluginI've been running ChatGPT plugin in prod for a month and sharing my thoughts about the pros and cons of developing it. Would I build a new ChatGPT plugin? <div><br /><div style="text-align: center;">
<iframe allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen="" frameborder="0" height="315" src="https://www.youtube.com/embed/c_adOoUzQog?si=0XOLNmvqziHpsMDR" title="YouTube video player" width="560"></iframe></div></div>Andrej Baranovskijhttp://www.blogger.com/profile/04468230464412457426noreply@blogger.com0tag:blogger.com,1999:blog-5874979429188093780.post-65556093048203024392023-09-25T09:47:00.004+02:002023-09-25T09:47:26.412+02:00LLama 2 LLM for PDF Invoice Data ExtractionI show how you can extract data from text PDF invoice using LLama2 LLM model running on a free Colab GPU instance. I specifically explain how you can improve data retrieval using carefully crafted prompts.<div><br /></div><div style="text-align: center;">
<iframe allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen="" frameborder="0" height="315" src="https://www.youtube.com/embed/WGNpdvnwR7o?si=KTZtpXIrMBd-o4YX" title="YouTube video player" width="560"></iframe></div>Andrej Baranovskijhttp://www.blogger.com/profile/04468230464412457426noreply@blogger.com0tag:blogger.com,1999:blog-5874979429188093780.post-8225101779823119822023-09-11T09:15:00.004+02:002023-09-11T09:15:54.871+02:00Data Filtering and Aggregation with Receipt Assistant Plugin for ChatGPTI explain Receipt Assistant plugin for ChatGPT from a user perspective. I show how to fetch previously processed and saved receipt data, including filtering and aggregation. Also, I show how you can fix spelling mistakes for Lithuanian language receipt items. At the end, numeric data is visualized with WizeCharts plugin for ChaGPT. <div><br /><div style="text-align: center;">
<iframe allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen="" frameborder="0" height="315" src="https://www.youtube.com/embed/NSSGFD3led4?si=4om3W3Af8UtYS_zy" title="YouTube video player" width="560"></iframe></div></div>Andrej Baranovskijhttp://www.blogger.com/profile/04468230464412457426noreply@blogger.com0tag:blogger.com,1999:blog-5874979429188093780.post-58532489073620145582023-09-04T11:52:00.003+02:002023-09-04T11:52:24.868+02:00Computer Vision with ChatGPT - Receipt Assistant PluginOur plugin - Receipt Assistant was approved to be included in ChatGPT store. I explain how it works and how to use it in combination with other plugins, for example, to display charts. Receipt Assistant provides vision and storage option for ChatGPT. It is primarily tuned to work with receipts, but it can handle any structured info of medium complexity. <div><br /><div style="text-align: center;">
<iframe allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen="" frameborder="0" height="315" src="https://www.youtube.com/embed/JyWQxpt99qo?si=2o8KqPxeGnbNrNoX" title="YouTube video player" width="560"></iframe></div></div>Andrej Baranovskijhttp://www.blogger.com/profile/04468230464412457426noreply@blogger.com0tag:blogger.com,1999:blog-5874979429188093780.post-59171113164623407102023-08-19T21:41:00.002+02:002023-08-19T21:41:14.949+02:00How to Host FastAPI from Your Computer with ngrokWith ngrok you can host your FastAPI app from your computer. This can be a handy and cheaper option for some projects. In this video, I explain my experience running FastAPI apps from my very own Cloud with ngrok :)<div><br /></div><div style="text-align: center;">
<iframe allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share" allowfullscreen="" frameborder="0" height="315" src="https://www.youtube.com/embed/HEGT00StXbw" title="YouTube video player" width="560"></iframe></div>Andrej Baranovskijhttp://www.blogger.com/profile/04468230464412457426noreply@blogger.com0