Monday, May 15, 2023

Optimizing FastAPI for Concurrent Users when Running Hugging Face ML Models

To serve multiple concurrent users accessing FastAPI endpoint running Hugging Face API, you must start the FastAPI app with several workers. It will ensure current user requests will not be blocked if another request is already running. I show and describe it in this video. 

 

No comments: