Monday, December 23, 2024

Stateless MLX Inference with FastAPI in Sparrow

I show how to run inference with MLX in stateless mode, when loaded model is released after inference completes. This is useful when inference requests are less frequent and it helps to reclaim resources reserved by MLX.

 

No comments: