Tuesday, March 24, 2026

How to Cache vLLM Model in FastAPI for Faster Inference

I show you how to keep your vLLM model loaded in FastAPI cache for much faster inference — without reloading it on every request. 

 

Monday, March 16, 2026

Qwen 3.5 Test for JSON Structured Data Extraction

Quick test of the new Qwen 3.5 models on JSON structured data extraction from images. Testing and comparing results for 9B FP16, 27B Q8, and A3B 35B Q8. The 35B Q8 model wins in terms of both speed and accuracy. Test was run on MLX-VLM using a Mac Mini M4 Pro with 64GB RAM

 

Thursday, March 12, 2026

Fast Large Table Extraction: Sparrow + dots.ocr to JSON

Sparrow provides table processing mode. It is optimized to handle large tables, it comes with separate template script (new templates can be easily added) to process dots.ocr markdown output into structure JSON with field mapping.

 

Wednesday, March 4, 2026

Local OCR Comparison: dots.ocr More Accurate, DeepSeek-OCR 2 Faster (Sparrow + MLX)

I run local tests with Sparrow to compare DeepSeek OCR2 and dots.ocr (by RedNote), both run on MLX-VLM in FP16 precision. Dots.ocr consistently beats DeepSeek OCR2 in accuracy, but DeepSeek OCR2 deliveres much better inference performance.