Pipeline: Mistral OCR converts the document to structured HTML, then Mistral Small extracts and transforms the data into JSON based on a defined schema with field-level hints.
In this video, extracting a bonds portfolio table with hint-driven rules:
- Instrument name normalization (extracting issuer brand from full fund names)
- European number formatting (period as thousands separator, comma as decimal)
- Percentage formatting with sign preservation
- Derived risk classification computed from profit/loss percentage
Sparrow is open source and local-first by design — documents never leave your infrastructure unless you choose the cloud backend.
⭐ GitHub: github.com/katanaml/sparrow
🌐 Live demo: sparrow.katanaml.io
No comments:
Post a Comment