🚀 New in Docling: Structured Data Extraction from Documents! 🚀 We’ve just added a brand-new functionality: extraction of structured data from complex documents using free-form schemas 🤩 What does that mean? 🔹 You can now skip the conversion step — instead of turning documents into text or JSON first, Docling directly extracts the structured fields you care about. 🔹 The requested fields are defined in a free-form schema, so you can instantly align the extraction with the schemas of your own databases. 🔹 This makes it ideal for data-pipelines that don’t need full document conversion, but do need to populate structured databases from messy, unstructured documents. ✨ It’s: 1️⃣ Super simple to use (check out the code snippet 👉 https://lnkd.in/eK6vaMEe ) 2️⃣ 100% open-source and runs fully local – no API calls needed 🙌 3️⃣ Powered by cutting-edge models from NuMind (YC S22) 4️⃣ Perfect for data-pipelines where you need to populate structured databases from documents (think invoices, Curriculum Vitae, contracts, product datasheets, etc) 5️⃣ Currently focused on PDFs and images (PNG) — support for pure text coming soon! The example below shows how easy it is to define a schema and extract structured fields directly from an invoice. 👉 Try it out, break it, send us feedback, and if you like what we’re building, don’t forget to ⭐ the repo (https://lnkd.in/d4UT-6_2)! After a long and fruitful summer, the Docling team has been cooking up many new features — and this is just the beginning! 🌟 #opensource #AI #documentAI #docling #IBM #IBMResearch
Built an Agent with Docling that reads 'U.S. Customs forms' and turns them into structured, usable information. Great work team
Peter W. J. Staar -> I just added docling to idp-software.com feel free to make edits via Pull Request, I'll merge it https://idp-software.com/vendors/docling/
Peter W. J. Staar : Absolutely amazing! 🚀 This is such a powerful addition. I’ve been handling something similar with a lot of glue code, but this feature will really help streamline the entire data ingestion pipeline. Excited to try it out soon and see how it performs in real-world workflows. Kudos to the team! 👏 Quick question: does it also support nested schemas (e.g., line items in invoices), or is it mainly for flat field extraction?**
Awesome! We are big fans of Docling.
Thanks for the awesome product. It is getting better. Privacy is gold.
Loved this — super clear! 🙌 Curious: which Document Loader worked best for mixed PDFs + webpages in your experiments?
How flexible is it when document layouts vary widely? Curious because invoice/contract formats can be really inconsistent.
Accuracy is very good thanks for sharing.
Technical Lead AI - Engineer AI
1moCongrats for the release. I have to ask this question, langExtract from Google does something similar, what is the difference?