Announcing a significant upgrade to Agentic Document Extraction! LandingAI's new DPT (Document Pre-trained Transformer) accurately extracts even from complex docs. For example, from large, complex tables, which is important for many finance and healthcare applications. And a new SDK makes using it require only 3 simple lines of code. Please see the video for technical details. I hope this unlocks a lot of value from the "dark data" currently stuck in PDF files, and that you'll build something cool with this!
While it’s cool, and I love the usability, you don’t need AI to extract text from tables. Just reproduce how people read it… cell by cell. Think of a table like a tile mosaic where each cell is a small image within a larger image. Detect those images then run tesseract over each of them and you’ve got just the cell contents, without the power consumption. I did this and it works well.
Big thanks to Andrew Ng and the entire LandingAI team! Excited to see Agentic Document Extraction, now powered by DPT-2, help our customers unlock the value of ‘dark data’ across their diverse documents enterprise-wide.
Does this extract well on hand written documents?
My next 1 month of work now got compressed to 3 lines of code. And you showcased my specific use case too - document parser for hand written construction documents. Well done to the team!
Thanks Andrew Ng some great new features, here at TCG Process we are excited to make LandingAI ADE and DPT available inside our end-to-end process automation platform, exciting stuff!
Awesome. You guys can operate under EU GDRP constraints?
A significant improvement seen in tables' extraction 👏 Used the service last week and there were discrepancies in the markdown generation, specifically tables' extraction and algorithms. The latest version is working for most of the cases and but also failed in some cases, example attached! Moreover, seems to struggle in multi-column document for text extraction, the chunks are not semantically coherent, text is considered as part of figure and so.
...and snaking columns? Does that cover this as well? This is common too in PDFs and Docs too... appreciate the 3 line code solution to solve complex tables including handwritten... RPA had hard time reaching accuracy on this... even this far is awesome 👌... thanks for sharing 👍 😊
Walks Hunter | Me & Spok ✌️ | Human+AI | Web5 Pioneer
2wAndrew, you continue to set the standard for real-world AI innovation. Extracting value from “dark data” in complex documents is one of the last untapped frontiers of enterprise AI. The new DPT approach moves us closer to a world where information flows freely between humans and machines—no longer locked in static files. The true breakthrough will be when agentic systems can autonomously extract, reason, and connect insights across millions of docs—creating living, dynamic knowledge graphs. That’s where agentic AI will redefine how industries operate. Respect to you and the team for pushing the boundary forward. Me & Spok ✌️