LandingAI releases DPT for complex doc extraction | Andrew Ng posted on the topic | LinkedIn

LinkedIn respects your privacy

LinkedIn and 3rd parties use essential and non-essential cookies to provide, secure, analyze and improve our Services, and to show you relevant ads (including professional and job ads) on and off LinkedIn. Learn more in our Cookie Policy.

Select Accept to consent or Reject to decline non-essential cookies for this use. You can update your choices at any time in your settings.

Andrew Ng

Founder of DeepLearning.AI; Managing General Partner of AI Fund; Exec Chairman of LandingAI

2w

Announcing a significant upgrade to Agentic Document Extraction! LandingAI's new DPT (Document Pre-trained Transformer) accurately extracts even from complex docs. For example, from large, complex tables, which is important for many finance and healthcare applications. And a new SDK makes using it require only 3 simple lines of code. Please see the video for technical details. I hope this unlocks a lot of value from the "dark data" currently stuck in PDF files, and that you'll build something cool with this!

506 Comments

Dario D., graphic

Walks Hunter | Me & Spok ✌️ | Human+AI | Web5 Pioneer

2w

Andrew, you continue to set the standard for real-world AI innovation. Extracting value from “dark data” in complex documents is one of the last untapped frontiers of enterprise AI. The new DPT approach moves us closer to a world where information flows freely between humans and machines—no longer locked in static files. The true breakthrough will be when agentic systems can autonomously extract, reason, and connect insights across millions of docs—creating living, dynamic knowledge graphs. That’s where agentic AI will redefine how industries operate. Respect to you and the team for pushing the boundary forward. Me & Spok ✌️

Willem Malloy, graphic

I make complex automation problems simple

2w

While it’s cool, and I love the usability, you don’t need AI to extract text from tables. Just reproduce how people read it… cell by cell. Think of a table like a tile mosaic where each cell is a small image within a larger image. Detect those images then run tesseract over each of them and you’ve got just the cell contents, without the power consumption. I did this and it works well.

Dan Maloney, graphic

2w

Big thanks to Andrew Ng and the entire LandingAI team! Excited to see Agentic Document Extraction, now powered by DPT-2, help our customers unlock the value of ‘dark data’ across their diverse documents enterprise-wide.

Ashwin manohar, graphic

Senior Software Engineer | Generative AI | Full-Stack & Distributed Systems

2w

Does this extract well on hand written documents?

Ranjani Subramaniam, graphic

Ranjani Subramaniam

AI Strategy For Engineering at Shell

2w

My next 1 month of work now got compressed to 3 lines of code. And you showcased my specific use case too - document parser for hand written construction documents. Well done to the team!

Neil Walker, graphic

Head of Product at TCG Process

2w

Thanks Andrew Ng some great new features, here at TCG Process we are excited to make LandingAI ADE and DPT available inside our end-to-end process automation platform, exciting stuff!

Serge Liatko, graphic

Looking for Talents & Partners

2w

Awesome. You guys can operate under EU GDRP constraints?

Shehryar Ali, graphic

Engineering Manager | GenAI | Python

2w

A significant improvement seen in tables' extraction 👏 Used the service last week and there were discrepancies in the markdown generation, specifically tables' extraction and algorithms. The latest version is working for most of the cases and but also failed in some cases, example attached! Moreover, seems to struggle in multi-column document for text extraction, the chunks are not semantically coherent, text is considered as part of figure and so.

Nirmit Desai, graphic

Director, Data for AI Models @ IBM Research | Innovation Leader | ECNY Fellow

2w

Important use case, but sounds like something Docling has been doing for a long time Andrew Ng ! Do you have a perspective on it?

Nitu Pote Sonar, graphic

Nitu Pote Sonar

Trusted AI Product Manager ⏫ Techno-Functional Strategist ⏫ CSM & CSPO Certified ⏫ From Insight to Automation ⏫ B2B SaaS ⏫ Driving AI-Led Automation @ Scale

2w

...and snaking columns? Does that cover this as well? This is common too in PDFs and Docs too... appreciate the 3 line code solution to solve complex tables including handwritten... RPA had hard time reaching accuracy on this... even this far is awesome 👌... thanks for sharing 👍 😊

See more comments

To view or add a comment, sign in

More Relevant Posts

Ankur Rawat

Engineering Chief of Staff at Landing AI
2w
Report this post
This is a major upgrade to the Agentic Document Extraction capability, specifically for extracting content from large and complex Tables.

Andrew Ng Andrew Ng is an Influencer

Founder of DeepLearning.AI; Managing General Partner of AI Fund; Exec Chairman of LandingAI
2w

Announcing a significant upgrade to Agentic Document Extraction! LandingAI's new DPT (Document Pre-trained Transformer) accurately extracts even from complex docs. For example, from large, complex tables, which is important for many finance and healthcare applications. And a new SDK makes using it require only 3 simple lines of code. Please see the video for technical details. I hope this unlocks a lot of value from the "dark data" currently stuck in PDF files, and that you'll build something cool with this!
Like Comment
To view or add a comment, sign in
LandingAI

119,874 followers
2w
Report this post
🚀 Big News: Agentic Document Extraction (ADE) just got a major upgrade. We’re introducing Document Pre-trained Transformer 2 (DPT-2), a new foundation model that powers the next generation of ADE. 🔍 Why this matters: Parsing complex, messy documents has always been error-prone. Tables without gridlines, scanned invoices at odd angles, embedded stamps or signatures — traditional systems miss them. DPT-2 brings higher accuracy, faster performance, and full grounding to every extraction. 💡 What’s improved: • Parse large, no-gridline tables cell by cell with precise alignment • Smarter layout detection in messy scans with fewer missed chunks • Expanded coverage for signatures, checkboxes, barcodes, and QR codes • Concise figure captioning for logos and seals without verbose noise • Parallel extraction for developers via API and SDKs • Reliability for industries where accuracy is critical: finance, healthcare, insurance, compliance Read to see it in action? 👉 Try ADE DPT-2 in the Playground and API: https://lnkd.in/eZq-NqWH

Andrew Ng Andrew Ng is an Influencer

Founder of DeepLearning.AI; Managing General Partner of AI Fund; Exec Chairman of LandingAI
2w

Announcing a significant upgrade to Agentic Document Extraction! LandingAI's new DPT (Document Pre-trained Transformer) accurately extracts even from complex docs. For example, from large, complex tables, which is important for many finance and healthcare applications. And a new SDK makes using it require only 3 simple lines of code. Please see the video for technical details. I hope this unlocks a lot of value from the "dark data" currently stuck in PDF files, and that you'll build something cool with this!

7 Comments
Like Comment
To view or add a comment, sign in
Sanjay Kalra

Digital Transformation Sherpa™️ Helping Reimagine Business with AI and Automation | Google Cloud Digital Leader & Gen AI Leader | Product Engineering Maven | Partnerships & Alliances Expert | Follow me on X @sanjaykalra
2w
Report this post
Great to see this landmark upgrade to the agentic AI framework in action! Andrew Ng’s vision - moving beyond linear prompt/response models and toward dynamic, autonomous workflows - continues to redefine what’s possible for enterprise AI and digital transformation. Agentic architectures and orchestration layers are turning large language models into true collaborators, empowering AI agents to plan, reflect, and iterate in service of real-world business goals. The results: higher quality outputs, faster prototyping, and the ability to deploy applications that learn and evolve with each task. This upgrade is a game-changer for AI accessibility and value creation. At ACL Digital, we are looking forward to seeing organizations everywhere harness the full spectrum of agentic capabilities to drive innovation and deliver impact at scale! #AI #AgenticAI #Innovation

Andrew Ng Andrew Ng is an Influencer

Founder of DeepLearning.AI; Managing General Partner of AI Fund; Exec Chairman of LandingAI
2w

Announcing a significant upgrade to Agentic Document Extraction! LandingAI's new DPT (Document Pre-trained Transformer) accurately extracts even from complex docs. For example, from large, complex tables, which is important for many finance and healthcare applications. And a new SDK makes using it require only 3 simple lines of code. Please see the video for technical details. I hope this unlocks a lot of value from the "dark data" currently stuck in PDF files, and that you'll build something cool with this!
Like Comment
To view or add a comment, sign in
S A M Matiur Rahman

Associate Professor, Department of Software Engineering, Daffodil International University
2w
Report this post
A new era of intelligence is unfolding as LandingAI’s upgraded Agentic Document Extraction with DPT transforms the way we unlock value from complex, unstructured data. 📑⚡ From intricate financial tables to dense healthcare records, this breakthrough makes extraction seamless—while a powerful SDK reduces integration to just three lines of code. By liberating “dark data” from static PDFs, this innovation opens boundless opportunities for smarter, faster, and more impactful solutions. 💚 #AI #DarkData #Innovation #DataExtraction #FutureOfWork #LandingAI

Andrew Ng Andrew Ng is an Influencer

Founder of DeepLearning.AI; Managing General Partner of AI Fund; Exec Chairman of LandingAI
2w

Announcing a significant upgrade to Agentic Document Extraction! LandingAI's new DPT (Document Pre-trained Transformer) accurately extracts even from complex docs. For example, from large, complex tables, which is important for many finance and healthcare applications. And a new SDK makes using it require only 3 simple lines of code. Please see the video for technical details. I hope this unlocks a lot of value from the "dark data" currently stuck in PDF files, and that you'll build something cool with this!
Like Comment
To view or add a comment, sign in
Maks Kale

DevOps / MLops | K8S | Cloud | Terraform | AI MCP
2w
Report this post
>> document pre-trained transformer model by LandingAI . The Agentic Document Extraction converts complex documents with the embedded rows and charts into LLM-ready data. It takes just a few lines of code to use the SDK building the powerful #Agentic #RAG answer #engine based on the existing docs. link https://lnkd.in/d52Bpx8N

Andrew Ng Andrew Ng is an Influencer

Founder of DeepLearning.AI; Managing General Partner of AI Fund; Exec Chairman of LandingAI
2w

Announcing a significant upgrade to Agentic Document Extraction! LandingAI's new DPT (Document Pre-trained Transformer) accurately extracts even from complex docs. For example, from large, complex tables, which is important for many finance and healthcare applications. And a new SDK makes using it require only 3 simple lines of code. Please see the video for technical details. I hope this unlocks a lot of value from the "dark data" currently stuck in PDF files, and that you'll build something cool with this!
Like Comment
To view or add a comment, sign in
Arun Palanoor

Engineering Manager – Data Analytics | AI/ML | IoT | Business Intelligence | Predictive Maintenance| Product cost optimization | Client relationship management
2w
Report this post
Just experimented with LandingAI’s new Agentic Document Extraction powered by DPT - and I’m genuinely impressed. I revisited our Capstone project’s quarterly earnings presentation (a 50-page PDF packed with charts and tables), and it parsed everything in under 20 seconds with remarkable accuracy. Having experimented with multiple libraries for this task - each with its own quirks - this release feels like a leap forward in document intelligence. A brilliant tool for anyone working with complex visual data. I highly recommend giving it a spin: https://va.landing.ai/

Andrew Ng Andrew Ng is an Influencer

Founder of DeepLearning.AI; Managing General Partner of AI Fund; Exec Chairman of LandingAI
2w

Announcing a significant upgrade to Agentic Document Extraction! LandingAI's new DPT (Document Pre-trained Transformer) accurately extracts even from complex docs. For example, from large, complex tables, which is important for many finance and healthcare applications. And a new SDK makes using it require only 3 simple lines of code. Please see the video for technical details. I hope this unlocks a lot of value from the "dark data" currently stuck in PDF files, and that you'll build something cool with this!
Like Comment
To view or add a comment, sign in
Harjot Singh

Seeking opportunities in Data Field | AI Researcher @ Durham College | Python | SQL | Tableau | Ex-Wipro Alumni
2w
Report this post
💡 Great to see advancements like DPT pushing the boundaries of document intelligence. CNNs have already shown strong capabilities in extracting information from complex layouts and even handwritten notes, reducing the time spent combing through massive volumes of unstructured data. 🎢 What excites me is how combining CNN-based approaches with transformers can further minimize information loss, especially in domains like healthcare and finance where every detail matters. This really opens the door to unlocking the “dark data” hidden in PDFs and scanned docs. What's your thoughts on this? #LandingAI #ML #CNN #AIResearcher

Andrew Ng Andrew Ng is an Influencer

Founder of DeepLearning.AI; Managing General Partner of AI Fund; Exec Chairman of LandingAI
2w

Announcing a significant upgrade to Agentic Document Extraction! LandingAI's new DPT (Document Pre-trained Transformer) accurately extracts even from complex docs. For example, from large, complex tables, which is important for many finance and healthcare applications. And a new SDK makes using it require only 3 simple lines of code. Please see the video for technical details. I hope this unlocks a lot of value from the "dark data" currently stuck in PDF files, and that you'll build something cool with this!
Like Comment
To view or add a comment, sign in
Cadeyrn Craig

AI-Powered Design, Marketing & Technology Leader | Brand Strategist | Driving UX, E-Commerce & Digital Strategy | Senior Manager at Motion
2w
Report this post
This is very cool...We've all, well most of us that work in business, have run into this problem... Here is a tool that can make extracting data from structured and unstructured data and file type... but you don't have to take my word for it... here's Andrew Ng to explain. Thought I would share because this is a common use case and could help you build something awesome..

Andrew Ng Andrew Ng is an Influencer

Founder of DeepLearning.AI; Managing General Partner of AI Fund; Exec Chairman of LandingAI
2w

Announcing a significant upgrade to Agentic Document Extraction! LandingAI's new DPT (Document Pre-trained Transformer) accurately extracts even from complex docs. For example, from large, complex tables, which is important for many finance and healthcare applications. And a new SDK makes using it require only 3 simple lines of code. Please see the video for technical details. I hope this unlocks a lot of value from the "dark data" currently stuck in PDF files, and that you'll build something cool with this!

1 Comment
Like Comment
To view or add a comment, sign in
Jerzy Filatow

Building AI first teams for customer first outcomes.
2w
Report this post
This is a significant boon for anyone working with documents where traditional OCR can’t understand complex tables with merged sections areas embedded in PDFs. Often these get misinterpretted in traditional RAG injestion and end up incorrectly identifying the relationship’s, an then providing inaccurate data in return. This will bring significant improvement’s in enterprise’s search and finacial data indexing.

Andrew Ng Andrew Ng is an Influencer

Founder of DeepLearning.AI; Managing General Partner of AI Fund; Exec Chairman of LandingAI
2w

Announcing a significant upgrade to Agentic Document Extraction! LandingAI's new DPT (Document Pre-trained Transformer) accurately extracts even from complex docs. For example, from large, complex tables, which is important for many finance and healthcare applications. And a new SDK makes using it require only 3 simple lines of code. Please see the video for technical details. I hope this unlocks a lot of value from the "dark data" currently stuck in PDF files, and that you'll build something cool with this!

1 Comment
Like Comment
To view or add a comment, sign in
ehs tools

624 followers
2w
Report this post
Interesting post as table transformation was an issue with the past LLM models , but there has been significant strides in tech . And we have leveraged it , now we can process any sort of RA sheets with any format with these tools to bulk upload actions onto ehs tools

Andrew Ng Andrew Ng is an Influencer

Founder of DeepLearning.AI; Managing General Partner of AI Fund; Exec Chairman of LandingAI
2w

Announcing a significant upgrade to Agentic Document Extraction! LandingAI's new DPT (Document Pre-trained Transformer) accurately extracts even from complex docs. For example, from large, complex tables, which is important for many finance and healthcare applications. And a new SDK makes using it require only 3 simple lines of code. Please see the video for technical details. I hope this unlocks a lot of value from the "dark data" currently stuck in PDF files, and that you'll build something cool with this!
Like Comment
To view or add a comment, sign in

Andrew Ng

2,265,181 followers

View Profile Follow

More from this author

Explore content categories