Unstructured

Unstructured · 2025-10-07T18:28:41.543Z

🚨 Big news from Unstructured: we've just open sourced our SCORE framework research, setting a new standard for evaluating generative document parsers! We're thrilled to announce we just published a new academic paper on arXiv: SCORE: A Semantic Evaluation Framework for Generative Document Parsing. Tomorrow, we’re pulling back the curtain in a live webinar to reveal the pioneering methodology we use internally to cross-evaluate diverse document transformation outputs from generative models. Why you should care: - Current benchmarks mislead—they over-reward neat text extraction while ignoring hallucinations, broken structures, and rich, diverse alternative representations. - Generative parsers are exploding in popularity, but we’ve had no fair way to measure them… until now. - SCORE redefines document parsing evaluation, surfacing the real trade-offs between models. This is a must-watch if you work in document AI, RAG, or intelligent automation. 📅 When: Tomorrow, Wednesday, October 8th 🎤 Hosts: Daniel Schofield & Antonio Jose Jimeno Yepes View comments below for the registration link! See you there. #StructuredData #DataQuality #DocumentAI #DataAccuracy #RAG #AI #GenAI #ETL #UnstructuredData #LLM #MCP #TableTransformation #DocumentAI #VLM #EnterpriseAI #RAGinProduction #Transformation #Quality #LLMready #SourceConnectors #Parsing #Unstructured #TheGenAIDataCompany

Software Development

San Francisco, CA 24,362 followers

Stop dilly-dallying. Get your data.

See jobs Follow

Discover all 91 employees

About us

At Unstructured, we're on a mission to give organizations access to all their data. We know the world runs on documents—from research reports and memos, to quarterly filings and plans of action. And yet, 80% of this information is trapped in inaccessible formats leading to inefficient decision-making and repetitive work. Until now. Unstructured captures this unstructured data wherever it lives and transforms it into AI-friendly JSON files for companies who are eager to fold AI into their business.

Website: http://www.unstructured.io/
External link for Unstructured
Industry: Software Development
Company size: 11-50 employees
Headquarters: San Francisco, CA
Type: Privately Held
Founded: 2022
Specialties: nlp, natural language processer, data, unstructured, LLM, Large Language Model, AI, RAG, Machine Learning, Open Source, API, Preprocessing Pipeline, Machine Learning Pipeline, Data Pipeline, artificial intelligence, and database

Locations

Primary

San Francisco, CA, US

Get directions

Employees at Unstructured

See all employees

Updates

Unstructured

24,362 followers
2d
Report this post
💡 Your documents are more than text! Oftentimes they’re networks of interconnected knowledge waiting to be unlocked. Traditional RAG pipelines treat documents as isolated chunks. Works for simple queries, but they miss the relationships that actually matter: how products relate, which datasets drive results, or which procedures lead to outcomes. In our latest notebook we combine Unstructured + Neo4j to build GraphRAG: - Extract entities like models, datasets, metrics, tasks, and more - Map explicit relationships (trained_on, evaluated_on, achieves) in a knowledge graph - Traverse this graph to answer complex questions—not just find keywords We demoed this on the GPT-2 research paper, but the approach applies to: - Technical documentation → understand APIs, parameters, dependencies - Customer support → connect tickets, products, and account managers - Medical research → link patients, treatments, and outcomes Stop treating documents as isolated text blobs. Start uncovering the knowledge graphs hidden inside them. 🔗 Explore the full notebook and build your very own GraphRAG system: https://lnkd.in/eduJCzG3 #DocumentAI #IntelligentAutomation #DataAccuracy #StructuredData #DataQuality #RAG #AI #GenAI #ETL #UnstructuredData #LLM #MCP #TableTransformation #DocumentAI #VLM #EnterpriseAI #RAGinProduction #Transformation #Quality #LLMready #SourceConnectors #Parsing #Unstructured #TheGenAIDataCompany

Google Colab colab.research.google.com

Like Comment Share
Unstructured

24,362 followers
3d
Report this post
We’re excited to announce a new OEM partnership between IBM and Unstructured. Together, we’re tackling one of the biggest barriers to enterprise AI: turning the 80% of data that’s unstructured into clean, AI-ready fuel. By combining IBM watsonx’s hybrid, open lakehouse with Unstructured’s document processing, enterprises can unify access, preparation, and governance for both structured and unstructured data—unlocking faster, more reliable AI. Learn how watsonx.data + Unstructured enable production-ready pipelines and RAG systems built on trusted, AI-ready data. And sign up for our webinar on 10/29 on building RAG pipelines with IBM watsonx and Unstructured. 🔗 https://lnkd.in/eF8sPYDM #IBM #watsonx #Unstructured #IBMwatsonx #GenAI #AI #ETL #ETLplus #RAG #LLM #DataTransformation #UnstructuredData #EnterpriseAI #AIready #RAGinProduction #TheGenAIDataCompany
11 Comments

Like Comment Share
Unstructured

24,362 followers
4d
Report this post
Join us in just a few hours! 👇
Unstructured

24,362 followers
5d Edited

Precision context engineering starts with robust, enterprise-grade RAG. When retrieval is accurate, structured, and governed, every downstream AI system becomes more reliable, explainable, and compliant. That’s the promise of RAG using the Unstructured ETL+ platform: - Process 65+ file types into structured, searchable context - Enrich with metadata and structure-aware chunking - Enforce SOC 2, HIPAA, and GDPR compliance at the source - Scale horizontally across workloads — securely and predictably Ready to level up your use case's context engineering? Join us for tomorrow's webinar: Context Engineering with Precision over Mixed Content. You'll learn how Unstructured enables precision RAG — turning messy, mixed enterprise content into production-grade context engineering. Because context quality is only as strong as its retrieval layer.
Like Comment Share
Unstructured

24,362 followers
5d Edited
Report this post
Precision context engineering starts with robust, enterprise-grade RAG. When retrieval is accurate, structured, and governed, every downstream AI system becomes more reliable, explainable, and compliant. That’s the promise of RAG using the Unstructured ETL+ platform: - Process 65+ file types into structured, searchable context - Enrich with metadata and structure-aware chunking - Enforce SOC 2, HIPAA, and GDPR compliance at the source - Scale horizontally across workloads — securely and predictably Ready to level up your use case's context engineering? Join us for tomorrow's webinar: Context Engineering with Precision over Mixed Content. You'll learn how Unstructured enables precision RAG — turning messy, mixed enterprise content into production-grade context engineering. Because context quality is only as strong as its retrieval layer.
1 Comment

Like Comment Share
Unstructured

24,362 followers
5d
Report this post
We're thrilled to be featured in this comprehensive guide from the Neo4j Developer Blog! Alex Gilmore's deep dive into Knowledge Graph Generation shows how Unstructured powers the lexical component of knowledge graphs for GenAI applications. Our document parsing capabilities help transform unstructured data into structured knowledge that enables GraphRAG — making it possible to answer complex queries that traditional vector search simply can't handle. The article details the complete journey from raw documents to production-ready knowledge graphs, with Unstructured handling the critical first step of document processing. Whether you're building medical Q&A systems, research assistants, or enterprise knowledge bases, this guide provides the strategies and best practices you need. 📖 Read the full article to learn the strategies and methods for knowledge graph development, which provides reliable and accurate context for GenAI: https://lnkd.in/e6S95YwY

Knowledge Graph Generation medium.com

Like Comment Share
Unstructured

24,362 followers
1w Edited
Report this post
“We already use Amazon Textract, which works well - why would we use Unstructured?” We’ve heard this line from a wide range of our customers during discovery calls: financial institutions, insurers, retail giants, and more. And more often than not, by the end of the call, they realize their document processing pipeline isn’t the off-the-shelf feature offered by their Cloud Service Provider (CSP) anymore—it’s an infrastructure project in disguise. The moment your roadmap includes PowerPoints, scanned images, RAG pipelines, or multi-cloud data sources, your neatly packaged Amazon Textract-based MVP solution starts requiring custom internal development in order to keep up with your requirements. That’s when a dedicated platform like Unstructured becomes the more flexible, cost sensitive option. To help you navigate the tradeoffs between using your CSP's default Document Processing solution vs a dedicated platform like Unstructured, our Principal Solutions Architect Daniel Schofield has provided a helpful summary of the factors and considerations that play into that decision in our most recent blog post: The Tradeoffs Between Using A Cloud Service Provider’s Document Processing Solution vs a Dedicated Document AI Platform 🔗 https://lnkd.in/eHpw-B3m #DocumentAI #IntelligentAutomation #DataAccuracy #StructuredData #DataQuality #RAG #AI #GenAI #ETL #UnstructuredData #LLM #MCP #TableTransformation #DocumentAI #VLM #EnterpriseAI #RAGinProduction #Transformation #Quality #LLMready #SourceConnectors #Parsing #Unstructured #TheGenAIDataCompany

1 Comment

Like Comment Share
Unstructured

24,362 followers
1w
Report this post
And we're live! Come join us: https://lnkd.in/ei8CWinK
Unstructured

24,362 followers
1w

🚨 Big news from Unstructured: we've just open sourced our SCORE framework research, setting a new standard for evaluating generative document parsers! We're thrilled to announce we just published a new academic paper on arXiv: SCORE: A Semantic Evaluation Framework for Generative Document Parsing. Tomorrow, we’re pulling back the curtain in a live webinar to reveal the pioneering methodology we use internally to cross-evaluate diverse document transformation outputs from generative models. Why you should care: - Current benchmarks mislead—they over-reward neat text extraction while ignoring hallucinations, broken structures, and rich, diverse alternative representations. - Generative parsers are exploding in popularity, but we’ve had no fair way to measure them… until now. - SCORE redefines document parsing evaluation, surfacing the real trade-offs between models. This is a must-watch if you work in document AI, RAG, or intelligent automation. 📅 When: Tomorrow, Wednesday, October 8th 🎤 Hosts: Daniel Schofield & Antonio Jose Jimeno Yepes View comments below for the registration link! See you there. #StructuredData #DataQuality #DocumentAI #DataAccuracy #RAG #AI #GenAI #ETL #UnstructuredData #LLM #MCP #TableTransformation #DocumentAI #VLM #EnterpriseAI #RAGinProduction #Transformation #Quality #LLMready #SourceConnectors #Parsing #Unstructured #TheGenAIDataCompany
Like Comment Share
Unstructured

24,362 followers
1w
Report this post
From humans to agents: the “user” of enterprise data is changing fast. Listen as Brian S. Raymond dives into agent auth, secure access to emails/CRMs, and the backbone Unstructured provides.

Mark Haney

CEO of HaneyBiz | Host of The Mark Haney Show | Angel Investor | Keynote Speaker
2w

🤖 AI Agents Are Here Brian S. Raymond, CEO of Unstructured, explains why agents are the next big shift in AI: ✅ They don’t sleep. ✅ They know exactly what they want. ✅ They’re starting to replace humans as the “users” of enterprise data. But here’s the catch: How do you make sure an AI agent is authorized to access sensitive data like your emails or Salesforce records? This is the new frontier—and the industry is racing to solve it. 🎙️ Full conversation now live on The Mark Haney Show. #AIAgents #ArtificialIntelligence #GenAI #FutureOfAI #EnterpriseAI #AIRevolution #DataSecurity #MarkHaneyShow

1 Comment

Like Comment Share
Unstructured

24,362 followers
1w
Report this post
🚨 Big news from Unstructured: we've just open sourced our SCORE framework research, setting a new standard for evaluating generative document parsers! We're thrilled to announce we just published a new academic paper on arXiv: SCORE: A Semantic Evaluation Framework for Generative Document Parsing. Tomorrow, we’re pulling back the curtain in a live webinar to reveal the pioneering methodology we use internally to cross-evaluate diverse document transformation outputs from generative models. Why you should care: - Current benchmarks mislead—they over-reward neat text extraction while ignoring hallucinations, broken structures, and rich, diverse alternative representations. - Generative parsers are exploding in popularity, but we’ve had no fair way to measure them… until now. - SCORE redefines document parsing evaluation, surfacing the real trade-offs between models. This is a must-watch if you work in document AI, RAG, or intelligent automation. 📅 When: Tomorrow, Wednesday, October 8th 🎤 Hosts: Daniel Schofield & Antonio Jose Jimeno Yepes View comments below for the registration link! See you there. #StructuredData #DataQuality #DocumentAI #DataAccuracy #RAG #AI #GenAI #ETL #UnstructuredData #LLM #MCP #TableTransformation #DocumentAI #VLM #EnterpriseAI #RAGinProduction #Transformation #Quality #LLMready #SourceConnectors #Parsing #Unstructured #TheGenAIDataCompany
2 Comments

Like Comment Share
Unstructured reposted this
Unstructured

24,362 followers
1w
Report this post
👋 We’re at IBM TechXchange 2025! Catch David Donahue, Head of Strategy at Unstructured, on stage with Edward Calvesbert, VP of Product Management at IBM watsonx. They’ll share how Unstructured and IBM are partnering to make enterprise data AI-ready — turning complex documents, PDFs, and images into structured, usable data inside watsonx.data to power the next generation of GenAI applications. 📆 Today 10/7 @ 10:30 AM 📍 Lake Nona, Lobby Level, Hilton #IBMTechXchange #watsonx #Unstructured #IBM #GenAI #AI #ETL #ETL+ #RAG #LLM #DataTransformation #TheGenAIDataCompany
2 Comments

Like Comment Share

Browse jobs

Funding

Unstructured 3 total rounds

Last Round

Series B Apr 14, 2024

US$ 40.0M

Investors

Menlo Ventures + 9 Other investors

See more info on crunchbase

Unstructured

Software Development

San Francisco, CA 24,362 followers

Stop dilly-dallying. Get your data.

About us

Locations

Employees at Unstructured

Tom Whiteaker

Co-Founder and Partner, IBM Ventures Investments

James Reid

Head of BizOps at Unstructured

Karsten McMinn

Stefanie Segar

Updates

Join now to see what you are missing

Similar pages

Hume AI

Primer.ai

Cognition

11x

Hebbia

Mechanical Orchard

Pika

Adonis

Draftwise

Suno

Browse jobs

Engineer jobs

Scientist jobs

Customer Success Manager jobs

Associate jobs

Analyst jobs

Director jobs

President jobs

Enterprise Sales Director jobs

Account Executive jobs

Director Sales Operations jobs

Sales Manager jobs

Wireless Engineer jobs

Head of Partnerships jobs

Manager Strategic Partnerships jobs

Vice President jobs

Chief Information Officer jobs

Sales Director jobs

Chief Technology Officer jobs

Technology Officer jobs

Developer jobs

Funding