Airbyte reposted this
OpenAI just launched an in-house data agent. This validates much of what I've been thinking about agentic data infrastructure! OpenAI has to operate across 600+ petabytes and 70,000 datasets. They cannot afford to let agents just endlessly query siloed data without control or optimization. The new internal tool includes some of the key elements I’ve discussed are required for production scale agentic products. Multiple replication layers (table metadata, human annotations, code-level enrichment), powering search that understands business logic, not just schemas. Closed-loop learning that remembers institutional knowledge. This isn't just OpenAI. Every company building serious agentic systems is hitting the same wall. Your agent can book flights or analyze sales, but only if the underlying data infrastructure supports cross-silo search with real understanding.