How Search APIs are Revolutionizing Data Collection for Businesses

CEO at Bright Data - Keeping public web data, public.

63% of all websites already see traffic from AI agents, and that share is only climbing. Those agents thrive on one thing: real-time data. In my comments for Built In's recent article, I explain how search APIs are becoming the backbone of this ecosystem, providing essential data for companies that no longer want to collect and index data themselves. Companies can forget the challenges they previously faced with data collection by calling an API to retrieve up-to-date information. This shift allows businesses to prioritize innovation over infrastructure-building, something we are seeing our own customers, both enterprise and start up embrace as a strategic move. Many of them are using Bright Data's SERP API, a powerful tool that can deliver search engine results in multiple formats in just one second. Here's a link to the full article. https://brdta.com/4pPzLwe #SearchAPI #OpenWeb #DataInfrastructure

What Are Search Engines for AI? What Makes Them Different? | Built In builtin.com

To view or add a comment, sign in

More Relevant Posts

Vested Marketing

708 followers
3w
Report this post
If you want your content to be the source that Microsoft Copilot or Google AI cites, you need to understand the decision process. Answer engines prioritize relevance to the query, EEAT signals, corroboration across sources, structured data, and technical health. In our latest post we show how to structure pages, add the right JSON-LD in HubSpot, and measure wins in Search Console and HubSpot. Practical steps, measurement guidelines, and a priority roadmap included. Read the post and book a strategy call: https://hubs.ly/Q03LBV5m0

How Do Answer Engines Choose Sources? vested.marketing
Like Comment
To view or add a comment, sign in
Micha(el) Bladowski

There are many programers, but only a few Problem solvers! ---- I built burner-note.link🔒⏳🔥
1mo
Report this post
🔍 #Perplexity launches Search #API - direct access to real-time web index powering their answers without generative layer #AI #webapi #search ⚡ Delivers factual, up-to-date results with structured snippets ranked for relevance - skip heavy preprocessing 🎯 Continuously refreshed index spanning hundreds of billions of pages with high accuracy at low latency 🔧 Developer-friendly pricing at $5 per 1K requests with full control over output processing 🎨 Perfect for #AI agents needing grounded web context and research tools demanding trust and freshness 🚀 Enables custom products where developers want complete control over how search data is used 📊 Structured response format eliminates need for complex data parsing and preprocessing steps 🌐 https://lnkd.in/dUbeekb2
1 Comment
Like Comment
To view or add a comment, sign in
Cut Inside

84 followers
3w
Report this post
👻 The Ghost in the Machine: Are AI Queries Showing Up in Your GSC? We’ve spotted something strange in Google Search Console. While auditing long-tail queries across several eCommerce clients, we started seeing an unusual pattern: • Long, highly specific search terms like “[product name] product info and reviews” • Massive impressions, but 0 clicks • A sharp spike beginning August 21st, the same day Google expanded AI Mode globally 🤖 We’ve confirmed it: These aren’t just humans searching. AI systems are actively querying Google to gather information. Our data shows that AI Overviews aren’t only pulling from indexed content, they’re actually performing searches while constructing answers, and those traces are already showing up in GSC data. Why It Matters • Brands must own their product narratives; if AI is searching, you want to be the source it finds. • Publishers could ride this wave by targeting AI-style query structures to increase visibility. • It reinforces a core truth: Search visibility is AI visibility. We're helping clients identify these patterns and prepare content strategies accordingly. 📊 Curious if these ghosts are haunting your data? • Open GSC Performance Report • Add a filter by query and select “Matching regex” • Paste this: ^\S+(?:\s+\S+){8,} This will surface queries longer than 8 words. Notice any strange patterns? That might be your AI ghost. #SEO #AIsearch #GSC #GoogleSearchConsole #GenerativeSearch #AIMode #CutInside #SearchVisibility #BrandedSearch #DigitalStrategy #AIOverviews #SearchData #ContentStrategy #eCommerceSEO #PublisherSEO #TechnicalSEO
Like Comment
To view or add a comment, sign in
Shivam Kumar

Gen AI Engineer Intern@Zensar | Pre-final Year CSE(AI-DS) Student at IIITM | Building AI solution
2w
Report this post
Google quietly removed the &num=100 parameter and it’s already shaking up the data that many SEO and AI tools rely on. ▶ Here’s what’s happening : • The &num=100 parameter let tools fetch 100 results per page from Google. That’s gone. • Since its removal, many sites report lower impressions and keyword visibility in Search Console. • The old data might’ve been inflated by scrapers i.e the new numbers could actually reflect more accurate visibility. ▶ Why this matters for AI • This isn’t just an SEO story it affects AI systems trained on web or search data. • Data Shift: Models trained on old visibility distributions might now misinterpret trends. • Reduced Signal: Long-tail queries could vanish, shrinking the training corpus. • Retrieval Impact: Search-based AIs (e.g. RAG systems) now get smaller result pools less breadth to work with. • Benchmark Drift: Old evaluation metrics based on Google’s results may no longer line up. • Cleaner Data Ahead: The upside fewer scrapers, less noise, and potentially more reliable data for retraining. ▶ How to Cope and Adapt : • Recalibrate your models: Retrain or fine-tune on the new visibility distributions to remove bias from inflated impression data. • Diversify data sources: Don’t rely solely on Google SERP data integrate Bing, DuckDuckGo, Reddit, or site analytics. • Dynamic retraining pipelines: Set up periodic validation checks so your AI can adapt automatically to search ecosystem changes. • Re-evaluate benchmarks: Update internal KPIs or model evaluation sets to reflect the new baseline. • Embrace precision over volume: Focus retrieval and SEO AI systems on quality of results rather than depth. 💡 Takeaway • Google’s removal of &num=100 reminds us that even a single parameter change can ripple across SEO dashboards and AI models alike. • Those who adapt their data pipelines fastest will turn this disruption into an advantage. #AI #SEO #GoogleSearch #DataShift #SearchEngineLand #DataAdaptation

2 Comments
Like Comment
To view or add a comment, sign in
Matthew O'Such

Enterprise Digital Marketing Executive Organic and Paid Media : Growth Leader | Performance & Conversion w Data-led Strategies | Building & Scaling Global Teams | 20+ Yrs for Fortune 500 & SaaS B2B B2C SEO AIO SEM
2w
Report this post
Search Everywhere Optimization people - Important info coming out of our industry today. TollBit data study finds: 1) AI makes up only 0.102% of traffic to sites on avg; 2) avg -9% decrease YoY in visitors from trad search while AI hits increase for Q2; 3) Google drives +830x more visits than AI search systems; 4) AI systems send almost no human referrals to publishers despite heavy scraping. TollBit is a platform designed to help websites, like publishers and content owners, monitor, manage, and monetize their content when it is accessed by AI bots, data scrapers, and other autonomous systems - just dropped this report. In a time when we're thirsty for data, TollBit is providing insights we need as talking points for our customers and stakeholders. Here's where to find the full report and the article on SEL: https://lnkd.in/e_WgnZAg https://lnkd.in/eQPATY_k Frankly put: The open web's business model, especially for publishers running on ad revenue, is even more at risk as AI consumes and regurgitates their content without compensating creators.

Google sends 831x more visitors than AI systems: Report searchengineland.com
Like Comment
To view or add a comment, sign in
Sharif Zafar

Founder of Heroxshorts & Dorfy app, AI Developer, AI Systems | on a mission to inspire the world
3w
Report this post
Google Proposes TUMIX: Multi-Agent Test-Time Scaling With Tool-Use Mixture What if, instead of re-sampling one agent, you could push Gemini-2.5 Pro to 34.1% on HLE by mixing 12–15 tool-using agents that share notes and stop early? Google Cloud AI Research, with collaborators from MIT, Harvard, and Google DeepMind, introduced TUMIX (Tool-Use Mixture)—a test-time framework that ensembles heterogeneous agent styles (text-only, code, search, guided variants) and lets them share intermediate answers over a few refinement rounds, then stop early via an LLM-based judge. The result: higher accuracy at lower cost on hard reasoning benchmarks such as HLE, GPQA-Diamond, and AIME (2024/2025). https://ift.tt/vuRE2rg So, What exactly is different new? Mixture over modality, not just more samples: TUMIX runs ~15 agent styles spanning Chain-of-Thought (CoT), code execution, web search, dual-tool agents, and guided variants. Each round, every agent sees (a) the original question and (b) other agents’ previous answers, then proposes a refined answer. This message-passing raises average accuracy early while diversity gradually collapses—so stopping matters. Adaptive early-termination: An LLM-as-Judge halts refinement once answers exhibit strong consensus (with a minimum round threshold). This preserves accuracy at ~49% of the inference cost vs. fixed-round refinement; token cost drops to ~46% because late rounds are token-heavier. Auto-designed agents: Beyond human-crafted agents, TUMIX prompts the base LLM to generate new agent types; mixing these with the manual set yields an additional ~+1.2% average lift without extra cost. The empirical “sweet spot” is ~12–15 agent styles. https://ift.tt/vuRE2rg How does it work? TUMIX runs a group of heterogeneous agents—text-only Chain-of-Thought, code-executing, web-searching, and guided variants—in parallel, then iterates a small number of refinement rounds where each agent conditions on the original question plus the other agents’ prior rationales and answers (structured note-sharing). After each round, an LLM-based judge evaluates consensus/consistency to decide early termination; if confidence is insufficient, another round is triggered, otherwise the system finalizes via simple aggregation (e.g., majority vote or selector). This mixture-of-tool-use design trades brute-force re-sampling for diverse reasoning paths, improving coverage of correct candidates while controlling token/tool budgets; empirically, benefits saturate around 12–15 agent styles, and stopping early preserves diversity and lowers cost without sacrificing accuracy Lets discuss the Results Under comparable inference budgets to strong tool-augmented baselines (Self-MoA, Symbolic-MoE, DEI, SciMaster, GSA), TUMIX yields the best average accuracy; a scaled variant (TUMIX+) pushes further with more compute: HLE (Humanity’s Last Exam): Pro: 21.6% → 34.1% (TUMIX+); Flash: 9.7% → 23.1%. (HLE is a 2,500-question,...
Like Comment
To view or add a comment, sign in
Josh Blyskal

Leading AEO Strategy & Research @ Profound
1mo Edited
Report this post
That's a wrap on Brighton SEO San Diego 2025! Shared some wild data from our analysis of 3B+ AI citations and 250M+ answer engine responses: Reddit citations have been jumping 800%+ (now 8% of all citations). ChatGPT quietly switched from Bing to a hybrid index using SerpApi. And traditional SEO signals like traffic explain less than 5% of AI citation behavior. The most surprising finding: 37% of AI search prompts have "generative intent" - users asking AI to create something, not find something. That's an entirely new search behavior we need to optimize for. Practical takeaways from the data: - Semantic URLs get 11.4% more citations - Positions 5-10 on Google are viable for AI visibility - Listicles and comparative content dominate (25%+ of all citations) - Meta descriptions that "spoil the content" perform better Slides dropping in the comments for those who want to dig into the data. Incredible energy in San Diego as always. For those asking about tracking your own AI visibility or getting access to the citation data we showed: reach out and we'll get you sorted. Already counting down to SEOIRL in Toronto. See you there 👀 Get a demo of Profound here: https://tryprofound.com/
54 Comments
Like Comment
To view or add a comment, sign in
Alex Kirpichonak

Make clients find you | MarTech solutions
1w Edited
Report this post
Old SEO workflows are getting torched. Here’s what just changed. Ahrefs dropped 15 real use cases for its new Model Context Protocol (MCP) server, letting you query live Ahrefs data inside ChatGPT, Claude, or Copilot. Think keyword, backlink, and competitor insights—all fetched in under two minutes without switching tools. Glen Allsopp (yes, the 450K‑weekly‑users guy) wrote the guide. Why this matters? Because it rewires how we do SEO ops. You can skip the endless CSVs and API headaches. Just ask your AI assistant to run 20‑keyword analyses, 10‑site backlink comparisons, or international‑growth checks—and get structured reports on the fly. BTW, I’ve been testing an internal tool that unifies data from Ahrefs, Semrush, and more. When you can "talk" to your SEO data, performance jumps fast. You move from dashboards to decisions in real time. This is the real future of AIO and GAIO: human‑led strategy, AI‑powered execution... Exited! #SEO #ai #mcp #SearchIntelligence #GrowthMarketing Source: https://lnkd.in/enaVV9-S
Like Comment
To view or add a comment, sign in
Vineeth Nair

Technical Architect @ QBurst
2w
Report this post
𝙂𝙤𝙤𝙜𝙡𝙚’𝙨 𝙣𝙪𝙢=100 𝙋𝙖𝙧𝙖𝙢𝙚𝙩𝙚𝙧 𝙍𝙚𝙩𝙞𝙧𝙚𝙢𝙚𝙣𝙩 - 𝘚𝘮𝘢𝘭𝘭 𝘊𝘩𝘢𝘯𝘨𝘦, 𝘉𝘪𝘨 𝘙𝘪𝘱𝘱𝘭𝘦𝘴 In 𝘚𝘦𝘱𝘵𝘦𝘮𝘣𝘦𝘳 2025, Google quietly retired the &num=100 search parameter, the long-used method to fetch 100 results per page. What seems like a minor tweak has rippled through SEO platforms, data aggregators, and AI search assistants that rely on large-scale web data collection. Without bulk retrieval, platforms like 𝗣𝗲𝗿𝗽𝗹𝗲𝘅𝗶𝘁𝘆, 𝗬𝗼𝘂.𝗰𝗼𝗺, 𝗔𝗻𝗱𝗶, and even 𝗖𝗵𝗮𝘁𝗚𝗣𝗧 (𝘪𝘯 𝘸𝘦𝘣-𝘣𝘳𝘰𝘸𝘴𝘪𝘯𝘨 𝘮𝘰𝘥𝘦𝘴) now face slower lookups, higher crawl costs, and narrower visibility into long-tail content. Over time, this shift could subtly influence how AI tools learn from and summarize the web. A small change in Google’s search output limits yet another reminder of how dependent the modern AI ecosystem is on upstream data pipelines.
Like Comment
To view or add a comment, sign in
Claret Ibeawuchi

AI Engineer | Developer Relations 🥑 | Building and talking about Intelligent, Secure and Resilient Systems
4w
Report this post
Remember my post on grounding with maps here: https://lnkd.in/ddNzCtbM Well, Google ADK 1.15.0 has made implementing this easier by adding the Google Maps grounding tool as a built-in tool in the ADK tools library: https://lnkd.in/dP8dkXvq 🚀 Key Features in ADK 1.15: 🗺️ Google Maps Grounding Tool (Built-in!) - Hyper-local AI responses with real-time business data - 250+ million places accessible through natural conversation - Seamless integration with existing ADK agents 📊 OpenTelemetry Support - `--otel_to_cloud` experimental support for comprehensive monitoring - GenAI Instrumentation built into the framework - End-to-end tracing of AI workflows 🧠 Context Caching & Static Instructions - Context caching for faster response times - Static instructions that don't change (perfect for system prompts) - Auto-creation and lifecycle management of context caches What I Know So Far Before Building With It: Based on the source code, the GoogleMapsGroundingTool: - Only works with Gemini 2.x models (not Gemini 1.x) - Requires VertexAI (GOOGLE_GENAI_USE_VERTEXAI=TRUE) - Operates internally within the model - no local code execution - Automatically invoked by Gemini 2 models for grounding queries - Built-in tool that doesn't require manual configuration Which one are you most excited to try? #AI #GoogleADK #MachineLearning #OpenTelemetry #GoogleMaps #VertexAI #SoftwareDevelopment
2 Comments
Like Comment
To view or add a comment, sign in

10,848 followers

204 Posts

View Profile Follow

LinkedIn respects your privacy

How Search APIs are Revolutionizing Data Collection for Businesses

Explore content categories