Building Agentic solutions that rely on a Language Model are wrappers offer no value add. An Agentic Architecture will have to be designed and engineered! Devising an architecture requires both an in depth understanding of the business goals and a deeper understanding of AI tools and the landscape. Making it both engineering and art. 1/ A good agentic design should be model agnostic and use case specific. Knowing to leverage Small Language models in orchestrating workflows is a breakthrough in managing costs, and latency. The NVIDIA paper Small Language Models are the Future of Agentic AI is a dialectical account of this - https://lnkd.in/ghVhu9ur 2/ If you are thinking of knowledge management to models, think Context Engineering and not RAG. Actively & continuously structuring the inputs into an LLM’s context window is key to unlocking accuracy and minimizing hallucinations. This post by SRK summarises everything about Context Engineering - https://lnkd.in/g4ywnGdn 3/ Knowing how LLMs function involves making sense of the LLM’s Semantic layer ( can be called vibing !?). Being able to map business, technical, context data into a semantic structure that the LLM gets is a very practical and lived experience. Read this post by Zaher Alhaj https://lnkd.in/gkQhAn6A 4/ Planning of LLMs – Agentic Layers requires to plan at each step using inference time reasoning. Getting to know the right inference time reasoning techniques to use such that cost per task, latency, steps per task, context adherence are all optimized for is a skill. Knowing about all the planning techniques in a single place is here - https://lnkd.in/g4ZgAakD 5/ Agentic applications go from a POC to Production only using Evals. Evals enable to devise a structured approach to Test, Verify, Iterate to build Agentic applications. Evals is turning out to be the most fundamental value add for production ready systems. Hamel H. has been harking about it for more than a year see his FAQs for Evals here - https://lnkd.in/gPjMvNTt Beyond the hype that AI solutions can do magic and are transforming in breakneck speed. For Agentic AI to become a ‘normal technology’ is a marathon – steady in its own pace. Maturity in technology is the virtuous cycle Innovation -> Adoption -> Diffusion. https://lnkd.in/giaRmtPQ Lastly, an area that I personally believe will re define agent development is Post Training Language Models [PoLM]. There is a lot to learn and experiment here. Here is an encyclopaedia of it - https://lnkd.in/gk_TZ3B4 Keep it going AI PMs, AI Architects and AI Engineers !
Building Agentic Solutions with AI: A Marathon, Not a Sprint
More Relevant Posts
-
Here’s the latest AI news for today, 9/8/2025. 1. Cognition Funding and Team Moves Cognition raised $400M at a $10.2B post-money valuation as part of its $10B round. swyx announced he’s joining Cognition while other projects remain independent. 2. Agent Development and Coding Tools Vercel launched an open-source vibe coding platform with a tuned GPT‑5 loop that built a multiplayer Pong game in one shot. Claude Code demonstrates that a minimal master loop with async buffering can deliver robust debuggability. 3. Model and Inference Advances Kimi K2 0905 doubled its context window to 256k tokens while boosting coding benchmarks. Alibaba’s Qwen3‑ASR now handles multilingual transcription with less than 8% error. 4. Faster Decoding and Optimizations Meta’s Set Block Decoding speeds up generation by 3–5× without architectural changes. New KV cache and quant innovations are reducing compute costs and latency. 5. Multimodal Generation and “Vibe Coding” Google’s Veo 3 now goes GA with ~50% price cuts, 1080p output and vertical video support on the Gemini API. Open-sourced “Nano Banana” projects are catalyzing playful, one‑click workflows in Google AI Studio. 6. Open‑Source LLM Developments The UAE team teased a “K2 Think” reasoning model, though key specs are still missing. Tilde AI released TildeOpen‑30B, a multilingual open‑source model aimed at underrepresented European languages. 7. Local Hardware and Offline LLMs A dual RTX 6000 workstation build shows that power limiting can keep performance losses minimal. Users discuss offline setups with models like GLM 4.5 and Qwen3 30B to balance efficiency and reliability. 8. ChatGPT Regression and Guardrails Debate Users report that new ChatGPT versions are regressing in basic instruction-following and context retention. Critics say investor-driven guardrails are replacing open-ended conversation with more structured, code-like outputs. 9. Discord Community and Hardware Updates Discord highlights include creative advances with Grok 2.5 for writing and Qwen3-Coder’s policy-light roleplay performance. Meanwhile, discussions on GPU hardware note the shock of NVIDIA’s RTX 5090 pricing and rapid optimization on AMD’s MI300x8. 10. Legal Settlements and AI’s Societal Impact Anthropic’s $1.5B settlement with authors is stirring fears of stricter US AI regulation. Meanwhile, Geoffrey Hinton warns that AI’s economic benefits may concentrate wealth, widening inequality. Stay tuned for more technical insights and practical tips as the AI landscape evolves.
To view or add a comment, sign in
-
Small Language Models (SLM) and data curation : there is a trend in the wild AI space that I finally resonate with. The guys at distil labs, the paper from NVIDIA (link in comments), the focus on data curation from DatologyAI, and more are showing promising directions. The last 2 years have been focused solely on compute and model architecture. Even though they are important components of AI project success, this trend made people working on data curation and/or using SLMs feel like second-class engineers. There are many variables explaining why this happened: frontier labs pushing their agenda, text data being easily accessible online, and the allure of scaling laws. But let's focus on the positive trend that's emerging now: laser-focused use cases using SLMs and data curation to solve real-world problems. At Aiba - Safe Digital Lives we have focused on these topics from day one by design. Cherry on the cake: you can deploy very useful models in production at a fraction of the cost. The future of AI isn't just about bigger models—it's about smarter, more targeted solutions that actually deliver value where it matters most.
To view or add a comment, sign in
-
🚀 Small Language Models (SLMs) vs Large Language Models (LLMs): Where is the Future? As leaders, we all want to harness AI without breaking the bank — or the planet. NVIDIA’s new paper — Small Language Models are the Future of Agentic AI — makes a strong case that SLMs are not just viable, but essential for the next generation of business AI agents. 🔍 What is an SLM? Small Language Models (SLMs) are compact, efficient models (typically <10B parameters) that run with low latency on everyday consumer devices. Think of them as agile specialists in your AI toolkit: faster, cheaper, and easier to deploy for specific business tasks. 💡What NVIDIA Found • SLMs aren't just viable – they're superior for most agentic subtasks • 60-70% of current LLM queries in popular AI agents could be reliably handled by specialized SLMs. That's massive cost savings • NVIDIA even provides a playbook for migrating workloads from LLMs to SLMs Why SLMs Beat LLMs for Business: 💰 10–30x More Cost-Effective: Lower inference costs, Smaller infrastructure footprint, Faster ROI ⚡ Performance Where It Counts: Real-time response, Edge deployment, Specialized task accuracy 🎯 Perfect for Agentic Workflows: Most AI agents don’t need open-ended dialogue — they need to: Call APIs & tools, Generate code for specific frameworks, Process structured data, Automate workflows 🔄 Operational Flexibility: Fine-tune in hours, not weeks, Deploy multiple specialized models, Easier compliance & data control ✅ The Business Reality Check: The winning strategy? Hybrid systems—use SLMs by default, invoke LLMs when needed Link to the full paper is in the comments #AIforBusiness #SLMs #LLMs #AgenticAI #NVIDIA #FutureOfWork
To view or add a comment, sign in
-
-
NVIDIA's research highlights the balance between Large Language Models (LLMs) and Small Language Models (SLMs) as key to the next generation of business AI agents. The recommendations focus on lowering costs and improving efficiency in AI deployment: - use SLMs for cost-effective, real-time, or on-device applications. - build modular agentic systems with SLMs for routine tasks, and LLMs for complex reasoning. - prioritize SLMs for rapid task specialization and faster adaptation.
🚀 Small Language Models (SLMs) vs Large Language Models (LLMs): Where is the Future? As leaders, we all want to harness AI without breaking the bank — or the planet. NVIDIA’s new paper — Small Language Models are the Future of Agentic AI — makes a strong case that SLMs are not just viable, but essential for the next generation of business AI agents. 🔍 What is an SLM? Small Language Models (SLMs) are compact, efficient models (typically <10B parameters) that run with low latency on everyday consumer devices. Think of them as agile specialists in your AI toolkit: faster, cheaper, and easier to deploy for specific business tasks. 💡What NVIDIA Found • SLMs aren't just viable – they're superior for most agentic subtasks • 60-70% of current LLM queries in popular AI agents could be reliably handled by specialized SLMs. That's massive cost savings • NVIDIA even provides a playbook for migrating workloads from LLMs to SLMs Why SLMs Beat LLMs for Business: 💰 10–30x More Cost-Effective: Lower inference costs, Smaller infrastructure footprint, Faster ROI ⚡ Performance Where It Counts: Real-time response, Edge deployment, Specialized task accuracy 🎯 Perfect for Agentic Workflows: Most AI agents don’t need open-ended dialogue — they need to: Call APIs & tools, Generate code for specific frameworks, Process structured data, Automate workflows 🔄 Operational Flexibility: Fine-tune in hours, not weeks, Deploy multiple specialized models, Easier compliance & data control ✅ The Business Reality Check: The winning strategy? Hybrid systems—use SLMs by default, invoke LLMs when needed Link to the full paper is in the comments #AIforBusiness #SLMs #LLMs #AgenticAI #NVIDIA #FutureOfWork
To view or add a comment, sign in
-
-
This NVIDIA Research position paper makes a strong, system-level case that 𝐒𝐦𝐚𝐥𝐥 𝐋𝐚𝐧𝐠𝐮𝐚𝐠𝐞 𝐌𝐨𝐝𝐞𝐥𝐬 (𝐒𝐋𝐌𝐬) (<~7B params; single-user, low-latency, consumer-device deployable) should become the default substrate for agentic AI. 𝐊𝐞𝐲 𝐭𝐡𝐞𝐬𝐞𝐬: • 🧠 Modern SLMs (SmolLM2 125M to 1.7B, Hymba-1.5B, Phi-3 7B, Nemotron-H 2 to 9B) now match or exceed prior-gen 30–70B models on core agent primitives: tool invocation, structured output, code gen, grounded reasoning. • 🚀 10–30× lower inference FLOPs / energy / latency; cheap PEFT (LoRA, DoRA, QLoRA) + overnight domain adaptation; better routing granularity; simpler infra (less tensor/model parallel). • 🔁 Agents mostly perform narrow, repeatable, tool-mediated transformations—format determinism > open-ended dialogue. • 🌱 Reduced cost, environmental footprint, and faster iteration loops = higher experiment velocity. 🥇 “𝘚𝘓𝘔-𝘧𝘪𝘳𝘴𝘵, 𝘦𝘴𝘤𝘢𝘭𝘢𝘵𝘦 𝘵𝘰 𝘓𝘓𝘔 𝘰𝘯𝘭𝘺 𝘸𝘩𝘦𝘯 𝘢𝘮𝘣𝘪𝘨𝘶𝘪𝘵𝘺/𝘯𝘰𝘷𝘦𝘭 𝘢𝘣𝘴𝘵𝘳𝘢𝘤𝘵𝘪𝘰𝘯 𝘢𝘳𝘪𝘴𝘦𝘴.” Worth a read if you’re rethinking agent cost curves and routing topologies. 👉 https://lnkd.in/djV5qjTs
To view or add a comment, sign in
-
I am not sure if this was necessarily a secret, with the release of Deep seek, it was clear that size is not necessary to achieve the goal of a useful language model. But more importantly, we should take this as a wakeup call, to the alternative of cloud based, ever bigger artificial general intelligence models. Focused models at the edge enable a significant reduction in power, allow for more security, and bring on application specific intelligence. The world cannot support the ever growing demands for power from ever larger models
Just In: NVIDIA may have exposed the biggest secret in AI 😳 A new paper shows Small Language Models often outperform massive LLMs like GPT-4 or Claude in real-world use The reason is simple: Most agent tasks (summarizing docs, extracting info, writing templates, calling APIs) are predictable. For these, SLMs are cheaper and better. ▪️ Smaller ≠ weaker. Toolformer (6.7B) beats GPT-3 (175B). DeepSeek-R1-Distill (7B) outperforms Claude 3.5 and GPT-4o on reasoning. ▪️ 10–30x cheaper, faster, and deployable locally. ▪️ Easy to fine-tune with LoRA or QLoRA. ▪️ Perfect fit for structured outputs like JSON or Python. The smarter architecture is clear. Default to SLMs. Call an LLM only when absolutely necessary. The future of AI agents looks like is modular systems built on SLMs. Do you think startups will move fast to SLMs, or stay locked into LLM-heavy stacks? AI Resources you might find helpful: • OpenAI’s New Playbook for Product-Market Fit in AI Startups https://lnkd.in/dy-HgVtX • 2,500+ Angel Investors Backing AI Startups - https://lnkd.in/djjN-TD2 • 50 AI Agent Startup Ideas for 2025 - https://lnkd.in/dy9E-MCn • AI Co-Pilots for Startups & VCs - https://lnkd.in/dSX3ss2q • The Agentic Revolution - https://lnkd.in/dadFm25a • 118 AI Communication Agent Use Cases - https://lnkd.in/dr9cM5BT
To view or add a comment, sign in
-
-
Big isn’t always better: NVIDIA highlights how Small Language Models can outperform LLMs in agent tasks — efficient, affordable, and easier to deploy.
Just In: NVIDIA may have exposed the biggest secret in AI 😳 A new paper shows Small Language Models often outperform massive LLMs like GPT-4 or Claude in real-world use The reason is simple: Most agent tasks (summarizing docs, extracting info, writing templates, calling APIs) are predictable. For these, SLMs are cheaper and better. ▪️ Smaller ≠ weaker. Toolformer (6.7B) beats GPT-3 (175B). DeepSeek-R1-Distill (7B) outperforms Claude 3.5 and GPT-4o on reasoning. ▪️ 10–30x cheaper, faster, and deployable locally. ▪️ Easy to fine-tune with LoRA or QLoRA. ▪️ Perfect fit for structured outputs like JSON or Python. The smarter architecture is clear. Default to SLMs. Call an LLM only when absolutely necessary. The future of AI agents looks like is modular systems built on SLMs. Do you think startups will move fast to SLMs, or stay locked into LLM-heavy stacks? AI Resources you might find helpful: • OpenAI’s New Playbook for Product-Market Fit in AI Startups https://lnkd.in/dy-HgVtX • 2,500+ Angel Investors Backing AI Startups - https://lnkd.in/djjN-TD2 • 50 AI Agent Startup Ideas for 2025 - https://lnkd.in/dy9E-MCn • AI Co-Pilots for Startups & VCs - https://lnkd.in/dSX3ss2q • The Agentic Revolution - https://lnkd.in/dadFm25a • 118 AI Communication Agent Use Cases - https://lnkd.in/dr9cM5BT
To view or add a comment, sign in
-
-
This is a fascinating insight: Small Language Models (SLMs) often beat massive LLMs in real-world tasks. Why? Because most AI agent work — summarizing, extracting, controlling APIs — is predictable. For this, smaller models are: ⚡ Faster 💰 Cheaper 📱 Deployable locally ➡️ And that’s exactly where Edge AI comes in. At Infineon Technologies , we see the same pattern every day: • Edge devices (cars, factories, sensors) need efficient, secure AI that runs on-site. • Our PSoC™ Edge and AURIX™ MCUs with ML accelerators are designed for these “small but smart” models. • Combined with secure elements and power-efficient semiconductors, they make AI trustworthy, frugal, and sovereign at the edge. This shift — from giant cloud models to distributed, efficient edge intelligence — is also at the heart of IPCEI AST. Europe can lead by building the ecosystem where small models + smart hardware create the biggest impact. SLMs are the software. Edge AI is the hardware. Together, they are the future. #IPCEI #EdgeAI #Infineon #DigitalSovereignty #AI #SLM #EdgeComputing #Europe2030 #IPCEI
Just In: NVIDIA may have exposed the biggest secret in AI 😳 A new paper shows Small Language Models often outperform massive LLMs like GPT-4 or Claude in real-world use The reason is simple: Most agent tasks (summarizing docs, extracting info, writing templates, calling APIs) are predictable. For these, SLMs are cheaper and better. ▪️ Smaller ≠ weaker. Toolformer (6.7B) beats GPT-3 (175B). DeepSeek-R1-Distill (7B) outperforms Claude 3.5 and GPT-4o on reasoning. ▪️ 10–30x cheaper, faster, and deployable locally. ▪️ Easy to fine-tune with LoRA or QLoRA. ▪️ Perfect fit for structured outputs like JSON or Python. The smarter architecture is clear. Default to SLMs. Call an LLM only when absolutely necessary. The future of AI agents looks like is modular systems built on SLMs. Do you think startups will move fast to SLMs, or stay locked into LLM-heavy stacks? AI Resources you might find helpful: • OpenAI’s New Playbook for Product-Market Fit in AI Startups https://lnkd.in/dy-HgVtX • 2,500+ Angel Investors Backing AI Startups - https://lnkd.in/djjN-TD2 • 50 AI Agent Startup Ideas for 2025 - https://lnkd.in/dy9E-MCn • AI Co-Pilots for Startups & VCs - https://lnkd.in/dSX3ss2q • The Agentic Revolution - https://lnkd.in/dadFm25a • 118 AI Communication Agent Use Cases - https://lnkd.in/dr9cM5BT
To view or add a comment, sign in
-
-
Thrilled to share the architecture of an intelligent, autonomous agent I've developed! 🚀 This goes beyond a standard chatbot; it's a dynamic agent operating on a reasoning and acting (ReAct) framework. It leverages a suite of tools for grounded generation, ensuring responses are accurate, context-aware, and factual. Here's a high-level look at the agentic workflow: 📥 Input: The process is triggered by an initial user prompt. 🧠 The Agent as Orchestrator: The central AI Agent initiates a reasoning loop. It performs intent recognition and task decomposition, then constructs an execution plan to determine the optimal sequence of tool usage or direct generation. 🛠️ Tool-Augmented Generation: The agent intelligently orchestrates its capabilities: Foundation Model (OpenAI): Serves as the core cognitive engine for advanced language comprehension, logical inference, and natural language generation. Conversational Memory: A short-term memory buffer enables stateful conversation, preserving context for coherent, multi-turn dialogues. SerpAPI for Knowledge Retrieval: To combat model hallucination and overcome static knowledge limitations, this tool grounds the agent by fetching real-time data from the web. 🌐 Calculator for Symbolic Reasoning: The agent offloads deterministic mathematical tasks to this tool, ensuring computational accuracy where LLMs can be unreliable. 🔢 This composable architecture creates a powerful assistant capable of executing complex tasks that require planning, external information retrieval, and precise calculation. It's a significant step towards building more robust, reliable, and truly functional AI systems. #AI #ArtificialIntelligence #LLM #AgenticAI #GenerativeAI #Grounding #Reasoning #OpenAI #SerpAPI #SoftwareDevelopment #AIAgents
To view or add a comment, sign in
-
-
🚨 Rethinking AI Agents: The Future is Hybrid, and It's Powered by SLMs! While LLMs get the spotlight, new NVIDIA research argues the future of AI agents is the Small Language Model (SLM). The reason is simple: most agentic tasks are repetitive, narrow, and structured. They don’t always require the heavy compute and cost of LLMs. Why SLMs are useful for agentic AI: ⚡️ Powerful Enough: Modern SLMs (e.g., NVIDIA Nemotron-H, Huggingface SmolLM2) match larger models on critical agent tasks like reasoning and tool calling. 💰 More Economical & Faster: With lower latency and 10-30x cheaper to run, they enable real-time responses with less cloud expenses. Fine-tuning for specialized workflows is also dramatically faster. ⚙️ Operationally Superior: Ideal for modular design, SLMs are perfect for edge and on-device AI, and offer better privacy and user control. They ensure easily formatted outputs for reliable tool use. 📌 Suggested Approach: The path forward isn't about replacing LLMs, but building smarter. - Build hybrid, heterogeneous architectures: modular agents with a mix of models. - Fine-tune SLMs when possible for specific, high-volume skills. - Gradually migrate sub-agents and repetitive tasks from LLMs to SLMs. 📈 The Future is Hybrid: This approach is more than a technical fix. It’s a move toward responsible and sustainable AI that unlocks massive cost savings, broader accessibility, and more scalable systems. 📖 Paper: "SLMs for Agentic AI" https://lnkd.in/gfMq3EEK
To view or add a comment, sign in
Explore related topics
- How to Understand Agentic Systems
- How to Build Production-Ready AI Agents
- How to Master Agentic AI Development
- How to Implement Agentic AI Innovations
- Tips to Secure Agentic AI Systems
- How to Prepare for Agentic Transformation
- The Role of Agentic AI in Automation
- How to Improve Agent Performance With Llms
- Challenges Associated With Agentic AI
- How to Evaluate Agentic AI Performance
Explore content categories
- Career
- Productivity
- Finance
- Soft Skills & Emotional Intelligence
- Project Management
- Education
- Technology
- Leadership
- Ecommerce
- User Experience
- Recruitment & HR
- Customer Experience
- Real Estate
- Marketing
- Sales
- Retail & Merchandising
- Science
- Supply Chain Management
- Future Of Work
- Consulting
- Writing
- Economics
- Artificial Intelligence
- Employee Experience
- Workplace Trends
- Fundraising
- Networking
- Corporate Social Responsibility
- Negotiation
- Communication
- Engineering
- Hospitality & Tourism
- Business Strategy
- Change Management
- Organizational Culture
- Design
- Innovation
- Event Planning
- Training & Development
Building NetoAI | Strategy, Product, Ops Leader | IIM-L | NIT-T
1moSivas Subramaniyan Sharp post - appreciate the balance of engineering rigor and product realism. Model-agnostic design, context engineering (not just RAG), and disciplined evals are exactly how agentic apps move from POC to production. Exactly what PMs, architects, and engineers need, to explore real AI solutions - highly recommended.