This Week at Nexa 🚀 - new model support, new platforms, and new ways for builders to get involved. 1) NexaML supports latest models across Qualcomm platforms on NPU: For example, Liquid AI’s LFM2-1.2B now runs fully on the Qualcomm Hexagon NPU, across IoT (Dragonwing IQ-9075), Automotive (SA8295), Mobile (Samsung S25), and Compute (Snapdragon X Elite) devices 2) IBM Granite 4.0 Nano joins the lineup on NPU: Powered by NexaSDK, Granite 4.0 Nano runs locally on Snapdragon X Elite at 60 tokens/sec (full-precision). We built an AI agent demo showing it fetching live info and organizing files through pure on-device function calls. 3) Builder Bounty Program is live: Developers can now earn up to $1,500, get the “Nexa Builder” Discord badge, and be featured in our SDK repo and launch docs — just by shipping an on-device AI project. Learn more: https://lnkd.in/gHRzCyMX 4) Nexa Wishlist launches: Vote for the next models we bring on-device — from GGUF to MLX to NexaML (Qualcomm and Apple NPU). Vote today: sdk.nexa.ai/wishlist
Nexa AI
Software Development
Cupertino, California 5,611 followers
On Device AI Deployment and Research | NexaSDK: github.com/NexaAI/nexa-sdk | Hyperlink App: https://hyperlink.nexa.ai/
About us
Nexa AI is an on device AI deployment and research company. We craft optimized foundation models and on-device inference framework that runs any model on any device, across any backend—within minutes. Our mission is to make on device AI friction‑free and production‑ready.
- Website
-
https://nexa.ai/
External link for Nexa AI
- Industry
- Software Development
- Company size
- 11-50 employees
- Headquarters
- Cupertino, California
- Type
- Privately Held
- Founded
- 2023
Locations
-
Primary
Get directions
Cupertino, California 95014, US
Employees at Nexa AI
Updates
-
Our Hyperlink website just got a brand new look. As local AI becomes the new interface for your computer, it deserves a design that feels just as elegant — calm, clear, and beautifully simple. It starts from the landing page — a quiet, enjoyable expression of what Hyperlink is. And that’s not all — the floating UI feature is now ready for public testing. It stays with you wherever you go. Go to Settings → Shortcut and turn it on. Start exploring the new Hyperlink experience and see how it fits your flow. Tell us what you think of the new Hyperlink look. 👇 Link below. Kudos to the product and design team at Nexa AI.
-
-
The best AI roadmap is shaped by real builders. For weeks, one request kept showing up in our inbox: “Can Nexa support Qwen3-8B on Qualcomm NPU?" So we did. Qwen3-8B now runs fully on-device on Qualcomm Hexagon NPU through NexaML. And today, we’re making this loop official: introducing Nexa Wishlist — where developers can request and vote for the next models we bring on-device. Whether it’s GGUF or MLX for CPU/GPU, or Nexa format for Qualcomm and Apple NPU — just: 1. Submit the Hugging Face repo ID 2. Select the backends you want supported 3. Watch as popular models get supported The community leads. We build fast. Vote today on https://lnkd.in/griaEP5R — and drop your requested model in the comments.
-
Fully local RAG on Qualcomm Hexagon NPU — built with Python. Using NexaSDK Python Library (pip install nexaai), this demo indexes your local docs and answers questions entirely on-device. Demo below: - First minute: the RAG demo in action on Snapdragon - Next 40 seconds: quickstart in Jupyter Notebook— easily setting up and running models on the Qualcomm Hexagon NPU with NexaML. If you know Python, you can build advanced on-device AI — private, fast, and hardware-accelerated. GitHub code in comment. Manoj Khilnani, Chun-Po Chang, Srinivasa Deevi, Madhura Chatterjee
-
💰 Nexa Builder Bounty Program is live. Learn, build, and get paid for shipping on-device AI. Build with NexaSDK — the unified on-device engine with NPU acceleration, full multimodal (text / vision / audio), and cross-platform support. - Earn up to $1,500 - Get the “Nexa Builder” Discord badge - Be featured in our SDK repo & launch materials Run models others can’t even touch locally (like Qwen3-VL).Fast to start, simple to build, and the best way to learn the modern on-device AI stack while earning for your work. Participation detail here: https://lnkd.in/gHRzCyMX Got an idea? Drop it below or tag a dev who should join
-
-
Nexa AI reposted this
Introducing Granite 4.0 Nano — compact, open-source models built for AI at the edge: https://ibm.co/6041Bzpsx Available in 350M and 1B for building AI on laptops and mobile devices. Now available on: ✅ Docker, Inc ✅ Hugging Face ✅ Nexa AI ✅ Ollama ✅ Qualcomm ✅ Unsloth AI
-
-
IBM Granite 4.0 Nano is out and runs Day-0 on Qualcomm Hexagon NPU on NexaSDK, powered by our NexaML engine, currently the only framework that brings Granite models to full NPU execution — unlocking real, on-device AI agents. We built a demo where Granite 4.0 Nano fetches live info and organizes files through function calls — fully local inference, blazingly fast at 60 tokens/sec on Snapdragon X Elite with full-precision. AI agents need to run on-device for instant response, privacy, and always-on awareness — and NPUs are built exactly for that. Small models like Granite 4.0 Nano make this possible, and NexaML makes it practical — turning every phone, PC, car, and IoT device into an AI agent platform. Demo below. Star NexaSDK for more Day-0 supports! Links in comments. Manoj Khilnani, Chun-Po Chang, Eda Kavlakoğlu, Saleem Hussain, Gabe Goodhart, Neel Kishan, Rodrigo A., Srinivasa Deevi, Devang Aggarwal, Madhura Chatterjee
-
NexaML supports the latest models across all Qualcomm platforms — from compute to mobile, automotive, and IoT fully on NPU — with real-time speed and rapid turnaround time. For example, Liquid AI’s LFM2-1.2B — a new hybrid model combining multiplicative-gate and convolution layers — now runs 100% on-device on Qualcomm Hexagon NPUs across all platforms: - Dragonwing IQ-9075 (IoT): 45 tokens/sec - SA8295 (Automotive): 37 tokens/sec - Samsung S25 (Mobile): 89 tokens/sec - Snapdragon X Elite (Compute): 52 tokens/sec This marks the first time a state-of-the-art small language model runs across Qualcomm’s full ecosystem under one unified inference engine — NexaML. Check out the demo below or reach out to explore model integration for Qualcomm platforms. See blog in comment. Manoj Khilnani, Chun-Po Chang, Srinivasa Deevi, Devang Aggarwal, Madhura Chatterjee, Damanjit Singh
-
This week at Nexa 🚀 — SOTA model support, Python Library, Community showcase 1) Liquid AI's LFM2-1.2B on Qualcomm Hexagon NPU with 52 tok/s on Snapdragon X Elite, powered by NexaSDK. 2) Qwen's Qwen3-VL now runs locally on Qualcomm Oryon CPU, Adreno GPU, and Hexagon NPU with NexaSDK powered by NexaML (first/only on Snapdragon). 3) Python bindings live — pip install nexaai to run LLMs, VLMs, ASR, embedding models on NPU, GPU, CPU from Python. 4) NexaML Profiling Tool (preview) — shows E2E latency and per-op breakdown to tune the inference performance on NPU. 5) Community: Nexa AI was featured at Qualcomm and AMD booths at the PyTorch conference this week.