Nexa AI’s cover photo
Nexa AI

Nexa AI

Software Development

Cupertino, California 5,611 followers

On Device AI Deployment and Research | NexaSDK: github.com/NexaAI/nexa-sdk | Hyperlink App: https://hyperlink.nexa.ai/

About us

Nexa AI is an on device AI deployment and research company. We craft optimized foundation models and on-device inference framework that runs any model on any device, across any backend—within minutes. Our mission is to make on device AI friction‑free and production‑ready.

Industry
Software Development
Company size
11-50 employees
Headquarters
Cupertino, California
Type
Privately Held
Founded
2023

Locations

Employees at Nexa AI

Updates

  • View organization page for Nexa AI

    5,611 followers

    This Week at Nexa 🚀 - new model support, new platforms, and new ways for builders to get involved. 1) NexaML supports latest models across Qualcomm platforms on NPU: For example, Liquid AI’s LFM2-1.2B now runs fully on the Qualcomm Hexagon NPU, across IoT (Dragonwing IQ-9075), Automotive (SA8295), Mobile (Samsung S25), and Compute (Snapdragon X Elite) devices 2) IBM Granite 4.0 Nano joins the lineup on NPU: Powered by NexaSDK, Granite 4.0 Nano runs locally on Snapdragon X Elite at 60 tokens/sec (full-precision). We built an AI agent demo showing it fetching live info and organizing files through pure on-device function calls. 3) Builder Bounty Program is live: Developers can now earn up to $1,500, get the “Nexa Builder” Discord badge, and be featured in our SDK repo and launch docs — just by shipping an on-device AI project. Learn more: https://lnkd.in/gHRzCyMX 4) Nexa Wishlist launches: Vote for the next models we bring on-device — from GGUF to MLX to NexaML (Qualcomm and Apple NPU). Vote today: sdk.nexa.ai/wishlist

  • Our Hyperlink website just got a brand new look. As local AI becomes the new interface for your computer, it deserves a design that feels just as elegant — calm, clear, and beautifully simple. It starts from the landing page — a quiet, enjoyable expression of what Hyperlink is. And that’s not all — the floating UI feature is now ready for public testing. It stays with you wherever you go. Go to Settings → Shortcut and turn it on. Start exploring the new Hyperlink experience and see how it fits your flow. Tell us what you think of the new Hyperlink look. 👇 Link below. Kudos to the product and design team at Nexa AI.

    • No alternative text description for this image
  • View organization page for Nexa AI

    5,611 followers

    The best AI roadmap is shaped by real builders. For weeks, one request kept showing up in our inbox: “Can Nexa support Qwen3-8B on Qualcomm NPU?" So we did. Qwen3-8B now runs fully on-device on Qualcomm Hexagon NPU through NexaML. And today, we’re making this loop official: introducing Nexa Wishlist — where developers can request and vote for the next models we bring on-device. Whether it’s GGUF or MLX for CPU/GPU, or Nexa format for Qualcomm and Apple NPU — just: 1. Submit the Hugging Face repo ID 2. Select the backends you want supported 3. Watch as popular models get supported The community leads. We build fast. Vote today on https://lnkd.in/griaEP5R — and drop your requested model in the comments.

  • View organization page for Nexa AI

    5,611 followers

    Fully local RAG on Qualcomm Hexagon NPU — built with Python. Using NexaSDK Python Library (pip install nexaai), this demo indexes your local docs and answers questions entirely on-device. Demo below: - First minute: the RAG demo in action on Snapdragon - Next 40 seconds: quickstart in Jupyter Notebook— easily setting up and running models on the Qualcomm Hexagon NPU with NexaML. If you know Python, you can build advanced on-device AI — private, fast, and hardware-accelerated. GitHub code in comment. Manoj Khilnani, Chun-Po Chang, Srinivasa Deevi, Madhura Chatterjee

  • 💰 Nexa Builder Bounty Program is live. Learn, build, and get paid for shipping on-device AI. Build with NexaSDK — the unified on-device engine with NPU acceleration, full multimodal (text / vision / audio), and cross-platform support. - Earn up to $1,500 - Get the “Nexa Builder” Discord badge - Be featured in our SDK repo & launch materials Run models others can’t even touch locally (like Qwen3-VL).Fast to start, simple to build, and the best way to learn the modern on-device AI stack while earning for your work. Participation detail here: https://lnkd.in/gHRzCyMX Got an idea? Drop it below or tag a dev who should join

    • No alternative text description for this image
  • IBM Granite 4.0 Nano is out and runs Day-0 on Qualcomm Hexagon NPU on NexaSDK, powered by our NexaML engine, currently the only framework that brings Granite models to full NPU execution — unlocking real, on-device AI agents. We built a demo where Granite 4.0 Nano fetches live info and organizes files through function calls — fully local inference, blazingly fast at 60 tokens/sec on Snapdragon X Elite with full-precision. AI agents need to run on-device for instant response, privacy, and always-on awareness — and NPUs are built exactly for that. Small models like Granite 4.0 Nano make this possible, and NexaML makes it practical — turning every phone, PC, car, and IoT device into an AI agent platform. Demo below. Star NexaSDK for more Day-0 supports! Links in comments. Manoj Khilnani, Chun-Po Chang, Eda Kavlakoğlu, Saleem Hussain, Gabe Goodhart, Neel Kishan, Rodrigo A., Srinivasa Deevi, Devang Aggarwal, Madhura Chatterjee

  • View organization page for Nexa AI

    5,611 followers

    NexaML supports the latest models across all Qualcomm platforms — from compute to mobile, automotive, and IoT fully on NPU — with real-time speed and rapid turnaround time. For example, Liquid AI’s LFM2-1.2B — a new hybrid model combining multiplicative-gate and convolution layers — now runs 100% on-device on Qualcomm Hexagon NPUs across all platforms: - Dragonwing IQ-9075 (IoT): 45 tokens/sec - SA8295 (Automotive): 37 tokens/sec - Samsung S25 (Mobile): 89 tokens/sec - Snapdragon X Elite (Compute): 52 tokens/sec This marks the first time a state-of-the-art small language model runs across Qualcomm’s full ecosystem under one unified inference engine — NexaML. Check out the demo below or reach out to explore model integration for Qualcomm platforms. See blog in comment. Manoj Khilnani, Chun-Po Chang, Srinivasa Deevi, Devang Aggarwal, Madhura Chatterjee, Damanjit Singh

  • This week at Nexa 🚀 — SOTA model support, Python Library, Community showcase 1) Liquid AI's LFM2-1.2B on Qualcomm Hexagon NPU with 52 tok/s on Snapdragon X Elite, powered by NexaSDK. 2) Qwen's Qwen3-VL now runs locally on Qualcomm Oryon CPU, Adreno GPU, and Hexagon NPU with NexaSDK powered by NexaML (first/only on Snapdragon). 3) Python bindings live — pip install nexaai to run LLMs, VLMs, ASR, embedding models on NPU, GPU, CPU from Python. 4) NexaML Profiling Tool (preview) — shows E2E latency and per-op breakdown to tune the inference performance on NPU. 5) Community: Nexa AI was featured at Qualcomm and AMD booths at the PyTorch conference this week.

Similar pages

Browse jobs

Funding

Nexa AI 1 total round

Last Round

Seed
See more info on crunchbase