Qwen3-Next, or to say, a preview of our next generation (3.5?) is out! This time we try to be bold, but actually we have been doing experiments on hybrid models and linear attention for about a…

Tech Lead of Qwen

1mo

Qwen3-Next, or to say, a preview of our next generation (3.5?) is out! This time we try to be bold, but actually we have been doing experiments on hybrid models and linear attention for about a year. We believe that our solution shoud be at least a stable and solid solution to new model architecture for super long context! GDN plus hybrid is based on a lot of trials and errors, and the implementation of attention gate is something just like a free lunch to get benefits. Moreover, we continue our research on MoE and carefully further increase the sparsity to make it more efficient and effective! What makes us suffer a lot is that you need to run the whole process of training to evaluate new model architecture, which means pre-training + post-training (notably reinforcement learning). We have proven it working and we release the instruct and thinking models both after RL. Nevertheless, as this is for the first time that we release something totally new, we are still unsure about what we have done right or wrong, and we need the support from the community. Specifically, many thanks to Hugging Face, vLLM, and SGLang. They have done quite a lot of efforts helping us deliver this new model to you all! Welcome to try and send us feedback! Hope it is a good start of a new journey 🚗

Qwen

31,778 followers

1mo

🚀 Introducing Qwen3-Next-80B-A3B — the FUTURE of efficient LLMs is here! 🔹 80B params, but only 3B activated per token → 10x cheaper training, 10x faster inference than Qwen3-32B.(esp. @ 32K+ context!) 🔹Hybrid Architecture: Gated DeltaNet + Gated Attention → best of speed & recall 🔹 Ultra-sparse MoE: 512 experts, 10 routed + 1 shared 🔹 Multi-Token Prediction → turbo-charged speculative decoding 🔹 Beats Qwen3-32B in perf, rivals Qwen3-235B in reasoning & long-context 🧠 Qwen3-Next-80B-A3B-Instruct approaches our 235B flagship. 🧠 Qwen3-Next-80B-A3B-Thinking outperforms Gemini-2.5-Flash-Thinking. Try it now: https://chat.qwen.ai/ Blog: https://lnkd.in/g3KFxmhE Huggingface: https://lnkd.in/gM33n78k ModelScope: https://lnkd.in/g8hiCqjW Kaggle: https://lnkd.in/gKRnu6x8 Alibaba Cloud API: https://lnkd.in/gUjMEj4t

2 Comments

John Davies

CTO Incept5 (AI) and Fintex

1mo

37 or 41 safetensors on one and 35 on the other, I can’t wait to try this out. Thanks Junyang and team!

Mehdi Baneshi

I build e2e platform

1mo

Keep going, thanks for everything

See more comments

To view or add a comment, sign in

LinkedIn respects your privacy

Junyang Lin’s Post

Explore content categories