💡 vLLM @ Open Source AI Week! 1️⃣ Wednesday, Oct 23 & Thursday, Oct 24: vLLM @ Pytorch Conference 2025 🚀 Explore vLLM at PyTorch Conference 2025! 📅 Sessions to catch: 1. Easy, Fast, Cheap LLM Serving for Everyone – Simon Mo, Room 2004/2006 2. Open Source Post-Training Stack: Kubernetes + Ray + PyTorch + VLLM – Robert Nishihara, Room 2004/2006 3. No GPU Left Behind: Scaling Online LLM Training With Co-located VLLM in TRL – Mert Toslali & Yu Chin Fabian Lim, Room 2000/2002 4. Enabling VLLM V1 on AMD GPUs With Triton – TBA, Room 2001/2003 5. Lightning Talk: vllm-triton-backend: State-of-the-art Performance on NVIDIA & AMD – Burkhard Ringlein, Room 2001/2003 Don’t miss insights on scaling, GPU efficiency, and cutting-edge LLM serving! https://lnkd.in/g-shmYxh 2️⃣ Wednesday, Oct 23: Trivia & Community: NVIDIA × DeepInfra x vLLM Join us during PyTorch Open Source AI Week for a fun, interactive evening: AI, tech & pop culture trivia with prizes Network with AI infrastructure & open-source enthusiasts Food & drinks included Learn, connect, and have fun outside the conference sessions! https://luma.com/cpgzpcwt
vLLM
Software Development
An open source, high-throughput and memory-efficient inference and serving engine for LLMs.
About us
vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs
- Website
-
https://github.com/vllm-project/vllm
External link for vLLM
- Industry
- Software Development
- Company size
- 51-200 employees
- Type
- Nonprofit
Employees at vLLM
Updates
-
vLLM reposted this
NVIDIA + Open Source AI Week 2025 – powered by partnerships, events, and community 👏 We’re excited to bring together technology, community, and collaboration at Open Source AI Week 2025, hosted by The Linux Foundation. From informal meetups to hackathons, panels to poster sessions, here’s the scoop on where you can join us in to advance open-source AI 👇 🥤 AI Dev Night with Unsloth AI & Mistral AI — Join us for boba and talks on training & deployment with RTX AI PCs. 🧩 Trivia & Community with Deep Infra Inc. & vLLM — A fun, interactive quiz night to connect engineers, practitioners, and open-source devs. 🧑💻 GPU MODE IRL Hackathon — Over 215 networked NVIDIA B200 GPUs, we are joining other mentors from Thinking Machines, Unsloth AI, PyTorch, Periodic Labs, Mobius Labs, Google DeepMind, and more — courtesy of Nebius. 🙌 PyTorch Conference 2025 — NVIDIA-led sessions, posters, meetups, and panels aligned with the flagship event. 🤖 Open Agent Summit — NVIDIA Developer Advocate Mitesh Patel will join a panel on The Future of Agents & Human-Agent Collaboration. 🧠 Measuring Intelligence Summit — Vivienne Zhang (Senior PM, Generative AI Software) will speak on reasoning models, benchmarks, and superintelligence 🤓 Technical Sessions & Posters — Covering topics like Lightweight, High-Performance FSDP on NVIDIA GPU, Scaling KV Caches for LLMs, and more. ⚡ Dynamo & Dine with Baseten — hands-on LLM inference & scaling 💬 Model Builders Meetup with NVIDIA Nemotron & Prime Intellect — open frontier models + RL 🔗 Stay in the loop — bookmark our event page for updates as we add more → https://nvda.ws/48ExDSb
-
-
vLLM reposted this
Come join us for trivia night in SF on Oct 22 with NVIDIA and vLLM. All things open-source and inference. RSVP: https://luma.com/cpgzpcwt
NVIDIA + Open Source AI Week 2025 – powered by partnerships, events, and community 👏 We’re excited to bring together technology, community, and collaboration at Open Source AI Week 2025, hosted by The Linux Foundation. From informal meetups to hackathons, panels to poster sessions, here’s the scoop on where you can join us in to advance open-source AI 👇 🥤 AI Dev Night with Unsloth AI & Mistral AI — Join us for boba and talks on training & deployment with RTX AI PCs. 🧩 Trivia & Community with Deep Infra Inc. & vLLM — A fun, interactive quiz night to connect engineers, practitioners, and open-source devs. 🧑💻 GPU MODE IRL Hackathon — Over 215 networked NVIDIA B200 GPUs, we are joining other mentors from Thinking Machines, Unsloth AI, PyTorch, Periodic Labs, Mobius Labs, Google DeepMind, and more — courtesy of Nebius. 🙌 PyTorch Conference 2025 — NVIDIA-led sessions, posters, meetups, and panels aligned with the flagship event. 🤖 Open Agent Summit — NVIDIA Developer Advocate Mitesh Patel will join a panel on The Future of Agents & Human-Agent Collaboration. 🧠 Measuring Intelligence Summit — Vivienne Zhang (Senior PM, Generative AI Software) will speak on reasoning models, benchmarks, and superintelligence 🤓 Technical Sessions & Posters — Covering topics like Lightweight, High-Performance FSDP on NVIDIA GPU, Scaling KV Caches for LLMs, and more. ⚡ Dynamo & Dine with Baseten — hands-on LLM inference & scaling 💬 Model Builders Meetup with NVIDIA Nemotron & Prime Intellect — open frontier models + RL 🔗 Stay in the loop — bookmark our event page for updates as we add more → https://nvda.ws/48ExDSb
-
-
🇸🇬 vLLM Singapore Meetup — Highlights Thanks to everyone who joined! Check out the slides by vLLM’s DarkLight1337 with tjtanaa / Embedded LLM * V1 is here: faster startup, stronger CI & perf checks. * Scaling MoE: clear Expert Parallelism (EP) setup for single/multi-node + elastic EP to match traffic. * Disaggregated serving: split prefill vs. decode to tune TTFT (time-to-first-token) vs. throughput. * MLLM speedups: reuse embeddings with a processor cache, optional GPU-side processors, and encoder DP-across-TP (replicate small encoders per TP rank; shard the decoder) to cut comms overhead. Also: WEKA — vLLM + LMCache Lab + SSD for high-perf KV cache. @ASTARsg MERaLiON — deploying AudioLLM with vLLM + Ray for autoscaling & load balancing. Slides Folder: https://lnkd.in/gwVdv6-k
-
-
vLLM reposted this
Hi folks - If you're in the Austin Area on Wednesday September 17th, we (PyTorch ATX) are hosting a joint meetup with the vLLM community at the Capitol Factory and we'd love to have you join us. The sessions are listed below. You'll get a solid grounding in vLLM and also learn about two really cool new ground breaking projects in the semantic router and llm-d. We have 200 people already signed up, but still have a few spots open, please help us share the event. It's going to be awesome! - https://lnkd.in/gPwt-ZQn - Getting started with inference using vLLM - Steve Watt, PyTorch ambassador - An intermediate guide to inference using vLLM - PagedAttention, Quantization, Speculative Decoding, Continuous Batching and more - Luka Govedič, vLLM core committer - vLLM Semantic Router - Intelligent Auto Reasoning Router for Efficient LLM Inference on Mixture-of-Models - Huamin Chen, vLLM Semantic Router project creator - Combining Kubernetes and vLLM to deliver scalable, distributed inference with llm-d - Greg Pereira, llm-d maintainer
-
-
🚀Join us for the Boston vLLM Meetup on September 18! Our first Boston meetup back in March was fully packed, so register early! Hosted by Red Hat and Venture Guides, this event brings together vLLM users, developers, maintainers, and engineers to explore the latest in vLLM and optimized inference. Expect deep technical talks, live demos, and plenty of time to connect with the community. 📍Location: Venture Guides office by TD Garden/North Station 🕔Time: 5:00 PM – 8:30 PM Agenda highlights: * Intro to vLLM & project update * Model optimization with LLM Compressor and Speculators * Demo: vLLM + LLM Compressor in action * Distributed inference with llm-d * Q&A, discussion, and networking (with pizza 🍕 & refreshments) 👉 Register here: https://luma.com/vjfelimw Come meet the vLLM team, learn from experts, and connect with others building the future of inference.
-
LinkedIn not only uses vLLM at massive scale, but also actively contributes to the community, checkout their wonderful blog https://lnkd.in/gFV6zA5J
This blog post was completed back in May, and looking at it now, it still feels like a diary of the journey we’ve been on together in AI Infra Model Serving. As I shared in my earlier post, the LLM Serving team was founded by a group of incredibly talented and passionate engineers. I first met some of them during a vLLM meetup with AWS, and it’s been amazing to see how far we’ve come since then. In just 1.5 years, the team has grown at a remarkable pace. We started by learning how to use vLLM, then mastered it, and eventually customized it to meet LinkedIn’s unique needs. Along the way, our work has been adopted broadly across the LinkedIn ecosystem. Early examples include Hiring Agent and Job Search, and today many LinkedIn products and services are powered by vLLM. At the end of that blog, we expressed gratitude to our partners and friends who have supported us—because none of these achievements would have been possible without you. Red Hat: Michael Goin, Robert Shaw, Nick Hill NVIDIA: Rachel O., Ed Nieda, Harry Kim UCB SkyComputing: Simon Mo, Woosuk Kwon, Zhuohan Li, Lily (Xiaoxuan) Liu LMCache: Yihua Cheng, Kuntai Du, Junchen Jiang https://lnkd.in/dJAAAXFH
-
vLLM reposted this
I just ran batch inference on a 30B parameter LLM across 4 GPUs with a single Python command! The secret? Modern AI infrastructure where everyone handles their specialty: 📦 UV (by Astral) handles dependencies via uv scripts 🖥️ Hugging Face Jobs handles GPU orchestration 🧠 Qwen AI team handles the model (Qwen3-30B-A3B-Instruct-2507) ⚡ vLLM handles efficient batched inference I'm very excited about using uv scripts as a nice way of packaging fairly simple but useful ML tasks in a somewhat reproducible way. This combined with Jobs opens up some nice oppertunities for making pipelines that require different types of compute. Technical deep dive and code examples: https://lnkd.in/e5BEBU95