How to Fine-Tune LLMs with Amazon SageMaker and vLLM

View profile for Ram Vegiraju

Senior ML Solutions Architect @ AWS

Agents are getting most of the attention in the Generative AI space lately — and fairly so. But when it comes to enhancing LLM performance, there are other powerful levers we shouldn’t overlook. One of them is Fine-Tuning — and techniques like QLoRA are key for hyper-personalization, especially when working with domain-specific or user-centric data. In this video, I walk through how you can leverage Amazon SageMaker’s Multi-Adapter Inference feature — combined with vLLM’s Async Engine using the new Large Model Inference (LMI) container — to serve multiple adapters efficiently at scale. 👇 Check it out below and feel free to adapt it for your own use-cases. I’ve also attached the code sample if you want to experiment directly. #GenerativeAI #SageMaker #vLLM #QLoRA #MachineLearning #AWS #FineTuning https://lnkd.in/eEFAgFBF

Customizing LLMs at Scale with SageMaker Multi-Adapter Inference

https://www.youtube.com/

Dmitry Soldatkin

All about Data and AI/ML

6d

Ram, nice video with excellent explanation of multi-LoRA adapter hosting on Amazon SageMaker AI.

See more comments

To view or add a comment, sign in

Explore content categories