Optimizing LLMs for Resource-Constrained Environments: A Survey of Model Compression Techniques

Girija, Sanjay Surendranath; Kapoor, Shashank; Arora, Lakshit; Pradhan, Dipen; Raj, Aman; Shetgaonkar, Ankit

Computer Science > Machine Learning

arXiv:2505.02309 (cs)

[Submitted on 5 May 2025 (v1), last revised 8 May 2025 (this version, v2)]

Title:Optimizing LLMs for Resource-Constrained Environments: A Survey of Model Compression Techniques

Authors:Sanjay Surendranath Girija, Shashank Kapoor, Lakshit Arora, Dipen Pradhan, Aman Raj, Ankit Shetgaonkar

View PDF

Abstract:Large Language Models (LLMs) have revolutionized many areas of artificial intelligence (AI), but their substantial resource requirements limit their deployment on mobile and edge devices. This survey paper provides a comprehensive overview of techniques for compressing LLMs to enable efficient inference in resource-constrained environments. We examine three primary approaches: Knowledge Distillation, Model Quantization, and Model Pruning. For each technique, we discuss the underlying principles, present different variants, and provide examples of successful applications. We also briefly discuss complementary techniques such as mixture-of-experts and early-exit strategies. Finally, we highlight promising future directions, aiming to provide a valuable resource for both researchers and practitioners seeking to optimize LLMs for edge deployment.

Comments:	Accepted to IEEE COMPSAC 2025
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Cite as:	arXiv:2505.02309 [cs.LG]
	(or arXiv:2505.02309v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2505.02309

Submission history

From: Sanjay Surendranath Girija [view email]
[v1] Mon, 5 May 2025 01:27:47 UTC (462 KB)
[v2] Thu, 8 May 2025 05:55:48 UTC (462 KB)

Computer Science > Machine Learning

Title:Optimizing LLMs for Resource-Constrained Environments: A Survey of Model Compression Techniques

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Optimizing LLMs for Resource-Constrained Environments: A Survey of Model Compression Techniques

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators