0% found this document useful (0 votes)
11 views17 pages

Project Report

The project implements Huffman Coding for file compression and decompression, utilizing a lossless data compression method to assign variable-length codes to characters based on their frequency. Key functionalities include compressing a text file, decompressing it back to its original form, and calculating the compression ratio. Future enhancements include error handling, GUI integration, and multi-threading for improved performance.

Uploaded by

rajatsaini4555
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views17 pages

Project Report

The project implements Huffman Coding for file compression and decompression, utilizing a lossless data compression method to assign variable-length codes to characters based on their frequency. Key functionalities include compressing a text file, decompressing it back to its original form, and calculating the compression ratio. Future enhancements include error handling, GUI integration, and multi-threading for improved performance.

Uploaded by

rajatsaini4555
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 17

Project Report: Huffman File

Compression and
Decompression
1. Introduction

This project implements the Huffman Coding


algorithm for file compression and
decompression. Huffman coding is a lossless
data compression method where frequent
characters are assigned shorter codes, and
rare characters are assigned longer codes.
The program compresses a file by encoding
characters with variable-length codes and
decompresses it back to its original form.

2. Objective

The main objectives of this project are:

 Compress a text file using Huffman


coding.

 Decompress the file to restore the


original content.
 Display the size reduction and
compression ratio.

 Print Huffman codes for each character.

3. Tools and Technologies Used

 Programming Language: C

 Data Structures: Min-heap (for building


Huffman tree), Binary tree (for
representing the tree).

 Algorithm: Huffman coding for lossless


compression.

4. System Design and Functionality

The program performs two main tasks:

 Compression: It reads an input file,


builds a Huffman tree based on
character frequencies, and stores the
compressed data.

 Decompression: It rebuilds the


Huffman tree and restores the original
file contents.

Key Features:
 Huffman Tree Construction: Built
using a min-heap to merge nodes with
the lowest frequencies.

 Compression: The file is compressed by


replacing characters with their
corresponding Huffman codes.

 Decompression: The compressed file is


decoded by reconstructing the Huffman
tree.

 Compression Ratio Calculation:


Displays the percentage reduction in file
size.

P 5. Code Explanation

 MinHeapNode: Represents a node in


the Huffman tree with a character,
frequency, and child pointers.

 MinHeap: A heap structure to build the


tree efficiently.

 HuffmanCode: Stores the character and


its Huffman code.
 Functions like createMinHeap,
buildHuffmanTree, and encodeFile
handle tree construction, compression,
and decompression.

6. Results

 Compression Output: Huffman codes


for each character are printed.

 Compression Ratio: Displays original


vs. compressed file sizes and the
compression ratio.

Example output:

Original size: 8000 bits

Compressed size: 6000 bits

Compression Ratio: 75.00%

7. Conclusion

The Huffman File Compression and


Decompression program efficiently reduces
file sizes using the Huffman coding
algorithm. It demonstrates the benefits of
lossless compression in saving storage and
optimizing file transfer.
8. Future Enhancements

 Error Handling: Improve error handling


for invalid files.

 GUI Integration: Implement a graphical


user interface for easier use.

 Multi-threading: Use multi-threading


to speed up the compression process.
Source Code
#include <stdio.h>

#include <stdlib.h>

#include <string.h>

#define MAX_TREE_HT 256

typedef struct MinHeapNode {

char data;

unsigned freq;

struct MinHeapNode *left, *right;

} MinHeapNode;

typedef struct MinHeap {

unsigned size;

unsigned capacity;

MinHeapNode** array;

} MinHeap;

typedef struct HuffmanCode {

char data;

char code[MAX_TREE_HT];

} HuffmanCode;

// Create a new heap node

MinHeapNode* newNode(char data, unsigned freq) {

MinHeapNode* temp = (MinHeapNode*)malloc(sizeof(MinHeapNode));

temp->left = temp->right = NULL;

temp->data = data;

temp->freq = freq;
return temp;

// Create a min heap

MinHeap* createMinHeap(unsigned capacity) {

MinHeap* minHeap = (MinHeap*)malloc(sizeof(MinHeap));

minHeap->size = 0;

minHeap->capacity = capacity;

minHeap->array = (MinHeapNode**)malloc(minHeap->capacity * sizeof(MinHeapNode*));

return minHeap;

// Swap two min heap nodes

void swapMinHeapNode(MinHeapNode** a, MinHeapNode** b) {

MinHeapNode* t = *a;

*a = *b;

*b = t;

// Heapify function

void minHeapify(MinHeap* minHeap, int idx) {

int smallest = idx;

int left = 2 * idx + 1;

int right = 2 * idx + 2;

if (left < minHeap->size && minHeap->array[left]->freq < minHeap->array[smallest]->freq)

smallest = left;

if (right < minHeap->size && minHeap->array[right]->freq < minHeap->array[smallest]->freq)

smallest = right;

if (smallest != idx) {

swapMinHeapNode(&minHeap->array[smallest], &minHeap->array[idx]);

minHeapify(minHeap, smallest);

}
}

// Extract the minimum node

MinHeapNode* extractMin(MinHeap* minHeap) {

MinHeapNode* temp = minHeap->array[0];

minHeap->array[0] = minHeap->array[minHeap->size - 1];

--minHeap->size;

minHeapify(minHeap, 0);

return temp;

// Insert a node into min heap

void insertMinHeap(MinHeap* minHeap, MinHeapNode* minHeapNode) {

++minHeap->size;

int i = minHeap->size - 1;

while (i && minHeapNode->freq < minHeap->array[(i - 1) / 2]->freq) {

minHeap->array[i] = minHeap->array[(i - 1) / 2];

i = (i - 1) / 2;

minHeap->array[i] = minHeapNode;

// Build Huffman Tree

MinHeapNode* buildHuffmanTree(char data[], int freq[], int size) {

MinHeapNode *left, *right, *top;

MinHeap* minHeap = createMinHeap(size);

for (int i = 0; i < size; ++i)

insertMinHeap(minHeap, newNode(data[i], freq[i]));

while (minHeap->size != 1) {

left = extractMin(minHeap);

right = extractMin(minHeap);
top = newNode('$', left->freq + right->freq);

top->left = left;

top->right = right;

insertMinHeap(minHeap, top);

return extractMin(minHeap);

// Store Huffman codes

void storeCodes(MinHeapNode* root, char* code, int top, HuffmanCode codes[], int* index) {

if (root->left) {

code[top] = '0';

storeCodes(root->left, code, top + 1, codes, index);

if (root->right) {

code[top] = '1';

storeCodes(root->right, code, top + 1, codes, index);

// If this is a leaf node

if (!root->left && !root->right) {

code[top] = '\0';

strcpy(codes[*index].code, code);

codes[*index].data = root->data;

(*index)++;

// Encode File

void encodeFile(const char* inputFilename, const char* outputFilename) {

printf("Compressing file %s...\n", inputFilename);

FILE *inputFile = fopen(inputFilename, "r");


if (!inputFile) {

printf("Error: File %s not found!\n", inputFilename);

return;

FILE *outputFile = fopen(outputFilename, "wb");

if (!inputFile || !outputFile) {

printf("Error opening file!\n");

return;

int freq[256] = {0};

char ch;

while ((ch = fgetc(inputFile)) != EOF) {

freq[(unsigned char)ch]++;

rewind(inputFile);

char data[256];

int freqs[256], size = 0;

for (int i = 0; i < 256; i++) {

if (freq[i] > 0) {

data[size] = (char)i;

freqs[size] = freq[i];

size++;

MinHeapNode* root = buildHuffmanTree(data, freqs, size);

HuffmanCode codes[256];

char code[MAX_TREE_HT];
int index = 0;

storeCodes(root, code, 0, codes, &index);

fwrite(freq, sizeof(freq), 1, outputFile);

unsigned char buffer = 0;

int bitCount = 0;

while ((ch = fgetc(inputFile)) != EOF) {

for (int i = 0; i < index; i++) {

if (codes[i].data == ch) {

for (int j = 0; codes[i].code[j] != '\0'; j++) {

buffer = (buffer << 1) | (codes[i].code[j] - '0');

bitCount++;

if (bitCount == 8) {

fwrite(&buffer, sizeof(unsigned char), 1, outputFile);

buffer = 0;

bitCount = 0;

break;

if (bitCount > 0) {

buffer <<= (8 - bitCount);

fwrite(&buffer, sizeof(unsigned char), 1, outputFile);

rewind(inputFile); // Go back to the beginning of input file again

int originalBits = 0;

int compressedBits = 0;

while ((ch = fgetc(inputFile)) != EOF) {


originalBits += 8; // Each character is 1 byte = 8 bits

for (int i = 0; i < index; i++) {

if (codes[i].data == ch) {

compressedBits += strlen(codes[i].code); // Count the Huffman code bits

break;

rewind(inputFile); // Go back to the beginning of input file again

while ((ch = fgetc(inputFile)) != EOF) {

originalBits += 8; // Each character is 1 byte = 8 bits

for (int i = 0; i < index; i++) {

if (codes[i].data == ch) {

compressedBits += strlen(codes[i].code); // Count the Huffman code bits

break;

printf("\nOriginal size: %d bits\n", originalBits);

printf("Compressed size (excluding frequency table): %d bits\n", compressedBits);

float ratio = ((float)compressedBits / originalBits) * 100;

printf("Compression Ratio: %.2f%%\n", ratio);

fclose(inputFile);

fclose(outputFile);

printf("File compressed successfully as %s!\n", outputFilename);

}
// Decode File

void decodeFile(const char* inputFilename, const char* outputFilename) {

printf("Decompressing file %s...\n", inputFilename);

FILE *inputFile = fopen(inputFilename, "rb");

if (!inputFile) {

printf("Error: File %s not found!\n", inputFilename);

return;

FILE *outputFile = fopen(outputFilename, "w");

if (!inputFile || !outputFile) {

printf("Error opening file!\n");

return;

int freq[256];

fread(freq, sizeof(freq), 1, inputFile);

char data[256];

int freqs[256], size = 0;

for (int i = 0; i < 256; i++) {

if (freq[i] > 0) {

data[size] = (char)i;

freqs[size] = freq[i];

size++;

MinHeapNode* root = buildHuffmanTree(data, freqs, size);

MinHeapNode* current = root;

unsigned char buffer;


while (fread(&buffer, sizeof(unsigned char), 1, inputFile)) {

for (int i = 7; i >= 0; i--) {

current = (buffer & (1 << i)) ? current->right : current->left;

if (!current->left && !current->right) {

fputc(current->data, outputFile);

current = root;

fclose(inputFile);

fclose(outputFile);

printf("File decompressed successfully as %s!\n", outputFilename);

int main() {

char inputFile[100], compressedFile[100], outputFile[100];

int choice;

while(1){

printf("********** Huffman File Compression System **********\n");

printf("1. Compress a file\n");

printf("2. Decompress a file\n");

printf("3. Exit\n");

printf("******************************************************\n");

printf("Enter your choice: ");

scanf("%d", &choice);

if (choice == 1) {

printf("Enter file to compress: ");

scanf("%s", inputFile);

printf("Enter compressed output file name: ");

scanf("%s", compressedFile);
encodeFile(inputFile, compressedFile);

else if (choice == 2) {

printf("Enter file to decompress: ");

scanf("%s", compressedFile);

printf("Enter output file name after decompression: ");

scanf("%s", outputFile);

decodeFile(compressedFile, outputFile);

else if (choice ==3){

printf("Exiting...\n");

return 0;

else {

printf("Invalid choice!\n");

return 0;

}
 RESULT OF COMPRESSION

You might also like