Compare two files using Hashing in Python
Last Updated :
28 Apr, 2025
In this article, we would be creating a program that would determine, whether the two files provided to it are the same or not. By the same means that their contents are the same or not (excluding any metadata). We would be using Cryptographic Hashes for this purpose. A cryptographic hash function is a function that takes in input data and produces a statistically unique output, which is unique to that particular set of data. We would be using this property of Cryptographic hash functions to identify the contents of two files, and then would compare that to determine whether they are the same or not.
Note: The probability of getting the same hash for two different data set is very very low. And even then the good cryptographic hash functions are made so that hash collisions are accidental rather than intentional.
We would be using SHA256 (Secure hash algorithm 256) as a hash function in this program. SHA256 is very resistant to collisions. We would be using hashlib library's sha256() to use the implementation of the function in python.
hashlib module is preinstalled in most python distributions. If it doesn't exists in your environment, then you can get the module by running the following command in the command--
pip install hashlib
Below is the implementation.
Text File 1:

Text File 2:

Python3
import sys
import hashlib
def hashfile(file):
# A arbitrary (but fixed) buffer
# size (change accordingly)
# 65536 = 65536 bytes = 64 kilobytes
BUF_SIZE = 65536
# Initializing the sha256() method
sha256 = hashlib.sha256()
# Opening the file provided as
# the first commandline argument
with open(file, 'rb') as f:
while True:
# reading data = BUF_SIZE from
# the file and saving it in a
# variable
data = f.read(BUF_SIZE)
# True if eof = 1
if not data:
break
# Passing that data to that sh256 hash
# function (updating the function with
# that data)
sha256.update(data)
# sha256.hexdigest() hashes all the input
# data passed to the sha256() via sha256.update()
# Acts as a finalize method, after which
# all the input data gets hashed hexdigest()
# hashes the data, and returns the output
# in hexadecimal format
return sha256.hexdigest()
# Calling hashfile() function to obtain hashes
# of the files, and saving the result
# in a variable
f1_hash = hashfile(sys.argv[1])
f2_hash = hashfile(sys.argv[2])
# Doing primitive string comparison to
# check whether the two hashes match or not
if f1_hash == f2_hash:
print("Both files are same")
print(f"Hash: {f1_hash}")
else:
print("Files are different!")
print(f"Hash of File 1: {f1_hash}")
print(f"Hash of File 2: {f2_hash}")
Output:
For Different Files as Input:

For Same Files as Input:

Explanation:-
We take in input the filenames (via command-line argument), therefore the file paths must be provided from the command line. The function hashfile() is defined, to deal with arbitrary file sizes without running out of memory. As if we pass all the data in a file to the sha256.update() function, it doesn't hash the data properly leading to inconsistency in the results. hashfile() returns the hash of the file in base16 (hexadecimal format). We call the same function for both the files and store their hashes in two separate variables. After which we use the hashes to compare them. If both the hashes are same (meaning the files contain same data), we output the message Both files are same and then the hash. If they are different we output a negative message, and the hash of each file (so that the user can visually see the different hashes).
Similar Reads
Compare two Files line by line in Python In Python, there are many methods available to this comparison. In this Article, We'll find out how to Compare two different files line by line. Python supports many modules to do so and here we will discuss approaches using its various modules. This article uses two sample files for implementation.
3 min read
How to compare two text files in python? Comparing two text files in Python involves checking if their contents match or differ. This process helps you identify whether the files are exactly the same or if there are any changes between them.To download the text files used in this article, click hereUsing hash-based comparisonThis method ca
3 min read
String Comparison in Python Python supports several operators for string comparison, including ==, !=, <, <=, >, and >=. These operators allow for both equality and lexicographical (alphabetical order) comparisons, which is useful when sorting or arranging strings.Letâs start with a simple example to illustrate the
3 min read
Compare sequences in Python using dfflib module The dfflib Python module includes various features to evaluate the comparison of sequences, it can be used to compare files, and it can create information about file variations in different formats, including HTML and context and unified diffs. It contains various classes to perform various comparis
5 min read
Finding Duplicate Files with Python In this article, we will code a python script to find duplicate files in the file system or inside a particular folder. Method 1: Using Filecmp The python module filecmp offers functions to compare directories and files. The cmp function compares the files and returns True if they appear identical
4 min read