File Sharing and Data Duplication Removal in Cloud Using File Checksum
File Sharing and Data Duplication Removal in Cloud Using File Checksum
Volume 6 Issue 2, January-February 2022 Available Online: www.ijtsrd.com e-ISSN: 2456 – 6470
1. INTRODUCTION
The collection of information is known as data. The requires more storage and more storage require more
data is increasing constantly in the digital universe. A cost as we have to increase the hardware or storage
study suggests that at end of 2020 each person will unit. Only increasing the storage unit is not the
create 1.7 megabyte of data. It is also clear that the solution because we are not sure that how much
rate of data production per day is about 2.5 quintillion storage unit we have to add. Adding more number of
bytes of data. The reasons behind the growth of storage units makes system bulk and more costly.
multiple data are:
So, the solution to above problem is proper
Multiple backup of data or file by single person.
implementation of data duplication removal system.
Misuses of social media.
The data duplication removal method stores the data
The hacking of the organisation system in 9/11 and or file to the system if they are not stored previously.
loss of data caused by illegal activity proved that loss If the match is found then it will update the old entry.
of data is major problem for the organization. This So this system will remove the duplicate data quickly
event forces the organization to implement data back and saves the precious storage units.
of system in order to preserve their important data.
2. SURVEY MOTIVATION
The organizations started keeping regular backup of
"Di Pietro, Roberto, and Alessandro Sorniotti"
their data such as email, video audio etc. which
discussed the security concern raised by de-
increase their storage unit. While backing the data
duplication and to address this security concern the
regularly, they end up with storing the duplicate data
author utilizes the idea of Proof of Ownership
multiple times which is the misuse of storage.
(POW). POW are intended to permit server to verify
As the data is increasing constantly storing them and whether a client possesses a file or not.
managing them becomes more difficult. More data
@ IJTSRD | Unique Paper ID – IJTSRD49416 | Volume – 6 | Issue – 2 | Jan-Feb 2022 Page 1331
International Journal of Trend in Scientific Research and Development @ www.ijtsrd.com eISSN: 2456-6470
According To “Atishkathpal Matthew John Anf 4. CONCLUSION
Gauravmakkar”, data duplication removal is the This technique focus in developing web based
method of eliminating the duplicate data from the application that can find the redundant data quickly
storage devices in order to minimize the consumption and easily using file checksum technique. For
of memory in storage devices. Since, the concepts calculating the checksum of already existing files and
were good but their system cannot work as they new file Message Digest (MD-5) algorithm is used.
intended due to poor management of hardware MD-5 algorithm is used to calculate the checksum as
devices and not easy to use which result in the under well as to provide the better security and encryption
performance of the system. to the valuable files of users. Hence, this system
removes duplicate file easily and quickly by
2.1. GOAL
Many work has been done in past in order to save the providing better security.
storage problem that is caused by data duplication. 5. REFERENCES
Data duplication has been the major problem and the [1] Di Pietro, Roberto and Alessandro Sorniotti,
technology developed in past was not able to solve "Proof of ownership for de-duplication
the problem due to improper management of systems: A secure, scalable, and efficient
technology. solution", Computer Communications, 15 May
2016.
2.2. LIMITATION
More processing time. [2] M. Bellare,S. Keelveedhi, and T. Ristenpart,
Chance of false result. "Dupless: Server aided encryption for
Not user friendly. deduplicated storage", USENIX Security
System maintenance is difficult. Symposium, 2013.
2.3. KEYWORDS [3] Harnik, Danny, Alexandra Shulman-Peleg and
Cloud computing, data storage, file checksum Benny Pinkas, "Side channels in cloud services,
algorithms, computational infrastructure, duplication. the case of deduplication in cloud storage ",
IEEE Security & Privacy 8, 2014.
3. SURVEY OUTCOMES
Data Deduplication increases the amount of unwanted [4] Atishkathpal, Matthew John and
data in the storage unit by storing the multiple copy of Gauravmakkar, "Distributed Duplicate
same file. Data duplication removal technique uses Detection in Post-Process Data De-
file checksum technique to find duplicate or duplication", Conference: HiPC , 2011
redundant data quickly. The technique calculates the
[5] X. Zhao, Y. Zhang, Y. Wu, K. Chen, J.
checksum of the file when the file is uploaded and
Jiang, K. Li, "Liquid: A Scalable Deduplication
checks the newly calculated checksum with the
File System for Virtual Machine Images", IEEE
checksum of file that are already store in database. If
Transactions on Parallel and Distributed
the file is already present it will modify the file else it
Systems, January 2013.
will make new entry of file. In this system we are
going to use MD-5 hash algorithm, to detect the [6] Stephen J. Bigelow, "Data Deduplication
duplicate file. MD-5 refers to Message Digest Explained: http://searchgate.org", February,
algorithm which is 128 bit hash algorithm. 2018
Advantages: [7] http://www.computerweekly.com/report/Data-
Faster file searching. duplication-technology-review
Reduce storage space by eliminating data [8] https://nevonprojects.com
redundancy.
Ease to download and upload file. [9] Morris Dworkin, 2015; NIST Policy on Hash
Functions; Cryptographic Technology
group,https://csrc.nist.gov/projects/hash-
functions/nist-policy-on-hash-functions August
5, 2015; “National Institute of Standard and
Technology NIST Special Publication 800-145
@ IJTSRD | Unique Paper ID – IJTSRD49416 | Volume – 6 | Issue – 2 | Jan-Feb 2022 Page 1332
International Journal of Trend in Scientific Research and Development @ www.ijtsrd.com eISSN: 2456-6470
[10] NimalaBhadrappa, Mamatha G. S. 2017, [17] Single Instance Storage in Microsoft Windows
Implementation of De-Duplication Algorithm, Storage Server 2003 R2Archived 2007-01-04
International Research Journal of Engineering at the Way back Machine:
and Technology (IRJET), Volume 04, Issue https://archive.org/webTechnical White Paper:
09. Published May 2006 access September, 2018.
[11] O’Brien, J. A. &Marakas, G. M. (2011). [18] Stephen J. Bigelow, 2007 Data Deduplication
Computer Software. Management Information Explained: http://searchgate.org; Accessed
Systems 10th ed. 145. McGraw-Hill/Irwin February, 2018
[12] Peter Mel; The NIST definition of Cloud [19] Wenying Zeng, Yuelong K. O, Wei S., (2009)
Computing, “National Institute of Standard and Research on Cloud Storage Architecture and
Technology NIST Special Publication 800-145 Key Technologies, ICIS 2009 Proceedings of
the 2nd International Conference on Interaction
[13] PHP 5 tutorials; W3Schools,
https://www.w3schools.com/pHP/default.asp Sciences: Information Technology, Culture and
Human Pages 1044-1048.
[14] Accessed June, 2018.
[20] What is PHP? PHP User contributory notes;
[15] Rivest R., 1992 The MD5 Message Digest http://php.net/manual/en/intro-whatis.php.
Algorithm. RFC 1321 Accessed June 6, 2018
http://www.ietf.org/rfc/rfc321.txt
[21] X. Zhao, Y. Zhang, Y. Wu, K. Chen, J. Jiang,
[16] Sandeep Sharma, 2015; 15 Best PHP Libraries K. Li, "Liquid: A scalable deduplication file
Every Developer Should Know; published on; system for virtual machine
https://www.programmableweb.com/news/15-
best-php-libraries-every-developer-should- [22] Images", Parallel and Distributed Systems
know/analysis/2015/11/18 ; accessed June 12, IEEE Transactions on, vol. 25, no. 5, pp. 1257-
2018. 1266, May 2014.
@ IJTSRD | Unique Paper ID – IJTSRD49416 | Volume – 6 | Issue – 2 | Jan-Feb 2022 Page 1333