SlideShare a Scribd company logo
An Adaptive Algorithm for Detection of Duplicate Records Presented By: Rama kanta Behera  IT200127207 Under the guidance of : Miss Ipsita Mishra
INTRODUCTION A “ records set ” is a list of prior distinct records. A new  record  is to be verified for a  duplicate against the  records set   A database is a collection of related data.  Various Algorithms like  Matching learning algo, Learnable string similarity measures  Adaptive Algo
OBJECTIVES Reduced cost of duplicate  record  detection. Perfect scalability of one such detection procedure. Cache prior information of distinct  records  and thus  cause retaining of prior  records  redundant for furthering the search Keep the algorithm adaptive.
PREVALENT METHODS   The Brute Force Method   This method consumes complexity of the order number of  records  in the  records set  and requires all prior  records  to be stored.  Method by Rail et. al   The comparison of a new  record  against the  records set  is reduced from being full text match to comparing two integers
OUTLINE OF THE PROPOSED SOLUTION   The central idea behind the present algorithm is based on the fundamental property of primality of numbers  I f(x) Record set Integer number space Fig: hashing I P Record set Integer number  Prime number  f(x) g(x) Fig: Extended hashing into prime space
r1 r2 … rn I1 I2 … In P1 P2 … Pn PRODUCT( P prior) f(x) g(x) P1*p2 …*pn= P prior Fig: The complete algorithm
REALIZATION OF THE ALGORITHM  Two functions  f(x)  and  g(x)  are to be realized for the implementation of the algorithm.  Realizing f(x)   Realizing g(x)
STEPS OF THE ALGORITHM   Step 1  : For each new record, hash is performed and unique hash value (Hnew) for each distinct record is obtained.  Step 2  : Hnew is mapped to its corresponding unique prime (Pnew). Step 3  : Pprior is divided with Pnew. If Pnew exactly divides Pprior, then the corresponding record to Pnew is a duplicate and already exists in Pprior. Else, Pnew is a distinct record. Step 4  : If Pnew is a distinct record, Pprior is multiplied with Pnew and the result is stored back in Pprior. Thus updating Pprior renders the algorithm adaptive.
Fig: Flowchart
IMPLEMENTATIONS There are three important implementation details that need to be discussed  Size of Records set   Use of Logarithms   Subsets of Records set
CONCLUSION A new approach to handle duplicate  records  is  presented  This approach combines the concepts of number theory and algorithmic to solve the oftener felt problem of “duplicate record detection”.
THANK YOU !!!

More Related Content

What's hot (20)

Computer Science Engineering : Data structure & algorithm, THE GATE ACADEMY
Computer Science Engineering : Data structure & algorithm, THE GATE ACADEMYComputer Science Engineering : Data structure & algorithm, THE GATE ACADEMY
Computer Science Engineering : Data structure & algorithm, THE GATE ACADEMY
klirantga
 
Essential NumPy
Essential NumPyEssential NumPy
Essential NumPy
zekeLabs Technologies
 
Standardizing on a single N-dimensional array API for Python
Standardizing on a single N-dimensional array API for PythonStandardizing on a single N-dimensional array API for Python
Standardizing on a single N-dimensional array API for Python
Ralf Gommers
 
Big O Notation
Big O NotationBig O Notation
Big O Notation
Marcello Missiroli
 
150970116028 2140705
150970116028 2140705150970116028 2140705
150970116028 2140705
Manoj Shahu
 
Apriori algorithm
Apriori algorithmApriori algorithm
Apriori algorithm
Ashis Kumar Chanda
 
CNIT 127 Ch 5: Introduction to heap overflows
CNIT 127 Ch 5: Introduction to heap overflowsCNIT 127 Ch 5: Introduction to heap overflows
CNIT 127 Ch 5: Introduction to heap overflows
Sam Bowne
 
Stack Data structure
Stack Data structureStack Data structure
Stack Data structure
B Liyanage Asanka
 
S1140183 Presentation
S1140183 PresentationS1140183 Presentation
S1140183 Presentation
University of Aizu
 
Plotting data with python and pylab
Plotting data with python and pylabPlotting data with python and pylab
Plotting data with python and pylab
Giovanni Marco Dall'Olio
 
Stack
StackStack
Stack
maamir farooq
 
DCC2014 - Fully Online Grammar Compression in Constant Space
DCC2014 - Fully Online Grammar Compression in Constant SpaceDCC2014 - Fully Online Grammar Compression in Constant Space
DCC2014 - Fully Online Grammar Compression in Constant Space
Yasuo Tabei
 
Sortsearch
SortsearchSortsearch
Sortsearch
Krishna Chaytaniah
 
05 heap 20161110_jintaeks
05 heap 20161110_jintaeks05 heap 20161110_jintaeks
05 heap 20161110_jintaeks
JinTaek Seo
 
Ch 5: Introduction to heap overflows
Ch 5: Introduction to heap overflowsCh 5: Introduction to heap overflows
Ch 5: Introduction to heap overflows
Sam Bowne
 
Lo18
Lo18Lo18
Lo18
liankei
 
Faster persistent data structures through hashing
Faster persistent data structures through hashingFaster persistent data structures through hashing
Faster persistent data structures through hashing
Johan Tibell
 
High Performance Python - Marc Garcia
High Performance Python - Marc GarciaHigh Performance Python - Marc Garcia
High Performance Python - Marc Garcia
Marc Garcia
 
Heap_Sort1.pptx
Heap_Sort1.pptxHeap_Sort1.pptx
Heap_Sort1.pptx
sandeep54552
 
Cs 62
Cs 62Cs 62
Cs 62
Web Developer
 
Computer Science Engineering : Data structure & algorithm, THE GATE ACADEMY
Computer Science Engineering : Data structure & algorithm, THE GATE ACADEMYComputer Science Engineering : Data structure & algorithm, THE GATE ACADEMY
Computer Science Engineering : Data structure & algorithm, THE GATE ACADEMY
klirantga
 
Standardizing on a single N-dimensional array API for Python
Standardizing on a single N-dimensional array API for PythonStandardizing on a single N-dimensional array API for Python
Standardizing on a single N-dimensional array API for Python
Ralf Gommers
 
150970116028 2140705
150970116028 2140705150970116028 2140705
150970116028 2140705
Manoj Shahu
 
CNIT 127 Ch 5: Introduction to heap overflows
CNIT 127 Ch 5: Introduction to heap overflowsCNIT 127 Ch 5: Introduction to heap overflows
CNIT 127 Ch 5: Introduction to heap overflows
Sam Bowne
 
DCC2014 - Fully Online Grammar Compression in Constant Space
DCC2014 - Fully Online Grammar Compression in Constant SpaceDCC2014 - Fully Online Grammar Compression in Constant Space
DCC2014 - Fully Online Grammar Compression in Constant Space
Yasuo Tabei
 
05 heap 20161110_jintaeks
05 heap 20161110_jintaeks05 heap 20161110_jintaeks
05 heap 20161110_jintaeks
JinTaek Seo
 
Ch 5: Introduction to heap overflows
Ch 5: Introduction to heap overflowsCh 5: Introduction to heap overflows
Ch 5: Introduction to heap overflows
Sam Bowne
 
Faster persistent data structures through hashing
Faster persistent data structures through hashingFaster persistent data structures through hashing
Faster persistent data structures through hashing
Johan Tibell
 
High Performance Python - Marc Garcia
High Performance Python - Marc GarciaHigh Performance Python - Marc Garcia
High Performance Python - Marc Garcia
Marc Garcia
 

Viewers also liked (12)

Progressive duplicate detection
Progressive duplicate detectionProgressive duplicate detection
Progressive duplicate detection
ieeepondy
 
A study and survey on various progressive duplicate detection mechanisms
A study and survey on various progressive duplicate detection mechanismsA study and survey on various progressive duplicate detection mechanisms
A study and survey on various progressive duplicate detection mechanisms
eSAT Journals
 
The Duplicitous Duplicate
The Duplicitous DuplicateThe Duplicitous Duplicate
The Duplicitous Duplicate
Anish Raivadera
 
Duplicate detection
Duplicate detectionDuplicate detection
Duplicate detection
jonecx
 
Tutorial 4 (duplicate detection)
Tutorial 4 (duplicate detection)Tutorial 4 (duplicate detection)
Tutorial 4 (duplicate detection)
Kira
 
Progressive Texture
Progressive TextureProgressive Texture
Progressive Texture
Dr Rupesh Shet
 
Record matching over query results from Web Databases
Record matching over query results from Web DatabasesRecord matching over query results from Web Databases
Record matching over query results from Web Databases
tusharjadhav2611
 
novel and efficient approch for detection of duplicate pages in web crawling
novel and efficient approch for detection of duplicate pages in web crawlingnovel and efficient approch for detection of duplicate pages in web crawling
novel and efficient approch for detection of duplicate pages in web crawling
Vipin Kp
 
Linking data without common identifiers
Linking data without common identifiersLinking data without common identifiers
Linking data without common identifiers
Lars Marius Garshol
 
Indexing Techniques for Scalable Record Linkage and Deduplication
Indexing Techniques for Scalable Record Linkage and DeduplicationIndexing Techniques for Scalable Record Linkage and Deduplication
Indexing Techniques for Scalable Record Linkage and Deduplication
Pradeeban Kathiravelu, Ph.D.
 
Deduplication
DeduplicationDeduplication
Deduplication
Lars Marius Garshol
 
Efficient Duplicate Detection Over Massive Data Sets
Efficient Duplicate Detection Over Massive Data SetsEfficient Duplicate Detection Over Massive Data Sets
Efficient Duplicate Detection Over Massive Data Sets
Pradeeban Kathiravelu, Ph.D.
 
Progressive duplicate detection
Progressive duplicate detectionProgressive duplicate detection
Progressive duplicate detection
ieeepondy
 
A study and survey on various progressive duplicate detection mechanisms
A study and survey on various progressive duplicate detection mechanismsA study and survey on various progressive duplicate detection mechanisms
A study and survey on various progressive duplicate detection mechanisms
eSAT Journals
 
The Duplicitous Duplicate
The Duplicitous DuplicateThe Duplicitous Duplicate
The Duplicitous Duplicate
Anish Raivadera
 
Duplicate detection
Duplicate detectionDuplicate detection
Duplicate detection
jonecx
 
Tutorial 4 (duplicate detection)
Tutorial 4 (duplicate detection)Tutorial 4 (duplicate detection)
Tutorial 4 (duplicate detection)
Kira
 
Record matching over query results from Web Databases
Record matching over query results from Web DatabasesRecord matching over query results from Web Databases
Record matching over query results from Web Databases
tusharjadhav2611
 
novel and efficient approch for detection of duplicate pages in web crawling
novel and efficient approch for detection of duplicate pages in web crawlingnovel and efficient approch for detection of duplicate pages in web crawling
novel and efficient approch for detection of duplicate pages in web crawling
Vipin Kp
 
Linking data without common identifiers
Linking data without common identifiersLinking data without common identifiers
Linking data without common identifiers
Lars Marius Garshol
 
Indexing Techniques for Scalable Record Linkage and Deduplication
Indexing Techniques for Scalable Record Linkage and DeduplicationIndexing Techniques for Scalable Record Linkage and Deduplication
Indexing Techniques for Scalable Record Linkage and Deduplication
Pradeeban Kathiravelu, Ph.D.
 
Efficient Duplicate Detection Over Massive Data Sets
Efficient Duplicate Detection Over Massive Data SetsEfficient Duplicate Detection Over Massive Data Sets
Efficient Duplicate Detection Over Massive Data Sets
Pradeeban Kathiravelu, Ph.D.
 
Ad

Similar to An adaptive algorithm for detection of duplicate records (20)

Bi4101343346
Bi4101343346Bi4101343346
Bi4101343346
IJERA Editor
 
Lec4
Lec4Lec4
Lec4
Nikhil Chilwant
 
Hashing and File Structures in Data Structure.pdf
Hashing and File Structures in Data Structure.pdfHashing and File Structures in Data Structure.pdf
Hashing and File Structures in Data Structure.pdf
JaithoonBibi
 
GRAPHS, BREADTH FIRST TRAVERSAL AND DEPTH FIRST TRAVERSAL
GRAPHS, BREADTH FIRST TRAVERSAL AND DEPTH FIRST TRAVERSALGRAPHS, BREADTH FIRST TRAVERSAL AND DEPTH FIRST TRAVERSAL
GRAPHS, BREADTH FIRST TRAVERSAL AND DEPTH FIRST TRAVERSAL
mohanrajm63
 
session 15 hashing.pptx
session 15   hashing.pptxsession 15   hashing.pptx
session 15 hashing.pptx
rajneeshsingh46738
 
08 Hash Tables
08 Hash Tables08 Hash Tables
08 Hash Tables
Andres Mendez-Vazquez
 
Count-min sketch to Infinity.pdf
Count-min sketch to Infinity.pdfCount-min sketch to Infinity.pdf
Count-min sketch to Infinity.pdf
Stephen Lorello
 
4.4 hashing02
4.4 hashing024.4 hashing02
4.4 hashing02
Krish_ver2
 
Performance Analysis of Hashing Mathods on the Employment of App
Performance Analysis of Hashing Mathods on the Employment of App Performance Analysis of Hashing Mathods on the Employment of App
Performance Analysis of Hashing Mathods on the Employment of App
IJECEIAES
 
Hashing a searching technique in data structures
Hashing a searching technique in data structuresHashing a searching technique in data structures
Hashing a searching technique in data structures
shiks1234
 
Presentation1
Presentation1Presentation1
Presentation1
Saurabh Mishra
 
Final exam in advance dbms
Final exam in advance dbmsFinal exam in advance dbms
Final exam in advance dbms
Md. Mashiur Rahman
 
Hashing in Data Structure and analysis of Algorithms
Hashing in Data Structure and analysis of AlgorithmsHashing in Data Structure and analysis of Algorithms
Hashing in Data Structure and analysis of Algorithms
KavitaSingh962656
 
Apriori algorithm
Apriori algorithmApriori algorithm
Apriori algorithm
nouraalkhatib
 
13-hashing.ppt
13-hashing.ppt13-hashing.ppt
13-hashing.ppt
soniya555961
 
Design data Analysis hashing.ppt by piyush
Design  data Analysis hashing.ppt by piyushDesign  data Analysis hashing.ppt by piyush
Design data Analysis hashing.ppt by piyush
22001003058
 
Hashing algorithms and its uses
Hashing algorithms and its usesHashing algorithms and its uses
Hashing algorithms and its uses
Jawad Khan
 
Analysis Of Algorithms - Hashing
Analysis Of Algorithms - HashingAnalysis Of Algorithms - Hashing
Analysis Of Algorithms - Hashing
Sam Light
 
computer notes - Data Structures - 36
computer notes - Data Structures - 36computer notes - Data Structures - 36
computer notes - Data Structures - 36
ecomputernotes
 
Algorithms notes tutorials duniya
Algorithms notes   tutorials duniyaAlgorithms notes   tutorials duniya
Algorithms notes tutorials duniya
TutorialsDuniya.com
 
Hashing and File Structures in Data Structure.pdf
Hashing and File Structures in Data Structure.pdfHashing and File Structures in Data Structure.pdf
Hashing and File Structures in Data Structure.pdf
JaithoonBibi
 
GRAPHS, BREADTH FIRST TRAVERSAL AND DEPTH FIRST TRAVERSAL
GRAPHS, BREADTH FIRST TRAVERSAL AND DEPTH FIRST TRAVERSALGRAPHS, BREADTH FIRST TRAVERSAL AND DEPTH FIRST TRAVERSAL
GRAPHS, BREADTH FIRST TRAVERSAL AND DEPTH FIRST TRAVERSAL
mohanrajm63
 
Count-min sketch to Infinity.pdf
Count-min sketch to Infinity.pdfCount-min sketch to Infinity.pdf
Count-min sketch to Infinity.pdf
Stephen Lorello
 
Performance Analysis of Hashing Mathods on the Employment of App
Performance Analysis of Hashing Mathods on the Employment of App Performance Analysis of Hashing Mathods on the Employment of App
Performance Analysis of Hashing Mathods on the Employment of App
IJECEIAES
 
Hashing a searching technique in data structures
Hashing a searching technique in data structuresHashing a searching technique in data structures
Hashing a searching technique in data structures
shiks1234
 
Hashing in Data Structure and analysis of Algorithms
Hashing in Data Structure and analysis of AlgorithmsHashing in Data Structure and analysis of Algorithms
Hashing in Data Structure and analysis of Algorithms
KavitaSingh962656
 
Design data Analysis hashing.ppt by piyush
Design  data Analysis hashing.ppt by piyushDesign  data Analysis hashing.ppt by piyush
Design data Analysis hashing.ppt by piyush
22001003058
 
Hashing algorithms and its uses
Hashing algorithms and its usesHashing algorithms and its uses
Hashing algorithms and its uses
Jawad Khan
 
Analysis Of Algorithms - Hashing
Analysis Of Algorithms - HashingAnalysis Of Algorithms - Hashing
Analysis Of Algorithms - Hashing
Sam Light
 
computer notes - Data Structures - 36
computer notes - Data Structures - 36computer notes - Data Structures - 36
computer notes - Data Structures - 36
ecomputernotes
 
Algorithms notes tutorials duniya
Algorithms notes   tutorials duniyaAlgorithms notes   tutorials duniya
Algorithms notes tutorials duniya
TutorialsDuniya.com
 
Ad

More from Likan Patra (20)

Sewn Product Machinary & Equipments
Sewn Product Machinary & EquipmentsSewn Product Machinary & Equipments
Sewn Product Machinary & Equipments
Likan Patra
 
SMArt Contest- Smart Quiz Questions
SMArt Contest- Smart Quiz QuestionsSMArt Contest- Smart Quiz Questions
SMArt Contest- Smart Quiz Questions
Likan Patra
 
RC Shri Jagannath Dham- Club Activity Report 2014-15
RC Shri Jagannath Dham- Club Activity Report 2014-15RC Shri Jagannath Dham- Club Activity Report 2014-15
RC Shri Jagannath Dham- Club Activity Report 2014-15
Likan Patra
 
Quiz about Google and its Products
Quiz about Google and its ProductsQuiz about Google and its Products
Quiz about Google and its Products
Likan Patra
 
e-ENERGY METERING BOX (Smart Meter by KPMP Electronics)
e-ENERGY METERING BOX (Smart Meter by KPMP Electronics)e-ENERGY METERING BOX (Smart Meter by KPMP Electronics)
e-ENERGY METERING BOX (Smart Meter by KPMP Electronics)
Likan Patra
 
Everything you want to know about Liquid Lenses
Everything you want to know about Liquid LensesEverything you want to know about Liquid Lenses
Everything you want to know about Liquid Lenses
Likan Patra
 
Seminar on Cyber Crime
Seminar on Cyber CrimeSeminar on Cyber Crime
Seminar on Cyber Crime
Likan Patra
 
What is Optical fiber ?
What is Optical fiber ?What is Optical fiber ?
What is Optical fiber ?
Likan Patra
 
Tech 101: Understanding Firewalls
Tech 101: Understanding FirewallsTech 101: Understanding Firewalls
Tech 101: Understanding Firewalls
Likan Patra
 
Holographic Data Storage
Holographic Data StorageHolographic Data Storage
Holographic Data Storage
Likan Patra
 
A Technical Seminar on OSI model
A Technical Seminar on OSI modelA Technical Seminar on OSI model
A Technical Seminar on OSI model
Likan Patra
 
Who are the INTERNET SERVICE PROVIDERS?
Who are the INTERNET SERVICE PROVIDERS?Who are the INTERNET SERVICE PROVIDERS?
Who are the INTERNET SERVICE PROVIDERS?
Likan Patra
 
Computer Tomography (CT Scan)
Computer Tomography (CT Scan)Computer Tomography (CT Scan)
Computer Tomography (CT Scan)
Likan Patra
 
Akshaya patra foundation - In Depth
Akshaya patra foundation - In DepthAkshaya patra foundation - In Depth
Akshaya patra foundation - In Depth
Likan Patra
 
So, He got a JOB through LinkedIn
So, He got a JOB through LinkedInSo, He got a JOB through LinkedIn
So, He got a JOB through LinkedIn
Likan Patra
 
4g technology
4g technology4g technology
4g technology
Likan Patra
 
Qr code (quick response code)
Qr code (quick response code)Qr code (quick response code)
Qr code (quick response code)
Likan Patra
 
Blue ray disc seminar representation
Blue ray disc seminar representationBlue ray disc seminar representation
Blue ray disc seminar representation
Likan Patra
 
Brain finger printing
Brain finger printingBrain finger printing
Brain finger printing
Likan Patra
 
Audio watermarking
Audio watermarkingAudio watermarking
Audio watermarking
Likan Patra
 
Sewn Product Machinary & Equipments
Sewn Product Machinary & EquipmentsSewn Product Machinary & Equipments
Sewn Product Machinary & Equipments
Likan Patra
 
SMArt Contest- Smart Quiz Questions
SMArt Contest- Smart Quiz QuestionsSMArt Contest- Smart Quiz Questions
SMArt Contest- Smart Quiz Questions
Likan Patra
 
RC Shri Jagannath Dham- Club Activity Report 2014-15
RC Shri Jagannath Dham- Club Activity Report 2014-15RC Shri Jagannath Dham- Club Activity Report 2014-15
RC Shri Jagannath Dham- Club Activity Report 2014-15
Likan Patra
 
Quiz about Google and its Products
Quiz about Google and its ProductsQuiz about Google and its Products
Quiz about Google and its Products
Likan Patra
 
e-ENERGY METERING BOX (Smart Meter by KPMP Electronics)
e-ENERGY METERING BOX (Smart Meter by KPMP Electronics)e-ENERGY METERING BOX (Smart Meter by KPMP Electronics)
e-ENERGY METERING BOX (Smart Meter by KPMP Electronics)
Likan Patra
 
Everything you want to know about Liquid Lenses
Everything you want to know about Liquid LensesEverything you want to know about Liquid Lenses
Everything you want to know about Liquid Lenses
Likan Patra
 
Seminar on Cyber Crime
Seminar on Cyber CrimeSeminar on Cyber Crime
Seminar on Cyber Crime
Likan Patra
 
What is Optical fiber ?
What is Optical fiber ?What is Optical fiber ?
What is Optical fiber ?
Likan Patra
 
Tech 101: Understanding Firewalls
Tech 101: Understanding FirewallsTech 101: Understanding Firewalls
Tech 101: Understanding Firewalls
Likan Patra
 
Holographic Data Storage
Holographic Data StorageHolographic Data Storage
Holographic Data Storage
Likan Patra
 
A Technical Seminar on OSI model
A Technical Seminar on OSI modelA Technical Seminar on OSI model
A Technical Seminar on OSI model
Likan Patra
 
Who are the INTERNET SERVICE PROVIDERS?
Who are the INTERNET SERVICE PROVIDERS?Who are the INTERNET SERVICE PROVIDERS?
Who are the INTERNET SERVICE PROVIDERS?
Likan Patra
 
Computer Tomography (CT Scan)
Computer Tomography (CT Scan)Computer Tomography (CT Scan)
Computer Tomography (CT Scan)
Likan Patra
 
Akshaya patra foundation - In Depth
Akshaya patra foundation - In DepthAkshaya patra foundation - In Depth
Akshaya patra foundation - In Depth
Likan Patra
 
So, He got a JOB through LinkedIn
So, He got a JOB through LinkedInSo, He got a JOB through LinkedIn
So, He got a JOB through LinkedIn
Likan Patra
 
Qr code (quick response code)
Qr code (quick response code)Qr code (quick response code)
Qr code (quick response code)
Likan Patra
 
Blue ray disc seminar representation
Blue ray disc seminar representationBlue ray disc seminar representation
Blue ray disc seminar representation
Likan Patra
 
Brain finger printing
Brain finger printingBrain finger printing
Brain finger printing
Likan Patra
 
Audio watermarking
Audio watermarkingAudio watermarking
Audio watermarking
Likan Patra
 

Recently uploaded (20)

“State-space Models vs. Transformers for Ultra-low-power Edge AI,” a Presenta...
“State-space Models vs. Transformers for Ultra-low-power Edge AI,” a Presenta...“State-space Models vs. Transformers for Ultra-low-power Edge AI,” a Presenta...
“State-space Models vs. Transformers for Ultra-low-power Edge AI,” a Presenta...
Edge AI and Vision Alliance
 
Boosting MySQL with Vector Search -THE VECTOR SEARCH CONFERENCE 2025 .pdf
Boosting MySQL with Vector Search -THE VECTOR SEARCH CONFERENCE 2025 .pdfBoosting MySQL with Vector Search -THE VECTOR SEARCH CONFERENCE 2025 .pdf
Boosting MySQL with Vector Search -THE VECTOR SEARCH CONFERENCE 2025 .pdf
Alkin Tezuysal
 
Domino IQ – What to Expect, First Steps and Use Cases
Domino IQ – What to Expect, First Steps and Use CasesDomino IQ – What to Expect, First Steps and Use Cases
Domino IQ – What to Expect, First Steps and Use Cases
panagenda
 
Trends Report: Artificial Intelligence (AI)
Trends Report: Artificial Intelligence (AI)Trends Report: Artificial Intelligence (AI)
Trends Report: Artificial Intelligence (AI)
Brian Ahier
 
Introduction to Internet of things .ppt.
Introduction to Internet of things .ppt.Introduction to Internet of things .ppt.
Introduction to Internet of things .ppt.
hok12341073
 
End-to-end Assurance for SD-WAN & SASE with ThousandEyes
End-to-end Assurance for SD-WAN & SASE with ThousandEyesEnd-to-end Assurance for SD-WAN & SASE with ThousandEyes
End-to-end Assurance for SD-WAN & SASE with ThousandEyes
ThousandEyes
 
MCP vs A2A vs ACP: Choosing the Right Protocol | Bluebash
MCP vs A2A vs ACP: Choosing the Right Protocol | BluebashMCP vs A2A vs ACP: Choosing the Right Protocol | Bluebash
MCP vs A2A vs ACP: Choosing the Right Protocol | Bluebash
Bluebash
 
How to Detect Outliers in IBM SPSS Statistics.pptx
How to Detect Outliers in IBM SPSS Statistics.pptxHow to Detect Outliers in IBM SPSS Statistics.pptx
How to Detect Outliers in IBM SPSS Statistics.pptx
Version 1 Analytics
 
The case for on-premises AI
The case for on-premises AIThe case for on-premises AI
The case for on-premises AI
Principled Technologies
 
Your startup on AWS - How to architect and maintain a Lean and Mean account J...
Your startup on AWS - How to architect and maintain a Lean and Mean account J...Your startup on AWS - How to architect and maintain a Lean and Mean account J...
Your startup on AWS - How to architect and maintain a Lean and Mean account J...
angelo60207
 
ELNL2025 - Unlocking the Power of Sensitivity Labels - A Comprehensive Guide....
ELNL2025 - Unlocking the Power of Sensitivity Labels - A Comprehensive Guide....ELNL2025 - Unlocking the Power of Sensitivity Labels - A Comprehensive Guide....
ELNL2025 - Unlocking the Power of Sensitivity Labels - A Comprehensive Guide....
Jasper Oosterveld
 
soulmaite review - Find Real AI soulmate review
soulmaite review - Find Real AI soulmate reviewsoulmaite review - Find Real AI soulmate review
soulmaite review - Find Real AI soulmate review
Soulmaite
 
Jeremy Millul - A Talented Software Developer
Jeremy Millul - A Talented Software DeveloperJeremy Millul - A Talented Software Developer
Jeremy Millul - A Talented Software Developer
Jeremy Millul
 
Azure vs AWS Which Cloud Platform Is Best for Your Business in 2025
Azure vs AWS  Which Cloud Platform Is Best for Your Business in 2025Azure vs AWS  Which Cloud Platform Is Best for Your Business in 2025
Azure vs AWS Which Cloud Platform Is Best for Your Business in 2025
Infrassist Technologies Pvt. Ltd.
 
Co-Constructing Explanations for AI Systems using Provenance
Co-Constructing Explanations for AI Systems using ProvenanceCo-Constructing Explanations for AI Systems using Provenance
Co-Constructing Explanations for AI Systems using Provenance
Paul Groth
 
ISOIEC 42005 Revolutionalises AI Impact Assessment.pptx
ISOIEC 42005 Revolutionalises AI Impact Assessment.pptxISOIEC 42005 Revolutionalises AI Impact Assessment.pptx
ISOIEC 42005 Revolutionalises AI Impact Assessment.pptx
AyilurRamnath1
 
Create Your First AI Agent with UiPath Agent Builder
Create Your First AI Agent with UiPath Agent BuilderCreate Your First AI Agent with UiPath Agent Builder
Create Your First AI Agent with UiPath Agent Builder
DianaGray10
 
Establish Visibility and Manage Risk in the Supply Chain with Anchore SBOM
Establish Visibility and Manage Risk in the Supply Chain with Anchore SBOMEstablish Visibility and Manage Risk in the Supply Chain with Anchore SBOM
Establish Visibility and Manage Risk in the Supply Chain with Anchore SBOM
Anchore
 
Data Virtualization: Bringing the Power of FME to Any Application
Data Virtualization: Bringing the Power of FME to Any ApplicationData Virtualization: Bringing the Power of FME to Any Application
Data Virtualization: Bringing the Power of FME to Any Application
Safe Software
 
7 Salesforce Data Cloud Best Practices.pdf
7 Salesforce Data Cloud Best Practices.pdf7 Salesforce Data Cloud Best Practices.pdf
7 Salesforce Data Cloud Best Practices.pdf
Minuscule Technologies
 
“State-space Models vs. Transformers for Ultra-low-power Edge AI,” a Presenta...
“State-space Models vs. Transformers for Ultra-low-power Edge AI,” a Presenta...“State-space Models vs. Transformers for Ultra-low-power Edge AI,” a Presenta...
“State-space Models vs. Transformers for Ultra-low-power Edge AI,” a Presenta...
Edge AI and Vision Alliance
 
Boosting MySQL with Vector Search -THE VECTOR SEARCH CONFERENCE 2025 .pdf
Boosting MySQL with Vector Search -THE VECTOR SEARCH CONFERENCE 2025 .pdfBoosting MySQL with Vector Search -THE VECTOR SEARCH CONFERENCE 2025 .pdf
Boosting MySQL with Vector Search -THE VECTOR SEARCH CONFERENCE 2025 .pdf
Alkin Tezuysal
 
Domino IQ – What to Expect, First Steps and Use Cases
Domino IQ – What to Expect, First Steps and Use CasesDomino IQ – What to Expect, First Steps and Use Cases
Domino IQ – What to Expect, First Steps and Use Cases
panagenda
 
Trends Report: Artificial Intelligence (AI)
Trends Report: Artificial Intelligence (AI)Trends Report: Artificial Intelligence (AI)
Trends Report: Artificial Intelligence (AI)
Brian Ahier
 
Introduction to Internet of things .ppt.
Introduction to Internet of things .ppt.Introduction to Internet of things .ppt.
Introduction to Internet of things .ppt.
hok12341073
 
End-to-end Assurance for SD-WAN & SASE with ThousandEyes
End-to-end Assurance for SD-WAN & SASE with ThousandEyesEnd-to-end Assurance for SD-WAN & SASE with ThousandEyes
End-to-end Assurance for SD-WAN & SASE with ThousandEyes
ThousandEyes
 
MCP vs A2A vs ACP: Choosing the Right Protocol | Bluebash
MCP vs A2A vs ACP: Choosing the Right Protocol | BluebashMCP vs A2A vs ACP: Choosing the Right Protocol | Bluebash
MCP vs A2A vs ACP: Choosing the Right Protocol | Bluebash
Bluebash
 
How to Detect Outliers in IBM SPSS Statistics.pptx
How to Detect Outliers in IBM SPSS Statistics.pptxHow to Detect Outliers in IBM SPSS Statistics.pptx
How to Detect Outliers in IBM SPSS Statistics.pptx
Version 1 Analytics
 
Your startup on AWS - How to architect and maintain a Lean and Mean account J...
Your startup on AWS - How to architect and maintain a Lean and Mean account J...Your startup on AWS - How to architect and maintain a Lean and Mean account J...
Your startup on AWS - How to architect and maintain a Lean and Mean account J...
angelo60207
 
ELNL2025 - Unlocking the Power of Sensitivity Labels - A Comprehensive Guide....
ELNL2025 - Unlocking the Power of Sensitivity Labels - A Comprehensive Guide....ELNL2025 - Unlocking the Power of Sensitivity Labels - A Comprehensive Guide....
ELNL2025 - Unlocking the Power of Sensitivity Labels - A Comprehensive Guide....
Jasper Oosterveld
 
soulmaite review - Find Real AI soulmate review
soulmaite review - Find Real AI soulmate reviewsoulmaite review - Find Real AI soulmate review
soulmaite review - Find Real AI soulmate review
Soulmaite
 
Jeremy Millul - A Talented Software Developer
Jeremy Millul - A Talented Software DeveloperJeremy Millul - A Talented Software Developer
Jeremy Millul - A Talented Software Developer
Jeremy Millul
 
Azure vs AWS Which Cloud Platform Is Best for Your Business in 2025
Azure vs AWS  Which Cloud Platform Is Best for Your Business in 2025Azure vs AWS  Which Cloud Platform Is Best for Your Business in 2025
Azure vs AWS Which Cloud Platform Is Best for Your Business in 2025
Infrassist Technologies Pvt. Ltd.
 
Co-Constructing Explanations for AI Systems using Provenance
Co-Constructing Explanations for AI Systems using ProvenanceCo-Constructing Explanations for AI Systems using Provenance
Co-Constructing Explanations for AI Systems using Provenance
Paul Groth
 
ISOIEC 42005 Revolutionalises AI Impact Assessment.pptx
ISOIEC 42005 Revolutionalises AI Impact Assessment.pptxISOIEC 42005 Revolutionalises AI Impact Assessment.pptx
ISOIEC 42005 Revolutionalises AI Impact Assessment.pptx
AyilurRamnath1
 
Create Your First AI Agent with UiPath Agent Builder
Create Your First AI Agent with UiPath Agent BuilderCreate Your First AI Agent with UiPath Agent Builder
Create Your First AI Agent with UiPath Agent Builder
DianaGray10
 
Establish Visibility and Manage Risk in the Supply Chain with Anchore SBOM
Establish Visibility and Manage Risk in the Supply Chain with Anchore SBOMEstablish Visibility and Manage Risk in the Supply Chain with Anchore SBOM
Establish Visibility and Manage Risk in the Supply Chain with Anchore SBOM
Anchore
 
Data Virtualization: Bringing the Power of FME to Any Application
Data Virtualization: Bringing the Power of FME to Any ApplicationData Virtualization: Bringing the Power of FME to Any Application
Data Virtualization: Bringing the Power of FME to Any Application
Safe Software
 
7 Salesforce Data Cloud Best Practices.pdf
7 Salesforce Data Cloud Best Practices.pdf7 Salesforce Data Cloud Best Practices.pdf
7 Salesforce Data Cloud Best Practices.pdf
Minuscule Technologies
 

An adaptive algorithm for detection of duplicate records

  • 1. An Adaptive Algorithm for Detection of Duplicate Records Presented By: Rama kanta Behera IT200127207 Under the guidance of : Miss Ipsita Mishra
  • 2. INTRODUCTION A “ records set ” is a list of prior distinct records. A new record is to be verified for a duplicate against the records set A database is a collection of related data. Various Algorithms like Matching learning algo, Learnable string similarity measures Adaptive Algo
  • 3. OBJECTIVES Reduced cost of duplicate record detection. Perfect scalability of one such detection procedure. Cache prior information of distinct records and thus cause retaining of prior records redundant for furthering the search Keep the algorithm adaptive.
  • 4. PREVALENT METHODS The Brute Force Method This method consumes complexity of the order number of records in the records set and requires all prior records to be stored. Method by Rail et. al The comparison of a new record against the records set is reduced from being full text match to comparing two integers
  • 5. OUTLINE OF THE PROPOSED SOLUTION The central idea behind the present algorithm is based on the fundamental property of primality of numbers I f(x) Record set Integer number space Fig: hashing I P Record set Integer number Prime number f(x) g(x) Fig: Extended hashing into prime space
  • 6. r1 r2 … rn I1 I2 … In P1 P2 … Pn PRODUCT( P prior) f(x) g(x) P1*p2 …*pn= P prior Fig: The complete algorithm
  • 7. REALIZATION OF THE ALGORITHM Two functions f(x) and g(x) are to be realized for the implementation of the algorithm. Realizing f(x) Realizing g(x)
  • 8. STEPS OF THE ALGORITHM Step 1 : For each new record, hash is performed and unique hash value (Hnew) for each distinct record is obtained. Step 2 : Hnew is mapped to its corresponding unique prime (Pnew). Step 3 : Pprior is divided with Pnew. If Pnew exactly divides Pprior, then the corresponding record to Pnew is a duplicate and already exists in Pprior. Else, Pnew is a distinct record. Step 4 : If Pnew is a distinct record, Pprior is multiplied with Pnew and the result is stored back in Pprior. Thus updating Pprior renders the algorithm adaptive.
  • 10. IMPLEMENTATIONS There are three important implementation details that need to be discussed Size of Records set Use of Logarithms Subsets of Records set
  • 11. CONCLUSION A new approach to handle duplicate records is presented This approach combines the concepts of number theory and algorithmic to solve the oftener felt problem of “duplicate record detection”.