Fast detection of transformed data leaks[mithun_p_c]

Submitted By,
JOSNA KRISHNA
S7 CSE
ROLL No.:35

 INTRODUCTION
 SENSITIVE DATAS IN COMPANIES
 DATA LEAKAGE-------HOW???
 DANGER…
 TOWARDS SECURITY
 EXISTING SYSTEM
 PROPOSED SYSTEM
 INTO THE ALGORITHM
 CONCLUSION

DATA LEAKAGE:
Data leakage is the unauthorized
transmission of sensitive data or
information from within an organization
to an external destination .

•Intellectual Properties
•Financial Information
•Patient Information
•Personal Credit Card Data,
•& Other Information
Depending Upon the
Business and the industry.

•In the course of business, data must be
handed over to trusted 3rd Parties for
some operations.
•Sometimes these trusted 3rd
Parties may act as points of
Data leakage.
•Data Leakage mainly
happens due to
Human Errors.

•A hospital may give patient records to
researcher who will devise new treatment.
•Company may have partnership with other
companies that require sharing of customer
data.
•An enterprise may outsource
it’s data processing, so data
must be given to various other
companies.

•Number of leaked sensitive data records has
grown 10 times in recent years.
•Data leakage by accidents exceeds the risk posed
by vulnerable software.
•Sensitive data leakage is more in cases where
there is no End-to-End encryption (example: PGP-
Pretty Good Privacy)

•Prevent clear text sensitive Data from Direct Access.
•Deploy a Screening Tool:
-To scan computer file systems.
-To scan server storage.
-Inspect outbound network traffic.
•Data leak detection differs from AntiVirus and Network
Intrusion Detection System (AV&NIDS).

->New security requirements
&
->Algorithmic Challenges.
Algorithmic Challenges:
-Data Transformation
-Scalability
•Direct usage of Automata-based string matching
is not possible.

It is based on Set Intersection.
Operation performed on 2 sets
of n-grams.
One from content and one from sensitive data.
This method is used to detect similar
documents on:
•The web.
•Shared malicious traffic pattern.
•Malware.
•E-mail spam.

 Symantec DLP
 Identity Finder
 Global Velocity
 GoCloud DLP etc.

Set Intersection is order less.
(Ordering of shared n-grams is not analyzed)
Generates false alerts.
(When n is set to small value)
Cannot detect the partial data leakage.
It is not an adequate method.

This one is holding sequential alignment
algorithm.
Executed on :
•Sampled sensitive data sequence.
•Sampled content being inspected.
Alignment produces the amount of sensitive data
in a content.
More accuracy is achieved.

Scalability issue is solved by sampling both the
Sensitive Data & Content Sequence before aligning.
A pair of algorithms is used:
•Comparable Sampling Algorithm
•Sampling Oblivious Alignment Algorithm
High detection specificity.
Pervasive & localized modifications.

o The Comparable Sampling Algorithm yields
constant samples of a sequence wherever
the sampling starts and ends
o The Sampling Oblivious Alignment
Algorithm infers the similarity between the
original unsampled sequence with
sophisticated techniques through dynamic
programming.

 In this method, both sensitive data &
content sequence are sampled.
 The alignment is performed on sampled
sequences
 Here, a ‘Comparable Sampling’ property is
used.
 Both the algorithms performs more faster
on a GPU than a CPU.
 Promises high speed security scanning.

Requirements:
Definition 1: A substring is a consecutive
segment of the original string.
Definition 2: A subsequence does not
require its items to be consecutive in the
original string.

Definition 3: Given string x is substring
of y ,comparable sampling on x and y
yields x’ and y’. x’ is similar to a
substring of y’.
Definition 4: Given x as a substring of
y, a subsequence preserving sampling on
x and y yield two subsequences x’ and y’
,so that x’ is substring of y’.

 It is deterministic and subsequence
preserving.
 This algorithm is unbiased.
 It yields a constant samples of a
sequence wherever the sampling starts
and ends.

 Input: an array S of items, a size |w| for a sliding
window w, a
 selection function f (w, N) that selects N smallest
items from a
 window w, i.e., f = min(w, N)
 Output: a sampled array T
 1: initialize T as an empty array of size |S|
 2: w ←read(S, |w|)
 3: let w.head and w.tail be indices in S
corresponding to the
 higher-indexed end and lower-indexed end of w,
respectively
 4: collection mc ← min(w, N)
 5: while w is within the boundary of S do

 6: mp ←mc
 7: move w toward high index by 1
 8: mc ← min(w, N)
 9: if mc = mp then
 10: item en ← collectionDiff (mc,mp)
 11: item eo ← collectionDiff (mp,mc)
 12: if en < eo then
 13: write value en to T at w.head’s position
 14: else
 15: write value eo to T at w.tail’s position
 16: end if
 17: end if
 18: end while

We set our sampling procedure with a sliding window
of size 6 (i.e., |w| = 6) and N= 3. The input
sequence is 1,5,1,9,8,5,3,2,4,8. The initial window
w= [1,5,1,9,8,5] and collection mc = sliding{1,1,5}.

 The complexity of selection function is
O(n log|w|) or O(n),where n is the size of
input, |w| is the size of the window.
 The factor O(log|w|) comes from
maintaining the smallest N items within
the window.

Requirements:
The algorithm runs on compact sampled sequences L .
Extra fields for scoring matrix cells in dynamic
programming.
Extra step in recurrence relation for updating the null
region.
Complex weight function computes similarities
between two null region.

 Order –aware comparison
 High Tolerance to pattern variation
 Capability of detecting partial leaks
 Consistent

 Input: A weight function fw, visited cells in
H matrix that are
adjacent to H(i, j ): H(i −1, j −1), H(i, j −1),
and H(i −1, j ),
and the i -th and j -th items Lai,Lbj
in two sampled sequences La
and Lb, respectively.

•Presented here is a content inspection technique
for sensitive data leakage.
•Detection approach is based on aligning 2
samples for similarity comparison.
•Our alignment method is useful for common data
scenarios.

Fast detection of transformed data leaks[mithun_p_c]

Fast detection of transformed data leaks[mithun_p_c]

More Related Content

What's hot(20)

Viewers also liked(20)

Similar to Fast detection of transformed data leaks[mithun_p_c](20)

Recently uploaded(20)

Fast detection of transformed data leaks[mithun_p_c]