0% found this document useful (0 votes)
115 views9 pages

Data Mining

This document discusses data mining and its application in crime pattern detection. It provides an overview of data mining, including what it is, how it works, common techniques like clustering and association rule learning. It then discusses how data mining can be used in criminal justice by analyzing crime databases and identifying patterns that may help solve crimes faster. As an example, it describes how k-means clustering can be applied to real crime data to group similar crimes together by location, helping detectives identify clusters of related criminal activity.

Uploaded by

Ranjitha Mada
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
115 views9 pages

Data Mining

This document discusses data mining and its application in crime pattern detection. It provides an overview of data mining, including what it is, how it works, common techniques like clustering and association rule learning. It then discusses how data mining can be used in criminal justice by analyzing crime databases and identifying patterns that may help solve crimes faster. As an example, it describes how k-means clustering can be applied to real crime data to group similar crimes together by location, helping detectives identify clusters of related criminal activity.

Uploaded by

Ranjitha Mada
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 9

DATA MINING

Authors:

M.RANJITHA(08B81A0570) B.SANGEETHA(08B81A0587)

CSE BTECH III YEAR CSE BTECH III YEAR

EMAIL:[email protected] EMAIL: [email protected]

Phone no:8374235238 Phone no:9912139698

CVR COLLEGE OF ENGINEERING


Vastunagar , Mangalpalli (V), Ibrahimpatan-501 510,
RR Dist., Andhra Pradesh.
ABSTRACT

Data mining can be used to model crime information that can be used to increase
detection problems. Crimes are a social revenue, cuts costs, or both. Data mining
nuisance and cost our society dearly in software is one of a number of analytical
several ways. Any research that can help in
tools for analyzing data. It allows users to
solving crimes faster will pay for itself.
About 10% of the criminals commit about analyze data from many different
50% of the crimes. Here we look at use of dimensions or angles, categorize it, and
clustering algorithm for a data mining summarize the relationships identified.
approach to help detect the crimes patterns Technically, data mining is the process of
and speed up the process of solving crime. finding correlations or patterns among
We will look at k-means clustering with dozens of fields in large relational databases.
some enhancements to aid in the process of
identification of crime patterns. We applied • What can data mining do?
these techniques to real crime data from a
sheriff’s office and validated our results. We Data mining is primarily used today by
also use semi-supervised learning technique companies with a strong consumer focus -
here for knowledge discovery from the retail, financial, communication, and
crime records and to help increase the marketing organizations. It enables these
predictive accuracy. We also developed a companies to determine relationships among
weighting scheme for attributes here to deal "internal" factors such as price, product
with limitations of various out of the box positioning, or staff skills, and "external"
clustering tools and techniques. This easy to factors such as economic indicators,
implement data mining framework works competition, and customer demographics.
with the geospatial plot of crime and helps And, it enables them to determine the impact
to improve the productivity of the detectives on sales, customer satisfaction, and
and other law enforcement officers. It can corporate profits. Finally, it enables them to
also be applied for counter terrorism for "drill down" into summary information to
homeland security. view detail transactional data. With data
mining, a retailer could use point-of-sale
records of customer purchases to send
INTRODUCTION: targeted promotions based on an individual's
purchase history. By mining demographic
data from comment or warranty cards, the
• What is data mining? retailer could develop products and
promotions to appeal to specific customer
Generally, data mining (sometimes called segments
data or knowledge discovery) is the process
of analyzing data from different perspectives
and summarizing it into useful information -
• Data warehousing: • Clusters: Data items are grouped
according to logical relationships or
Dramatic advances in data capture, consumer preferences. For example,
processing power, data transmission, and data can be mined to identify market
storage capabilities are enabling segments or consumer affinities.
organizations to integrate their various
databases into data warehouses. Data • Associations: Data can be mined to
warehousing is defined as a process of identify associations. The beer-
centralized data management and retrieval. diaper example is an example of
Data warehousing, like data mining, is a associative mining.
relatively new term although the concept
itself has been around for years. Data • Sequential patterns: Data is mined
warehousing represents an ideal vision of to anticipate behavior patterns and
maintaining a central repository of all trends. For example, an outdoor
organizational data. Centralization of data is equipment retailer could predict the
needed to maximize user access and likelihood of a backpack being
analysis. Dramatic technological advances purchased based on a consumer's
are making this vision a reality for many purchase of sleeping bags and hiking
companies. And, equally dramatic advances shoes.
in data analysis software are allowing users
to access this data freely. The data analysis Applications:
software is what supports data mining.
 Data Mining in Agriculture
• How does data mining work?
 Surveillance / Mass surveillance
While large-scale information technology  National Security Agency
has been evolving separate transaction and  Quantitative
analytical systems, data mining provides the  structure-activity relationship
link between the two. Data mining software
 Customer analytics
analyzes relationships and patterns in stored
 Police-enforced ANPR in the UK
transaction data based on open-ended user
queries. Several types of analytical software  Stellar wind (code name)
are available: statistical, machine learning,
and neural networks. Generally, any of four
types of relationships are sought: Methods:
 Association rule learning
• Classes: Stored data is used to locate  Cluster analysis
data in predetermined groups. For  Structured data analysis (statistics)
example, a restaurant chain could
mine customer purchase data to  Java Data Mining
determine when customers visit and  Data analysis
what they typically order. This  Predictive analytics
information could be used to
increase traffic by having daily
specials.
Now let us see the application of data crimes, computer data analysts have started
mining used in real time. Its gives us a clear helping the law enforcement officers and
idea on how data mining is actually used detectives to speed up the process of solving
and the procedure of step by step. We crimes. Here we will take an
choose to discuss about crime watch using interdisciplinary approach between
data mining as we can clearly get a idea on computer science and criminal justice to
how intelligently its been tackled. The develop a data mining paradigm that can
following below is the example described help solve crimes faster. More specifically,
we will use clustering based models to help
APPLICATION: in identification of crime-patterns [1].
We will discuss some terminology that is
used in criminal justice and police
CRIME PATTERN DETECTION departments and compare and contrast them
relative to data mining systems. Suspect
1. Introduction refers to the person that is believed to have
committed the crime. The suspect may be
Historically solving crimes has been the identified or unidentified. The suspect is not
prerogative of the criminal justice and law a convict until proved guilty. The victim is
enforcement specialists. With the increasing the person useful in identifying a crime
use of the computerized systems to track pattern or a crime spree.
who is the target of the crime. Most of the Some well-known examples of crime
time the victim is identifiable and in most patterns are the DC sniper, a serial-rapist or
cases is the person reporting the crime. a serial killer. These crimes may involve
Additionally, the crime may have some single suspect or may be committed by a
witnesses. group of suspects. The below figure shows
There are other words commonly used such the plot of geo-spatial clusters of crime.
as homicides that refer to manslaughter or
killing someone. Within homicides there
may be categories like infanticide, eldercide,
killing intimates and killing law
enforcement officers. For the purposes of
our modeling, we will not need to get into
the depths of criminal justice but will
confine ourselves to the main kinds of
crimes.
Cluster (of crime) has a special meaning and
refers to a geographical group of crime, i.e.
a lot of crimes in a given geographical
region. Such clusters can be visually
represented using a geo-spatial plot of the
crime overlaid on the map of the police Fig 1 Geo-spatial plot of crimes, each red
jurisdiction. The densely populated group of dot represents a crime incident.
crime is used to visually locate the ‘hot-
spots’ of crime. However, when we talk of
clustering from a data-mining standpoint, we
refer to similar kinds of crime in
2. Crime Reporting Systems: 3. Data Mining and Crime
Patterns:
The data for crime often presents an We will look at how to convert crime
interesting dilemma. While some data is information into a data-mining problem [2],
kept confidential, some becomes public such that it can help the detectives in solving
information. Data about the prisoners can crimes faster. We have seen that in crime
often be viewed in the county or sheriff’s terminology a cluster is a group of crimes in
sites. However, data about crimes related to a geographical region or a hot spot of crime.
narcotics or juvenile cases is usually more Whereas, in data mining terminology a
restricted. Similarly, the information about cluster is group of similar data points – a
the sex offenders is made public to warn possible crime pattern. Thus appropriate
others in the area, but the identity of the clusters or a subset of the cluster will have a
victim is often prevented. Thus as a data one-to-one correspondence to crime
miner, the analyst has to deal with all these patterns.
public versus private data issues so that data Thus clustering algorithms in data mining
mining modeling process does not infringe are equivalent to the task of identifying
on these legal boundaries. groups of records that are similar between
Most sheriffs’ office and police departments themselves but different from the rest of the
use electronic systems for crime reporting data. In our case some of these clusters will
that have replaced the traditional paper- useful for identifying a crime spree
based crime reports. These crime reports committed by one or same group of
have the following kinds of information suspects. Given this information, the next
categories namely - type of crime, date/time, challenge is to find the variables providing
location etc. Then there is information about the best clustering. These clusters will then
the suspect (identified or unidentified), be presented to the detectives to drill down
victim and the witness. Additionally, there is using their domain expertise. The automated
the narrative or description of the crime and detection of crime patterns, allows the
Modus Operandi (MO) that is usually in the detectives to focus on crime sprees first and
text form. The police officers or detectives solving one of these crimes results in
use free text to record most of their solving the whole “spree” or in some cases
observations that cannot be included in if the groups of incidents are suspected to be
checkbox kind of pre-determined questions. one spree, the complete evidence can be
While the first two categories of information built from the different bits of information
are usually stored in the computer databases from each of the crime incidents. For
as numeric, character or date fields of table, instance, one crime site reveals that suspect
the last one is often stored as free text. has black hair, the next incident/witness
The challenge in data mining crime data reveals that suspect is middle aged and third
often comes from the free text field. While one reveals there is tattoo on left arm, all
free text fields can give the newspaper together it will give a much more complete
columnist, a great story line, converting picture than any one of those alone. Without
them into data mining attributes is not a suspected crime pattern, the detective is
always an easy job. We will look at how to less likely to build the complete picture from
arrive at the significant attributes for the bits of information from different crime
data mining models. incidents. Today most of it is manually done
with the help of multiple spreadsheet reports
that the detectives usually get from the ago. Thus, in order to be able to detect
computer data analysts and their own crime newer and unknown patterns in future,
logs. clustering techniques work better.
We choose to use clustering technique over
any supervised technique such as 4. Clustering Techniques Used:
classification, since crimes vary in nature
widely and crime database often contains
several unsolved crimes. Therefore, We will look at some of our contributions to
classification technique that will rely on the this area of study. We will show a simple
existing and known solved crimes, will not clustering example here.
give good predictive quality for future Let us take an oversimplified case of crime
crimes. Also nature of crimes change over record. A crime data analyst or detective
time, such as will use a report based on this data sorted in
Internet based cyber crimes or crimes using different orders, usually the first sort will be
cell-phones were uncommon not too long on the most important characteristic based
on the detective’s experience.
amounts of data and dealing with noisy or
missing data about the crime incidents.
Crime Suspec Suspec Suspec Victi Weapo
m
We used k-means clustering technique here,
Type t t t n
Race Sex Age age as it is one of the most widely used data
Robber B M Middle Elderl Knife mining clustering technique.
y y Next, the most important part was to prepare
Robber W M Young Middl Bat
y e the data for this analysis. The real crime data
Robber B M ? Elderl Knife was obtained from a Sherriff’s office, under
y y
Robber B F Middle Youn Piston
non-disclosure agreements from the crime
y g reporting system. The operational data was
converted into demoralized data using the
Table 1 Simple Crime Example extraction and transformation. Then, some
checks were run to look at the quality of
We look at table 1 with a simple example of data such as missing data, outliers and
crime list. multiple abbreviations for same word such
as blank, unknown all meant the same for
The type of crime is robbery and it will be missing age of the person. If these are not
the most important attribute. The rows 1 and coded as one value, clustering will create
3 show a simple crime pattern where the these as multiple groups for same logical
suspect description matches and victim value. The next task was to identify the
profile is also similar. The aim here is that significant attributes for the clustering. This
we can use data mining to detect much more process involved talking to domain experts
complex patterns since in real life there are such as the crime detectives, the crime data
many attributes or factors for crime and analysts and iteratively running the attribute
often there is partial information available importance algorithm to arrive at the set of
about the crime. In a general case it will not attributes for the clustering the given crime
be easy for a computer data analyst or types. We refer to this as the semi
detective to identify these patterns by simple supervised or expert-based paradigm of
querying. problem solving.
Thus clustering technique using data mining Based on the nature of crime the different
comes in handy to deal with enormous attributes become important such as the age
group of victim is important for homicide, existing clusters using tracers or known
for burglary the same may not be as crime incidents injected into the new data
important since the burglar may not care set and then compare the new clusters
about the age of the owner of the house. relative to the tracers.
To take care of the different attributes for This process of using tracers is analogous to
different crimes types, we introduced the use of radioactive tracers to locate
concept of weighing the attributes. This something that is otherwise hard to find.
allows placing different weights on different
attributes dynamically based on the crime
types being clustered. This also allows us to
weigh the categorical attributes unlike just Figure
the numerical attributes that can be easily
scaled for weighting them. Using the
integral weights, the categorical attributes
can be replicated as redundant columns to
increase the effective weight of that variable
or feature. We have not seen the use of
weights for clustering elsewhere in the
literature review, as upon normalization all
attributes assume equal importance in
clustering algorithm. However, we have
introduced this weighting technique here in
light of our semi-supervised or expert based
methodology. Based on our weighted
clustering attributes, we cluster the dataset
for crime patterns and then present the
results to the detective or the domain expert
along with the statistics of the important
attributes.
The detective looks at the clusters, smallest
clusters first and then gives the expert
recommendations. This iterative process
helps to determine the significant attributes
and the weights for different crime types. Figure 2 Plot of crime clusters with legend
Based on this information from the domain for significant attributes for that crime
expert, namely the detective, future crime pattern
patterns can be detected. First the future or
unsolved crimes can be clustered based on
the significant attributes and the result is
given to detectives for inspection. Since, this 5. Results of Crime Pattern
clustering exercise, groups hundreds of Analysis:
crimes into some small groups or related
crimes, it makes the job of the detective
The proposed system is used along with the
much easier to locate the crime patterns.
geo spatial plot. The crime analyst may
The other approach is to use a small set of
choose a time range and one or more types
new crime data and score it against the
of crime from certain geography and display
the result graphically. From this set, the user the court dispositions to verify that some of
may select either the entire set or a region of the data mining clusters or patterns were
interest. The resulting set of data becomes indeed crime spree by the same culprit(s).
the input source for the data mining
processing. These records are clustered
based on the predetermined attributes and
the weights. The resulting, clusters have the
possible crime patterns. These resulting
clusters are plotted on the geo-spatial plot. 6. Conclusions and Future
We show the results in the figure below. The
different clusters or the crime patterns are Direction:
color-coded. For each group, the legend
provides the total number of crimes We looked at the use of data mining for
incidents included in the group along with identifying crime patterns crime pattern
the significant attributes that characterize the using the clustering techniques. Our
group. This information is useful for the contribution here was to formulate crime
detective to look at when inspecting the pattern detection as machine learning task
predicted crime clusters. and to thereby use data mining to support
We validated our results for the detected police detectives in solving crimes. We
crime patterns by looking the court identified the significant attributes; using
dispositions on these crime incidents as to expert based semi-supervised learning
whether the charges on the suspects were method and developed the scheme for
accepted or rejected. So to recap the starting weighting the significant attributes. Our
point is the crime incident data (some of modeling technique was able to identify the
these crimes already had the court crime patterns from a large number of
dispositions/ rulings available in the crimes making the job for crime detectives
system), which the measured in terms of the easier.
significant attributes or features or crime Some of the limitations of our study include
variables such as the demographics of the that crime pattern analysis can only help the
crime, the suspect, the victim etc. No detective, not replace them. Also data
information related to the court ruling was mining is sensitive to quality of input data
used in the clustering process. that may be inaccurate, have missing
Subsequently, we cluster the crimes based information, be data entry error prone etc.
on our weighing technique, to come up with Also mapping real data to data mining
crime groups (clusters in data mining attributes is not always an easy task and
terminology), which contain the possible often requires skilled data miner and crime
crime patterns of crime sprees. The geo- data analyst with good domain knowledge.
spatial plot of these crime patterns along They need to work closely with a detective
with the significant attributes to quantify in the initial phases.
these groups is presented to the detectives As a future extension of this study we will
who now have a much easier task to identify create models for predicting the crime hot-
the crime sprees than from the list of spots [3] that will help in the deployment of
hundreds of crime incidents in unrelated police at most likely places of crime for any
orders or some predetermined sort order. In given window of time, to allow most
our case, we looked at the crime patterns, as effective utilization of police resources. We
shown in same colors below and looked at also plan to look into developing social link
networks to link criminals, suspects, gangs
and study their interrelationships.
Additionally the ability to search suspect
description in regional, FBI databases [4], to
traffic violation databases from different
states etc. to aid the crime pattern detection
or more specifically counter terrorism
measures will also add value to this crime
detection paradigm.

7. References:
[1] Hsinchun Chen, Wingyan Chung, Yi
Qin, Michael Chau, Jennifer Jie Xu, Gang
Wang, Rong Zheng, Homa Atabakhsh,
“Crime Data Mining: An Overview and
Case Studies”, AI Lab, University of
Arizona, proceedings National Conference
on Digital Government Research, 2003,
available at: http://ai.bpa.arizona.edu/

[2] Hsinchun Chen, Wingyan Chung, Yi


Qin, Michael Chau, Jennifer Jie Xu, Gang
Wang, Rong Zheng, Homa Atabakhsh,
“Crime Data Mining: A General Framework
And Some Examples”, IEEE Computer
Society April 2007

You might also like