0% found this document useful (0 votes)
51 views8 pages

Crime Type and Occurrence Prediction Using Machine Learning Algorithm

Uploaded by

sravs85220
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
51 views8 pages

Crime Type and Occurrence Prediction Using Machine Learning Algorithm

Uploaded by

sravs85220
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Proceedings of the International Conference on Artificial Intelligence and Smart Systems (ICAIS-2021)

IEEE Xplore Part Number: CFP21OAB-ART; ISBN: 978-1-7281-9537-7

CRIME TYPE AND OCCURRENCE PREDICTION USING MACHINE LEARNING ALGORITHM

Kanimozhi N 1 , Keerthana N V2, Pavithra G S3, Ranjitha G4 ,Yuvarani S 5


1
Assistant Professor,Kongu Engineering College
2
Assistant Professor,Velalar College of
Engineering and Technology
2021 International Conference on Artificial Intelligence and Smart Systems (ICAIS) | 978-1-7281-9537-7/20/$31.00 ©2021 IEEE | DOI: 10.1109/ICAIS50930.2021.9395953

3,4,5
UG Scholar, Kongu Engineering College
1
[email protected]
2
[email protected]
3
[email protected]
4
[email protected]
5
[email protected]

Abstract - In this era of recent times, crime activities. Hence, use of machine learning techniques
has become an evident way of making people and and its records is required to predict the crime type
society under trouble. An increasing crime factor and patterns. It imposes the uses of existing crime
leads to an imbalance in the constituency of a data and predicts the crime type and its occurrence
country. In order to analyse and have a response bases on the location and time. Researchers
ahead this type of criminal activities, it is necessary undergone many studies that helps in analysing the
to understand the crime patterns. This study crime patterns along with their relations in a specific
imposes one such crime pattern analysis by using location. Some of the hotspots analysed has become
crime data obtained from Kaggle open source easier way of classifying the crime patterns. This
which in turn used for the prediction of most leads to assist the officials to resolve them faster. This
recently occurring crimes. The major aspect of this approach uses a dataset obtained from Kaggle open
project is to estimate which type of crime source based on various factors along with the time
contributes the most along with time period and and space where it occurs over a certain period of
location where it has happened. Some machine time. We implied a classification algorithm that helps
learning algorithms such as Naïve Bayes is implied in locating the type of crime and hotspots of the
in this work in order to classify among various criminal actions that takes place on the certain time
crime patterns and the accuracy achieved was and day. In this proposed one to impose a machine
comparatively high when compared to pre- learning algorithms to find the matching criminal
composed works. patterns along with the assist of its category with the
Keywords: Crime, Analyse, Crime patterns, given temporal and spatial data.
Kaggle, Estimate, Naïve Bayes, Accuracy
II. Literature Survey
I. Introduction
Crime are of different type that occurs at
Crime has become a major thread imposed
different locations around the various geographical
which is considered to grow relatively high in
location. Many research scholars have been
intensity. An action stated is said to be a crime, when
suggesting a mechanism to analyse the relationship
it violates the rule, against the government laws and it
between crime and social variables that includes
is highly offensive. The crime pattern analysis
unemployed individuals, earning amount, level of
requires a study in the different aspects of
education and so on.
criminology and also in indicating patterns. The
Suhong Kim and Param Joshi [1] proposed
Government has to spend a lot of time and work to
two different machine learning models which is used
imply technology to govern some of these criminal
for prediction, K nearest neighbour algorithm (KNN )

978-1-7281-9537-7/21/$31.00 ©2021 IEEE 266

Authorized licensed use limited to: University of Prince Edward Island. Downloaded on May 15,2021 at 17:55:24 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the International Conference on Artificial Intelligence and Smart Systems (ICAIS-2021)
IEEE Xplore Part Number: CFP21OAB-ART; ISBN: 978-1-7281-9537-7

and decision tree approach. The accuracy obtained over these recent years, system has to handle an
ranges between 39 to 44 percent when predicting enormous amount of data which requires more time to
crime patterns and finding the crime type. Benjamin analyse them manually. Hence, advance machine
Fredrick David. H [2] imposed a data mining learning approaches like K means clustering has been
technique that involves evaluating and inspect large used. A literature survey on Spatial and Temporal
pre-existing datasets in accordance to deliver more Hotspot prediction of crime [6] proposed a study to
information. The extraction of new patterns is cross categorize and evaluate the location and time of the
checked with predefined datasets available. crime hotspot detection techniques by performing
Shraddha S. Kavathekar [3] used association (SLR) Systematic Literature Review. Fuzhan Nasiri,
rule mining in predicting crimes. Some Machine Zakikhani, Kimiya and Tarek Zayed [7] suggested a
learning algorithms including Deep Neural Network failure prediction model that helps in detecting the
(DNN) and Artificial Neural Network (ANN) have corrosion in the pipelines of gas transmission. Most of
been implied. A deep neural network works more the prediction model depend absolutely on the
accurately using the feature level dataset. Using DNN, experimental tests data or involving some of the
entirely connected convolution layers has been used limited historical data records. This helps in ignoring
in building the prediction model, mainly for multi- the corrosion from various geographical
labelled data classification. It was implemented using circumstances. Nikhli Dubey and Setu K. Chaturvedi
Tenserflow that is an API mainly designed for Deep [8] imposed pertinent analysis of data mining
learning technique with the dropout layers. These approaches for the detection of the impeding future
findings suggest that when there is more count of crime. A Computational mechanism to classify the
missing values, there is a need for pre-processing crime using machine learning techniques [9] proposed
because crimes do not occur in the same manner but a malleable computational implementation tool to
focuses on some particular areas. Artificial Neural analyse the crime rate in a country helps in classifying
Network [ANN] is based on the prognosis by trend cybercrimes. Hyeon-Woo Kang and Hang-Bong Kang
analysis in solving problems. It comprises of [10] suggested a fusion method based on Deep Neural
enormous amount of processing constituent that Network in predicting the criminal activities from the
works altogether in building a model. Chandy and feature level data with sufficient parameters.
Abraham [4] proposed a random forest classifier in III. Existing System
extracting the features for data processing using cloud In pre-work, the dataset obtained from the
computing. The extracted features are request number, open source are first pre-processed to remove the
user identification, expiry time, time of arrival nd duplicated values and features. Decision tree has been
memory requirement. After feature extraction, the used in the factor of finding crime patterns and also
prediction of work load is done by using the trained extracting the features from large amount of data is
data that has been perceived from the learning stage inclusive. It provides a primary structure for further
that allows to learn the details of the extracted classification process. The classified crime patterns
features from user’s request. are feature extracted using Deep Neural network.
Rohit Patil, Muzamil Kacchi, Pranali Gavali Based on the prediction, the performance is calculated
and Komal Pimparia [5] suggests an Apriori for both trained and test values. The crime prediction
algorithm for frequent patterns and the result obtained helps in forecasting the future happening of any type
from K-means is used. Due to increase in crime rate

978-1-7281-9537-7/21/$31.00 ©2021 IEEE 267

Authorized licensed use limited to: University of Prince Edward Island. Downloaded on May 15,2021 at 17:55:24 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the International Conference on Artificial Intelligence and Smart Systems (ICAIS-2021)
IEEE Xplore Part Number: CFP21OAB-ART; ISBN: 978-1-7281-9537-7

of criminal activities and help the officials to resolve VI. Advantages


them at the earliest. 1. The proposed algorithm is well suited for the
IV. Drawbacks crime pattern detection since most of the
1. The pre-existing works account for low featured attributes depends on the time and
accuracy since the classifier uses a location.
categorical values which produces a biased 2. It also overcomes the problem of analysing
outcome for the nominal attributes with independent effect of the attributes.
greater value. 3. The initialization of optimal value is not
2. The classification techniques does not suited required since it accounts for real valued,
for regions with inappropriate data and real nominal value and also concern the region
valued attributes. with insufficient information.
3. The value of the classifier must be tuned and 4. The accuracy has been relatively high when
hence there is a need of assigning an optimal compared to other machine learning
value. prediction model.
V. Proposed System VII. Module Description
The data obtained is first pre-processed using 1. Data pre-processing
machine learning technique filter and wrapper in 2. Mapping
order to remove irrelevant and repeated data values. It 3. Naïve Bayes classification
also reduces the dimensionality thus the data has been 4. Crime prediction
cleaned. The data is then further undergoes a splitting 5. Evaluation
process. It is classified into test and trained data set. 6.
The model is trained by dataset both training and A. Data Pre-Processing
testing .It is then followed by mapping. The crime Data obtained from the open source must be
type, year, month, time, date, place are mapped to an first pre-processed in order to overcome unnecessary
integer for ensuring classification easier. The violations. The dataset has been chosen for Denver
independent effect between the attributes are analysed city with enormous amount of crime data over six
initially by using Naïve Bayes. Bernouille Naïve years. The machine learning technique filter and
Bayes is used for classifying the independent features wrapper is implied to find the missing integral in
extracted. The crime features are labelled that allows specified attribute values. Data cleaning play a vital in
to analyse the occurrence of crime at a particular time training a prediction model and also in the
and location. Finally, the crime which occur the most performance of the commenced process.
along with spatial and temporal information is gained. Filtering the instance and removal of
The performance of the prediction model is find out irrelevant context from datasets are done. The filtering
by calculating accuracy rate. The language used in methods contributes in measuring the significance of
designing the prediction model is python and run on the features. The correlation with the dependent
the Colab – an online compiler for data analysis and values is considered in the feature selection. The
machine learning models. wrapper method imposed is used in measuring how
useful is the feature subset by training a prediction
model on it actually. The data after pre-processing is
split into test and trained attributes.

978-1-7281-9537-7/21/$31.00 ©2021 IEEE 268

Authorized licensed use limited to: University of Prince Edward Island. Downloaded on May 15,2021 at 17:55:24 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the International Conference on Artificial Intelligence and Smart Systems (ICAIS-2021)
IEEE Xplore Part Number: CFP21OAB-ART; ISBN: 978-1-7281-9537-7

INCIDENT_ OFFENSE_ID OFFENSE_CO OFFENSE_CODE_EXTEN OFFENSE_CATEGOR


ID DE SION Y_ID
2018869789 20188697892399 2399 0 theft-other
00
202111218 20211121857070 5707 0 criminal-trespassing
0
2017600521 20176005213239 2399 1 theft-bicycle
3 900
2019601224 20196012240230 2308 0 theft-from-bldg
0 800
2018861883 20188618835016 5016 0 violation-of-restraining-
00 order

Table 1. Dataset Collection

FIRST_OCCURRENCE_DATE LAST_OCCURRENCE_DATE REPORTED_DATE


12/27/2018 3:58:00 PM NIL 12/27/2018 4:51:00 PM
01-06-2021 9.20.00 PM NIL 01-07-2021 12.23.00 AM
06-08-2017 1.15.00 PM 06-08-2017 5.15.00 PM 06-12-2017 8.44.00 AM
12-07-2019 1.07.00 PM 12-07-2019 6.30.00 PM 12-09-2019 1.35.00 PM
12/22/2018 8:15:00 PM 12/22/2018 8:31:00 PM 12/22/2018 10:00:00 PM

Table 2. Crime Dataset with occurrence date and time

NEIGHBORHOOD_ID IS_CRIME IS_TRAFFIC


montbello 1 0
Gateway-green-valley-ranch 1 0
wellshire 1 0
belcaro 1 0
cherry-creek 1 0

Table 3. Neighbourhood dataset

B. Mapping in implementing the proposed work since it is well


The crime features such as crime type, the suited for machine learning process. The package
date on which the crime has been occurred including matplotlib is imported in order to plot the graph to
the time of occurrence are first segregated. It is then show the occurrence of the criminal activities. The
mapped to an integer for easy labelling. The labelled crime which occurred the most can be plotted in the
details are further analysed and used are used in graph graph which contributes for further prediction
plotting. Python is chosen as programming language process.

978-1-7281-9537-7/21/$31.00 ©2021 IEEE 269

Authorized licensed use limited to: University of Prince Edward Island. Downloaded on May 15,2021 at 17:55:24 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the International Conference on Artificial Intelligence and Smart Systems (ICAIS-2021)
IEEE Xplore Part Number: CFP21OAB-ART; ISBN: 978-1-7281-9537-7

NEIGHBORHOOD_ID IS_CRIME
montbello 1
gateway-green-valley-ranch 2
wellshire 3
belcaro 2
cherry-creek 2

Table 4. Mapping crime type

NEIGHBORHOOD_ID IS_CRIME CRIME_OCCURENCE_MONTH


montbello 1 6
gateway-green-valley-ranch 2 10
wellshire 3 3
belcaro 2 1
cherry-creek 2 6

Table 5. Finding crime occurrence type and month in a dataset

CRIME_OCCURENCE_DAY CRIME_OCCURENCE_TIME CRIME_OCCURENCE_YEAR


3 6 3
3 3 4
5 5 3
2 5 5
4 5 4

Table 6. Finding crime occurrence day, time, year count in dataset

C. Naïve Bayes Classification 2. Multi-nominal Naïve Bayes is applied for


The reason behind the application of Naïve multiple classifier that corresponds to the
Bayes is that crime prediction usually concerns with categorical features in the trained value.
the temporal and spatial data. The independent effect 3. Bernouille Naïve Bayes is used for the
among the attribute values is first analysed since the working of independent feature effects of the
selected crime attributes possess an independent effect selected attributes for crime prediction.
upon them. They are used in creating a model by D. Crime Prediction
providing a training using crime data that are related The expected crime type is predicted by
to robbery, burglary, murder, sexual abusing, armed extending the supported crime features. The features
robbery, chain snatching, gang rape and highway are then applied to nominal values. It could be
robbery. Some of the extended techniques of Naïve explained clearly by taking a single tuple as an
Bayes has been implied. instance.
1. Gaussian Naïve Bayes is related to real Considering a tuple:
valued attribute selection. It is otherwise 1. {Gateway town, 20th October 2020 , 2:
stated as normal distribution that is done by 30 PM, Friday} => {Larceny – a crime
calculating the standard deviation and mean involves the theft of a particular’s
from the trained data. property}

978-1-7281-9537-7/21/$31.00 ©2021 IEEE 270

Authorized licensed use limited to: University of Prince Edward Island. Downloaded on May 15,2021 at 17:55:24 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the International Conference on Artificial Intelligence and Smart Systems (ICAIS-2021)
IEEE Xplore Part Number: CFP21OAB-ART; ISBN: 978-1-7281-9537-7

Considering probable occurrence based on the


feature extracted:
1. {Gateway town} => {Theft has
occurred}
2. {October} => {Theft has occurred}
3. {2020} => {Theft has occurred}
4. {2:30 PM} => {Theft has occurred}
5. {Friday} => {Theft has occurred}
The independent occurrence has been formed
and the conditional probability is calculated. By
doing so, we could predict the crime type.
Usage of symbols:

1. m represents Month
Fig 2. Plotting the highest occurrence month
2. t represents Time
3. a represents Area
4. d represents Day
5. y presents Year
6. c represents Type

The Formula using the chain in order to find the


conditional probability:-

P(c|m, y, a, t, d) = [P(m|c, y, a, t, d) * P(y|c, a, t, d) *


P (t|d, c) * P(d|c) * P(c)] / [P(m|y, a, t, d) * P(y|a, t,
d) * P (a|t, d) * P(t|d)]

Fig 3. Plotting the highest occurrence time range

Fig 1. Plotting the highest crime type

978-1-7281-9537-7/21/$31.00 ©2021 IEEE 271

Authorized licensed use limited to: University of Prince Edward Island. Downloaded on May 15,2021 at 17:55:24 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the International Conference on Artificial Intelligence and Smart Systems (ICAIS-2021)
IEEE Xplore Part Number: CFP21OAB-ART; ISBN: 978-1-7281-9537-7

VIII. Conclusion

In this paper, the difficulty in dealing with


the nominal distribution and real valued attributes is
overcome by using two classifiers such as Multi-
nominal NB and Gaussian NB. Much training time is
not required and serves to be the best suited for real-
time predictions. It also overcomes the problem of
working with continuous target set of variables where
the existing work refused to fit with. Thus the crime
that occur the most could be predicted and spotted
using Naïve Bayesian Classification. The performance
of the algorithm is also calculated by using some
standard metrics. The metrics include average
Fig 4. Plotting the highest occurrence day precision, recall, F1 score and accuracy are mainly
concerned in the algorithm evaluation. The accuracy
F. Evaluation
value could be increased much better by
The performance of the implied prediction is implementing machine learning algorithms.
then evaluated in order to achieve a high degree of
accuracy when compared to the pre-existing model IX. Future Work
used. The training is done with cross validation that
Though it overcomes the problem of the
helps in training the data on different set of training
existing work, it has some limitations. In the situation
data. It will evaluate the accuracy for overall splits in
of absence of class labels, then the probability of the
the cross validation implied. In python, in order to
estimation will be zero. As a future extension of the
calculate the value of accuracy we need to pass the
proposed work, the application of more machine
data arguments such as model name, target set and cv
learning classification models proves to increase
that helps in signifying the split occurrence. Finally,
accuracy in crime prediction and will enhance the
the mean and the standard deviation of the average
overall performance. It helps in providing a better
precision is calculated. The accuracy of 93.07% has
study for the future improvement by taking the
been achieved who gives a great increase in compared
income information into consideration for
to existing prediction models.
neighborhoods places in order to foresee if any
EVALUATION CROSS VALIDATION relationship between the income levels of a particular
METRICS in the neighborhood places and their crime rate.
Accuracy 93.07%
X. References
Precision 92.53% [1] Suhong Kim, Param Joshi, Parminder Singh

Recall 85.76% Kalsi,Pooya Taheri, “Crime Analysis


Through Machine Learning”, IEEE
F1 score 92.12%
Transactions on November 2018.
Table 7. Performance measure for Naïve Bayes [2] Benjamin Fredrick David. H and A.
classifier Suruliandi,“Survey on Crime Analysis and

978-1-7281-9537-7/21/$31.00 ©2021 IEEE 272

Authorized licensed use limited to: University of Prince Edward Island. Downloaded on May 15,2021 at 17:55:24 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the International Conference on Artificial Intelligence and Smart Systems (ICAIS-2021)
IEEE Xplore Part Number: CFP21OAB-ART; ISBN: 978-1-7281-9537-7

Prediction using Data mining techniques”, Crime Offenses using Machine Learning”,
ICTACT Journal on Soft Computing on Sustainability Journals, Volume 12, Issue 10,
April 2012. Published on May 2020.
[3] Shruti S.Gosavi and Shraddha S. [10] Hyeon-Woo Kang and Hang-Bong Kang,
Kavathekar,“A Survey on Crime Occurrence “Prediction of crime occurrence from multi-
Detection and prediction Techniques”, modal data using deep learning”, Peer-
International Journal of Management, reviewed journal, published on April 2017.
Technology And Engineering , Volume 8,
Issue XII, December 2018.
[4] Chandy, Abraham, "Smart resource usage
prediction using cloud computing for
massive data processing systems" Journal of
Information Technology 1, no. 02 (2019):
108-118.
[5] Learning Rohit Patil, Muzamil Kacchi,
Pranali Gavali and Komal Pimparia, “Crime
Pattern Detection, Analysis & Prediction
using Machine”, International Research
Journal of Engineering and Technology,
(IRJET) e-ISSN: 2395-0056, Volume: 07,
Issue: 06, June 2020
[6] Umair Muneer Butt, Sukumar Letchmunan,
Fadratul Hafinaz Hassan, Mubashir Ali,
Anees Baqir and Hafiz Husnain Raza
Sherazi, “Spatio-Temporal Crime Hotspot
Detection and Prediction: A Systematic
Literature Review”, IEEE Transactions on
September 2020.
[7] Nasiri, Zakikhani, Kimiya and Tarek Zayed,
"A failure prediction model for corrosion in
gas transmission pipelines", Proceedings of
the Institution of Mechanical Engineers, Part
O: Journal of Risk and Reliability, (2020).
[8] Nikhil Dubey and Setu K. Chaturvedi, “A
Survey Paper on Crime Prediction Technique
Using Data Mining”, Corpus ID: 7997627,
Published on 2014.
[9] Rupa Ch, Thippa Reddy Gadekallu, Mustufa
Haider Abdi and Abdulrahman Al-Ahmari,
“Computational System to Classify Cyber

978-1-7281-9537-7/21/$31.00 ©2021 IEEE 273

Authorized licensed use limited to: University of Prince Edward Island. Downloaded on May 15,2021 at 17:55:24 UTC from IEEE Xplore. Restrictions apply.

You might also like