0% found this document useful (0 votes)

10 views16 pages

Chapter 02 Overview (Python)

Uploaded by

ps4yass3r

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views16 pages

Chapter 02 Overview (Python)

Uploaded by

ps4yass3r

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 16

MIS 541

Introduction to Business
Analytics
Rasha Alahmad
Chapter (2) : Overview of
the Data Mining Process
Core Ideas in Data Mining

o Classification
o Prediction
o Association Rules & Recommenders
o Data and Dimension Reduction
o Data Exploration and Visualization
Core Ideas in Data Mining
Classification

It is a common task in data mining is to examine data where the

classification is unknown with the goal of predicting what that
classification is or will be.

o Email Filtering: Classifying an incoming email as

"spam" or "not spam" based on its content and sender
information.

o Medical Diagnosis: Classifying a patient’s condition as

"healthy," "at risk," or "ill" based on their symptoms and
test results.
Core Ideas in Data Mining
Prediction

It predicts the value of a numerical variable (e.g., purchase

amount) rather than a category (e.g., purchaser or non-
purchaser).

o Stock Market: Predicting the future price of a stock based

on historical data and market trends.

o Real Estate Pricing: Predicting the selling price of a

house based on features such as location, size, and
condition.
Core Ideas in Data Mining
Association Rules
Identify patterns between items in large datasets.
o For example, grocery stores can use these patterns to determine which
products are frequently bought together, such as bread and cheese, to
optimize product placement.

Recommendation Systems
Analyze individual user preferences and behaviors to make
personalized suggestions.
o Amazon and Netflix use these systems to recommend products or
shows based on past interactions.

Association rules reveal patterns for the entire population.

Recommendation systems provide personalized "what goes with
what" suggestions for each user.
Core Ideas in Data Mining
Data and Dimension Reduction
The performance of data mining algorithms is often improved
when the number of variables is limited, and when large numbers
of records can be grouped into homogeneous groups.

o Rather than dealing with thousands of product types, an

analyst may prefer to group them into a smaller number of
groups and build separate models for each group.

This process of consolidating a large number of records into a

smaller set is termed data reduction.

Methods for reducing the number of cases are often called

clustering.
Core Ideas in Data Mining
Data Exploration aims to understand the overall structure of the
data and identify unusual values.
o It is used for data cleaning and manipulation.

Methods for exploring data include looking at each variable

separately as well as looking at relationships among variables.

Data Visualization
Exploration by creating charts and dashboards.
Steps in Data Mining
1. Develop an understanding of the purpose of your project
2. Obtain the dataset to be used in the analysis
3. Explore, clean, and preprocess the data
4. Reduce the data dimension, if necessary
5. Determine the data mining task (e.g., classification, prediction)
6. Choose the data mining techniques to be used (regression, neural
nets etc.)
7. Use algorithms to perform the task
8. Interpret the results of the algorithms
9. Deploy the model
Dataset: WestRoxbury.csv

14 Variable/Columns

5803 Records/Rows
Preliminary Exploration in Python
loading data, viewing it, summary statistics

Open Anaconda-Navigator and launch a ‘jupyter’

notebook. It opens a new browser window.

Import Pandas library

import pandas as pd

Load data
housing_df = pd.read_csv('WestRoxbury.csv')
housing_df.shape #find number of rows & columns
housing_df.head() #show the 1st five rows
print(housing_df) #show all the data
Column Names
'TOTAL VALUE ‘ 'BEDROOMS ‘
'TAX’
'LOT SQFT ‘ 'FULL BATH',
'YR BUILT’ 'HALF BATH’
'GROSS AREA ' 'KITCHEN’
'LIVING AREA’ 'FIREPLACE’
'FLOORS ‘ 'REMODEL'
'ROOMS’
Data Exploration in Python, cont.
Rename columns: replace spaces with '_'
housing_df = housing_df.rename
(columns={'TOTAL VALUE ': 'TOTAL_VALUE'})

# explicit
housing_df.columns = [s.strip().replace(' ',
'_') for s in housing_df.columns] # all
columns
Show first four rows of the data
housing_df.loc[0:3] # inclusive
housing_df.iloc[0:4] # exclusive
Data Exploration in Python, cont.
Show the first 10 values in column
TOTAL_VALUE

housing_df['TOTAL_VALUE'].iloc[0:10]

Show the fifth row of the first 10 columns

housing_df.iloc[4][0:10]
housing_df.iloc[4, 0:10]
Data Exploration in Python, cont.
Use pd.concat to combine columns into a new
data frame.
pd.concat(
[
housing_df.iloc[4:6,0:2],
housing_df.iloc[4:6,4:6]],
axis=1)

Show the first 10 rows of the first column

housing_df['TOTAL_VALUE'][0:10]
Data Exploration in Python, cont.
# Descriptive statistics

# show length
print('Number of rows ‘)
print(len(housing_df['TOTAL_VALUE'])
of first column
# show mean
print('Mean of TOTAL_VALUE ‘)
print(housing_df['TOTAL_VALUE'].mean()
of column
Show summary statistics for each column
housing_df.describe()

(eBook PDF) Data Mining for Business Analytics: Concepts, Techniques, and Applications in R instant download
100% (1)
(eBook PDF) Data Mining for Business Analytics: Concepts, Techniques, and Applications in R instant download
51 pages
(Ebook PDF) Data Mining For Business Analytics: Concepts, Techniques, and Applications in R
No ratings yet
(Ebook PDF) Data Mining For Business Analytics: Concepts, Techniques, and Applications in R
41 pages
(Ebook PDF) Data Mining For Business Analytics: Concepts, Techniques, and Applications With JMP Pro Download PDF
100% (3)
(Ebook PDF) Data Mining For Business Analytics: Concepts, Techniques, and Applications With JMP Pro Download PDF
41 pages
Dmml Notes
No ratings yet
Dmml Notes
89 pages
Core Fundamentals Course Student Guide V 11
50% (2)
Core Fundamentals Course Student Guide V 11
156 pages
LECTURE NOTES ON DATA MINING and DATA WA
No ratings yet
LECTURE NOTES ON DATA MINING and DATA WA
84 pages
House Prices Prediction_final (1)
No ratings yet
House Prices Prediction_final (1)
24 pages
ITS665 REPORT
No ratings yet
ITS665 REPORT
45 pages
Basic Concepts Data Mining (Lecture 02) - 1
No ratings yet
Basic Concepts Data Mining (Lecture 02) - 1
40 pages
Unit-1 Introduction To Data Mining
No ratings yet
Unit-1 Introduction To Data Mining
33 pages
1.3 Tasks of Data Mining
No ratings yet
1.3 Tasks of Data Mining
10 pages
Data Mining Notes C2
No ratings yet
Data Mining Notes C2
12 pages
Dataminig ch1 30006
No ratings yet
Dataminig ch1 30006
4 pages
ModelQB - Part B&C-1
No ratings yet
ModelQB - Part B&C-1
51 pages
Lect 2
No ratings yet
Lect 2
35 pages
Data Mining-Unit-1
No ratings yet
Data Mining-Unit-1
21 pages
Lesson 3 - Machine Learning Workflow
No ratings yet
Lesson 3 - Machine Learning Workflow
53 pages
Data Mining
No ratings yet
Data Mining
14 pages
What Is Business Analytics?: Predictive Analytics Descriptive Analytics Prescriptive Analytics
No ratings yet
What Is Business Analytics?: Predictive Analytics Descriptive Analytics Prescriptive Analytics
35 pages
Down 2
No ratings yet
Down 2
61 pages
Data Warehousing and Data Mining Iv-Cse A: Prepared by
No ratings yet
Data Warehousing and Data Mining Iv-Cse A: Prepared by
5 pages
R21 DM Unit1
No ratings yet
R21 DM Unit1
77 pages
Data Accquisition
No ratings yet
Data Accquisition
6 pages
Archana Data Mining
No ratings yet
Archana Data Mining
24 pages
Introduction To Oracle: SQL Plus
100% (1)
Introduction To Oracle: SQL Plus
6 pages
Data Mining1
No ratings yet
Data Mining1
13 pages
Data Science unit I(LN and QB)
No ratings yet
Data Science unit I(LN and QB)
44 pages
1488
No ratings yet
1488
45 pages
unit 2
No ratings yet
unit 2
20 pages
Full Copy Dbms
No ratings yet
Full Copy Dbms
337 pages
Paper - Xvii Data Mining and Warehousing
No ratings yet
Paper - Xvii Data Mining and Warehousing
140 pages
Toronto
No ratings yet
Toronto
134 pages
Data Mining and Data Warehousing
No ratings yet
Data Mining and Data Warehousing
47 pages
Datamining Lect1
No ratings yet
Datamining Lect1
61 pages
U1_1
No ratings yet
U1_1
13 pages
Unit 3 Data Warehousing and Data Mining
No ratings yet
Unit 3 Data Warehousing and Data Mining
7 pages
DWDM REFERENCE NOTES
No ratings yet
DWDM REFERENCE NOTES
126 pages
Data Warehousing and Data Mining Dr.P.rizwan Ahmed
0% (1)
Data Warehousing and Data Mining Dr.P.rizwan Ahmed
20 pages
ware house server
No ratings yet
ware house server
89 pages
UNIT 4
No ratings yet
UNIT 4
42 pages
DWM Notes Class by Proff
No ratings yet
DWM Notes Class by Proff
88 pages
Unit I Notes
No ratings yet
Unit I Notes
23 pages
02-Data Mining Functionalities-2
No ratings yet
02-Data Mining Functionalities-2
23 pages
CS - Full SQL
No ratings yet
CS - Full SQL
60 pages
Guc 437 59 30794 2023-03-27T09 24 46
No ratings yet
Guc 437 59 30794 2023-03-27T09 24 46
54 pages
LECTURE 3-BDM 411 Data Analytics and BIG Data
No ratings yet
LECTURE 3-BDM 411 Data Analytics and BIG Data
49 pages
MySQL Questions Answers
No ratings yet
MySQL Questions Answers
18 pages
9 MidReview
No ratings yet
9 MidReview
25 pages
Module 4
No ratings yet
Module 4
54 pages
CBSE IP Practical File 2015 Java and MySQL
No ratings yet
CBSE IP Practical File 2015 Java and MySQL
42 pages
Unit 3
No ratings yet
Unit 3
34 pages
Unit-1 Notes (1)
No ratings yet
Unit-1 Notes (1)
24 pages
Data Mining New Notes Unit 3 PDF
No ratings yet
Data Mining New Notes Unit 3 PDF
12 pages
Rapid Miner
No ratings yet
Rapid Miner
33 pages
CSC 425 Data Mining and Warehousing 2024
No ratings yet
CSC 425 Data Mining and Warehousing 2024
54 pages
A Brief Overview On Data Mining Survey PDF
No ratings yet
A Brief Overview On Data Mining Survey PDF
8 pages
dw and dm notes (1)
No ratings yet
dw and dm notes (1)
89 pages
DM 1 PDF
No ratings yet
DM 1 PDF
67 pages
Pharmacy Informatics Notes
100% (2)
Pharmacy Informatics Notes
3 pages
The Joy Luck Club
0% (1)
The Joy Luck Club
1 page
02 Information Sources - Module
No ratings yet
02 Information Sources - Module
9 pages
BABoK V3.0
60% (5)
BABoK V3.0
10 pages
SRPDT Project Report Template(1)
No ratings yet
SRPDT Project Report Template(1)
21 pages
Data Mining - An Overview
No ratings yet
Data Mining - An Overview
40 pages
Data Mining-CH5
No ratings yet
Data Mining-CH5
49 pages
BigQueryTechnicalWP PDF
No ratings yet
BigQueryTechnicalWP PDF
12 pages
106 Unsupervised Learning - Association Rules
No ratings yet
106 Unsupervised Learning - Association Rules
13 pages
Information Quality Parameters
No ratings yet
Information Quality Parameters
9 pages
MAATrica A Measure For Assessing Consistency and Me - 2024 - European Journal o
No ratings yet
MAATrica A Measure For Assessing Consistency and Me - 2024 - European Journal o
13 pages
Data Mining Notes
No ratings yet
Data Mining Notes
25 pages
3 Relational Keys
No ratings yet
3 Relational Keys
19 pages
Warehousing : DPP 01
No ratings yet
Warehousing : DPP 01
3 pages
EPM 1173 - Day - 2-Unit - 2 - Excel-1
No ratings yet
EPM 1173 - Day - 2-Unit - 2 - Excel-1
16 pages
SQL 9
No ratings yet
SQL 9
41 pages
Unit I DATA MINING AAGAC
No ratings yet
Unit I DATA MINING AAGAC
27 pages
Chapter03 Assembly Part1
No ratings yet
Chapter03 Assembly Part1
8 pages
Chatbot Song Recommender System: Chapter 1: Introduction
No ratings yet
Chatbot Song Recommender System: Chapter 1: Introduction
15 pages
W1L1 - Course Outline
No ratings yet
W1L1 - Course Outline
6 pages
2016 Book PrinciplesOfDataMining PDF
100% (3)
2016 Book PrinciplesOfDataMining PDF
530 pages
What Is Data Mining?
No ratings yet
What Is Data Mining?
17 pages
Azure Data Engineer: Venkata Krishna Rao Gundapu
No ratings yet
Azure Data Engineer: Venkata Krishna Rao Gundapu
2 pages
Review On Data Analytics
No ratings yet
Review On Data Analytics
3 pages
Case Study DWH
No ratings yet
Case Study DWH
10 pages
Data Warehousing&Dat Mining
No ratings yet
Data Warehousing&Dat Mining
12 pages
Resume-Senior Data Engineer-Etihad Airways-Kashish Suri
No ratings yet
Resume-Senior Data Engineer-Etihad Airways-Kashish Suri
4 pages
Data Mining AND Warehousing: Abstract
No ratings yet
Data Mining AND Warehousing: Abstract
12 pages
Describe The Shared Responsibility Model
No ratings yet
Describe The Shared Responsibility Model
4 pages
DataWarehouseMining Complete Notes
No ratings yet
DataWarehouseMining Complete Notes
55 pages
Machine Learning - A Complete Exploration of Highly Advanced Machine Learning Concepts, Best Practices and Techniques: 4
From Everand
Machine Learning - A Complete Exploration of Highly Advanced Machine Learning Concepts, Best Practices and Techniques: 4
Peter Bradley
No ratings yet
Data Mining: Fundamentals and Applications
From Everand
Data Mining: Fundamentals and Applications
Fouad Sabry
No ratings yet

Chapter 02 Overview (Python)

Uploaded by

Chapter 02 Overview (Python)

Uploaded by

MIS 541

It is a common task in data mining is to examine data where the

o Email Filtering: Classifying an incoming email as

o Medical Diagnosis: Classifying a patient’s condition as

It predicts the value of a numerical variable (e.g., purchase

o Stock Market: Predicting the future price of a stock based

o Real Estate Pricing: Predicting the selling price of a

Association rules reveal patterns for the entire population.

o Rather than dealing with thousands of product types, an

This process of consolidating a large number of records into a

Methods for reducing the number of cases are often called

Methods for exploring data include looking at each variable

Open Anaconda-Navigator and launch a ‘jupyter’

Import Pandas library

Show the fifth row of the first 10 columns

Show the first 10 rows of the first column

You might also like