0% found this document useful (0 votes)
10 views16 pages

Chapter 02 Overview (Python)

Uploaded by

ps4yass3r
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views16 pages

Chapter 02 Overview (Python)

Uploaded by

ps4yass3r
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 16

MIS 541

Introduction to Business
Analytics
Rasha Alahmad
Chapter (2) : Overview of
the Data Mining Process
Core Ideas in Data Mining

o Classification
o Prediction
o Association Rules & Recommenders
o Data and Dimension Reduction
o Data Exploration and Visualization
Core Ideas in Data Mining
Classification

It is a common task in data mining is to examine data where the


classification is unknown with the goal of predicting what that
classification is or will be.

o Email Filtering: Classifying an incoming email as


"spam" or "not spam" based on its content and sender
information.

o Medical Diagnosis: Classifying a patient’s condition as


"healthy," "at risk," or "ill" based on their symptoms and
test results.
Core Ideas in Data Mining
Prediction

It predicts the value of a numerical variable (e.g., purchase


amount) rather than a category (e.g., purchaser or non-
purchaser).

o Stock Market: Predicting the future price of a stock based


on historical data and market trends.

o Real Estate Pricing: Predicting the selling price of a


house based on features such as location, size, and
condition.
Core Ideas in Data Mining
Association Rules
Identify patterns between items in large datasets.
o For example, grocery stores can use these patterns to determine which
products are frequently bought together, such as bread and cheese, to
optimize product placement.

Recommendation Systems
Analyze individual user preferences and behaviors to make
personalized suggestions.
o Amazon and Netflix use these systems to recommend products or
shows based on past interactions.

Association rules reveal patterns for the entire population.


Recommendation systems provide personalized "what goes with
what" suggestions for each user.
Core Ideas in Data Mining
Data and Dimension Reduction
The performance of data mining algorithms is often improved
when the number of variables is limited, and when large numbers
of records can be grouped into homogeneous groups.

o Rather than dealing with thousands of product types, an


analyst may prefer to group them into a smaller number of
groups and build separate models for each group.

This process of consolidating a large number of records into a


smaller set is termed data reduction.

Methods for reducing the number of cases are often called


clustering.
Core Ideas in Data Mining
Data Exploration aims to understand the overall structure of the
data and identify unusual values.
o It is used for data cleaning and manipulation.

Methods for exploring data include looking at each variable


separately as well as looking at relationships among variables.

Data Visualization
Exploration by creating charts and dashboards.
Steps in Data Mining
1. Develop an understanding of the purpose of your project
2. Obtain the dataset to be used in the analysis
3. Explore, clean, and preprocess the data
4. Reduce the data dimension, if necessary
5. Determine the data mining task (e.g., classification, prediction)
6. Choose the data mining techniques to be used (regression, neural
nets etc.)
7. Use algorithms to perform the task
8. Interpret the results of the algorithms
9. Deploy the model
Dataset: WestRoxbury.csv

14 Variable/Columns

5803 Records/Rows
Preliminary Exploration in Python
loading data, viewing it, summary statistics

Open Anaconda-Navigator and launch a ‘jupyter’


notebook. It opens a new browser window.

Import Pandas library


import pandas as pd

Load data
housing_df = pd.read_csv('WestRoxbury.csv')
housing_df.shape #find number of rows & columns
housing_df.head() #show the 1st five rows
print(housing_df) #show all the data
Column Names
'TOTAL VALUE ‘ 'BEDROOMS ‘
'TAX’
'LOT SQFT ‘ 'FULL BATH',
'YR BUILT’ 'HALF BATH’
'GROSS AREA ' 'KITCHEN’
'LIVING AREA’ 'FIREPLACE’
'FLOORS ‘ 'REMODEL'
'ROOMS’
Data Exploration in Python, cont.
Rename columns: replace spaces with '_'
housing_df = housing_df.rename
(columns={'TOTAL VALUE ': 'TOTAL_VALUE'})

# explicit
housing_df.columns = [s.strip().replace(' ',
'_') for s in housing_df.columns] # all
columns
Show first four rows of the data
housing_df.loc[0:3] # inclusive
housing_df.iloc[0:4] # exclusive
Data Exploration in Python, cont.
Show the first 10 values in column
TOTAL_VALUE

housing_df['TOTAL_VALUE'].iloc[0:10]

Show the fifth row of the first 10 columns


housing_df.iloc[4][0:10]
housing_df.iloc[4, 0:10]
Data Exploration in Python, cont.
Use pd.concat to combine columns into a new
data frame.
pd.concat(
[
housing_df.iloc[4:6,0:2],
housing_df.iloc[4:6,4:6]],
axis=1)

Show the first 10 rows of the first column


housing_df['TOTAL_VALUE'][0:10]
Data Exploration in Python, cont.
# Descriptive statistics

# show length
print('Number of rows ‘)
print(len(housing_df['TOTAL_VALUE'])
of first column
# show mean
print('Mean of TOTAL_VALUE ‘)
print(housing_df['TOTAL_VALUE'].mean()
of column
Show summary statistics for each column
housing_df.describe()

You might also like