CS 548 KNOWLEDGE DISCOVERY
AND DATA MINING
Fall 2016 - Project 1
By:
Yousef Fadila
ML Tlachac
Francisco Guerrero
Filling in the missing value
Discretize: ? = “unknown”
Manually filling in the data:
? = Germany GDPPC + Switzerland GDPPC) = 31.35
Regression imputation:
GDPPC = 2.1069 * LIFE-EXP + 0.1911 * AC-S-ED + -40.4882 * (SWL= [175-200),[125-150),[200-225),
[225-250),[250-275)) -16.6881 *(SWL=[200-225),[225-250),[250-275)) - 100.3841.
GDPPC (USA) = 2.1069 * 77.4 + 0.1911 * 94.6 -40.4882 *1 -16.6881 * 1 - 100.3841 = 23.59
Transforming COUNTRY attribute
COUNTRY HDI score COUNTRY HDI score
Ethiopia LOW Switzerland VERY-HIGH
India MEDIUM Germany VERY-HIGH
Mexico HIGH Japan VERY-HIGH
Thailand HIGH Canada VERY-HIGH
Russia HIGH Brazil HIGH
USA VERY-HIGH France VERY-HIGH
Discretizing AC-S-ED
Equal width
Equal frequency
CfsSubsetEval algorithm
Merit
The CfsSubsetEval formula used to calculate merit is
∑corr(aj,t)/√((∑σ(aj)2)+2corr(aj1,aj2)∏σ(aj))
where t is the target attribute (play), and aj are the selected attributes (outlook &
humidity).
=(corr(outlook,play) + corr(humidity,play))/√(12+12 + 2corr(humidity,outlook)(1)(1))
= (0.1960 + 0.1565)/√(1+1+2 (0.01610)) = 0.3525/√(2.032202) = 0.2473
Observing the Data
Correlation Matrix
Remove: numbUrban & medFamIncome
Multidimensional arrays and OLAP operations
Operations:
1. Roll-up time from day to year
2. Slice year == 2014
3. Roll-up patients from individual patients to all
OLAP operations on car’s sales data
1. Rolling-up
1. Drilling-down
1. Slicing
1. Dicing
Thank You
Questions
?

More Related Content

DOCX
Quote of the day
PPT
Sunrise and sunset[1]
PDF
Functions
PDF
Am i overpaying - business proposal
PPTX
co-Hadoop: Data co-location on Hadoop.
PPTX
Vanessa Filgueira
DOCX
Kannan PEM
PDF
Turismo presentacion
Quote of the day
Sunrise and sunset[1]
Functions
Am i overpaying - business proposal
co-Hadoop: Data co-location on Hadoop.
Vanessa Filgueira
Kannan PEM
Turismo presentacion

Viewers also liked (20)

PPT
Examine the Evidence
DOCX
Egalité des territoires et coopération
PPT
Innovative thinking التفكير الابداعي
PPTX
Anomaly Detection - Catch me if you can
DOCX
Inec destaca la importante reducción de desempleo en ecuador
PDF
Fundacion S.O.S. Informe
PPTX
Normas de seguridad e higiene para el equipo
PPTX
Presentación1
PPTX
Karina maldonado sandra salazar
DOCX
Listado Moneda de la Cuenta
PPS
La escritura en_educ_inicial
PPTX
Ley organica de educacion intercultural
PDF
Comunicación audiovisual robert capa
PPT
Sistemaurinario
PDF
Investigación: Periodismo de marca en Colombia (Coloquio #1)
PDF
COMPARISON OF OLSR AND ENERGY CONSERVED OLSR
PPTX
Lição 11 A Páscoa
PPTX
Arctic Monkeys Analysis
PDF
Amélie
PPTX
Big data ppt
Examine the Evidence
Egalité des territoires et coopération
Innovative thinking التفكير الابداعي
Anomaly Detection - Catch me if you can
Inec destaca la importante reducción de desempleo en ecuador
Fundacion S.O.S. Informe
Normas de seguridad e higiene para el equipo
Presentación1
Karina maldonado sandra salazar
Listado Moneda de la Cuenta
La escritura en_educ_inicial
Ley organica de educacion intercultural
Comunicación audiovisual robert capa
Sistemaurinario
Investigación: Periodismo de marca en Colombia (Coloquio #1)
COMPARISON OF OLSR AND ENERGY CONSERVED OLSR
Lição 11 A Páscoa
Arctic Monkeys Analysis
Amélie
Big data ppt
Ad

More from Yousef Fadila (9)

PPTX
Trackster Pruning at the CMS High-Granularity Calorimeter
PDF
Synergy on the Blockchain! whitepaper
PDF
Synergy Platform Whitepaper alpha
PDF
Recommandation systems -
PPTX
Analysis on steam platform
PPTX
interactive voting based map matching algorithm
PPTX
Spot deceptive TripAdvisor Reviews
PPTX
Textual & Sentiment Analysis of Movie Reviews
PPTX
Tweeting for Hillary - DS 501 case study 1
Trackster Pruning at the CMS High-Granularity Calorimeter
Synergy on the Blockchain! whitepaper
Synergy Platform Whitepaper alpha
Recommandation systems -
Analysis on steam platform
interactive voting based map matching algorithm
Spot deceptive TripAdvisor Reviews
Textual & Sentiment Analysis of Movie Reviews
Tweeting for Hillary - DS 501 case study 1
Ad

Recently uploaded (20)

PPT
BME 301 Lecture Note 1_2.ppt mata kuliah Instrumentasi
PDF
Hikvision-IR-PPT---EN.pdfSADASDASSAAAAAAAAAAAAAAA
PPT
What is life? We never know the answer exactly
PDF
Nucleic-Acids_-Structure-Typ...-1.pdf 011
PDF
NU-MEP-Standards معايير تصميم جامعية .pdf
PPTX
1.Introduction to orthodonti hhhgghhcs.pptx
PPTX
transformers as a tool for understanding advance algorithms in deep learning
PPTX
cyber row.pptx for cyber proffesionals and hackers
PDF
Teal Blue Futuristic Metaverse Presentation.pdf
PPTX
Reinforcement learning in artificial intelligence and deep learning
PPTX
Bussiness Plan S Group of college 2020-23 Final
PDF
Q1-wK1-Human-and-Cultural-Variation-sy-2024-2025-Copy-1.pdf
PDF
toaz.info-grade-11-2nd-quarter-earth-and-life-science-pr_5360bfd5a497b75f7ae4...
PPTX
Sheep Seg. Marketing Plan_C2 2025 (1).pptx
PDF
2025-08 San Francisco FinOps Meetup: Tiering, Intelligently.
PDF
Grey Minimalist Professional Project Presentation (1).pdf
PPTX
Capstone Presentation a.pptx on data sci
PPTX
inbound2857676998455010149.pptxmmmmmmmmm
PDF
Introduction to Database Systems Lec # 1
PDF
9 FinOps Tools That Simplify Cloud Cost Reporting.pdf
BME 301 Lecture Note 1_2.ppt mata kuliah Instrumentasi
Hikvision-IR-PPT---EN.pdfSADASDASSAAAAAAAAAAAAAAA
What is life? We never know the answer exactly
Nucleic-Acids_-Structure-Typ...-1.pdf 011
NU-MEP-Standards معايير تصميم جامعية .pdf
1.Introduction to orthodonti hhhgghhcs.pptx
transformers as a tool for understanding advance algorithms in deep learning
cyber row.pptx for cyber proffesionals and hackers
Teal Blue Futuristic Metaverse Presentation.pdf
Reinforcement learning in artificial intelligence and deep learning
Bussiness Plan S Group of college 2020-23 Final
Q1-wK1-Human-and-Cultural-Variation-sy-2024-2025-Copy-1.pdf
toaz.info-grade-11-2nd-quarter-earth-and-life-science-pr_5360bfd5a497b75f7ae4...
Sheep Seg. Marketing Plan_C2 2025 (1).pptx
2025-08 San Francisco FinOps Meetup: Tiering, Intelligently.
Grey Minimalist Professional Project Presentation (1).pdf
Capstone Presentation a.pptx on data sci
inbound2857676998455010149.pptxmmmmmmmmm
Introduction to Database Systems Lec # 1
9 FinOps Tools That Simplify Cloud Cost Reporting.pdf

CS 548 KNOWLEDGE DISCOVERY AND DATA MINING Project 1

  • 1. CS 548 KNOWLEDGE DISCOVERY AND DATA MINING Fall 2016 - Project 1 By: Yousef Fadila ML Tlachac Francisco Guerrero
  • 2. Filling in the missing value Discretize: ? = “unknown” Manually filling in the data: ? = Germany GDPPC + Switzerland GDPPC) = 31.35 Regression imputation: GDPPC = 2.1069 * LIFE-EXP + 0.1911 * AC-S-ED + -40.4882 * (SWL= [175-200),[125-150),[200-225), [225-250),[250-275)) -16.6881 *(SWL=[200-225),[225-250),[250-275)) - 100.3841. GDPPC (USA) = 2.1069 * 77.4 + 0.1911 * 94.6 -40.4882 *1 -16.6881 * 1 - 100.3841 = 23.59
  • 3. Transforming COUNTRY attribute COUNTRY HDI score COUNTRY HDI score Ethiopia LOW Switzerland VERY-HIGH India MEDIUM Germany VERY-HIGH Mexico HIGH Japan VERY-HIGH Thailand HIGH Canada VERY-HIGH Russia HIGH Brazil HIGH USA VERY-HIGH France VERY-HIGH
  • 6. Merit The CfsSubsetEval formula used to calculate merit is ∑corr(aj,t)/√((∑σ(aj)2)+2corr(aj1,aj2)∏σ(aj)) where t is the target attribute (play), and aj are the selected attributes (outlook & humidity). =(corr(outlook,play) + corr(humidity,play))/√(12+12 + 2corr(humidity,outlook)(1)(1)) = (0.1960 + 0.1565)/√(1+1+2 (0.01610)) = 0.3525/√(2.032202) = 0.2473
  • 9. Multidimensional arrays and OLAP operations Operations: 1. Roll-up time from day to year 2. Slice year == 2014 3. Roll-up patients from individual patients to all
  • 10. OLAP operations on car’s sales data 1. Rolling-up 1. Drilling-down 1. Slicing 1. Dicing