P3 - Data, Preprocessing, Informasi, & Analisis
P3 - Data, Preprocessing, Informasi, & Analisis
Pertemuan 3
Just One Word: Data and Big Data
“I just want to say one word to you. Just one word… Are you listening? … Plastics.
There’s a great future in plastics.”
Mr. McGuire in the 1967 movie The Graduate
• No longer plastic
• But, data
• Data is the key, the ticket, and the Holy Grail all rolled into one
• Sekumpulan fakta
• Angka
• Teks
• Gambar
• Suara
• Berdasarkan Sifat:
• Kualitatif
• Kuantitatif
• Skala Pengukuran
• Nominal
• Ordinal
• Rasio
• Interval
• Sumber
• Primer
• Sekunder
• Not only volume is increasing, but also variety and the velocity
of data are increasing
• Volume
• Velocity
• Variety
• Structured
• Semi-structured
• Unstructured
Operational data typically contains a relatively short Analytical data is historical. A business needs to
time span perform period-over-period analysis or examine
trending using
historical data.
• Data Cleaning
• Missing values can do this: Ignore the tuple, fill the
missing value manually, use global constant to fill,
using mean to fill, use mean for category, use most
probable value (regression, Bayesian, decision tree)
• Noisy Data, using smoothing techniques such as
binning (mean, median, boundaries), regression,
clustering.
• Data Integration
• Data Transformation
• Smoothing, Aggregation, Generalization,
Normalization, Attribute Construction
• Data Reduction
• Data Cube Aggregation, Attribute subset selection,
Dimensionality Reduction, Numerosity reduction,
discretization and hierarchy generation