0% found this document useful (0 votes)

17 views88 pages

Ann

Uploaded by

bassam.w.b.97

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views88 pages

Ann

Uploaded by

bassam.w.b.97

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 88

1

Hydroinformatics
• Synergetic use of modelling tools and ICT within single
methodological approach dealing with physical, social and
economic aspects of sustainable water resources Eng.
Hydrology
and
Computer
hydraulic
Environmental
engineering Modeling
Satellite imagery
Hydroinformatics
Hydroinformatics
ANN
Radar Management and ICT
Instrumentation

2
Branches of Hydroinformatics
• Big Data Management (Gathering, processing, Transferring , archive)
• Computational Hydraulic (Classic numerical method, FDM,FEM,BEM…)
• Remote Sensing (RS) and Geographic Information System (GIS)
• Information communication (via internet) See Next
• Soft Computing , Computational Intelligence Slide

3
Soft Computing
• Unlike hard computing schemes, which strive for exactness and full
truth, soft computing techniques exploit the given tolerance of
imprecision, partial truth, uncertainty and approximation for a
particular problem.
• Inductive reasoning plays a larger role in soft computing than in hard
computing. In effect, the role model for soft computing is the human
mind.

4
Components of Soft Computing
• Machine Learning & Artificial Intelligence Methods ; e.g. Artificial Neural
Networks (ANNs) , Support Vector Machine (SVM)…
• Evolutionary & Metaheuristic algorithms; e.g. Genetic Algorithm (GA) , Ant
Colony Optimization
• Fuzzy Logic (FL)
• Hybrid Methods, e.g., ANFIS , Genetic Programming
Natural-inspired
Methods

5
Data

Discrete Continuous

Binary

Nominal

Symmetrical Asymmetrical

6
Data mining continued

Knowledge tree:
1.Data Cleaning
2.Data Integration
3.Data Selection
4.Data Transformation
.Data mining
5.Pattern Evaluation
6.Knowledge Presentation

7
What is Data mining
Data mining is the process of analyzing data from different
perspectives and summarizing it into useful information

8
Data processing

What is Data Processing

The collection and manipulation of items
of data to produce meaningful information
Data
Processing

Post- Pre-
Processing Processing

9
Data processing continued

Data Preprocessing
• Why preprocess the data?
• Data cleaning
• Data integration and transformation
• Data reduction

10
Data processing continued

Why Data Preprocessing?

• Data in the real world is dirty
– Incomplete: lacking attribute values, lacking certain attributes of
interest, or containing only aggregate data
– Noisy: containing errors or outliers
– Inconsistent: containing discrepancies in codes or names
• No quality data, no quality mining results!
– Quality decisions must be based on quality data
– Data warehouse needs consistent integration of quality data

11
Data processing continued

Data Preprocessing
• Why preprocess the data?
• Data cleaning
• Data integration and transformation
• Data reduction

12
Data processing continued

Data Cleaning
Data cleaning tasks
– Fill in missing values
– Identify outliers and smooth out noisy data
– Correct inconsistent data

13
Data processing continued

Missing Data
• Data is not always available
– E.g., many tuples have no recorded value for several attributes, such
as customer income in sales data
• Missing data may be due to
– Equipment malfunction
– Inconsistent with other recorded data and thus deleted
– Data not entered due to misunderstanding
– Certain data may not be considered important at the time of entry
– Not register history or changes of the data
• Missing data may need to be inferred.

14
Data processing continued

How to Handle Noisy Data?

• Binning method:
– first sort data and partition into (equi- depth) bins
– Then smooth by bin means, smooth by bin median,
smooth by bin boundaries, etc.
• Clustering
– Detect and remove outliers
• Combined computer and human inspection
– Detect suspicious values and check by human
• Regression
– Smooth by fitting the data into regression functions

15
Data processing continued

Data Preprocessing
• Why preprocess the data?
• Data cleaning
• Data integration and transformation
• Data reduction

16
Data processing continued

Data Integration
• Data integration:
– Combines data from multiple sources into a coherent store
• Schema integration:
– Integrate metadata from different sources
• Detecting and resolving data value conflicts:
– For the same real world entity, attribute values from different
sources are different
– Possible reasons: different representations, different scales, e.g.,
metric vs. British units

17
Data processing continued

Data Transformation
• Smoothing: remove noise from data
• Aggregation: summarization, data cube construction
• Generalization: concept hierarchy climbing
• Normalization: scaled to fall within a small, specified range
– min-max normalization X  X min
X ( normal )  [ a  b ] b
– z-score normalization X max  X min
– normalization by decimal scaling

xt  1
 When we have negative data or zero:
X T  T( x t ) 


18
Data processing continued

Data Preprocessing

• Why preprocess the data?

• Data cleaning
• Data integration and transformation
• Data reduction

19
Data processing continued

Data Reduction Strategies

• Warehouse may store terabytes of data: Complex data
analysis/mining may take a very long time to run on the
complete data set
• Data reduction
– Obtains a reduced representation of the data set that is much
smaller in volume but yet produces the same (or almost the
same) analytical results
• Data reduction strategies
– Data cube aggregation
– Dimensionality reduction
– Numerosity reduction
– Discretization and concept hierarchy generation
20
Probability and Statistics

Probability, Statistics, and Decisions for Civil Engineers

1970,by Benjamin, J R, Cornell, C A
APPLIED STATISTICS FOR ENGINEERS CIVIL AND
ENVIRONMENTAL by Nathabandu T. Kottegoda
21
Probability and Statistics

Why we should learn Statistics and Probability?

From satellites continuously orbiting the globe to common social network sites,
data are being collected everywhere and all the time. Knowledge in statistics
provides you with the necessary tools to extract information intelligently from
this sea of data.
What is a MODEL and its types?

Deterministic
Process Physical
Formal
(Mathematical)
Stochastic
Black-
Conceptual White
Probability Box
Statistics -Box
e.g.,
ANN
22
Probability and Statistics continued

What is probability
• The quality or state of being probable; the extent to which
something is likely to happen or be the case; measured by the ratio
of the favourable cases to the whole number of cases possible.
Number of successful outcomes
P ( A) 
Number of possible outcomes

23
Probability and Statistics continued

Statistical Parameters
• Arithmetic Mean: The average of a set of numerical values, as calculated by adding them together and
dividing by the number of terms in the set.
• Weighted Arithmetic Mean: Is similar to an ordinary arithmetic mean (the most common type of average),
except that instead of each of the data points contributing equally to the final average, some data points
contribute more than others.
• Median: Is the value separating the higher half of a data
• Mode: The number which appears most often in a set of numbers.
• Variance: Is the expectation of the squared deviation of a random variable from its mean
• Standard Deviation: Is a measure that is used to quantify the amount of variation or dispersion of a set of
data values
• Coefficient of Variation: Is a standardized measure of dispersion of a probability distribution or frequency
distribution
• Skewness: Is a measure of the asymmetry of the probability distribution of a real-valued random variable
about its mean
• Covariance: Is a measure of the joint variability of two random variables
• Correlation Coefficient: Is a number that quantifies a type of correlation and dependence, meaning
statistical relationships between two or more values in fundamental statistics

24
Probability and Statistics continued

• Arithmetic Mean:

• Weighted Arithmetic Mean

• Covariance : cov( x, y )   x , y 
 ( x  x)( y  y )
i i

25
Probability and Statistics continued
n
1
• Variance:  2   ( xi  x) 2
n i 1
• Standard Deviation:
Hey!remember this  a 
M   ( x  a ) r f ( x ) dx
r 

• Coefficient of Variation:   2

• Skewness: 1

n
 ( xi  x) 3

• Correlation Coefficient :

26
Probability and Statistics continued

27
Probability and Statistics continued

Probability density function

• PDF is a function, whose value at any given sample (or point) in
the sample space (the set of possible values taken by the random
variable) can be interpreted as providing a relative likelihood that
the value of the random variable would equal that sample

• Some Properties of probability density function:

1. f(x)≥0 for any x

2. Total area under f(x) is 1


 0
f ( x ) dx  1
28
Probability and Statistics continued

Cumulative Distribution Function

• A function whose value is the probability that a corresponding
continuous random variable has a value less than or equal to the
argument of the function
• In other words, the cumulative distribution function F(x) is given by
the shaded area.
f(x) F ( x)  P( X  x)
P(a  X  b)  F (b)  F (a)
b
 a
f (u ) du

F(x)=P(X≤x)

x
29
30
Probability and Statistics continued

Moments in statistics
In our field
First moment of area is commonly used to determine the centroid of an area (r=1)
Second moment of area is also called Variance (r=2)
Third moment of area is also called Skewness (r=3)
Fourth moment of area is also called kurtosis (r=4)


  r
( x a ) f ( x)dx

a 
M r 

 f ( x)dx


  ( x  a ) f ( x)dx
a r
M r
0
31
Month Q(m3/s)
Probability and Statistics continued
1 * 1.5
Example 2 #4
3 @ 2.5
4 * 1.5
5 @ 2.5
6 #4
7 * 1.5

x
 x 27.5
  2.29
8
9
!2
@ 2.5
n 12 10 * 1.5
11 @ 2.5
12 * 1.5
 𝑥=27.5
5 1 4 2
x   xf ( x)  (12 1.5)  (12  2)  (12  2.5)  (12  4)  2.29

32
Probability and Statistics continued

Correlation coefficient
• Let X and Y be jointly distributed random variables. The
correlation between X and Y is
Cov ( X , Y )
  Corr ( X , Y ) 
 XY
-Measures the relative strength of the linear relationship
between two variables
-Unit-less
-Ranges between –1 and 1
-The closer to –1, the stronger the negative linear relationship
-The closer to 1, the stronger the positive linear relationship
-The closer to 0, the weaker any positive linear relationship

33
Probability and Statistics continued

Scatter Plots of Data with Various Correlation Coefficients

Qhat  computed Q
Q-hat Q-hat
Q-hat Q  measured Q

Q
Q Q
Q-hat
Q-hat Q-hat

Q
Q Q
 ( Qi Qˆi )
2

According to the last plot we cannot extrapolate correlation coefficient then  DC  1 

 ( Qi Q )
2

34
Probability and Statistics continued

-Normalization : Database normalization, or simply normalization,

is the process of organizing the columns (attributes) and tables
(relations) of a relational database to reduce data redundancy and
improve data integrity. A different approach to normalization of
probability distributions is quantile normalization, where
the quantiles of the different measures are brought into alignment

X  X min
X ( normal )  [ a  b] b
X max  X min

-Standardization : A standardized variable (sometimes called a z-

score or a standard score) is a variable that has been rescaled to
have a mean of zero and a standard deviation of one.
X X
z

35
Densities associated with multiple variables

n m
1)   P ( x , y )  1 9 ) f( x )   f ( x , y ) dy
i j 
i 1 j 1
x
k l 10 ) F( x )   f ( u ) du
2 ) F ( xk , yl )    P ( xi , y j )

i 1 j 1
m   s
3) P ( x i )   P ( xi , y j ) 11)  ( r , s )    x r y f( x , y ) dydx
j 1  
n  
4 ) P ( y j )   P ( xi , y j )
i 1 12 ) ( x , y )    ( x   x )( y   y ) f( x , y ) dydx  cov( x , y )
 
k m  xy
5 ) F ( x k )    P( x , y ) 13)  
i 1 j 1 i j  x y

6) F ( yl )  
n l
 P( x , y ) 14 ) DC  1 
 ( Qi Qˆ i ) 2
i 1 j 1 i j
 ( Qi Qi ) 2
 
7 )   f ( x , y ) dxdy  1
 

x y
8 ) F( x , y )    f ( u , v ) dudv
 
36
Probability and Statistics continued

Example
P( x 1.5, y 10 )  0.1

p( x  0.5 )  0.05  0.05  0.1

p( y 15 )  0.05  0.15  0.05  0.25

x   xf ( x )  ( 0.50.1)  (10.55 )  (1.50.4 )  ( 20.15 ) 1.3

y   yf ( y )  ( 50.2 )  (100.35 )  (150.25 )  ( 200.2 ) 12.25

2 2 2 2 2 2
S x   ( x  x ) f( x )  ( 0.5  1.3)  0.1  (1  1.3)  0.35  (1.5  1.3)  0.4  ( 2  1.3)  0.15  0.186

2 2 2 2 2 2
S y   ( y  y ) f( y )  ( 5  12.25 )  0.2  (10  12.25 )  0.35  (15  12.25 )  0.25  ( 20  12.25 )  0.2  26.189

 xy   ( x  x )( y  y ) f( x , y )  ( 0.5  1.3)( 5  12.25 ) 0.05  ( 0.5  1.3)(10  12.25 ) 0.05  (1  1.3)( 5  12.25 ) 0.1  (1  1.3)(10  12.25 ) 0.2  ...  1.45
37
Probability and Statistics continued

Linear regression
• In regression, one variable is considered independent (=predictor)
variable (X) and the other the dependent (=outcome) variable Y.
• Estimating the intercept and slope: least squares estimation:
If y    1 x1   2 x2
ŷ   xi  
1
  (  yi    xi )
 y  x    x    x1 1 2 2

 yx    x    x    x x
nn 2
n  xi yi   xi  yi 1 1 1 1 2 1 2
  i 1

 yx    x    x x    x
n  xi2  (  xi2 ) 2
2 2 1 1 2 2 1

38
Probability and Statistics continued

Example
• 1-According to studies on different basins, mean outflow discharge of each sub-basin (Q)
is related to its area (A) and number of rainy days (N) of it in each year:
• 10𝑄 = 𝛼𝐴𝛽1 𝑁𝛽2
• Using logarithm, a linear equation is resulted: x1  M  1 x2   2 x3
• 1 + 𝑙𝑜𝑔𝑄 = 𝑙𝑜𝑔𝛼 + 𝛽1 𝑙𝑜𝑔𝐴 + 𝛽2 𝑙𝑜𝑔𝑁
• If x1  1  log Q , x2  log A , x3  log N , M  log  determine correlation
• coefficient of M , 1 ,  2 via the data of various sub-basins that is given in table 1 using
linear regression.
• 2-Maximum instantaneous discharge of a river from 1926 to 1951 is given in table 2. First
of all, determine average, variance, skewness, PDF and CDF of this time-series. Secondly,
fit probability distributions of exponential, normal and Pearson and find out which one is
the better fit after which calculate the flood discharge during the next 5, 10 and 50 years.

39
Some common commands of MATLAB:
• +*/-\^% pi X(n,m),Size(X)
• Help,doc.f(x) D=det(X)
• Format long -Format short X=A.*B
• Factorial(x) sum(X)
• Sqrt(x) Who,length(X),Max(X),Min(X)
• Sin,cos,tan,cot,asin,acos,atan,acot Mean(X)
• Exp(x) Geomean(X)
• X= [1 2 3;4 5 6] Median(X)
• X=[1:10] Skewness(X)
• X= zeros(n,m) Mode(X)
• X= ones(n,m) Var(X)
• X=linspace(a,b,n) Std(X)
• X=rand(n,m) , X=randn(n,m) kurtosis(X)
• X=normrnd(M,S,m,n) corrcoef(X,Y)
• X= R’ regress(Y,X)
• Y=reshape(X,n,m) sort(X)
• X=eye(n) numel(X)
40
Artificial Neural Networks (ANN)

Contents
 Introduction
 History of Artificial Neural Networks (ANN)
 Overview of ANN
 Application of ANN

41
Introduction
 Artificial Neural Network (ANN) or Neural Network (NN) has provided an
exciting alternative method for solving a variety of problems in different
fields of science and engineering.
 This presentation tries to cover the subjects below:
 The whole idea about ANN
 The origin of ANN
 Mathematical concepts of ANN
 Outline some applications of ANN in water resources engineering

42
Origin of ANN
 Human brain has many incredible characteristics such as massive
parallelism, distributed representation and computation, learning
ability, generalization ability, adaptivity, which seems simple but is
really complicated
 It has been a long dream of computer scientists to build a
computer that can solve complex perceptual problems this fast.
 ANN models were result of such efforts to apply the same method
as the human brain uses.

43
What are Neural
Networks?
In machine learning and cognitive science:

 Models inspired by biological neural networks

 Estimating unknown functions using large

number of inputs

44
What is a model?

45
Mathematical Bernoulli equation,
Continuity equation

Model

Physical
A surcharge
modeled in
laboratory

46
Models
Distributed based on
(White Box)
physics
Mathematical

Conceptual

Model
Lumped Linear
(Black Box) Regression

Physical

47
As a mathematical
model:
 Non-linear regression
 A black-box model

48
Machine  An AI with ability to learn implicitly

Learning  Changes when exposed to new dataset

 Searches through data to find a pattern

49
Basis for learning in the brain

• Neural networks exhibit plasticity.

• Long-term changes in the strength of their

connections in response to the simulation
pattern

• Capable of forming new connections

with other neurons

50
Learning
What does it mean?
What is it’s source?
How does this process happen?

Learning Learning
Paradigm Algorithm
51
Learning
Paradigm
 Supervised Learning

• The correct answer is provided for the network for Feed Forward
every input pattern. Neural Network
(FFNN)
• Weights are adjusted regarding the correct answer.

 Unsupervised Learning
• Does not need the correct output.
• The system itself recognize the correlation and Clustering
organize patterns into categories accordingly.

52
Learning
Algorithm
 Error correction rules

 Boltzmann

 Hebbian

 Competitive learning

53
Data input

Structure of a neuron

Data Processor

Transferring Joint point of the

input signal to neurons
output

54
History of artificial neural
networks (Ann)
1943 McCulloch and Pitts Simple artificial model of the Threshold Logic
neuron

x1 W1

Y
x2 W2
Neuron Y

xn Wn
Threshold
b
55
History of artificial neural
networks (Ann)
1943 McCulloch and Pitts Simple artificial model of the Threshold Logic
neuron

56
History of artificial neural
networks (Ann)
1943 McCulloch and Pitts Simple artificial model of the Threshold Logic
neuron

x1 W1
1958 Rosenblatt Perceptron algorithm

x2 W2
1975 Werbos Back Propagation

57
Connection
Patterns

Feed-
Recurrent
Forward

58
Overview of ann

59
Connection
Patterns

Feed-
Recurrent
Forward

60
Overview of ann

61
Feed-
Forward

Multi
Layered
Perceptron

62
Feed-
Forward

MLP

63
STEP BY STEP CONSTRUCTION OF A mlp (FFNN)

64
STEP BY STEP CONSTRUCTION OF A mlp (FFNN)

65
STEP BY STEP CONSTRUCTION OF A mlp (FFNN)

66
Activation function
Function Formula

0


x 0
Hard Limiter 1


0 x

1
Sigmoidal 1e x

Hyperbolic Tangent
e 2x 1
e 2x 1

67
Activation function
Function Formula

0


x 0
Hard Limiter 1


0 x

1
Sigmoidal 1e x

Hyperbolic Tangent
e 2x 1
e 2x 1

68
Activation function
Function Formula

0


x 0
Hard Limiter 1


0 x

1
Sigmoidal 1e x

Hyperbolic Tangent
e 2x 1
e 2x 1

69
What must be done to build a model?
e.g. In modelling
• Parameters are
runoff we might
Sttt related to our
0 consider:
target?
Precipitation
Sttt • Data gathering Temperature
1
Evaporation
Sttt • Data
2
preprocessing
Sttt
• Construction, training
3 and verifying the model

70
What must be done to build a model?
• Parameters are
Sttt related to our
0
target? We need to
consider:
Sttt • Data gathering Quality
1
Quantity
Sttt • Data
2
preprocessing
Sttt
• Construction, training
3 and verifying the model

71
What must be done to build a model?
• Parameters are
Sttt related to our Processes like
0
target? normalizing the
• Data gathering data.
Sttt
1
x i  x min
Sttt • Data x max  x min
2
preprocessing
Sttt
• Construction, training
3 and verifying the model

72
What must be done to build a model?
• Parameters are The model must
Sttt related to our have an
0
target? acceptable DC
in both
Sttt • Data gathering calibration and
1
verification.
Sttt • Data x i  x min
A sensitivity
2
preprocessing x maxbe x min
analysis can
Sttt
• Construction, training done at this
3 and verifying the model
stage

73
More notes worth to mention on “Data gathering”

 The more data, the better! But must be economic!

Quantity
 Most of the time we don’t gather data, but if we do there
must be a trade of between quantity and the price $

Sample

Quality  Data must be heterogeneous

Data Domain

74
Overfitting (overlearning)
What does overlearning mean?

 When model fits to the train data very

precisely and looses it’s general nature.

 Model has a high DC at training stage but has

a pretty low DC in verification.

75
Overfitting (overlearning)

In what conditions it happens?

 Low diversity of data

 Number of hidden layer’s neuron or

epochs are not suitable

76
Weights
We need to
determine
Bias

77
Gradient
Descent

Back-
Propagation

78
Learning algorithm

79
Learning algorithm

80
Learning algorithm

81
Steps to modeling an Ann

Defining input and output data They must be

defined as matrices.
Defining percentages of training,
validation and test data

Defining initial hidden layer’s neurons

number

Choosing the training algorithm

82
Steps to modeling an Ann

Defining input and output data Usually:

70% for training
Defining percentages of training, 15% for validation
validation and test data 15% for test
Defining initial hidden layer’s neurons
number

Choosing the training algorithm

83
Steps to modeling an Ann
Hidden layer’s
Defining input and output data
neurons must be a
little more than
Defining percentages of training,
input layer’s
validation and test data
neurons.
Defining initial hidden layer’s neurons (Egg shaped)
number It is chosen with a
trial and error
Choosing the training algorithm method.

84
Steps to modeling an Ann

Defining input and output data

Defining percentages of training,

validation and test data
Mostly we use
Defining initial hidden layer’s neurons
Levenberg-
number
Marquardt which
Choosing the training algorithm contains GD,AL and
BP

85
ADVANTAGES DISADVANTAGES

 Can implicitly detect complex  Neural networks are black

non-linear relationships box and have limited ability
between independent and to explicitly identify possible
dependent variables causal relationships

 Ability to detect all possible  Requires large data sets

interaction between
predictor variables  Prone to overfitting

 Can be developed using

different training algorithms

86
Using ann in water
resources engineering

87
Using ann in water
resources engineering

De MC SMO SYS en 01 V4 3 1 Sitrain Scout System
100% (3)
De MC SMO SYS en 01 V4 3 1 Sitrain Scout System
388 pages
Lecture 3 (DS) - Steps in Data Science Process
No ratings yet
Lecture 3 (DS) - Steps in Data Science Process
57 pages
CSC 3301-Lecture06 Introduction To Machine Learning
No ratings yet
CSC 3301-Lecture06 Introduction To Machine Learning
56 pages
DataMining S
No ratings yet
DataMining S
103 pages
CH 2
No ratings yet
CH 2
36 pages
Preprocessing 935
No ratings yet
Preprocessing 935
68 pages
3_Preprocessing
No ratings yet
3_Preprocessing
82 pages
Most Compact and Complete Data Science Cheat Sheet 1672981093
No ratings yet
Most Compact and Complete Data Science Cheat Sheet 1672981093
10 pages
Data Mining and Business Intelligence
No ratings yet
Data Mining and Business Intelligence
52 pages
Data Preprocessing
No ratings yet
Data Preprocessing
77 pages
Session1-DataCharacteristics
No ratings yet
Session1-DataCharacteristics
41 pages
DM-2Preprocessing 2
No ratings yet
DM-2Preprocessing 2
61 pages
Data Preprocessing (Sagar)
No ratings yet
Data Preprocessing (Sagar)
31 pages
What Is Data Science? Probability Overview Descriptive Statistics
No ratings yet
What Is Data Science? Probability Overview Descriptive Statistics
10 pages
UNIT I - Introduction - DataScience - New
No ratings yet
UNIT I - Introduction - DataScience - New
34 pages
Data Science Cheat Sheet
No ratings yet
Data Science Cheat Sheet
10 pages
Data Science - Sem6
100% (3)
Data Science - Sem6
118 pages
03 Pre Processing
No ratings yet
03 Pre Processing
89 pages
Chapter 3: Data Preprocessing
No ratings yet
Chapter 3: Data Preprocessing
30 pages
Data Mining _ Preprocessing
No ratings yet
Data Mining _ Preprocessing
77 pages
02know Your Data-Lecture2-3
No ratings yet
02know Your Data-Lecture2-3
53 pages
Data Science Pipeline, EDA & Data Preparation
No ratings yet
Data Science Pipeline, EDA & Data Preparation
14 pages
Data Science Class X Notes
No ratings yet
Data Science Class X Notes
3 pages
TYCS DS Unit1
No ratings yet
TYCS DS Unit1
28 pages
KIT306/606: Data Analytics Unit Coordinator: A/Prof. Quan Bai University of Tasmania
No ratings yet
KIT306/606: Data Analytics Unit Coordinator: A/Prof. Quan Bai University of Tasmania
51 pages
Unit1-Data Science Fundamentals
No ratings yet
Unit1-Data Science Fundamentals
35 pages
Lecture 01-05 Data, Central Tendency PDF
No ratings yet
Lecture 01-05 Data, Central Tendency PDF
51 pages
Mining
No ratings yet
Mining
63 pages
DM UNIT-1-1
No ratings yet
DM UNIT-1-1
56 pages
Data Mining and BI - Student Notes 2
No ratings yet
Data Mining and BI - Student Notes 2
40 pages
Unit 1 - Exploratory Data Analysis Fundamentals
No ratings yet
Unit 1 - Exploratory Data Analysis Fundamentals
47 pages
Data Science - Ebook
No ratings yet
Data Science - Ebook
32 pages
What Is Data Preprocessing
No ratings yet
What Is Data Preprocessing
4 pages
Data Mining
No ratings yet
Data Mining
40 pages
Insy662 - f23 - Week 1
No ratings yet
Insy662 - f23 - Week 1
21 pages
(IJCST-V10I4P1) :swagata Sarkar, Dhivya Balaje, Vibha V, Harish Pichumani
No ratings yet
(IJCST-V10I4P1) :swagata Sarkar, Dhivya Balaje, Vibha V, Harish Pichumani
4 pages
DV - Unit 1
No ratings yet
DV - Unit 1
40 pages
253777
No ratings yet
253777
66 pages
Data Preprocessing
No ratings yet
Data Preprocessing
63 pages
Unit 4
No ratings yet
Unit 4
66 pages
DS Module 1 Notes
No ratings yet
DS Module 1 Notes
25 pages
Q.1. Why Is Data Preprocessing Required?
100% (1)
Q.1. Why Is Data Preprocessing Required?
26 pages
DTS Modul Data Science Methodology
100% (1)
DTS Modul Data Science Methodology
56 pages
Chapter 3 - Tagged
No ratings yet
Chapter 3 - Tagged
63 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
19 pages
DAV_Technical Book(gtustudy.com)
No ratings yet
DAV_Technical Book(gtustudy.com)
137 pages
1 Intor To DMW
No ratings yet
1 Intor To DMW
22 pages
DSA Module 1 Notes
No ratings yet
DSA Module 1 Notes
24 pages
CIS 467 - Topic 2 - Data Exploration and Preprocessing
No ratings yet
CIS 467 - Topic 2 - Data Exploration and Preprocessing
81 pages
Data Preprocessing
No ratings yet
Data Preprocessing
54 pages
Beginners Guide To Data Science - A Twics Guide 1
100% (1)
Beginners Guide To Data Science - A Twics Guide 1
41 pages
BigData QB (c.format)
No ratings yet
BigData QB (c.format)
6 pages
FDS Pyq2
No ratings yet
FDS Pyq2
10 pages
Unit 2 Data Preprocessing
No ratings yet
Unit 2 Data Preprocessing
40 pages
Introduction to Data Analysis
No ratings yet
Introduction to Data Analysis
94 pages
Practical Data Strategies and Recipes
From Everand
Practical Data Strategies and Recipes
Tom Henricksen
No ratings yet
Introduction to Robotics
From Everand
Introduction to Robotics
Swarnalata Verma
No ratings yet
Mastering Data Science: From Basics to Expert Proficiency
From Everand
Mastering Data Science: From Basics to Expert Proficiency
William Smith
No ratings yet
Data Mining for Beginners: A Programmer’s Guide
From Everand
Data Mining for Beginners: A Programmer’s Guide
Agasti Khatri
No ratings yet
Pandas Essentials for Data Analysis: Definitive Reference for Developers and Engineers
From Everand
Pandas Essentials for Data Analysis: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Data Analytics with Generative AI
From Everand
Data Analytics with Generative AI
Younish P
No ratings yet
Lectura Sesión 4. Metadata Creation Practices at The Lilongwe University of Agriculture and Natural Resources Library's Institutional Repositor
No ratings yet
Lectura Sesión 4. Metadata Creation Practices at The Lilongwe University of Agriculture and Natural Resources Library's Institutional Repositor
16 pages
8000 - 9000 Operating Instructions PDF
No ratings yet
8000 - 9000 Operating Instructions PDF
12 pages
An Evaluation of The Effectiveness of Gcash As Payment and Online Banking System Among The 1
No ratings yet
An Evaluation of The Effectiveness of Gcash As Payment and Online Banking System Among The 1
41 pages
Heartbreaker
No ratings yet
Heartbreaker
122 pages
Gascoigne, Stephen - The Chinese Way To Health. A Self-Help Guide To Traditional Chinese Medicine (1997)
94% (16)
Gascoigne, Stephen - The Chinese Way To Health. A Self-Help Guide To Traditional Chinese Medicine (1997)
164 pages
Disorders of The Genitourinary System
No ratings yet
Disorders of The Genitourinary System
100 pages
M1_LOREX (2)
No ratings yet
M1_LOREX (2)
2 pages
EssentialsMfgTechCompExams PDF
No ratings yet
EssentialsMfgTechCompExams PDF
214 pages
Ethical Theories
No ratings yet
Ethical Theories
5 pages
Aubf Test
No ratings yet
Aubf Test
5 pages
Session 16 (BenchMarking& Cost of Quality)
No ratings yet
Session 16 (BenchMarking& Cost of Quality)
35 pages
Software Engineering Lab File
No ratings yet
Software Engineering Lab File
26 pages
Apb Demo Scoreboard - SV
No ratings yet
Apb Demo Scoreboard - SV
2 pages
Mini Project Instructions
No ratings yet
Mini Project Instructions
1 page
Project Homework Difference
100% (1)
Project Homework Difference
7 pages
Thumb Rules - Xls For Chemical Engineer
No ratings yet
Thumb Rules - Xls For Chemical Engineer
46 pages
Chapter 5 - Audit Process
No ratings yet
Chapter 5 - Audit Process
60 pages
LORD Catalog PDF
No ratings yet
LORD Catalog PDF
143 pages
Vogelsang_VX186_Maintenance_Manual
No ratings yet
Vogelsang_VX186_Maintenance_Manual
52 pages
CBSE Class 4 Mathematics Worksheet Set A
No ratings yet
CBSE Class 4 Mathematics Worksheet Set A
12 pages
Calculator Techniques
100% (12)
Calculator Techniques
59 pages
A Guide To A Healthy Lifestyle PDF
No ratings yet
A Guide To A Healthy Lifestyle PDF
6 pages
A Review On CFD Analysis of Control Valves
No ratings yet
A Review On CFD Analysis of Control Valves
4 pages
Design of Kinetic Energy Recovery System For Bicycle
No ratings yet
Design of Kinetic Energy Recovery System For Bicycle
44 pages
Rice Crop Yield
No ratings yet
Rice Crop Yield
6 pages
Mathhigher 4 H
100% (1)
Mathhigher 4 H
24 pages
Math Integrated Project
No ratings yet
Math Integrated Project
13 pages
The King in Yellow - Annotated Edition-88-224
No ratings yet
The King in Yellow - Annotated Edition-88-224
137 pages

Ann

Uploaded by

Ann

Uploaded by

1

What is Data Processing

Why Data Preprocessing?

How to Handle Noisy Data?

• Why preprocess the data?

Data Reduction Strategies

Probability, Statistics, and Decisions for Civil Engineers

Why we should learn Statistics and Probability?

• Weighted Arithmetic Mean

Probability density function

• Some Properties of probability density function:

1. f(x)≥0 for any x

2. Total area under f(x) is 1

Cumulative Distribution Function

Scatter Plots of Data with Various Correlation Coefficients

According to the last plot we cannot extrapolate correlation coefficient then  DC  1 

-Normalization : Database normalization, or simply normalization,

-Standardization : A standardized variable (sometimes called a z-

p( x  0.5 )  0.05  0.05  0.1

p( y 15 )  0.05  0.15  0.05  0.25

x   xf ( x )  ( 0.50.1)  (10.55 )  (1.50.4 )  ( 20.15 ) 1.3

y   yf ( y )  ( 50.2 )  (100.35 )  (150.25 )  ( 200.2 ) 12.25

 Models inspired by biological neural networks

 Estimating unknown functions using large

Learning  Changes when exposed to new dataset

 Searches through data to find a pattern

• Neural networks exhibit plasticity.

• Long-term changes in the strength of their

• Capable of forming new connections

Transferring Joint point of the

 The more data, the better! But must be economic!

Quality  Data must be heterogeneous

 When model fits to the train data very

 Model has a high DC at training stage but has

In what conditions it happens?

 Number of hidden layer’s neuron or

Defining input and output data They must be

Defining initial hidden layer’s neurons

Choosing the training algorithm

Defining input and output data Usually:

Choosing the training algorithm

Defining input and output data

Defining percentages of training,

 Can implicitly detect complex  Neural networks are black

 Ability to detect all possible  Requires large data sets

 Can be developed using

You might also like