5. Relationship Between Variables
5. Relationship Between Variables
between variables
Scatter plot
Two-way cross tabulation
Scatter plot
Scatter plot
• A scatter plot is primarily used to visualize the
relationship between two quantitative (numerical)
variables. It is particularly helpful when you want to:
1.Explore Correlations or Trends:
1.A scatter plot shows how one variable changes in response to
another. For example, it can reveal if there is a positive,
negative, or no correlation between two variables.
2.Identify Patterns or Clusters:
1.It helps in identifying clusters, trends, or outliers in the data.
Key Features of Data Suitable for Scatter Plots:
• Quantitative Variables: Both variables on the X-axis
and Y-axis should be numerical (e.g., height vs. weight,
age vs. income).
• Paired Data: Each point in the scatter plot represents
a pair of values (one from each variable).
• Large Datasets: Scatter plots work well with datasets
containing many observations, as they can show
density and patterns clearly.
• Relationship or Correlation:
• Shows the nature of the relationship between two
variables:
• Positive correlation: As one variable increases, the other
also increases (points slope upward).
• Negative correlation: As one variable increases, the other
decreases (points slope downward).
• No correlation: No clear pattern; points are scattered
randomly.
MPG dataset
Scatter plot
Pair Plot
• A pairwise plot (also called a pair plot) is a type of
visualization that shows pairwise relationships between
all numerical variables in a dataset. It creates a grid of
scatter plots for each combination of variables, along
with histograms or kernel density plots (KDE) on the
diagonal to show the distribution of individual variables.
Pair Plot
Scatter Plot Matrix with Multiple Categories
scatterplotmatrix
Two-way cross tabulations for
categorical data