0% found this document useful (0 votes)

12 views31 pages

Ad3301 Apr May 2024 Answer Key

The document outlines the syllabus and exam structure for the Data Exploration and Visualization course at Anna University for April/May 2024. It includes questions and answers on key responsibilities of data analysts, tools for data analysis and visualization, data transformation techniques, and various statistical methods. Additionally, it covers practical applications of pivot tables, line plots, histograms, and 3D data visualization, providing insights into their creation and interpretation.

Uploaded by

vigneshmanikandan1987

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views31 pages

Ad3301 Apr May 2024 Answer Key

Uploaded by

vigneshmanikandan1987

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 31

Subject Code: AD3301

Subject Name: Data Exploration and Visualization

Anna University Examination: April / May 2024
PART A
(2 MARKS)
1. Mention the key responsibilities of a data analyst.
Ans:
Data analysts are responsible for collecting, cleaning, interpreting, and presenting data. They use
statistical tools to identify patterns, prepare reports, and assist in decision-making.

2. Name some of the best tools used for data analysis and data visualization.
Ans:
For data analysis: Python, R, SQL.
For data visualization: Tableau, Power BI, Matplotlib, and Seaborn.

3. List the software and hardware components required for data visualization.
Ans:
• Software: Tableau, Excel, Python
• Hardware: High RAM, multi-core processor, GPU for rendering, high-resolution monitor

4. Draw and label a rough contour plot of the joint probability density function. When P = -0.4, ρ
= -0.4.
Ans:
The contour plot should show elliptical curves tilted downward, indicating a moderate negative
correlation. (To be drawn manually on paper.)
5. Difference between normalized scaling and standardized scaling.
Ans:
• Normalized Scaling: Rescales data to [0,1] using (x - min)/(max - min)
• Standardized Scaling: Converts data to have mean = 0 and standard deviation = 1 using (x -
mean)/std

6. Illustrate important steps to be followed in preparing a base map.

Ans:
Steps include:
1. Collect spatial data
2. Georeference the data
3. Choose coordinate system
4. Add map layers
5. Label features
6. Verify for accuracy

7. The diagram represents the sales of Superclene toothpaste over the last few years. Give a reason
why it is misleading.
Ans:
The Y-axis does not start from zero, which visually exaggerates small differences in sales, making the
chart misleading.

8. How do you find the correlation of a scatter plot?

Ans:
Observe the trend of data points.
• Upward trend = positive correlation
• Downward trend = negative correlation
Use Pearson’s correlation coefficient for numerical value.

9. Define least square method in time series.

Ans:
It is a method to fit a trend line by minimizing the sum of squared differences between observed and
estimated values in time series data.

10. List the techniques used in smoothing time series.

Ans:
1. Simple Moving Average
2. Weighted Moving Average
3. Exponential Smoothing
4. LOESS/LOWESS
5. Gaussian Smoothing
PART-B
(13 MARKS)

11 (a) (i) Discuss about Descriptive Statistics in Exploratory Analysis. (7 Marks)

Answer:

Descriptive statistics summarize and organize the characteristics of a dataset, playing a vital role
in Exploratory Data Analysis (EDA) by helping understand the structure and patterns in data.
Measures of Central Tendency
• Mean:
o The arithmetic average of values in a dataset.
o Sensitive to outliers.
• Median:
o The middle value when the data is sorted.
o More robust than the mean when outliers are present.
• Mode:
o The value that appears most frequently in the dataset.
Measures of Dispersion
• Range:
o Difference between the maximum and minimum values.
o Indicates spread but is sensitive to extreme values.
• Variance:
o The average of the squared differences from the mean.
o Represents how spread out the data points are.
• Standard Deviation:
o The square root of the variance.
o Most commonly used to measure the amount of variation in data.
Measures of Shape
• Skewness:
o Describes the asymmetry of the data distribution.
o Positive skew = tail on right, Negative skew = tail on left.
• Kurtosis:
o Describes the “tailedness” or peak of the distribution.
o High kurtosis = heavy tails; Low kurtosis = light tails.
Frequency Distribution
• Uses tables, histograms, and bar charts to show how often values occur.
Five-number Summary
• Consists of: Minimum, Q1, Median, Q3, Maximum
• Used in box plots to visualize data spread and detect outliers.
Data Visualization in Descriptive Statistics
• Histograms: Show frequency distribution
• Box Plots: Show quartiles and outliers
• Bar Charts: Visualize categorical data
11 (a) (ii) Explain in detail about Data Transformation Techniques. (6 Marks)

Answer

Data transformation converts data into a proper format or structure to improve the performance
of machine learning models and enhance interpretation. Techniques used are

Normalization (Min-Max Scaling)

• Scales data into a range of 0 to 1.
• Formula: (x−min)/(max−min)(x - min) / (max - min)
• Useful when features have different scales.
Standardization (Z-score Scaling)
• Transforms data to have mean = 0 and standard deviation = 1.
• Formula: (x−μ)/σ(x - \mu) / \sigma
• Suitable for normally distributed data.
Logarithmic Transformation
• Reduces right skewness.
• Makes data more symmetric and helps linearize exponential trends.
Square Root and Cube Root Transformations
• Help reduce skewness in moderate-skewed data.
• Preserve zero and positive values, useful for count data.
Encoding Categorical Variables
• Label Encoding: Converts categories to integers (e.g., Male = 0, Female = 1)
• One-Hot Encoding: Creates binary columns for each category (used in ML models)
Binning (Discretization)
• Converts continuous variables into discrete categories or intervals.
• Example: Age groups (0–18, 19–35, etc.)
Handling Skewness
• Apply Box-Cox or Yeo-Johnson transformations for non-linear distributions.
Feature Scaling Tools in Python
• Scikit-learn:
o MinMaxScaler, StandardScaler, PowerTransformer used for transformations.
11 (b) (i) Explain in detail about Comparative Statistics in Exploratory Analysis. (6 Marks)

Answer:
Comparative Statistics involves comparing two or more groups or variables to identify
differences, relationships, or patterns. It is a crucial component of Exploratory Data Analysis (EDA)
and helps in understanding how different variables behave across categories.
Purpose of Comparative Statistics
• Understand variability across groups (e.g., comparing regions, genders, time periods).
• Support decision-making by highlighting significant differences.
• Detect outliers or unusual behavior across subgroups.
• Serve as a precursor to inferential statistics (like hypothesis testing).
Common Comparative Statistical Measures
• Group-wise Mean / Median / Mode:
Helps understand central tendency within each subgroup.
• Standard Deviation & Variance (per group):
Measures the spread of data across different categories.
• Range & Interquartile Range (IQR):
Useful to compare dispersion in different datasets.
• Proportions & Percentages:
Used when comparing categorical variables (e.g., % of male vs female customers).
Visualization Tools for Comparison
• Box Plots by Group:
Show distribution, outliers, and spread for each category.
• Bar Charts / Clustered Bar Graphs:
Useful for visual comparison of frequencies or averages.
• Violin Plots:
Combine box plot and kernel density to visualize distribution by category.
• Side-by-side Histograms:
Useful for comparing frequency distributions.
Tabular Comparison Techniques
• Cross-tabulations (Contingency Tables):
Summarize categorical data for two variables.
• Pivot Tables (in Excel / Python):
Allow quick aggregation and comparison of metrics by row and column groups.
Basic Statistical Tests for Comparison
• T-tests:
Used to compare means between two groups.
• ANOVA (Analysis of Variance):
Used when comparing more than two group means.
Example Use Case
Scenario: A retail company compares average monthly sales across three branches (North, South,
Central).
• Uses box plots and summary tables.
• Finds South branch has higher average sales but also more variability.
• Insights: Better performance in South but less consistency → Need targeted strategy.
11 (b) (ii) Discuss in detail about the practical use of Pivot Table in data science with suitable
example. (7 Marks).

Answer
A Pivot Table is an interactive tool used to summarize large datasets quickly. It allows users to
aggregate, group, and rearrange data dynamically to gain insights — widely used in Excel, Power BI,
and Python (Pandas).
Core Functions of Pivot Tables
• Summarize data using Sum, Count, Average, Max, Min
• Perform grouping (e.g., by date, category, region)
• Create multi-level views using rows and columns
• Support filters to drill down into subsets of data
• Enable quick insights without formulas or coding

Steps in Creating a Pivot Table (Generalized)

1. Select data range or DataFrame
2. Choose row labels (e.g., Product category)
3. Choose column labels (e.g., Region or Year)
4. Select values to summarize (e.g., Total Sales)
5. Apply filters if needed (e.g., specific product line)

Tools That Support Pivot Tables

• Excel: Built-in pivot table feature
• Power BI / Tableau: Drag-and-drop visual pivot tables
• Python (Pandas): df.pivot_table(index, columns, values, aggfunc)

Practical Use Cases in Data Science

• SalesAnalysis:
Compare monthly sales by product and region.
• CustomerSegmentation:
Count number of customers by age group and location.
• PerformanceMonitoring:
Track employee productivity across departments.
• Healthcare Analytics:
Summarize patient count by disease type and ward.

Example: Supermarket Sales Dataset

Product Region Sales
Soap East 100
Soap West 200
Shampoo East 150
Shampoo West 250
Pivot Table Summary (Sum of Sales):
Product East West
Soap 100 200
Shampoo 150 250
Insight: Shampoo performs better across both regions.

Advantages of Using Pivot Tables

• Quick aggregation of large datasets
• No programming required (especially in Excel)
• Interactive and customizable for different views
• Saves time during data cleaning and EDA.
12 (a)(i) Define line plot. With an example, explain how to create a line plot to visualize the
trend. (6 Marks)

Answer

Definition:
A line plot (or line graph) is a type of chart used to display information as a series of data points
connected by straight lines.
It is useful to visualize trends over time or sequential data.

Key Characteristics:
• X-axis: Represents the independent variable (e.g., time).
• Y-axis: Represents the dependent variable (e.g., temperature, sales).
• Data points: Mark actual measurements.
• Lines: Connect data points to show the trend.

Use Cases:
• Tracking sales over months
• Monitoring temperature variation by day
• Observing stock price movement over time
• Measuring sensor output across time intervals

Steps to Create a Line Plot:

1. Prepare the data
o Organize values into a sequence (X, Y pairs).
o Example:
Day Sales
1 200
2 240
3 300
2. Choose software/tool
o Excel, Python (Matplotlib), Google Sheets, R
3. Plot X and Y axes
o X-axis: Time (Day)
o Y-axis: Value (Sales)
4. Mark data points
o Each (x, y) pair is marked with a dot or point.
5. Connect the points
o Use lines to join the points in sequence.
6. Add labels & title
o Label axes with units, add a chart title, and legend if needed.
Example (Python using Matplotlib):
import matplotlib.pyplot as plt

days = [1, 2, 3, 4, 5]
sales = [200, 220, 250, 300, 320]

plt.plot(days, sales, marker='o')

plt.title("Daily Sales Trend")
plt.xlabel("Day")
plt.ylabel("Sales")
plt.grid(True)
plt.show()
Output
12 (a)(ii) The following table gives the lifetime of 400 neon lamps. Draw the histogram for the
below data. (7 Marks)

Given Frequency Table:

Lifetime (in hours) Number of Lamps
300–400 14
400–500 56
500–600 60
600–700 86
700–800 74
800–900 62
900–1000 48

Step-by-Step Procedure to Draw Histogram:

1. Identify class intervals
o Equal width = 100 (from 300–1000)
o These become the bins for the histogram.
2. Mark X-axis and Y-axis
o X-axis: Lifetime intervals
o Y-axis: Frequency (number of lamps)
3. Choose a suitable scale
o Y-axis: Use scale like 10 units = 1 cm, up to maximum value 86
4. Draw bars
o Each class interval becomes a bar.
o Height = frequency value (no gap between bars).
5. Label graph
o Add title: “Histogram of Neon Lamp Lifetimes”
o Label axes: Lifetime (hrs) and Number of Lamps

Table for Histogram Plotting:

Interval Frequency Midpoint (for optional line chart)
300–400 14 350
400–500 56 450
500–600 60 550
600–700 86 650
700–800 74 750
800–900 62 850
900–1000 48 950
Sketching Guidelines:
• Use graph sheet if allowed
• No gaps between bars
• Equal bar widths (class width = 100)
• Bar heights based on frequencies:
o Tallest bar: 86 (600–700 hrs class)
o Shortest bar: 14 (300–400 hrs class)

Insights from Histogram:

• The most frequent lamp lifetime is in 600–700 hrs range.
• Distribution is slightly skewed to the left, with higher frequencies around the center.
• Useful for analyzing product lifespan consistency.

Histogram program-Python
import sys
import matplotlib

import matplotlib.pyplot as plt

bins = [300, 400, 500, 600, 700, 800, 900, 1000]

frequencies = [14, 56, 60, 86, 74, 62, 48]

plt.hist(bins[:-1], bins=bins, weights=frequencies, edgecolor='black', align='left', rwidth=0.9)

plt.title("Histogram of Neon Lamp Lifetimes")
plt.xlabel("Lifetime (in hours)")
plt.ylabel("Number of Lamps")
plt.grid(axis='y', linestyle='--', alpha=0.7)
plt.show()

#Two lines to make our compiler able to draw:

plt.savefig(sys.stdout.buffer)
sys.stdout.flush()
Output
12 (b)(i) Explain in detail about 3D Data Visualization, its components and its working flow
with suitable example. (6 Marks)

Answer:

What is 3D Data Visualization?

3D data visualization represents data in three dimensions (X, Y, Z), providing deeper insights into
multi-variable relationships, patterns, and structures that are hard to visualize in 2D.

Components of 3D Visualization
1. Axes (X, Y, Z)
o Define the dimensions of the plot.
o Each axis represents a variable or measurement.
2. 3D Plotting Engine
o Software or libraries that support rendering in 3D (e.g., Matplotlib 3D, Plotly, VTK).
3. Camera / Perspective
o Allows rotating, zooming, and panning the view.
4. Color, Shape, and Size Encodings
o Used to represent additional variables (e.g., intensity, category).
5. Interactivity Tools
o Tools like sliders, tooltips, or selection for real-time data interaction (in dashboards).
o
Tools Supporting 3D Visualization
• Matplotlib (Python) – Axes3D for basic 3D scatter, surface, and wireframe plots.
• Plotly (Python/JS) – Interactive, web-based 3D graphs.
• Tableau / Power BI – Supports limited 3D in dashboards.
• Unity3D / WebGL – Advanced 3D modeling and immersive visualization.

Working Flow of 3D Visualization

Step 1: Import and clean data
Step 2: Select three variables to plot (X, Y, Z)
Step 3: Choose 3D plot type (scatter, surface, contour, etc.)
Step 4: Apply encoding (color/size) for 4th or 5th variable if needed
Step 5: Render and rotate the plot to observe patterns
Step 6: Add titles, labels, and interaction controls (if supported)
Example (Using Python Matplotlib)
import sys
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt

fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')

x = [1, 2, 3, 4]
y = [10, 15, 20, 25]
z = [5, 6, 2, 3]

ax.scatter(x, y, z)
ax.set_xlabel('X Axis')
ax.set_ylabel('Y Axis')
ax.set_zlabel('Z Axis')
plt.title("3D Scatter Plot")
plt.show()

plt.savefig(sys.stdout.buffer)
sys.stdout.flush()

Output

Applications of 3D Visualization
• Climate or weather modeling (Temp vs. Lat vs. Altitude)
• Geospatial analysis (Longitude, Latitude, Elevation)
• Scientific simulations (Molecular structures, Fluid dynamics)
• Financial markets (Time vs. Price vs. Volume)
12 (b)(ii) Discuss in detail about text and annotation. (7 Marks)

Answer:

Purpose of Text and Annotation in Visualization:

Text and annotation help clarify, highlight, and explain key parts of a plot or chart, improving
readability and storytelling.
Types of Text in Data Visualization
1. Title
o Explains what the chart is about.
2. Axis Labels
o Identify units and variables on X, Y, and Z axes.
3. Legends
o Describe symbols, colors, or line types used in the chart.
4. Tick Labels
o Values shown along axes.
Annotations
Annotations are custom text notes or arrows placed near specific data points to emphasize:
• Outliers
• Peaks / Valleys
• Specific events
• Trends or anomalies
Annotation Components
1. Text: Descriptive message (e.g., “Max Value Here”)
2. Arrow/Box: Optional pointer to exact location
3. Coordinates: Position (x, y) where annotation is placed
4. Style: Font, color, size, angle
Adding Annotation in Python (Matplotlib)
import sys
import matplotlib.pyplot as plt
x = [1, 2, 3, 4]
y = [100, 150, 130, 170]
plt.plot(x, y, marker='o')
plt.title("Sales Trend")
plt.xlabel("Quarter")
plt.ylabel("Sales")

plt.annotate('Peak Sales', xy=(4, 170), xytext=(3, 180), arrowprops=dict( facecolor='red', shrink=0.05))

plt.grid(True)
plt.show()
plt.savefig(sys.stdout.buffer)
sys.stdout.flush()
Output

Best Practices for Text & Annotation

• Keep labels short and meaningful
• Avoid overlapping text
• Use contrasting color for visibility
• Place annotations close to the point of interest
• Use consistent font and size

Applications
• Marking outliers in scatter plots
• Highlighting max/min points in line graphs
• Explaining sudden spikes/dips in trend
• Adding comments or references in dashboards
13. (a)(i) Does universe frequency distribution have variable? Justify in detail. (7 Marks)

Answer

Definition of Universe Frequency Distribution:

• A universe refers to the entire population or complete set of data points under observation.
• A frequency distribution shows how often each value or range of values occurs.
Presence of Variable:
Yes, universe frequency distributions always involve one or more variables.
Justification with Explanation:
• A variable is a measurable attribute (e.g., marks, age, income).
• The frequency distribution groups the values of this variable and counts how often they occur.
• Hence, without a variable, frequency distribution has no basis to organize data.
Example:
Suppose the variable is Age of students in a class:
Age Group Frequency
18–20 15
21–23 20
24–26 10
Here:
• Age is the variable.
• The distribution is constructed by counting how many students fall into each age range.
Conclusion:
The frequency distribution cannot exist without a variable because it’s the basis for classification and
counting. So, a universe frequency distribution always includes variables.
13. (a)(ii) Explain in detail about scaling and standardizing. (6 Marks)

Answer

Why Transformation is Needed:

• Features with different units/scales can negatively impact model performance.
• Scaling and standardization bring uniformity to the data.

1. Scaling (Normalization):
• Rescales data between a fixed range, typically [0, 1].
• Useful for distance-based algorithms like KNN, SVM.

2. Standardization (Z-score Normalization):

• Converts data to have mean = 0 and standard deviation = 1.
• Useful when data is normally distributed or when comparing features with different units.

Summary Table:
Transformation Output Range Use When
Scaling [0, 1] Features with different scales
Standardizing ~N(0, 1) For normally distributed data
13. (b) Time Series Modeling: Why Time Series Model is Better (13 Marks)

Answer
Problem Context:
Consider the Training of 2 models for same data set with two different techniques.
You trained:
• Model 1: Decision Tree
• Model 2: Time Series Regression Model
Conclusion : Model 2 gave better performance.

Understand Decision Tree Limitations

• Treats each row independently, without order.
• Can't handle autocorrelation or sequential patterns.
• Prone to overfitting in noisy time data.
Advantages of Time Series Regression
• Respects time order of data.
• Uses:
o Trend component
o Seasonal component
o Lag features and moving averages
• Better suited for forecasting.
Example
Assume dataset = daily website visits
• Decision Tree might split based on thresholds (e.g., day = weekend)
• Time Series model detects that weekends always have 30% more visits

Reasons Why Time Series Model Performed Better:

Factor Time Series Model Decision Tree
Time Awareness Yes No
Trend Capture Yes No
Seasonality Yes No
Suitable for Forecasting Yes No

Conclusion:
Model 2 (Time Series Regression) respects temporal structure and provides more accurate,
generalized forecasts, making it more suitable for time-based data than a decision tree.
14. (a)(i) Contingency Table with Example (7 Marks)

Answer
Definition:
• A contingency table (also known as a cross-tabulation or cross table) is a matrix-style
table used to display the frequency distribution of two or more categorical variables.
• It helps analyse the relationship or association between those variables.
• A contingency table is a powerful tool for organizing categorical data and determining
whether relationships exist between variables.
• It forms the foundation for statistical tests of independence.
Structure:
A basic 2×2 contingency table looks like this:
Category A Category B Total
Group 1 a b a+b
Group 2 c d c+d
Total a+c b+d N
• Rows represent one variable.
• Columns represent the other variable.
• Each cell shows the frequency (count) of occurrences.
Use Case:
Suppose we want to study whether gender affects purchase behaviour.
Gender Purchased Not Purchased Total
Male 30 20 50
Female 50 10 60
Total 80 30 110
• Here, the two variables are:
o Gender (Male, Female)
o Purchase Decision (Purchased, Not Purchased)
Interpretation:
• Out of 50 males, 30 purchased the product.
• Out of 60 females, 50 purchased the product.
• This suggests females are more likely to purchase in this dataset.
Applications of Contingency Tables:
• Used in Chi-square tests to test independence between variables.
• Helps in understanding categorical relationships.
• Common in market research, social sciences, healthcare analytics, etc.
Advantages:
• Easy to construct and interpret.
• Provides quick summary of data interaction.
• Facilitates statistical hypothesis testing (like Chi-square test).
14. (a)(ii) Percentage Table with Example (6 Marks)
Definition:
• A percentage table is a statistical table that shows the relative frequencies of categories
expressed as percentages rather than raw counts.
• It helps in comparing different groups or categories, especially when totals differ or when
visual clarity is needed.
• A percentage table is a vital tool to simplify raw numerical data into relative comparisons,
making the insights easier to interpret, communicate, and visualize.

Purpose:
• To convert raw frequencies into relative measures.
• To highlight proportions instead of absolute counts.
• To support decision-making, especially in marketing, business analysis, and surveys.

Conversion Formula:

Example:
Let’s say a shop sold 100 items consisting of 3 different products.
Product Units Sold Percentage (%)
Soap 20 (20 / 100) × 100 = 20%
Paste 30 (30 / 100) × 100 = 30%
Shampoo 50 (50 / 100) × 100 = 50%
Total 100 100%

Interpretation:
• 50% of all sales were Shampoo.
• Soap and Paste contributed 20% and 30% respectively.
• This helps the shop identify which product is in higher demand.

Benefits of Percentage Tables:

• Makes it easier to compare across categories.
• Supports graphical representation like pie charts and stacked bar charts.
• Useful in summarizing survey data or market share.

Application Areas:
• Business analytics: Market share comparison
• Education: Exam result summaries by subject
• Healthcare: Disease cases by percentage of population
• Finance: Portfolio asset allocation
14. (b) Scatter Plot, Correlation & Analysis (13 Marks)
Given Data:
No. of games 3 5 2 6 7 1 2 7 1 7
Scores 80 90 75 80 90 50 65 85 40 100
Step-by-Step Instructions:
Step 1: Plot Scatter Plot
• X-axis: No. of games
• Y-axis: Scores
• Plot all 10 points
Step 2: Identify Correlation Pattern
• As no. of games increases, scores also increase → Positive correlation.
Justification:
• Scores improve with more games played
• e.g., 1 game → 40, 2 games → 65, 7 games → 100
• Clear upward trend → Strong Positive Correlation
Scatter Plot correlation and types.
1. Definition of Scatter Plot
A scatter plot (also called a scatter diagram or scatter graph) is a type of plot used to visualize
the relationship between two numerical variables.
• Each point represents one observation.
• X-axis – Independent variable (e.g., time, number of games).
• Y-axis – Dependent variable (e.g., score, height).
2. Purpose of a Scatter Plot
• To identify patterns, trends, or relationships between variables.
• To detect correlation (positive, negative, or none).
• To identify outliers or clusters in data.
3. Sample Dataset Example
Let’s say we are studying the relationship between the number of games played and the scores
obtained:
No. of Games 3 5 2 6 7 1 2 7 1 7
Scores 80 90 75 80 90 50 65 85 40 100
• Plot X = Games, Y = Scores
4. Interpretation of the Scatter Plot
• As the number of games increases, the scores also tend to increase.
• This indicates a positive relationship between the two variables.
5. Types of Correlation in Scatter Plots

Correlation Type Description Visual Pattern

Positive Both variables increase together Upward slope
Negative One increases, the other decreases Downward slope
No correlation No pattern between variables Random scatter
Examples:
1. Positive Correlation:
o Hours Studied ↑ → Marks ↑
o Scatter plot rises left to right
2. Negative Correlation:
o Temperature ↑ → Ice Cream Sales ↓ (in winter months)
o Scatter plot falls left to right
3. No Correlation:
o Height vs Favorite Color
o Points scattered randomly.
7. Scatter Plot Code (Python with Matplotlib)

import matplotlib.pyplot as plt

import numpy as np
games = np.array([3, 5, 2, 6, 7, 1, 2, 7, 1, 7])
scores = np.array([80, 90, 75, 80, 90, 50, 65, 85, 40, 100])
# Best-fit line
slope, intercept = np.polyfit(games, scores, 1)
trend = slope * games + intercept
plt.scatter(games, scores, color='blue', label='Scores')
plt.plot(games, trend, '--', color='grey', label='Best-fit line')
plt.title("Scatter Plot: Games Played vs Scores")
plt.xlabel("Number of Games Played")
plt.ylabel("Scores")
plt.grid(True)
plt.legend()
plt.show()

8. Use Cases of Scatter Plots

• Education: Study time vs marks
• Health: Calorie intake vs weight
• Business: Ad spending vs sales
• Sports: Practice sessions vs performance
15 (a)(i) Explain the main components of time series data. Which of these would be most
prevalent in data relating to unemployment? (6 Marks)

Answer
What is Time Series Data?
Time series data is a collection of data points measured sequentially over time at equal
intervals such as daily, monthly, quarterly, or yearly.
Examples:
• Daily stock prices
• Monthly unemployment rates
• Annual temperature data
Main Components of Time Series:
Time series data typically consists of four key components:

1. Trend Component (T):

• Refers to the long-term upward or downward movement in the data.
• Represents the general direction of the data over time.
• Can be linear or non-linear.
Example:
If unemployment steadily rises from 5% to 8% over 5 years, this indicates a positive trend.

2. Seasonal Component (S):

• Short-term recurring patterns that occur at fixed time intervals (e.g., daily, weekly,
monthly).
• Usually driven by calendar-based factors like holidays, seasons, or financial quarters.
Example:
Ice cream sales rise in summer and drop in winter – this is seasonality.

3. Cyclical Component (C):

• Non-fixed, long-term fluctuations that happen due to economic or business cycles.
• Unlike seasonality, cycles do not follow a fixed calendar interval.
• Duration can range from several months to years.
Example:
Unemployment increases during a recession and falls during a boom.
4. Irregular/Residual Component (I):
• Random, unpredictable variations caused by unexpected events such as strikes,
pandemics, or natural disasters.
• Also called noise in the data.

Most Prevalent in Unemployment Data:

In time series data related to unemployment, the most prevalent components are:
Trend:
• Unemployment may increase or decrease over the years due to economic growth,
automation, policy changes, etc.
Cyclic:
• Unemployment is heavily influenced by economic cycles (recessions, booms, financial
crises).
• These cycles are not fixed and can last for years, distinguishing them from seasonality.
Final Answer:
The cyclical component is most prevalent in unemployment data, followed by the trend.
15 (a)(ii) Suppose... views increase in Jan–Mar and decrease in Nov–Dec. Does this
represent seasonality? Justify. (7 Marks)

Given Statement:
• You are a data scientist at Times of India.
• You observed:
o Views increase from January to March
o Views decrease in November and December
o This pattern repeats every year

What is Seasonality?
Seasonality is the component of time series data that shows repetitive patterns or fluctuations
at regular intervals (e.g., daily, monthly, quarterly).

Key characteristics of seasonality:

• Occurs at fixed time intervals
• Is predictable and cyclic
• Related to calendar events or behavioral trends

Analysis of the Situation:

Month Observed Behavior
Jan–Mar Views Increase
Apr–Oct Moderate Views
Nov–Dec Views Decrease
Does it Represent Seasonality?
Yes, the behaviour indicates seasonality.
Justification:
• The changes in viewership occur regularly every year, making them periodic.
• The fluctuation is linked to specific months (calendar effect).
• Likely caused by:
o New Year resolutions
o Exam preparations
o End-of-year holidays
• This shows a stable, predictable pattern tied to the time of year.
Seasonality vs Other Components:
Component Present? Why?
Seasonality Yes Repeats yearly at fixed intervals
Trend No No long-term increase or decrease indicated
Cyclic No Not tied to economic/business cycles
Irregular No Pattern is not random or one-time
15 b) Suppose the following data represent total revenues.
(in millions of constant 1995 dollars) by a car rental agency over the 11 year period 1990 to
2000;
4.0, 5.0, 7.0, 6.0, 8.0, 9.0, 5.0, 2.0, 3.5, 5.5, 6.5
Compute the 5-year moving averages for this annual time series.

Answer
To solve this problem, we need to compute the 5-year moving averages for the given 11-year
time series data from 1990 to 2000.

Given Data (Revenues in Millions of Constant 1995 Dollars)

Year Revenue
1990 4.0
1991 5.0
1992 7.0
1993 6.0
1994 8.0
1995 9.0
1996 5.0
1997 2.0
1998 3.5
1999 5.5
2000 6.5

5-Year Moving Average Formula:

• A 5-year moving average is calculated by taking the average of 5 consecutive years.
• The centered average is usually placed at the middle year of the 5-year span (i.e., the
3rd year in the 5-year window).

Calculation of 5-Year Moving Averages:

Years Covered Sum Average (Sum ÷ 5) Centered Year
1990–1994 4.0+5.0+7.0+6.0+8.0 = 30.0 30.0 ÷ 5 = 6.0 1992
1991–1995 5.0+7.0+6.0+8.0+9.0 = 35.0 35.0 ÷ 5 = 7.0 1993
1992–1996 7.0+6.0+8.0+9.0+5.0 = 35.0 35.0 ÷ 5 = 7.0 1994
1993–1997 6.0+8.0+9.0+5.0+2.0 = 30.0 30.0 ÷ 5 = 6.0 1995
1994–1998 8.0+9.0+5.0+2.0+3.5 = 27.5 27.5 ÷ 5 = 5.5 1996
1995–1999 9.0+5.0+2.0+3.5+5.5 = 25.0 25.0 ÷ 5 = 5.0 1997
1996–2000 5.0+2.0+3.5+5.5+6.5 = 22.5 22.5 ÷ 5 = 4.5 1998
Final Answer Table:
Year 5-Year Moving Average
1992 6.0
1993 7.0
1994 7.0
1995 6.0
1996 5.5
1997 5.0
1998 4.5

Python code:
# Year and revenue data
years = list(range(1990, 2001))
revenues = [4.0, 5.0, 7.0, 6.0, 8.0, 9.0, 5.0, 2.0, 3.5, 5.5, 6.5]

# Compute 5-year moving averages

moving_averages = []
centered_years = []

for i in range(len(revenues) - 4):

five_year_avg = sum(revenues[i:i+5]) / 5
moving_averages.append(five_year_avg)
centered_years.append(years[i + 2]) # center of 5 years

# Display results
print("Year\t5-Year Moving Average")
for year, avg in zip(centered_years, moving_averages):
print(f"{year}\t{avg:.1f}")

Output
Year 5-Year Moving Average
1992 6.0
1993 7.0
1994 7.0
1995 6.0
1996 5.5
1997 5.0
1998 4.5
16(a) Time Series Forecasting Methods (13 marks)
Given Actual Data:
Period 1 2 3 4 5 6 7 8 9 10
Actual 974 766 727 849 693 655 854 742 717 852

1. Naïve Forecast
Forecast for period t = actual value at (t-1)
Start from Period 2:
Period Forecast
2 974
3 766
4 727
5 849
6 693
7 655
8 854
9 742
10 717

2. 3-Period Moving Average

Start from Period 4:

Period Calculation Forecast
4 (974 + 766 + 727)/3 = 822.3 822
5 (766 + 727 + 849)/3 = 780.7 781
6 (727 + 849 + 693)/3 = 756.3 756
7 (849 + 693 + 655)/3 = 732.3 732
8 (693 + 655 + 854)/3 = 734.0 734
9 (655 + 854 + 742)/3 = 750.3 750
10 (854 + 742 + 717)/3 = 771.0 771

3. 4-Period Moving Average

Start from Period 5
Period Calculation Forecast
5 (974+766+727+849)/4 = 829.0 829
6 (766+727+849+693)/4 = 758.8 759
7 (727+849+693+655)/4 = 731.0 731
8 (849+693+655+854)/4 = 762.8 763
9 (693+655+854+742)/4 = 736.0 736
10 (655+854+742+717)/4 = 742.0 742
4. Weighted Moving Average (3-2-1)

Weights: 3 (most recent), 2, 1

Start from Period 4:
Period Forecast Calculation Forecast
4 (3×727 + 2×766 + 1×974)/6 = 781.0 781
5 (3×849 + 2×727 + 1×766)/6 = 794.3 794
6 (3×693 + 2×849 + 1×727)/6 = 750.7 751
7 (3×655 + 2×693 + 1×849)/6 = 700.0 700
8 (3×854 + 2×655 + 1×693)/6 = 761.0 761
9 (3×742 + 2×854 + 1×655)/6 = 764.8 765
10 (3×717 + 2×742 + 1×854)/6 = 735.0 735

5. Weighted Moving Average (1-4-5)

Weights: 1 (oldest), 4, 5 (most recent)
Start from Period 4:

Period Calculation Forecast

4 (1×974 + 4×766 + 5×727)/10 = 764.5 765
5 (1×766 + 4×727 + 5×849)/10 = 784.6 785
6 (1×727 + 4×849 + 5×693)/10 = 752.4 752
7 (1×849 + 4×693 + 5×655)/10 = 678.6 679
8 (1×693 + 4×655 + 5×854)/10 = 748.1 748
9 (1×655 + 4×854 + 5×742)/10 = 770.3 770
10 (1×854 + 4×742 + 5×717)/10 = 729.5 730

6. Exponential Smoothing α = 0.1

Assume initial forecast F1 = A1 = 974

Period Forecast (α = 0.1) Rounded
2 974 974
3 974 + 0.1(766 - 974) = 951.6 952
4 951.6 + 0.1(727 - 951.6) = 929.14 929
5 929.14 + 0.1(849 - 929.14) = 920.23 920
6 920.23 + 0.1(693 - 920.23) = 897.51 898
7 897.51 + 0.1(655 - 897.51) = 873.26 873
8 873.26 + 0.1(854 - 873.26) = 871.33 871
9 871.33 + 0.1(742 - 871.33) = 858.5 859
10 858.5 + 0.1(717 - 858.5) = 844.85 845
7. Exponential Smoothing α = 0.8
Same formula as above:
Start from F1 = 974
Period Forecast (α = 0.8) Rounded
2 974 974
3 974 + 0.8(766 - 974) = 807.2 807
4 807.2 + 0.8(727 - 807.2) = 743.36 743
5 743.36 + 0.8(849 - 743.36) = 826.49 826
6 826.49 + 0.8(693 - 826.49) = 715.98 716
7 715.98 + 0.8(655 - 715.98) = 663.2 663
8 663.2 + 0.8(854 - 663.2) = 817.56 818
9 817.56 + 0.8(742 - 817.56) = 757.85 758
10 757.85 + 0.8(717 - 757.85) = 725.57 726

16(b) Average Seasonal Movement

Step 1: Given Quarterly Production Data

Year Q1 Q2 Q3 Q4
2002 3.5 3.8 3.7 3.5
2003 3.6 4.2 3.4 4.1
2004 3.4 3.9 3.7 4.2
2005 4.2 4.5 3.8 4.4
2006 3.9 4.4 4.2 4.6

Step 2: Compute Annual Averages

We calculate the average production for each year.
Year Total Annual Average
2002 3.5+3.8+3.7+3.5 = 14.5 14.5 / 4 = 3.625
2003 3.6+4.2+3.4+4.1 = 15.3 15.3 / 4 = 3.825
2004 3.4+3.9+3.7+4.2 = 15.2 15.2 / 4 = 3.800
2005 4.2+4.5+3.8+4.4 = 16.9 16.9 / 4 = 4.225
2006 3.9+4.4+4.2+4.6 = 17.1 17.1 / 4 = 4.275

Step 3: Compute Seasonal Indices

We compute the seasonal movement by comparing each quarter to its year’s average:
Seasonal Index = Quarter Value ÷ Annual Average
Year Q1 SI Q2 SI Q3 SI Q4 SI
2002 3.5/3.625 = 0.9655 3.8/3.625 = 1.0483 3.7/3.625 = 1.0207 3.5/3.625 = 0.9655
2003 3.6/3.825 = 0.9412 4.2/3.825 = 1.0974 3.4/3.825 = 0.8894 4.1/3.825 = 1.0714
2004 3.4/3.8 = 0.8947 3.9/3.8 = 1.0263 3.7/3.8 = 0.9737 4.2/3.8 = 1.1053
2005 4.2/4.225 = 0.9941 4.5/4.225 = 1.0655 3.8/4.225 = 0.8994 4.4/4.225 = 1.0414
2006 3.9/4.275 = 0.9123 4.4/4.275 = 1.0292 4.2/4.275 = 0.9824 4.6/4.275 = 1.0760
Step 4: Compute Average Seasonal Index for Each Quarter
Quarter Avg Seasonal Index
Q1 (0.9655 + 0.9412 + 0.8947 + 0.9941 + 0.9123) / 5 = 0.9416
Q2 (1.0483 + 1.0974 + 1.0263 + 1.0655 + 1.0292) / 5 = 1.0534
Q3 (1.0207 + 0.8894 + 0.9737 + 0.8994 + 0.9824) / 5 = 0.9531
Q4 (0.9655 + 1.0714 + 1.1053 + 1.0414 + 1.0760) / 5 = 1.0519

Step 5: Convert Seasonal Indices to Movements

Multiply each index by 100 to get seasonal movements (%):
Quarter Average Seasonal Movement
Q1 0.9416 × 100 = 94.16
Q2 1.0534 × 100 = 105.34
Q3 0.9531 × 100 = 95.31
Q4 1.0519 × 100 = 105.19

Final Answer: Average Seasonal Movements

Quarter Seasonal Movement (%)
Q1 94.16
Q2 105.34
Q3 95.31
Q4 105.19

Justification:
• Q2 and Q4 have above-average production → Seasonal peak periods.
• Q1 and Q3 have below-average production → Off-peak or slower quarters.
• These patterns help in seasonal adjustment and forecasting future trends in quarterly production.

JNK Rao 2008 SOME METHODS FOR SMALL AREA ESTIMATION
No ratings yet
JNK Rao 2008 SOME METHODS FOR SMALL AREA ESTIMATION
21 pages
Dev Answer Key
100% (1)
Dev Answer Key
17 pages
Ds Unit 2 QB
No ratings yet
Ds Unit 2 QB
25 pages
Fda End Sem
No ratings yet
Fda End Sem
14 pages
Crash Course Data Science
No ratings yet
Crash Course Data Science
7 pages
DS Day 5
No ratings yet
DS Day 5
11 pages
Unit2 Modified
No ratings yet
Unit2 Modified
42 pages
Data Basics For ML
No ratings yet
Data Basics For ML
23 pages
02data Edited v2
No ratings yet
02data Edited v2
43 pages
Ia - Eda
No ratings yet
Ia - Eda
10 pages
Amit Khilare Used Device Data PM Project
No ratings yet
Amit Khilare Used Device Data PM Project
25 pages
Week - 1 Day - 1 Descriptive Statistics
No ratings yet
Week - 1 Day - 1 Descriptive Statistics
40 pages
Descriptive Analytics
No ratings yet
Descriptive Analytics
4 pages
Dev Core
No ratings yet
Dev Core
7 pages
FDS Pyq2
No ratings yet
FDS Pyq2
10 pages
Exploratory Data Analysis Using Python
No ratings yet
Exploratory Data Analysis Using Python
7 pages
DAUP Exam Notes - 2in1
No ratings yet
DAUP Exam Notes - 2in1
35 pages
Dev Answer Key
No ratings yet
Dev Answer Key
21 pages
Exploratory Data Analysis (EDA) and Descriptive Analytic
No ratings yet
Exploratory Data Analysis (EDA) and Descriptive Analytic
47 pages
Q2 Ans
No ratings yet
Q2 Ans
5 pages
Week - 6-7
No ratings yet
Week - 6-7
9 pages
(AD3301-DeV) Unit-Wise (Important Question)
No ratings yet
(AD3301-DeV) Unit-Wise (Important Question)
7 pages
Data Exploration and Analysis With Python
No ratings yet
Data Exploration and Analysis With Python
9 pages
Module 1
No ratings yet
Module 1
64 pages
Lect 3
No ratings yet
Lect 3
51 pages
FDS - 2 Solved
No ratings yet
FDS - 2 Solved
14 pages
Quanta Tive
No ratings yet
Quanta Tive
34 pages
Foundation of Data Science Previous Year Question Paper
No ratings yet
Foundation of Data Science Previous Year Question Paper
40 pages
Chapter 2
No ratings yet
Chapter 2
53 pages
VIPDMTheory Chapter 2
No ratings yet
VIPDMTheory Chapter 2
56 pages
Chapter 2 - Understand Data
No ratings yet
Chapter 2 - Understand Data
63 pages
Week 2 - 3getting To Know Your Data
No ratings yet
Week 2 - 3getting To Know Your Data
67 pages
ITS62604 Tutorial 6 (Answer)
No ratings yet
ITS62604 Tutorial 6 (Answer)
2 pages
Data Mining 2
No ratings yet
Data Mining 2
64 pages
Exploratory Data Analysis - Satyajit
No ratings yet
Exploratory Data Analysis - Satyajit
35 pages
Data Literacy
No ratings yet
Data Literacy
4 pages
Cognizant Data Analyst Interview Questions 1745235888
No ratings yet
Cognizant Data Analyst Interview Questions 1745235888
18 pages
FDS PYQ Solution
No ratings yet
FDS PYQ Solution
8 pages
Concepts and Techniques: - Chapter 2
No ratings yet
Concepts and Techniques: - Chapter 2
65 pages
Concepts and Techniques: - Chapter 2
No ratings yet
Concepts and Techniques: - Chapter 2
65 pages
DM Unit-1-1
No ratings yet
DM Unit-1-1
56 pages
Data Exploration
No ratings yet
Data Exploration
11 pages
DWDM - Unit - III
No ratings yet
DWDM - Unit - III
77 pages
Document
No ratings yet
Document
8 pages
Q.1. Why Is Data Preprocessing Required?
100% (1)
Q.1. Why Is Data Preprocessing Required?
26 pages
Dev Unit 1
No ratings yet
Dev Unit 1
11 pages
Unit Test 3
No ratings yet
Unit Test 3
9 pages
Preprocessing 935
No ratings yet
Preprocessing 935
68 pages
Concepts and Techniques: - Chapter 2
No ratings yet
Concepts and Techniques: - Chapter 2
54 pages
DS&ML 4
No ratings yet
DS&ML 4
9 pages
Linear Regression Merged
No ratings yet
Linear Regression Merged
38 pages
Chapter 2
No ratings yet
Chapter 2
65 pages
Data Analytics (Finished
No ratings yet
Data Analytics (Finished
4 pages
Da Question Bank
No ratings yet
Da Question Bank
7 pages
L4 Exploratory Analysis en
No ratings yet
L4 Exploratory Analysis en
42 pages
Unit 2
No ratings yet
Unit 2
36 pages
Mvda - Question Bank
No ratings yet
Mvda - Question Bank
14 pages
Exploratory Data Analysis (EDA) in Python
No ratings yet
Exploratory Data Analysis (EDA) in Python
6 pages
Illuminating Data: A hands on guide to data visualization in R
From Everand
Illuminating Data: A hands on guide to data visualization in R
Eman Ahmad
No ratings yet
Mathematics for Data Science: Linear Algebra with Matlab
From Everand
Mathematics for Data Science: Linear Algebra with Matlab
César Pérez López
No ratings yet
Business Statistics I Essentials
From Everand
Business Statistics I Essentials
Louise Clark
5/5 (5)
Aghayedo, Elmi-Homework 6
No ratings yet
Aghayedo, Elmi-Homework 6
3 pages
Meta-Analysis For Psychologists
No ratings yet
Meta-Analysis For Psychologists
194 pages
21-Z-Test - Single Mean and Difference of Means-07!03!2024
No ratings yet
21-Z-Test - Single Mean and Difference of Means-07!03!2024
24 pages
Hasil SPSS PDF
No ratings yet
Hasil SPSS PDF
61 pages
Hints of Assignment5 - Fall 2024
No ratings yet
Hints of Assignment5 - Fall 2024
11 pages
MAT 161 Lesson - 4
No ratings yet
MAT 161 Lesson - 4
26 pages
Script
No ratings yet
Script
11 pages
Excel Functions Translated: English - Portuguese
No ratings yet
Excel Functions Translated: English - Portuguese
8 pages
PSDA MidTermTest 20132014
No ratings yet
PSDA MidTermTest 20132014
3 pages
EOFanalysis PDF
No ratings yet
EOFanalysis PDF
10 pages
P15 - 178380 - Eviews Guide
No ratings yet
P15 - 178380 - Eviews Guide
7 pages
PMF 2014 02 Biyase
No ratings yet
PMF 2014 02 Biyase
6 pages
2071stat PDF
No ratings yet
2071stat PDF
2 pages
UCLA - What Statistical Analysis Should I Use - SPSS
No ratings yet
UCLA - What Statistical Analysis Should I Use - SPSS
54 pages
Feature Extraction
No ratings yet
Feature Extraction
3 pages
SPSS Help and Tutorial - How To Use SPSS
100% (4)
SPSS Help and Tutorial - How To Use SPSS
77 pages
Descriptive Statistics PDF
No ratings yet
Descriptive Statistics PDF
130 pages
ML Mod 4 Part 1
No ratings yet
ML Mod 4 Part 1
99 pages
Dummy Reg
No ratings yet
Dummy Reg
68 pages
Analysis of Regression and Correlation
No ratings yet
Analysis of Regression and Correlation
56 pages
Data For Activity 1
No ratings yet
Data For Activity 1
5 pages
R - (2017) Understanding and Applying Basic Statistical Methods Using R (Wilcox - R - R) (Sols.)
No ratings yet
R - (2017) Understanding and Applying Basic Statistical Methods Using R (Wilcox - R - R) (Sols.)
91 pages
Population Proportion: Prepared By: Mr. Ian Anthony M. Torrente, LPT
No ratings yet
Population Proportion: Prepared By: Mr. Ian Anthony M. Torrente, LPT
11 pages
Tibshirani 1988 JASA EstimatingTransformationsRegression
No ratings yet
Tibshirani 1988 JASA EstimatingTransformationsRegression
13 pages
Basic Exercises Chapter 10
No ratings yet
Basic Exercises Chapter 10
19 pages
IJARESM June2021
No ratings yet
IJARESM June2021
10 pages
Mann-Whitney Worked Example
No ratings yet
Mann-Whitney Worked Example
5 pages
PLUM - Ordinal Regression: Warnings
No ratings yet
PLUM - Ordinal Regression: Warnings
3 pages
DOE - Design of Experiments: Reliability Engineering Paper Series
No ratings yet
DOE - Design of Experiments: Reliability Engineering Paper Series
4 pages

Ad3301 Apr May 2024 Answer Key

Uploaded by

Ad3301 Apr May 2024 Answer Key

Uploaded by

Subject Code: AD3301

Subject Name: Data Exploration and Visualization

6. Illustrate important steps to be followed in preparing a base map.

8. How do you find the correlation of a scatter plot?

9. Define least square method in time series.

10. List the techniques used in smoothing time series.

11 (a) (i) Discuss about Descriptive Statistics in Exploratory Analysis. (7 Marks)

Normalization (Min-Max Scaling)

Steps in Creating a Pivot Table (Generalized)

Tools That Support Pivot Tables

Practical Use Cases in Data Science

Example: Supermarket Sales Dataset

Advantages of Using Pivot Tables

Steps to Create a Line Plot:

plt.plot(days, sales, marker='o')

Given Frequency Table:

Step-by-Step Procedure to Draw Histogram:

Table for Histogram Plotting:

Insights from Histogram:

import matplotlib.pyplot as plt

bins = [300, 400, 500, 600, 700, 800, 900, 1000]

plt.hist(bins[:-1], bins=bins, weights=frequencies, edgecolor='black', align='left', rwidth=0.9)

#Two lines to make our compiler able to draw:

What is 3D Data Visualization?

Working Flow of 3D Visualization

Purpose of Text and Annotation in Visualization:

plt.annotate('Peak Sales', xy=(4, 170), xytext=(3, 180), arrowprops=dict( facecolor='red', shrink=0.05))

Best Practices for Text & Annotation

Definition of Universe Frequency Distribution:

Why Transformation is Needed:

2. Standardization (Z-score Normalization):

Understand Decision Tree Limitations

Reasons Why Time Series Model Performed Better:

Benefits of Percentage Tables:

Correlation Type Description Visual Pattern

import matplotlib.pyplot as plt

8. Use Cases of Scatter Plots

1. Trend Component (T):

2. Seasonal Component (S):

3. Cyclical Component (C):

Most Prevalent in Unemployment Data:

Key characteristics of seasonality:

Analysis of the Situation:

Given Data (Revenues in Millions of Constant 1995 Dollars)

5-Year Moving Average Formula:

Calculation of 5-Year Moving Averages:

# Compute 5-year moving averages

for i in range(len(revenues) - 4):

2. 3-Period Moving Average

Start from Period 4:

3. 4-Period Moving Average

Weights: 3 (most recent), 2, 1

5. Weighted Moving Average (1-4-5)

Period Calculation Forecast

6. Exponential Smoothing α = 0.1

Assume initial forecast F1 = A1 = 974

16(b) Average Seasonal Movement

Step 1: Given Quarterly Production Data

Step 2: Compute Annual Averages

Step 3: Compute Seasonal Indices

Step 5: Convert Seasonal Indices to Movements

Final Answer: Average Seasonal Movements

You might also like