0% found this document useful (0 votes)
47 views

Warehouse Assignment

The document discusses developing a data warehouse from an existing data source and selecting relevant subject areas and key stakeholders. It describes constructing dimension and fact tables in SQL Server and populating them from operational databases. Visualizations and reports are created in Tableau and SSRS. Predictive modeling is performed on the data warehouse using Python including data preparation, exploration using visualizations, and developing Naive Bayes and Random Forest models to predict total price.

Uploaded by

Hareem Nagra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
47 views

Warehouse Assignment

The document discusses developing a data warehouse from an existing data source and selecting relevant subject areas and key stakeholders. It describes constructing dimension and fact tables in SQL Server and populating them from operational databases. Visualizations and reports are created in Tableau and SSRS. Predictive modeling is performed on the data warehouse using Python including data preparation, exploration using visualizations, and developing Naive Bayes and Random Forest models to predict total price.

Uploaded by

Hareem Nagra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 9

Question 01.

Data Warehouse is developed to obtain business intelligence from data. Develop a proof-of-concept data
warehouse/mart (using dimensional model) capturing data from an existing data source(s). Document your reasons for
selecting the subject area(s), identify key stakeholders, formalize the business vision for developing the data
warehouse from the given B9IS107 CA 2 data source. Also explain the insights that a company may attain from the
given data. It should be reflected in the SSRS reports and Tableau visuals.

Solution:
Within an organization, a data warehouse acts as a central store for information compiled from different Online
Transaction Processing (OLTP) databases. Its main goal is to make Business Intelligence (BI) operations easier by
providing a thorough viewpoint for well-informed decision-making. Sophisticated techniques are used in the
construction of this repository to store data extracted from one or more databases in an efficient manner. When
needed, the stored data can be easily retrieved.
In this study, the subject area chosen is the Online Fashion Store—a sample database accessible on GitHub for
research purposes. This database represents a fictional company engaged in the manufacturing and global sale of
fashion products.

Reasons for selecting the subject area:


There are several reasons why the sales subject area is a good choice for a data warehouse proof of concept (POC):
 Sales data is critical for businesses. Sales data can be used to track performance, identify trends, and make
better decisions about product development, marketing, and operations.
 Sales data is relatively easy to collect and load into a data warehouse. Sales data is typically stored in a CRM
system or other operational system. This makes it easy to extract and load the data into a data warehouse.
 Sales data is highly actionable. Insights gained from analyzing sales data can be used to improve sales
performance, reduce costs, and increase profits.

Key Stakeholders:
 Sales managers
 Marketing managers
 Product managers
 Executives

Business Vision:
To develop a data warehouse that can be used to analyze sales data and identify trends and opportunities. The data
warehouse will be used to improve sales performance, marketing campaigns, and product development.

Insights from the Data:


The data warehouse can be used to gain insights into the following areas:

 Sales by product, customer, region, and time period


 Customer churn
 Product performance
 Marketing campaign effectiveness
 Inventory levels
 Stockouts

SSRS Reports and Tableau Visuals:


Here are some examples of SSRS reports and Tableau visuals that could be developed:
 Sales by product: This report will show the total sales for each product during the past year. This information
could be used to identify the best-selling products and the products that are not selling well.
 Sales by customer: This report will show the total sales for each customer during the past year. This
information could be used to identify the most valuable customers and the customers who are at risk of
churning.
 Sales by region: This report will show the total sales for each region during the past year. This information
could be used to identify the regions that are performing well and the regions that need improvement.
 Sales by time period: This report will show the total sales for each week, month, and quarter during the past
year. This information could be used to identify seasonal trends and other patterns.
 Customer churn: This report will show the number of customers who churned during the past year. This
information could be used to identify the reasons why customers are churning and to develop strategies to
reduce churn.
 Product performance: This report will show the sales, return rates, and other metrics for each product. This
information could be used to identify the products that are performing well and the products that need
improvement.
 Marketing campaign effectiveness: This report will show the results of each marketing campaign, such as the
number of leads generated, and the number of sales closed. This information could be used to identify the
most effective marketing campaigns and to improve the performance of future campaigns.
 Inventory levels: This report would show the inventory levels for each product. This information could be
used to identify potential stockouts and to ensure that the store has enough inventory to meet customer
demand.
 Stockouts: This report will show the number of stockouts that occurred during the past year. This information
could be used to identify the products that are most likely to go out of stock and to develop strategies to
reduce stockouts.

Question 2
Develop a dimensional model/star schema for developing Data warehouse.

Solution:
A dimensional model is a data modeling technique that is used to design data warehouses and data marts. It is based
on the concept of dimensions and facts. Dimensions are qualitative attributes that describe a business entity, such as
the customer's name, address, and age. Facts are quantitative measures that describe business events, such as the sales
amount and order date.
A star schema is a type of dimensional model that is commonly used for data warehouses. It consists of a fact table
surrounded by dimension tables. The fact table contains the quantitative data that you want to analyze. The dimension
tables contain the qualitative data that is associated with the fact table.
Here is an example of a dimensional model/star schema for a data warehouse:
Fact table: Orders
 Order ID
 Customer ID
 Product ID
 Quantity
 Unit price
 Total price
 Order date
 Ship date
 Delivery date
Dimension tables: Customers
 Customer ID
 Name
 Email
 Phone number
 Address
 City
 State
 Zip code
 Country
 Products
 Product ID
 Name
 Description
 Category
 Brand
 Size
 Color
 Price
 Dates
 Date ID
 Date
 Day of wee
 Week of year
 Month
 Quarter
 Year
This star schema can be used to develop a variety of reports and dashboards to analyze the data warehouse. For
example, the following reports could be developed:

 Total sales by product category


 Sales by customer region
 Sales by day of week
 Sales by month
 Sales by year
 Sales trends over time
The following dashboards could be developed:

 A dashboard that tracks key performance indicators (KPIs) such as total daily sales, top-selling products, and
most valuable customers.
 A dashboard that tracks sales performance by product category, customer region, and day of week.
 A dashboard that tracks sales trends over time.
By developing reports and dashboards, you can gain valuable insights into your data warehouse and make better
decisions for your business.

Question 3
Implement the data warehouse in SQL Server by creating dimension and fact tables. Write SQL code for ETL or use
an ETL tool to populate the data warehouse from operational database(s)/sources.

Solution:
To implement the data warehouse in SQL Server, I have used XAMPP SQL local server.
Figure 1 SQL Code 1
Creating new SQL database for warehouse using the query above.
Creating the dimension tables:

Figure 2 Dimension Tables

Creating the fact table:

Figure 3 Fact Table

Importing data into tables from CSV files:


Question 5:
Using your data warehouse, develop a dashboard in Tableau with four different multidimensional visualisations
presenting data analysis/analytics of your data using several features such as colours, calculated fields, filters, trend
lines, etc.
PART-II

You are required to carry out a series of analyses on a dataset and develop a predictive model using the Python
programming language used in this module. The dataset can be obtained from data warehouse implemented as first
part of your assessment or from some other source. Dataset should have at least 1,000 records (rows).

You are required to:


• Prepare and analyse the data using a number of techniques in Python
• Explore the data by implementing several data visualisations
• Develop at least two models using suitable data mining algorithms
• Analyse the results and provide a comparative evaluation using different data mining and visualisation
methods.

Figure 4 Import CSV's


First, we import our libraries and load our dataset. We have utilised the ‘sales.csv’ dataset .

We analyze our data and use various data preprocessing techniques to prepare it for model implementation.

We explore the data by implementing several data visualization techniques.


Now we come to our data mining algorithms. Before that, we have to determine our feature vector and target variable,
also we have to split the data.

The two models that we implemented are:


1. Gaussian Naïve Bayes
2. Random Forest

Finally, we come to the analysis of our results. Our two models performed extremely well, giving us a 99.5%
accuracy. We conclude that these models were efficient in predicting the total price.

You might also like