0% found this document useful (0 votes)

47 views

Warehouse Assignment

The document discusses developing a data warehouse from an existing data source and selecting relevant subject areas and key stakeholders. It describes constructing dimension and fact tables in SQL Server and populating them from operational databases. Visualizations and reports are created in Tableau and SSRS. Predictive modeling is performed on the data warehouse using Python including data preparation, exploration using visualizations, and developing Naive Bayes and Random Forest models to predict total price.

Uploaded by

Hareem Nagra

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

47 views

Warehouse Assignment

Uploaded by

Hareem Nagra

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 9

Question 01.

Data Warehouse is developed to obtain business intelligence from data. Develop a proof-of-concept data
warehouse/mart (using dimensional model) capturing data from an existing data source(s). Document your reasons for
selecting the subject area(s), identify key stakeholders, formalize the business vision for developing the data
warehouse from the given B9IS107 CA 2 data source. Also explain the insights that a company may attain from the
given data. It should be reflected in the SSRS reports and Tableau visuals.

Solution:
Within an organization, a data warehouse acts as a central store for information compiled from different Online
Transaction Processing (OLTP) databases. Its main goal is to make Business Intelligence (BI) operations easier by
providing a thorough viewpoint for well-informed decision-making. Sophisticated techniques are used in the
construction of this repository to store data extracted from one or more databases in an efficient manner. When
needed, the stored data can be easily retrieved.
In this study, the subject area chosen is the Online Fashion Store—a sample database accessible on GitHub for
research purposes. This database represents a fictional company engaged in the manufacturing and global sale of
fashion products.

Reasons for selecting the subject area:

There are several reasons why the sales subject area is a good choice for a data warehouse proof of concept (POC):
 Sales data is critical for businesses. Sales data can be used to track performance, identify trends, and make
better decisions about product development, marketing, and operations.
 Sales data is relatively easy to collect and load into a data warehouse. Sales data is typically stored in a CRM
system or other operational system. This makes it easy to extract and load the data into a data warehouse.
 Sales data is highly actionable. Insights gained from analyzing sales data can be used to improve sales
performance, reduce costs, and increase profits.

Key Stakeholders:
 Sales managers
 Marketing managers
 Product managers
 Executives

Business Vision:
To develop a data warehouse that can be used to analyze sales data and identify trends and opportunities. The data
warehouse will be used to improve sales performance, marketing campaigns, and product development.

Insights from the Data:

The data warehouse can be used to gain insights into the following areas:

 Sales by product, customer, region, and time period

 Customer churn
 Product performance
 Marketing campaign effectiveness
 Inventory levels
 Stockouts

SSRS Reports and Tableau Visuals:

Here are some examples of SSRS reports and Tableau visuals that could be developed:
 Sales by product: This report will show the total sales for each product during the past year. This information
could be used to identify the best-selling products and the products that are not selling well.
 Sales by customer: This report will show the total sales for each customer during the past year. This
information could be used to identify the most valuable customers and the customers who are at risk of
churning.
 Sales by region: This report will show the total sales for each region during the past year. This information
could be used to identify the regions that are performing well and the regions that need improvement.
 Sales by time period: This report will show the total sales for each week, month, and quarter during the past
year. This information could be used to identify seasonal trends and other patterns.
 Customer churn: This report will show the number of customers who churned during the past year. This
information could be used to identify the reasons why customers are churning and to develop strategies to
reduce churn.
 Product performance: This report will show the sales, return rates, and other metrics for each product. This
information could be used to identify the products that are performing well and the products that need
improvement.
 Marketing campaign effectiveness: This report will show the results of each marketing campaign, such as the
number of leads generated, and the number of sales closed. This information could be used to identify the
most effective marketing campaigns and to improve the performance of future campaigns.
 Inventory levels: This report would show the inventory levels for each product. This information could be
used to identify potential stockouts and to ensure that the store has enough inventory to meet customer
demand.
 Stockouts: This report will show the number of stockouts that occurred during the past year. This information
could be used to identify the products that are most likely to go out of stock and to develop strategies to
reduce stockouts.

Question 2
Develop a dimensional model/star schema for developing Data warehouse.

Solution:
A dimensional model is a data modeling technique that is used to design data warehouses and data marts. It is based
on the concept of dimensions and facts. Dimensions are qualitative attributes that describe a business entity, such as
the customer's name, address, and age. Facts are quantitative measures that describe business events, such as the sales
amount and order date.
A star schema is a type of dimensional model that is commonly used for data warehouses. It consists of a fact table
surrounded by dimension tables. The fact table contains the quantitative data that you want to analyze. The dimension
tables contain the qualitative data that is associated with the fact table.
Here is an example of a dimensional model/star schema for a data warehouse:
Fact table: Orders
 Order ID
 Customer ID
 Product ID
 Quantity
 Unit price
 Total price
 Order date
 Ship date
 Delivery date
Dimension tables: Customers
 Customer ID
 Name
 Email
 Phone number
 Address
 City
 State
 Zip code
 Country
 Products
 Product ID
 Name
 Description
 Category
 Brand
 Size
 Color
 Price
 Dates
 Date ID
 Date
 Day of wee
 Week of year
 Month
 Quarter
 Year
This star schema can be used to develop a variety of reports and dashboards to analyze the data warehouse. For
example, the following reports could be developed:

 Total sales by product category

 Sales by customer region
 Sales by day of week
 Sales by month
 Sales by year
 Sales trends over time
The following dashboards could be developed:

 A dashboard that tracks key performance indicators (KPIs) such as total daily sales, top-selling products, and
most valuable customers.
 A dashboard that tracks sales performance by product category, customer region, and day of week.
 A dashboard that tracks sales trends over time.
By developing reports and dashboards, you can gain valuable insights into your data warehouse and make better
decisions for your business.

Question 3
Implement the data warehouse in SQL Server by creating dimension and fact tables. Write SQL code for ETL or use
an ETL tool to populate the data warehouse from operational database(s)/sources.

Solution:
To implement the data warehouse in SQL Server, I have used XAMPP SQL local server.
Figure 1 SQL Code 1
Creating new SQL database for warehouse using the query above.
Creating the dimension tables:

Figure 2 Dimension Tables

Creating the fact table:

Figure 3 Fact Table

Importing data into tables from CSV files:

Question 5:
Using your data warehouse, develop a dashboard in Tableau with four different multidimensional visualisations
presenting data analysis/analytics of your data using several features such as colours, calculated fields, filters, trend
lines, etc.
PART-II

You are required to carry out a series of analyses on a dataset and develop a predictive model using the Python
programming language used in this module. The dataset can be obtained from data warehouse implemented as first
part of your assessment or from some other source. Dataset should have at least 1,000 records (rows).

You are required to:

• Prepare and analyse the data using a number of techniques in Python
• Explore the data by implementing several data visualisations
• Develop at least two models using suitable data mining algorithms
• Analyse the results and provide a comparative evaluation using different data mining and visualisation
methods.

Figure 4 Import CSV's

First, we import our libraries and load our dataset. We have utilised the ‘sales.csv’ dataset .

We analyze our data and use various data preprocessing techniques to prepare it for model implementation.

We explore the data by implementing several data visualization techniques.

Now we come to our data mining algorithms. Before that, we have to determine our feature vector and target variable,
also we have to split the data.

The two models that we implemented are:

1. Gaussian Naïve Bayes
2. Random Forest

Finally, we come to the analysis of our results. Our two models performed extremely well, giving us a 99.5%
accuracy. We conclude that these models were efficient in predicting the total price.

Visual Data Storytelling With Tableau by Lindy Ryan
85% (20)
Visual Data Storytelling With Tableau by Lindy Ryan
450 pages
Salesforce Admin Course Content
0% (1)
Salesforce Admin Course Content
13 pages
Introduction To Vault Core Architecture
100% (2)
Introduction To Vault Core Architecture
39 pages
Abinitio Session 1
100% (1)
Abinitio Session 1
237 pages
LMU21038963 ASC Coursework 1 E021951
100% (1)
LMU21038963 ASC Coursework 1 E021951
97 pages
Report Data Storage Assignment PDF
No ratings yet
Report Data Storage Assignment PDF
33 pages
Chapter 1
No ratings yet
Chapter 1
9 pages
C. V.Data warehousing.docx
No ratings yet
C. V.Data warehousing.docx
4 pages
Unit IV Data Mining
No ratings yet
Unit IV Data Mining
65 pages
100 Important Questions with Solutions for Data Warehousing & Data Mining (BCS058)
No ratings yet
100 Important Questions with Solutions for Data Warehousing & Data Mining (BCS058)
119 pages
Business Intelligence
No ratings yet
Business Intelligence
27 pages
06 Data Warehouse Design and Analytics
No ratings yet
06 Data Warehouse Design and Analytics
36 pages
Datawarehouse Modeling For BF
No ratings yet
Datawarehouse Modeling For BF
61 pages
Data Mining Cat
No ratings yet
Data Mining Cat
6 pages
Modul 9 - Data Warehousing and Business Intelligence - DMBOK2
No ratings yet
Modul 9 - Data Warehousing and Business Intelligence - DMBOK2
59 pages
Data Warehousing Management
No ratings yet
Data Warehousing Management
18 pages
Lecture 4 (Dataware Housing)
No ratings yet
Lecture 4 (Dataware Housing)
50 pages
DataMining and Data Warehousing
No ratings yet
DataMining and Data Warehousing
96 pages
Chapter1 Data Warehousing Intro
No ratings yet
Chapter1 Data Warehousing Intro
48 pages
Data Warehousing and Business Intelligence
No ratings yet
Data Warehousing and Business Intelligence
8 pages
CS 2208 DATA MINING AND WAREHOUSING NOTES
No ratings yet
CS 2208 DATA MINING AND WAREHOUSING NOTES
14 pages
Data Mining:: Concepts and Techniques
No ratings yet
Data Mining:: Concepts and Techniques
48 pages
BDA U2
No ratings yet
BDA U2
44 pages
Data Warehousing and Data Mining
No ratings yet
Data Warehousing and Data Mining
135 pages
Ch4-DW-detailed-version (1)
No ratings yet
Ch4-DW-detailed-version (1)
39 pages
R R P R D W A: Evealing EAL Roblems in EAL ATA Arehouse Pplications
No ratings yet
R R P R D W A: Evealing EAL Roblems in EAL ATA Arehouse Pplications
16 pages
JVP 42019
No ratings yet
JVP 42019
10 pages
jvp42019
No ratings yet
jvp42019
10 pages
Data Warehouse
No ratings yet
Data Warehouse
19 pages
Lecture 13
No ratings yet
Lecture 13
17 pages
CS7079NI - Data Warehousing and Big Data Y22 Autumn (1st Sit) - CW QP
No ratings yet
CS7079NI - Data Warehousing and Big Data Y22 Autumn (1st Sit) - CW QP
5 pages
Data Warehouse
No ratings yet
Data Warehouse
23 pages
Your Answer 1
No ratings yet
Your Answer 1
4 pages
Chap3_PIEAS_DCIS_BSCIS_DM_23_Topic_03_DWH_OLAP
No ratings yet
Chap3_PIEAS_DCIS_BSCIS_DM_23_Topic_03_DWH_OLAP
46 pages
warehouse
No ratings yet
warehouse
58 pages
Module-3 Data Warehousing
No ratings yet
Module-3 Data Warehousing
44 pages
ACM IntrotoDW-data Warehousing
No ratings yet
ACM IntrotoDW-data Warehousing
58 pages
04OLAP
No ratings yet
04OLAP
58 pages
Data Repositories in Data Analytics
No ratings yet
Data Repositories in Data Analytics
8 pages
Data Warehousing and Business Intelligence DS-3003 Assignment # 1
No ratings yet
Data Warehousing and Business Intelligence DS-3003 Assignment # 1
6 pages
UEU Sistem Pendukung Keputusan Pertemuan 5
No ratings yet
UEU Sistem Pendukung Keputusan Pertemuan 5
46 pages
Data Mining ---------1.
No ratings yet
Data Mining ---------1.
34 pages
Data Mining Unit-2 notes
No ratings yet
Data Mining Unit-2 notes
8 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
66 pages
Concepts and Techniques: - Chapter 4
No ratings yet
Concepts and Techniques: - Chapter 4
51 pages
8 Data Warehousing
No ratings yet
8 Data Warehousing
113 pages
04olap New
No ratings yet
04olap New
55 pages
DataMining- Chapter2 - Data WareHouse
No ratings yet
DataMining- Chapter2 - Data WareHouse
53 pages
Create First Data WareHouse - CodeProject
No ratings yet
Create First Data WareHouse - CodeProject
10 pages
DWM Question Bank Solution
No ratings yet
DWM Question Bank Solution
20 pages
FundamentalsOfDesigningDW MelissaCoates
No ratings yet
FundamentalsOfDesigningDW MelissaCoates
87 pages
02datawarehousing For DM
No ratings yet
02datawarehousing For DM
38 pages
CS423 Data Warehousing and Data Mining: Dr. Hammad Afzal
No ratings yet
CS423 Data Warehousing and Data Mining: Dr. Hammad Afzal
25 pages
List Data Warehouse Models With Example
No ratings yet
List Data Warehouse Models With Example
19 pages
ch4 DW summary
No ratings yet
ch4 DW summary
8 pages
aniket dwdm assignment
No ratings yet
aniket dwdm assignment
12 pages
Introduction To Data Warehouse
No ratings yet
Introduction To Data Warehouse
22 pages
DWDM Final
No ratings yet
DWDM Final
193 pages
Chapter-2 DM
No ratings yet
Chapter-2 DM
23 pages
Concepts and Techniques: - Chapter 4
No ratings yet
Concepts and Techniques: - Chapter 4
50 pages
DataWarehousing and Its Relevance
No ratings yet
DataWarehousing and Its Relevance
19 pages
Data Mining & Housing
No ratings yet
Data Mining & Housing
13 pages
Unit 3 Notes
0% (1)
Unit 3 Notes
20 pages
Data Analytics Essentials You Always Wanted To Know: Self Learning Management
From Everand
Data Analytics Essentials You Always Wanted To Know: Self Learning Management
Vibrant Publishers
4/5 (11)
Rohith Resume
No ratings yet
Rohith Resume
4 pages
Vinaykumar
No ratings yet
Vinaykumar
6 pages
Dronelog
No ratings yet
Dronelog
6 pages
Arch - Pro-Revit and Power BI
No ratings yet
Arch - Pro-Revit and Power BI
22 pages
Syllabus - Management Control (MC) - Chamnab Nhel
No ratings yet
Syllabus - Management Control (MC) - Chamnab Nhel
2 pages
Lenovo 360 Handbook 2023 External
100% (1)
Lenovo 360 Handbook 2023 External
66 pages
certificate 11
No ratings yet
certificate 11
1 page
Infographic Roadmap To Self-Service Analytics and BI Adoption
No ratings yet
Infographic Roadmap To Self-Service Analytics and BI Adoption
1 page
01 Ahmed Hasan Resume
No ratings yet
01 Ahmed Hasan Resume
1 page
SFA - Study - Gartner & Forrester
No ratings yet
SFA - Study - Gartner & Forrester
56 pages
PL 300
No ratings yet
PL 300
107 pages
Alihan Oncel - Resume CV
No ratings yet
Alihan Oncel - Resume CV
2 pages
SAP Sustaintability Control Tower L2 Pitch Deck
No ratings yet
SAP Sustaintability Control Tower L2 Pitch Deck
30 pages
Tableau Interview Questions
No ratings yet
Tableau Interview Questions
58 pages
Badrinathraju Vysyaraju: Professional Skills
No ratings yet
Badrinathraju Vysyaraju: Professional Skills
3 pages
Management Information Systems Question Bank
No ratings yet
Management Information Systems Question Bank
41 pages
EBS Cloud Coexistence PDF
No ratings yet
EBS Cloud Coexistence PDF
57 pages
MIS - ch1 TB With Quizlet
No ratings yet
MIS - ch1 TB With Quizlet
88 pages
ICL Overview Feb 21
No ratings yet
ICL Overview Feb 21
5 pages
Tigertms Icharge Enterprise A4 4pp FV
No ratings yet
Tigertms Icharge Enterprise A4 4pp FV
4 pages
Business Analytics Week 1
No ratings yet
Business Analytics Week 1
11 pages
Project Documentation
No ratings yet
Project Documentation
10 pages
HUAWEI SmartCare® SOC V300R001 PS Service Quality Operation Delivery Template
No ratings yet
HUAWEI SmartCare® SOC V300R001 PS Service Quality Operation Delivery Template
24 pages
Budget Buddy
No ratings yet
Budget Buddy
13 pages
Administrative Console
No ratings yet
Administrative Console
291 pages
UCS551 Chapter 4 Visualization Basics 1
No ratings yet
UCS551 Chapter 4 Visualization Basics 1
31 pages

Warehouse Assignment

Uploaded by

Warehouse Assignment

Uploaded by

Question 01.

Reasons for selecting the subject area:

Insights from the Data:

 Sales by product, customer, region, and time period

SSRS Reports and Tableau Visuals:

 Total sales by product category

Figure 2 Dimension Tables

Creating the fact table:

Figure 3 Fact Table

Importing data into tables from CSV files:

You are required to:

Figure 4 Import CSV's

We explore the data by implementing several data visualization techniques.

The two models that we implemented are:

You might also like