Warehouse Assignment
Warehouse Assignment
Data Warehouse is developed to obtain business intelligence from data. Develop a proof-of-concept data
warehouse/mart (using dimensional model) capturing data from an existing data source(s). Document your reasons for
selecting the subject area(s), identify key stakeholders, formalize the business vision for developing the data
warehouse from the given B9IS107 CA 2 data source. Also explain the insights that a company may attain from the
given data. It should be reflected in the SSRS reports and Tableau visuals.
Solution:
Within an organization, a data warehouse acts as a central store for information compiled from different Online
Transaction Processing (OLTP) databases. Its main goal is to make Business Intelligence (BI) operations easier by
providing a thorough viewpoint for well-informed decision-making. Sophisticated techniques are used in the
construction of this repository to store data extracted from one or more databases in an efficient manner. When
needed, the stored data can be easily retrieved.
In this study, the subject area chosen is the Online Fashion Store—a sample database accessible on GitHub for
research purposes. This database represents a fictional company engaged in the manufacturing and global sale of
fashion products.
Key Stakeholders:
Sales managers
Marketing managers
Product managers
Executives
Business Vision:
To develop a data warehouse that can be used to analyze sales data and identify trends and opportunities. The data
warehouse will be used to improve sales performance, marketing campaigns, and product development.
Question 2
Develop a dimensional model/star schema for developing Data warehouse.
Solution:
A dimensional model is a data modeling technique that is used to design data warehouses and data marts. It is based
on the concept of dimensions and facts. Dimensions are qualitative attributes that describe a business entity, such as
the customer's name, address, and age. Facts are quantitative measures that describe business events, such as the sales
amount and order date.
A star schema is a type of dimensional model that is commonly used for data warehouses. It consists of a fact table
surrounded by dimension tables. The fact table contains the quantitative data that you want to analyze. The dimension
tables contain the qualitative data that is associated with the fact table.
Here is an example of a dimensional model/star schema for a data warehouse:
Fact table: Orders
Order ID
Customer ID
Product ID
Quantity
Unit price
Total price
Order date
Ship date
Delivery date
Dimension tables: Customers
Customer ID
Name
Email
Phone number
Address
City
State
Zip code
Country
Products
Product ID
Name
Description
Category
Brand
Size
Color
Price
Dates
Date ID
Date
Day of wee
Week of year
Month
Quarter
Year
This star schema can be used to develop a variety of reports and dashboards to analyze the data warehouse. For
example, the following reports could be developed:
A dashboard that tracks key performance indicators (KPIs) such as total daily sales, top-selling products, and
most valuable customers.
A dashboard that tracks sales performance by product category, customer region, and day of week.
A dashboard that tracks sales trends over time.
By developing reports and dashboards, you can gain valuable insights into your data warehouse and make better
decisions for your business.
Question 3
Implement the data warehouse in SQL Server by creating dimension and fact tables. Write SQL code for ETL or use
an ETL tool to populate the data warehouse from operational database(s)/sources.
Solution:
To implement the data warehouse in SQL Server, I have used XAMPP SQL local server.
Figure 1 SQL Code 1
Creating new SQL database for warehouse using the query above.
Creating the dimension tables:
You are required to carry out a series of analyses on a dataset and develop a predictive model using the Python
programming language used in this module. The dataset can be obtained from data warehouse implemented as first
part of your assessment or from some other source. Dataset should have at least 1,000 records (rows).
We analyze our data and use various data preprocessing techniques to prepare it for model implementation.
Finally, we come to the analysis of our results. Our two models performed extremely well, giving us a 99.5%
accuracy. We conclude that these models were efficient in predicting the total price.