SQL Server 2012 for Business Intelligence
UTS Short Course
Mehmet Ozdemir – SA @ SSW
w: blog.ozdemir.id.au | e:
[email protected] | t: @mozdemir_au
SQL Server, BI, Infrastructure
Specializes in
Application architecture and
design
SQL Performance Tuning and
Optimization
HyperV, SCVMM
Technology aficionado
• Virtualization
• Reporting/BI
• Cubes
Course Website
http://sharepoint.ssw.com.au/Training/UTSSQL/Pages/default.aspx
Course Timetable
Course Materials
Course Overview
Session Date Time Topic
Tuesday
1 18:00 - 21:00 SSIS and Creating a Data Warehouse
17-Sep-2013
Tuesday
2 18:00 - 21:00 OLAP – Creating Cubes and Cube Issues
24-Sep-2013
Tuesday
3 18:00 - 21:00 Reporting Services
01-Oct-2013
Tuesday
4 18:00 - 21:00 Alternative Cube Browsers
08-Oct-2013
Tuesday
5 18:00 - 21:00 Data Mining
15-Oct-2013
Last week
Business Intelligence
Data Warehouse
Measure (Facts)
Dimension
ETL
SSIS
Homework
What is a "TYPE"?
Why?
06_DWCreateScript.sql:
What does the stored procedure "procDimDateInsert" do?
What is the difference
TRUNCATE
DELETE FROM?
SSIS: What for is the Sequence Container?
Cubes
Session 2: Tonight’s Agenda
1. What is a Cube?
2. Steps in Creating a Cube
3. Demo: Creating a Cube
4. Cube Issues
5. Hands on Lab
Cubes
Cube
Data structure for fast analysis of data
Precalculated
On top of a data warehouse
Manipulating and analysing data
from multiple perspectives
Why a cube?
Performance
Relational databases not suited for instantaneous analysis
Cube precalculates (aggregates) data
Cube Concept
Groceries
Electronics
Product Clothing
North
South
Garden
East
Automotive West Geog
Q1 Q2 Q3 Q4
Time
Edges are Dimensions
North
South
East
West
Groceries
North
Q4
„Import“ cube example
Sum over Packages
Max over Last
Aggregate measures over time dimension
Attribute hierarchy
Aggregate measures by multi dimensions
Packages Last
All Eastern Western All Eastern Western
Sources Hemisphere Hemisph. Sources Hemisph. Hemisph.
All
25110 6547 18563 Dec-29-99 Dec-22-99 Dec-29-99
Time
1st
11173 2977 8196 Jun-28-99 Jun-20-99 Jun-28-99
half
1st
5108 1452 3656 Mar-30-99 Mar-19-99 Mar-30-99
quarter
2nd
6065 1525 4540 Jun-28-99 Jun-20-99 Jun-28-99
quarter
.
.
.
.
Why a cube?
Performance
Relational databases not suited for instantaneous analysis
Cube precalculates (aggregates) data
KPI and trending
Adventure Works example
A cube is an
“aggregation of measures
against dimensions”
What is a Cube - Example
What is a Cube - Example
What is a Cube in SQL 2012
1. Data Source
Where the data comes from
• Adventure works connection string
2. Data Source View
The tables and how they link together
• Orders, Details, Products and relationships
• Name matching to detect relationships
3. Dimensions
How we break up the aggregate data
• Products, Time
4. Measures (Facts)
The aggregate data
• Line Total, Quantity
New type of Cube
SQL 2012 brings a new cube model called Tabular
Advantages
Faster
Easier to create
Disadvantages
New
Limited to the RAM on server
Steps in Creating a Cube
1. Define Data Source
2. Create Data Source View
3. Define Dimensions
4. Define Measures (Facts)
5. Process the Cube
Data Source - Database
Data Source - Impersonation (Authentication)
Data Source View – Select Tables and Views
Build a Cube – Finish the wizard
Build a Cube – Process Cube
Build a Cube – Process Cube
Build a Cube
Review Auto Generated Cube
Check Hierarchies e.g. Product Category shows ID rather than name
Cube Issues
Keeping things Related
Dimensions should tie in to Fact tables
Use Primary Keys
Keeping things Relevant
Multiple fact tables
Even more dimensions
Keeping things Fresh
Needs to be processed
Automated SSIS Packages
Keeping Missing Data
Fails to process when keys are missing
Change missing keys to Unknown
Common Patterns
Inserting null records to prevent invalid key lookups (slows down
cube processing significantly)
Cube is based off views (can be changed easily)
Can combine data using partitions
MOLAP vs ROLAP
Multidimensional Online Analytical Processing
Relational Online Analytical Processing
Click icon to add picture Click icon to add picture
MOLAP ROLAP
Fast query performance Works well with large volumes of data
Consumes less disk space Real time
Auto aggregation of data Securable
Processing can be slow Slower than MOLAP
Need to create your own aggregations
Not suited for budgeting and
forecasting
Best Practices - Dimensions
Consolidate multiple hierarchies into a single dimension
Avoid ROLAP storage model
Use role playing dimensions (e.g. OrderDate, BillDate, ShipDate) –
avoids multiple physical copies (use a view to make it look nicer)
(http://www.youtube.com/watch?v=SCH5gMCHMZs)
Avoid Many-to-Many dimensions (slow)
Best Practices – Attributes/Hierachies
Define all possible attribute relationships
Remove redundant attribute relationships
Use integer (or numeric) key columns
Best Practices - Measures
Use smallest numeric data type possible
Use semi-additive aggregate functions instead of MDX calculations
to achieve same behaviour
Avoid string source column for distinct count measures
Best Practices - OLAP
No more than 20M rows per partition
Manage storage settings by usage patterns
Frequently queried => MOLAP with lots of aggregations
Periodically queries => MOLAP with less or no aggregations
Historical => ROLAP with no aggregations
Use multiple disk controllers for IO performance
(Use SQLIO to determine disk perf)
Best Practices - General
Create Perspectives to help with querying data
Create Measure Groups
Create Calculated Measures and KPIs for frequently analysed data –
if you can do it in the DSV that’s preferable
MDX
MultiDimensional EXpressions
Syntax
SELECT
{ <Measure>,…} ON COLUMNS,
{ <Dimension> } ON ROWS
FROM <Cube>
WHERE (<Filters>)
Our data
SELECT
FROM [Adventure Works]
WHERE [Measures].[Internet Order Count]
Internet Orders
Slicing our data – 2007 Q1
SELECT
FROM [Adventure Works]
WHERE (
[Measures].[Internet Order Count],
[Date].[Fiscal].[Fiscal Quarter].&[2007]&[1]
)
Internet Orders in 2007 Q1
2007 Q1
Slicing our data – 2007 Q1, Australia
SELECT
FROM [Adventure Works]
WHERE (
[Measures].[Internet Order Count],
[Date].[Fiscal].[Fiscal Quarter].&[2007]&[1],
[Customer].[Customer Geography].[Country].&[Australia]
)
Internet Orders in 2007 Q1, Australia
Australia
2007 Q1
Slicing our data – 2007 Q1, Australia
SELECT
FROM [Adventure Works]
WHERE (
[Measures].[Internet Order Count],
[Date].[Fiscal].[Fiscal Quarter].&[2007]&[1],
[Customer].[Customer Geography].[Country].&[Australia],
[Customer].[Gender].&[M]
)
Internet Orders in 2007 Q1, Australia,
Male
Australia
2007 Q1
Syntax
SELECT
{[Measures].[Internet Order Count]} ON COLUMNS,
{[Customer].[Country].Members} ON ROWS
FROM [Adventure Works]
WHERE [Date].[Fiscal Year].&[2008]
Syntax
SELECT
{[Date].[Fiscal Quarter of Year].[Fiscal Quarter of
Year].Members} ON COLUMNS,
{[Customer].[Country].Members} ON ROWS
FROM [Adventure Works]
WHERE
[Measures].[Internet Order Count]
Syntax – Get me the last 4 quarters
SELECT
{LastPeriods(4,[Date].[Fiscal].[Fiscal Quarter].&[2008]&[1])}
ON COLUMNS,
{[Customer].[Country].Members} ON ROWS
FROM [Adventure Works]
WHERE
[Measures].[Internet Order Count]
Syntax – How did we do last year?
SELECT
{ParallelPeriod([Date].[Fiscal].[Fiscal Year], 1,[Date].[Fiscal].
[Fiscal Year].&[2008]),
[Date].[Fiscal].[Fiscal Year].&[2008]} ON COLUMNS,
{[Customer].[Country].Members} ON ROWS
FROM [Adventure Works]
WHERE [Measures].[Internet Order Count]
Syntax - YTD
WITH
MEMBER [Measures].[YTD Sales]
AS 'SUM(PeriodsToDate([Date].[Fiscal].[Fiscal Year]), [Measures].[Internet Order
Count])‘
SELECT {[Measures].[Internet Order Count],[Measures].[YTD Sales] } ON COLUMNS,
{[Date].[Fiscal].[Fiscal Quarter]} ON ROWS
FROM [Adventure Works]
Syntax – Cross Joins
SELECT
{CROSSJOIN([Date].[Fiscal].[Fiscal Year], [Product].[Category].MEMBERS)} ON
COLUMNS,
{[Customer].[Country].MEMBERS} ON ROWS
FROM [Adventure Works]
WHERE [Measures].[Internet Order Count]
Syntax – Cross Joins
SELECT
{NONEMPTYCROSSJOIN([Date].[Fiscal].[Fiscal Year], [Product].[Category].MEMBERS)}
ON COLUMNS,
{[Customer].[Country].MEMBERS} ON ROWS
FROM [Adventure Works]
WHERE [Measures].[Internet Order Count]
The Equivalent in SQL?
SELECT c.Name Country, DATEPART(yyyy, sh.OrderDate) [Year], pc.Name Category,
SUM(sd.OrderQty) TotalOrders
FROM Sales.SalesOrderDetail sd
INNER JOIN Sales.SalesOrderHeader sh ON sh.SalesOrderID = sd.SalesOrderID
INNER JOIN Production.Product p ON sd.ProductID = p.ProductID
INNER JOIN Production.ProductSubCategory psc ON psc.ProductCategoryID =
p.ProductSubcategoryID
INNER JOIN Production.ProductCategory pc ON pc.ProductCategoryID =
psc.ProductCategoryID
INNER JOIN Person.Address a ON a.AddressID = sh.ShipToAddressID
INNER JOIN Person.StateProvince sp ON sp.StateProvinceID = a.StateProvinceID
INNER JOIN Person.CountryRegion c ON c.CountryRegionCode = sp.CountryRegionCode
GROUP BY c.Name, DATEPART(yyyy, sh.OrderDate), pc.Name
ORDER BY c.Name, pc.Name, DATEPART(yyyy, sh.OrderDate)
The Equivalent in SQL?
What about fiscal year?
The Equivalent in SQL?
SELECT c.Name Country,
CASE WHEN MONTH(sh.OrderDate) < 7 THEN YEAR(sh.OrderDate) - 1 ELSE
YEAR(sh.OrderDate) END [Year],
pc.Name Category,
SUM(sd.OrderQty) TotalOrders
FROM Sales.SalesOrderDetail sd
INNER JOIN Sales.SalesOrderHeader sh ON sh.SalesOrderID = sd.SalesOrderID
INNER JOIN Production.Product p ON sd.ProductID = p.ProductID
INNER JOIN Production.ProductSubCategory psc ON psc.ProductCategoryID =
p.ProductSubcategoryID
INNER JOIN Production.ProductCategory pc ON pc.ProductCategoryID =
psc.ProductCategoryID
INNER JOIN Person.Address a ON a.AddressID = sh.ShipToAddressID
INNER JOIN Person.StateProvince sp ON sp.StateProvinceID = a.StateProvinceID
INNER JOIN Person.CountryRegion c ON c.CountryRegionCode = sp.CountryRegionCode
The Equivalent in SQL?
GROUP BY c.Name,
CASE WHEN MONTH(sh.OrderDate) < 7 THEN YEAR(sh.OrderDate) - 1 ELSE
YEAR(sh.OrderDate) END,
pc.Name
ORDER BY
c.Name,
pc.Name,
CASE WHEN MONTH(sh.OrderDate) < 7 THEN YEAR(sh.OrderDate) - 1 ELSE
YEAR(sh.OrderDate) END
XMLA (XML for Analysis)
Another way to query your Cube
SOAP-based XML protocol, designed specifically for universal data
access to any standard multidimensional data source that can be
accessed over an HTTP connection.
Analysis Services uses XMLA as its only protocol when
communicating with client applications.
MDX and XMLA is hard
Microsoft SQL Server 2012 Analysis Services: The BISM Tabular
Model
Professional Microsoft SQL Server 2012 Analysis Services with
MDX and DAX
Summary
1. What is a Cube?
2. Steps in Creating a Cube
3. Demo: Creating a Cube
4. Cube Issues
5. Hands on Lab
Tips
http://www.ssw.com.au/ssw/standards/Rules/RulesToBetterBusinessI
ntelligence.aspx
http://channel9.msdn.com/tags/Data+Warehousing/
http://channel9.msdn.com/tags/Business+Intelligence/
http://sqlblog.com/blogs/default.aspx
SSIS resources
SSIS Junkie
http://sqlblog.com/blogs/jamie_thomson
Microsoft SQL Server Integration Services ON-DEMAND
WEBCASTS
http://
www.microsoft.com/events/series/bi.aspx?tab=webcasts&id=42664
Great blog about SSIS
http://www.sqlis.com/sqlis/
Resources
Newsletter
http://www.sqlservercentral.com/
BIDS helper
http://www.codeplex.com/bidshelper
3 things…
[email protected]
http://blog.ozdemir.id.au
twitter.com/mozdemir_au
Thank You!
Gateway Court Suite 10
81 - 91 Military Road
Neutral Bay, Sydney NSW 2089
AUSTRALIA
ABN: 21 069 371 900
Phone: + 61 2 9953 3000
Fax: + 61 2 9953 3105
[email protected]
www.ssw.com.au