0% found this document useful (0 votes)
74 views

SQL Training 101

Uploaded by

valar.natesan87
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
74 views

SQL Training 101

Uploaded by

valar.natesan87
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

SQL Training 101

Presented by: Marcus Birju, Megha Vipin & Kevin Dean


NA Inbound Supply Chain
12/13/2017 - Seattle, WA

Amazon Confidential
Overview

• What is SQL?
• Using SQL
• RedShift
• Structuring a Statement
• Tools to Use
• Getting Help

Amazon Confidential
What is SQL?
• Structured Query Language
– Both are OK! S-Q-L or Sequel
• A standardized programming language used
for managing relational databases and
performing various operations on the data in
them.
• Objects you will interact with:
– Schemas
– Tables
– Views
• Select Statements

Amazon Confidential
Types of Objects
• Schema
– A database contains one or more named schemas.
– Each schema in a database contains tables and other kinds of
named objects.
– Identical database object names can be used in different
schemas in the same database without conflict.
• Table
– The data or information is stored in this object.
– Names are uniquely identified in each schema.
• View
– Names are uniquely identified in each schema.
– The view is not physically materialized
– The query that defines the view is run every time the view is
referenced in a query.

Amazon Confidential
Best Practices
/*
• Consistency in your code Title: IXD-XYZ_Anaysis-NA
Description: Returns all ASINs in GL Product Group 193 for the XYZ
– Uniformity analysis.
– Spacing Created by: @birjum
Change Log:
• Naming Conventions >2017-11-01 10:32 AM PST: Created Script.
• Aliases >2017-11-15 02:45 PM PST: Added missing region_id filter @asippy.
*/
– Columns
SELECT
– Joins DMA.ASIN
• Commenting , NVL(DFAM.FULFILLMENT_NETWORK_SKU, DMA.ASIN) AS FNSKU

– Using 2 dashs FROM booker.d_mp_asins DMA –ASIN Attributes


– Using /* and */ LEFT JOIN booker.d_fnsku_asin_map DFAM --Mapping for ASIN to
FNSKU
– Change logs ON DMA.ASIN = DFAM.ITEM_AUTHORITY_ID
• Think of values as: AND DMA.REGION_ID = DFAM.REGION_ID

– TRUE WHERE DMA.REGION_ID = 1 --NA REGION


– FALSE AND DMA.MARKETPLACE_ID IN (1)
AND DMA.GL_PRODUCT_GROUP IN (193)
– NULL AND DMA.IS_DELETED = 'N'

ORDER BY DMA.ASIN;

Amazon Confidential
Components of a select statement
• SELECT – Pick the columns you want
• FROM – Identify the data source (table, view)
• WHERE – Filters the data source
• GROUP BY – summarize the data
• HAVING – Filter on summarized data SELECT
MERCHANT_BRAND_NAME AS BRAND_NAME,
• ORDER BY – Order the data ASIN,
SELECT COUNT(*) AS CT
ASIN,
MARKETPLACE_ID, FROM BOOKER.D_MP_ASINS
GL_PRODUCT_GROUP,
MERCHANT_BRAND_NAME AS BRAND_NAME, WHERE REGION_ID = 1 --NA REGION
REPLENISHMENT_CODE, AND MARKETPLACE_ID IN(1)
PRODUCT_TIER_ID AS SEASON_CODE AND MERCHANT_BRAND_NAME = 'GoPro'
AND IS_DELETED = 'N'
FROM BOOKER.D_MP_ASINS
GROUP BY
WHERE REGION_ID = 1 --NA REGION MERCHANT_BRAND_NAME,
AND MARKETPLACE_ID IN(1) ASIN
AND GL_PRODUCT_GROUP IN(193)
AND IS_DELETED = 'N' HAVING COUNT(*) > 1

ORDER BY ASIN; ORDER BY ASIN;

Amazon Confidential
Joins
• What is it?
– A join clause is used to combine rows from two or more tables, based on a
related column(s) between them.
• Types
– Inner: All records that have matching values in both tables
– Left: All records from left table and matched records from the right table
– Right: All records from the right table and matched records from the left table
– Full Outer: all records when there is a match in either left or right table
– NOTE: Unmatched records display as NULL in all joins except inner.

Amazon Confidential
Join Examples

Inner Join OrderID CustomerName OrderDate Left CustomerName OrderID


SELECT 10308 Lucius Fox 9/18/1996 SELECT Oswald Cobblepot NULL
O.OrderID C.CustomerName
10365 Bruce Wayne 11/27/1996 , O.OrderID
Lucius Fox 10308
, C.CustomerName
, O.OrderDate 10383 Jim Gordon 12/16/1996 Bruce Wayne 10365
FROM Customers C
LEFT JOIN Orders O
FROM Orders O ON C.CustomerID = O.CustomerID
INNER JOIN Customers C
ON O.CustomerID=C.CustomerID; ORDER BY C.CustomerName;

Full Outer CustomerName OrderID


Right OrderID LastName FirstName
SELECT SELECT
Harvey Dent NULL NULL Falcone Carmine
C.CustomerName O.OrderID
, O.OrderID Lucius Fox 10308 , E.LastName 10248 Barnes Nathaniel
, E.FirstName
FROM Customers C Bruce Wayne 10365 10249 Nygma Edward
FULL OUTER JOIN Orders O FROM Orders O
NULL 10382 RIGHT JOIN EmployeeS E
ON C.CustomerID=O.CustomerID ON O.EmployeeID = E.EmployeeID
NULL 10351
ORDER BY C.CustomerName; ORDER BY O.OrderID;

Amazon Confidential
Unions

• Are used to merge the results of two ore


more separate query expressions.
• Union
– Takes distinct values from each query
expressions and combines the results
• Union All
– Retains duplicate values from each query
expressions and combines the results

Amazon Confidential
Union Example
Suppliers Orders
supplier_id supplier_name order_id order_date supplier_id
1000 Microsoft 1 2015-08-01 2000
2000 Oracle 2 2015-08-01 6000
3000 Apple 3 2015-08-02 7000
4000 Samsung 4 2015-08-03 8000

supplier_id supplier_id
SELECT supplier_id SELECT supplier_id
1000 1000
FROM suppliers FROM suppliers
2000 2000
UNION 3000 UNION ALL 2000

SELECT supplier_id 4000 No Duplicates SELECT supplier_id 3000


FROM orders 6000 FROM orders 4000
7000 6000
ORDER BY supplier_id; ORDER BY supplier_id;
8000 7000
8000

Amazon Confidential
What is Redshift?

• RedShift is a relational SQL database designed:


– To efficiently and quickly run “heavy” queries against
large datasets.
– Not for transactional purposes e.g. real-time data.
• REMEMBER:
– SQL is still SQL
– Some syntax may be different from Oracle and other
versions of SQL.

Amazon Confidential
Query Best Practices
• Avoid SELECT * queries
– Select ONLY the columns that are
necessary.
– More columns means more
processing time.
• Always include a date range and other
filters in the WHERE clause
– Don’t pull more data than what’s
needed.
– Commonly we use limits for region_id,
marketplace_id, legal_identity_id and
other such fields.
– More rows also means more
processing time.
• Optimize for performance!!

Amazon Confidential
Sort Keys

• Many tables in Redshift have Sort


Keys assigned by the creators.
• Sort Keys sort the data when it is
stored, and act as indexes when it
is retrieved.
• If they are present in the table,
always use Sort Keys in your
WHERE clauses to have a faster
run time.

Amazon Confidential
Primary Keys
• Many tables in Redshift have Primary Keys assigned by the creators.
• There can be a single Primary Key, or multiple Primary Keys, in any given
table.
• Primary Keys are meant to uniquely identify table records.
• Primary Keys are not enforced to be unique in Redshift, although they are
enforced by ETLM.
• Primary Keys can never be null in any relational SQL database, including in
Redshift.

Amazon Confidential
Syntax Differences
• Redshift is based on PostgreSQL, so
the syntax is similar.
• When using Redshift in ETLM, you
must add a dependencies hint.
• You must always name the schema
when querying
– e.g. booker.d_distributor_orders
rather than d_distributor_orders.
• Redshift syntax has some small
differences to Oracle syntax.
– SUBSTR versus SUBSTRING

Amazon Confidential
Communicating Requirements for Developer

• Why do you need the data?


Business reason and the processes leveraged
• What information is needed?
It’s easier to translate I need the top 80% loads that did not meet their
Priority SLA for the last 2 months for every FC. Our process to calculate
defaulters is such and such.
• How much data is needed?
Do you need it trending day over day, week over week etc.
• How do you want the final output/report to look?
Helps us understand the data when we see the final output broken down
by columns.
• Use Simple Issue Manager (SIM)…

Amazon Confidential
SIM Example

Amazon Confidential
SIM Example

Amazon Confidential
Requirements for Developing SQL
• Figure out the tables
– Internal resources:
• Bicon
• BI-metadata
• Query the tables
– Identify
• Primary keys
• Granularity
• Join keys
• Break it out into smaller pieces
– Use a Flowchart or Mapping
• Checking the data quality as you go
• Comment your code
• Don’t be scared of error messages!

Amazon Confidential
Example SQL Statement
SELECT
DMA.ASIN
, NVL(DFAM.FULFILLMENT_NETWORK_SKU, DMA.ASIN) AS FNSKU

FROM booker.d_mp_asins DMA –ASIN Attributes


LEFT JOIN booker.d_fnsku_asin_map DFAM --Mapping for ASIN to FNSKU
ON DMA.ASIN = DFAM.ITEM_AUTHORITY_ID
AND DMA.REGION_ID = DFAM.REGION_ID

WHERE DMA.REGION_ID = 1 --NA REGION


AND DMA.MARKETPLACE_ID IN (1)
AND DMA.GL_PRODUCT_GROUP IN (193)
AND DMA.IS_DELETED = 'N'

ORDER BY DMA.ASIN;

Amazon Confidential
Tools to Use
• Hubble:
– Use for data discovery.
– In Database Drop down, select “IBPLANNING(RedShift)”
• ETL Manager (Transform or Extract):
– Use for full datasets and metrics.
– Job Settings:
• Datanet Group: BI-DATABASE-NA-IB
• Logical Database: ibplanning
• DB User: ibplanning_rs_etl
• Information about DW tables:
– bi-metadata.amazon.com
– bicon.amazon.com

Amazon Confidential
Expand each Schema:
• Primary Keys
• Sort Keys
• Columns and Data
types

Right Click on any


table or field and
automatically script to
editor.

Amazon Confidential
Interactive Demo

Hubble
ETL Manager

Amazon Confidential
Getting Help
• Office Hours: Every Wednesday @ 2:30PM-3:30PM PST in Ruby
10.505
• Email: [email protected]
• TT routes to SIM:
– Category: Supply Chain
– Type: Execution
– Item: Inbound Technical Support
• AWS Documentation:
https://aws.amazon.com/documentation/redshift/

Amazon Confidential
Questions?

SQL Training 201 is coming soon…

Amazon Confidential

You might also like