67% found this document useful (3 votes)

485 views

Deep Dive On AWS Redshift

Uploaded by

Ramkumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

67% found this document useful (3 votes)

485 views

Deep Dive On AWS Redshift

Uploaded by

Ramkumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 73

Deep Dive on Amazon Redshift

Pratim Das
Specialist Solutions Architect, Data & Analytics, EMEA
28th June, 2017

Deep inside Redshift Architecture

Performance tuning

Integration with AWS data services

Redshift Spectrum

Redshift Echo System

Redshift at Manchester Airport Group

Summary + Q&A
Echo
Architecture Tuning Integration Spectrum MAG Summary
System

Redshift Architecture
Fast Cost
Efficient

Simple Elastic Managed Massively Parallel Petabyte Scale Data

Warehouse
Streaming Backup/Restore to S3
Load data from S3, DynamoDB and EMR
Extensive Security Features
Secure Compatible Scale from 160 GB -> 2 PB Online
Amazon Redshift Cluster Architecture
Massively parallel, shared nothing SQL Clients/BI Tools
Leader node JDBC/ODBC

• SQL endpoint 128GB RAM

• Stores metadata Leader

16 cores

Node
• Coordinates parallel SQL processing 16TB disk

10 GigE
Compute nodes (HPC)

• Local, columnar storage

• Executes queries in parallel 128GB RAM 128GB RAM 128GB RAM

Compute
16 cores Compute
16 cores Compute
16 cores

• Load, backup, restore Node

16TB disk
Node
16TB disk
Node
16TB disk

• 2, 16 or 32 slices
Ingestion
Backup S3 / EMR / DynamoDB / SSH
Restore
Design for Queryability

• Equally on each slice

• Minimum amount of work
• Use just enough cluster resources
Do an Equal Amount of Work
on Each Slice
Choose Best Table Distribution Style

Key Even All

Same key to Round robin All data on
same location distribution every node

Slice Slice Slice Slice Slice Slice Slice Slice Slice Slice Slice Slice
1 2 3 4 1 2 3 4 1 2 3 4

Node 1 Node 2 Node 1 Node 2 Node 1 Node 2

Do the Minimum Amount of
Work on Each Slice
Reduced I/O = Enhanced Performance
analyze compression listing;

Columnar storage Table | Column | Encoding

+ 10 10 | 13 | 14 | 26 |…
Zone maps 324 … | 100 | 245 | 324

+ 375 375 | 393 | 417…

Direct-attached storage 623 … 512 | 549 | 623

637 637 | 712 | 809 …

959 … | 834 | 921 | 959

Use Cluster Resources
Efficiently to Complete as
Quickly as Possible
Workload Management
Client Amazon Redshift Workload Management

Queries: 80% memory

4 Slots 80/4 = 20% per slot
BI tools

ETL: 20% memory

Analytics tools
2 Slots 20/2 = 10% per slot

Waiting Running
SQL clients
Echo
Architecture Tuning Integration Spectrum MAG Summary
System

Redshift Performance Tuning

Redshift Playbook

Part 1: Preamble, Prerequisites, and

Prioritization
Part 2: Distribution Styles and
Distribution Keys
Part 3: Compound and Interleaved
Sort Keys
Part 4: Compression Encodings
Part 5: Table Data Durability

amzn.to/2quChdM
Optimizing Amazon Redshift by Using the AWS
Schema Conversion Tool

amzn.to/2sTYow1
Echo
Architecture Tuning Integration Spectrum MAG Summary
System

Ingestion, ETL & BI

Getting data to Redshift using AWS DMS

Simple to use Minimal Downtime Supports most widely

used Databases

Low Cost Fast & Easy to Set-up Reliable

Loading data from S3

• Splitting Your Data into Multiple Files

• Uploading Files to Amazon S3
• Using the COPY Command to Load from
Amazon S3
ETL on Redshift
QuickSight for BI on Redshift

Amazon Redshift
Echo
Architecture Tuning Integration Spectrum MAG Summary
System

Amazon Redshift Spectrum

Amazon Redshift Spectrum
Run SQL queries directly against data in S3 using thousands of nodes

Fast @ exabyte scale Elastic & highly available On-demand, pay-per-query

S3
SQL
High concurrency: Multiple No ETL: Query data in-place Full Amazon Redshift
clusters access same data using open file formats SQL support
Life of a query Query
SELECT COUNT(*)
1
FROM S3.EXT_TABLE
GROUP BY…
JDBC/ODBC

Amazon
Redshift

...
1 2 3 4 N

Amazon S3 Data Catalog

Exabyte-scale object storage Apache Hive Metastore
Life of a query

JDBC/ODBC

Amazon
Query is optimized and compiled at
Redshift
2 the leader node. Determine what gets
run locally and what goes to Amazon
Redshift Spectrum

...
1 2 3 4 N

Amazon
Redshift

Final aggregations and joins

8 with local Amazon Redshift
tables done in-cluster

...
1 2 3 4 N

Amazon S3 Data Catalog

Exabyte-scale object storage Apache Hive Metastore
Life of a query

JDBC/ODBC

Amazon
Redshift
9 Result is sent back to client

...
1 2 3 4 N

Amazon S3 Data Catalog

Exabyte-scale object storage Apache Hive Metastore
Demo:
Running an analytic query
over an exabyte in S3
Lets build an analytic query - #1
An author is releasing the 8th book in her popular series. How SELECT
many should we order for Seattle? What were prior first few P.ASIN,
P.TITLE
day sales?
FROM
products P
Lets get the prior books she’s written. WHERE
P.TITLE LIKE ‘%POTTER%’ AND
P.AUTHOR = ‘J. K. Rowling’
1 Table
2 Filters
Lets build an analytic query - #2
An author is releasing the 8th book in her popular series. How SELECT
many should we order for Seattle? What were prior first few P.ASIN,
P.TITLE,
day sales?
SUM(D.QUANTITY * D.OUR_PRICE) AS SALES_sum
FROM
Lets compute the sales of the prior books she’s written in this s3.d_customer_order_item_details D,
products P
series and return the top 20 values
WHERE
D.ASIN = P.ASIN AND
2 Tables (1 S3, 1 local) P.TITLE LIKE '%Potter%' AND
P.AUTHOR = 'J. K. Rowling' AND
2 Filters GROUP BY P.ASIN, P.TITLE
1 Join ORDER BY SALES_sum DESC
2 Group By columns LIMIT 20;

1 Order By
1 Limit
1 Aggregation
Lets build an analytic query - #3
An author is releasing the 8th book in her popular series. How SELECT
many should we order for Seattle? What were prior first few P.ASIN,
P.TITLE,
day sales?
P.RELEASE_DATE,
SUM(D.QUANTITY * D.OUR_PRICE) AS SALES_sum
Lets compute the sales of the prior books she’s written in this FROM
s3.d_customer_order_item_details D,
series and return the top 20 values, just for the first three days
asin_attributes A,
of sales of first editions products P
WHERE
3 Tables (1 S3, 2 local) D.ASIN = P.ASIN AND
P.ASIN = A.ASIN AND
5 Filters A.EDITION LIKE '%FIRST%' AND
2 Joins P.TITLE LIKE '%Potter%' AND
P.AUTHOR = 'J. K. Rowling' AND
3 Group By columns
D.ORDER_DAY :: DATE >= P.RELEASE_DATE AND
1 Order By D.ORDER_DAY :: DATE < dateadd(day, 3, P.RELEASE_DATE)
1 Limit GROUP BY P.ASIN, P.TITLE, P.RELEASE_DATE
ORDER BY SALES_sum DESC
1 Aggregation LIMIT 20;
1 Function
2 Casts
Lets build an analytic query - #4
An author is releasing the 8th book in her popular series. How SELECT
many should we order for Seattle? What were prior first few P.ASIN,
P.TITLE,
day sales?
R.POSTAL_CODE,
P.RELEASE_DATE,
Lets compute the sales of the prior books she’s written in this SUM(D.QUANTITY * D.OUR_PRICE) AS SALES_sum
FROM
series and return the top 20 values, just for the first three days
s3.d_customer_order_item_details D,
of sales of first editions in the city of Seattle, WA, USA asin_attributes A,
products P,
4 Tables (1 S3, 3 local) regions R
WHERE
8 Filters D.ASIN = P.ASIN AND
3 Joins P.ASIN = A.ASIN AND
D.REGION_ID = R.REGION_ID AND
4 Group By columns
A.EDITION LIKE '%FIRST%' AND
1 Order By P.TITLE LIKE '%Potter%' AND
1 Limit P.AUTHOR = 'J. K. Rowling' AND
R.COUNTRY_CODE = ‘US’ AND
1 Aggregation R.CITY = ‘Seattle’ AND
1 Function R.STATE = ‘WA’ AND
D.ORDER_DAY :: DATE >= P.RELEASE_DATE AND
2 Casts
D.ORDER_DAY :: DATE < dateadd(day, 3, P.RELEASE_DATE)
GROUP BY P.ASIN, P.TITLE, R.POSTAL_CODE, P.RELEASE_DATE
ORDER BY SALES_sum DESC
LIMIT 20;
Now let’s run that query over an exabyte of data in S3

Roughly 140 TB of customer item order detail

records for each day over past 20 years.

190 million files across 15,000 partitions in S3.

One partition per day for USA and rest of world.

Need a billion-fold reduction in data processed.

Running this query using a 1000 node Hive cluster

would take over 5 years.*

• Compression ……………..….……..5X
• Columnar file format……….......…10X

• Scanning with 2500 nodes…....2500X

• Static partition elimination…............2X

• Dynamic partition elimination..….350X
• Redshift’s query optimizer……......40X

---------------------------------------------------
Total reduction……….…………3.5B X

* Estimated using 20 node Hive cluster & 1.4TB, assume linear

* Query used a 20 node DC1.8XLarge Amazon Redshift cluster
* Not actual sales data - generated for this demo based on data
format used by Amazon Retail.
Is Amazon Redshift Spectrum useful if I don’t have an exabyte?

Your data will get bigger

On average, data warehousing volumes grow 10x every 5 years
The average Amazon Redshift customer doubles data each year

Amazon Redshift Spectrum makes data analysis simpler

Access your data without ETL pipelines
Teams using Amazon EMR, Athena & Redshift can collaborate using the same data lake

Amazon Redshift Spectrum improves availability and concurrency

Run multiple Amazon Redshift clusters against common data
Isolate jobs with tight SLAs from ad hoc analysis
Echo
Architecture Tuning Integration Spectrum MAG Summary
System

Redshift Partner Echo System

4 types of partners

• Load and transform your data with Data Integration

Partners
• Analyze data and share insights across your
organization with Business Intelligence Partners
• Architect and implement your analytics platform
with System Integration and Consulting Partners
• Query, explore and model your data using tools and
utilities from Query and Data Modeling Partners

aws.amazon.com/redshift/partners/
“Some” Amazon Redshift Customers
Manchester Airport Group
An AWS Redshift customer story

Stuart Hutson
Head of Data and BI, MAG
+
Munsoor Negyal
Director of Data Science, Crimson Macaw

© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
MAG – take-off with cloud and data
Stuart Hutson – Head of Data and BI
THE AVIATION PROFESSIONALS
MAG is a leading UK based airport company, which owns and operates Manchester, London St ansted, East
Midlands and Bournemout h airports.
MAG is privately managed on behalf of its shareholders, the local authorities of Greater Manchester
and Industry Funds Management (IFM). IFM is a highly experienced, long-term investor in airports
and already has significant interests in ten airports across Australia and Europe.

Voting: 50% Voting: 50%

Economic: 64.5% Economic: 35.5%

48.5 MILLION passengers serv ed per £623 MILLION property assets across all
year. airports, 5.67m sq ft of commercial
property.
Ov er 80 AIRLINES serv ing 272
DESTINATIONS direct. £738.4 MILLION REVENUE +10.0%
increase from last year.
£134.3 MILLION RETAIL INCOME per
annum deliv ered v ia 200+ shops, bars £283.6 MILLION EBITDA growth of 17.2%
and restaurants. in 2015.

£125.7 MILLION CAR PARKS INCOME £5.6 BILLION contribution to the UK

deliv ered v ia 96,000 parking spaces. economy from MAG airports.

Source: FY15 Report & Accounts 44

OUR AIRPORTS…
MAG airports serve over 48.5 million people per annum from complementary cat chment areas covering over
75% of t he UK population.

c. 4.5m passengers per annum.

UK’s largest freight airport after Heathrow – 310,000 tonnes p.a.
c. 23m passengers per annum. Located nex t to key road interchanges – four hours from
UK’s 3 rd largest airport. virtually all UK commerce.
70+ airlines & 200+destinations.
2 runw ays w ith potential 62% capacity.
21.5m people w ithin a 2 hour drive.

c. 23m passengers per annum.

UK’s 4 th largest airport.
150+ destinations.
1 runw ay w ith 50% spare capacity.
c. 0.7m passengers per annum.
25m people w ithin 2 hour drive.
Significant investment in new terminal increasing
Acquired February 2015.
passenger capacity to 3m p.a.
Wealthy catchment area.
Large land holding – on-site business park.
Catchment area w it hin 2 hours’ driv e of:

M AN STN
45
OUR CONNECTIVITY…
80+ airlines and over 270 direct destinations providing global connectivity.

AIR SERVICE DEVELOPMENT CARGO SERVICE DEVELOPMENT

MAG has a diverse carrier mix from global destinations with an excellent track record of MAG’s Cargo produces an annual income of £20.2 million and holds 26% of the UK freight market
incentivizsng passenger growth. share.

MAG has ex ceeded ex pect at ions w ith indust ry leading rat es of passenger grow t h. I mport ant ly East Midlands is t he U K’s largest dedicat ed freight hub handling 310,000 t onnes of freight per
for passengers, by forging st rong commercial part nerships w it h airlines, our airport s hav e been annum. St anst ed handles 233,000 t onnes of freight per annum and is a key gat ew ay t o London
able t o increase choice and conv enience and make a st ronger cont ribut ion t o economy and t he Sout h of England.
grow t h.

46
OUR DEVELOPMENTS…
Manchester Transformation Programme and London St ansted Transformation Programme are developments
t hat all aim to drive improved cust omer service.

MANCHESTER TRANSFORMATION PROGRAMME

With investment of £1 billion, Manchester will become one of the most modern and customer
focused airports in Europe demonstrating the importance of Manchester as a global gateway.

LONDON STANSTED TRANSFORMATION PROGRAMME

The £80 million terminal transformation project at London Stansted will transform the passenger
experience and boost commercial yields.

47
MAG’S CURRENT BUSINESS INTELLIGENCE MATURITY
4. HOW CAN WE MAKE
IT HAPPEN?
PRESCRIPTIVE
ANALYTICS
3. WHAT WILL
HAPPEN?
PREDICTIVE
ANALYTICS
2. WHY DID IT
HAPPEN?
DIAGNOSTIC
ANALYTICS
VALUE
1. WHAT
HAPPENED?
DESCRIPTIVE
ANALYTICS

MATURITY
MAG’S LEGACY ARCHITECTURE - CHALLENGES…

flat file (sent database web

Business
v ia SFTP) connection serv ices • Multiple version of the truth
• Unclear of operational problems
• People are overloaded and with data and data led questions
• Analysts not able to do analyst job due to lack of data and tools
• Data processing issues - late reports, missing data
• Data accessible in silos - no real cross-functional analysis

Technical
• Database @ 95% capacity on physical kit that can not be scaled.
• Dashboards are slow to run.
• Constant optimisation and maintenance of database.
• Limited concurrent connections for queries.
• Lack of self-serve – centralised BI model.
access v ia
browser to
• No direct connection to database – business wants to expand into
online PDFs v ia using R and Python etc.
dashboards email
and ad-
• All data in batch with no possibility of streaming
hoc queries
VALUE OF BI STRATEGY IN MAG…

Monetise Data Democratise Data Data DNA

• Monetise data and • Democratise data • Create a data DNA:

technology across across the Build a culture around
our omni-channels: Enterprise: Our data data and analytical
MAG’s BI Strategy needs to be thinking across the
must be bold, it pervasive across the organisation by
should be aiming for organisation. The embedding analytics
how we monetises decisions of the and data across
our data and organisation should MAGs business
technology across be made on clear processes and
our omni-channel information presented decision making.
business by to the business at the
improving the right time to enable
customer experience. MAG to make the
right decisions.

50
PHASE 1 - IMPLEMENT SELF-SERVE RETAIL BI SOLUTION
- 50+ PARTNERS GENERATING OVER £130M REVENUE…

Build Data and BI Foundation solution

• To create an extensible and flexible data solution for MAG comprising of:

• Extended Data Warehouse.

• Scalable and elastic compute.
• Deal with seasonality spikes of passenger travel.

• Real-time streaming.
• Enable MAG to become a real-time business across their customer journey.

• Cloud environment:
• Secured.
• Resilient.
• Repeatable build.

• Enable MAG to quickly experiment at low cost and minimal risk.

• MAG wants to trial new technologies, especially open-source.

• Create an architecture than can evolve over time to meets MAG’s new challenges.
• Benefits delivered early and continuously.
• No need for MAG to invest in a large, front -loaded EDW programme.
EXAMPLE OF MAG’S DESIGN PRINCIPLES TO SOLVE THE PROBLEMS…

• Evolutionary architecture

• Infrastructure as Code

• Protecting our data

• Assume for failure

• Data quality is a priority

• Embrace open source for experimentation

• SaaS -> PaaS -> IaaS

• Serverless computing

• Etc.
MAG – OUR 6 MONTH JOURNEY…

From To
Single instance database. → Scale-able Data Warehouse.

Daily sales rung in at store lev el. Ov er 90% of all sales automatically ingested
→ at product lev el.

Car parking - flat files ingested in batch. Ingest and interrogate streaming data
directly:
→ • Car park data is being added v ia Kinesis

Access to database limited to reporting Authorised users can use v isualisation and
tool. → data science tools (e.g. R and Python) of
their choice for self-serv e analytics
No database writeback for end-users. Sandboxes in Redshift for end user
→ experimentation.
MAG – NEXT 6-12 MONTHS…

• Moving to near-time streaming into Redshift for:

• Terminal Operations
• Security Services
• Car Park Management

• Streaming semi-structured data into Redshift

• Trialling IoT data streaming
• Passenger analysis

• Trial AWS Glue and AWS Redshift Spectrum

• Automated profile and catalogue of data across the enterprise
• Continuous integration of data into our data warehouse
Who Are Crimson Macaw
Driving customer success by unlocking the value of data.
Competency focused consultancy

Architecture DevOps Data Engineering Enterprise Data

and Data Science
Data Strategy Solutions

1. Plan 2. Build 3. Action

www.crimsonmacaw.com
Our partners ...

3 AWS Big Data Speciality

3 AWS Certified Solutions Architect Associate

2 AWS Certified Developer Associate

2 AWS Certified SysOps Administrator Associate

Building a solution
... without too many twists and turns.
Key architectural components used
Visualisation in Tableau

Data Transformation Streaming in Kinesis

in ODI & Kinesis Firehose

Storage in S3

Data Warehouse in
Amazon Redshift
Cloud Architecture + Data Architecture =
Solution
How do you match the pace of infrastructure build in the cloud with understanding the data & BI
requirements?

Deliver value quickly vs conformed • A horizontal analytical ‘slice’ across the estate.
dimensions? • Understand conformed dimensions.
• Vertical slice of a business domain.
• Reduced refactoring due to the prior horizontal
analysis.

Understand how the business will consume • Produce artefacts that are:
and use the data? • Shared by stakeholders and the delivery
team.
• Understandable by all parties.
• Highly visual, allow complex information to
be absorbed - sun modelling.
Sun modelling vs Enterprise Bus Matrix
Time Calendar
Month
Employee
Calendar Name
Financial Week
Financial Period
Date
Financial Quarter Employee
Year ID
Sales £ measure
Name Net Despatches £
Sales Units

Customer
Enterprise Bus Matrix
ID SKU
Postcode
Town
Product
Country
County
Salutation Type dimension
Gender
Item
Customer Description
Product
hierarchy Sun Model

Star Schema
Building the infrastructure (as code)
• Why use infrastructure as code?
• Repeatability.
• Consistency.
• Versioned.
• Code reviews.
• Speed of delivery.

• Technology Used:
• CloudFormation in YAML format with custom YAML Tags.
• Lambda Functions for Custom Resource Types.
• Bespoke deployment utility.
• Puppet Standalone in Cloud Init for EC2.

• Why this approach?

• Enforced Tagging Policy with propagated tags.
• Custom YAML Tags act as a precompiler for CloudFormation.
• Not all resources types were available, e.g. DMS.
• Redshift IAM Roles and Tags – both now available out of the box!.
Security overview
• Three independent AWS accounts
• Dev – for development of data processes.
• Prod – target deployment. Dev Prod
• Sec – sink for data generated by Config
Service and CloudTrail to S3 buckets.
• Encryption AWS Config
Sec
AWS Config
• KMS Encryption keys used throughout AWS CloudTrail AWS CloudTrail

• Enforced SSL connections to Redshift

• S3 – enforced write encryption (by policy).

• Audit and compliance documentation

• AWS Artifacts.
Redshift topology
• Storage Optimised (red)
• Optimised for storing larger volumes of data (source).
• Ingestion point for newly arriving data.
• Transformation layer (large number of work tables).
• VPC - private subnet.

• Compute Optimised layer (blue)

• Transformed data.
• Near real-time operational data.
• Present dimensional layer.
• VPC – public subnet (whitelisted access).
What about Streaming?
• Setup Kinesis Streams to allow
3rd parties to send data.
• Enabled cross account access
with an assumed role.
• Used Lambda to route mixed
data to multiple Firehose
Streams.
• Firehose Streams sink data to
S3 and/or Redshift Compute
(blue).
Observations
ODI and Redshift
• Problem: ODI initiated Redshift tasks not completing.

Solution: Increase Array Fetch Size in ODI

• Problem: No native knowledge modules in ODI for
Redshift.
• Solution: Customised existing generic SQL knowledge modules
for Redshift.
• Evaluating 3 rd party solution Knowledge Module.
Tableau and Redshift
• How does Tableau Online connect to Redshift?
• JDBC via SSL.
• Whitelisted to Redshift.
• Tableau available in multiple regions (US, Ireland).
• Enable Redshift constraints:
• Foreign Key and Primary Key and Unique constraints – ensure they are created in Redshift
(even though they are not enforced).
• Enable Tableau “Assume Referential Integrity”
• in Tableau workbooks (if you have it!).
• Queries in Tableau:
• Executed via Redshift cursor – minimise IO.
• Current activity: stv_active_cursors.
• For recent activity (two - five days): stl_query and stl_utility_text.
Tableau – getting back to SQL
select
a.query,a.querytxt as cursor_sql
,b.sequence,b.text as raw_sql, b.starttime
from
stl_query a inner join stl_utilitytext b
on a.pid = b.pid and a.xid = b.xid
where
database = ‘<DBName>’
and a.starttime >= dateadd(day, -1, current_date)
order by
a.xid, b.starttime, b.sequence asc;
Redshift
• Performance so far has been very good.
• A lot to do with the design of Redshift.
• Optimisations so far have been limited to:
• Fields:
• lengths
• datatypes
• compression datatypes.
• Distribution keys.
• Sort keys.
• Skew analysis.
• Vacuum and ANALYZE.
• But we intend to do some more work on below:
• Work queue management.
• User load analysis.
• Attribute pushdown.
Echo
Architecture Tuning Integration Spectrum MAG Summary
System

1. Analyze Database Audit Logs for Security and amzn.to/2szR3nf

Fast
Compliance Using Amazon Redshift Spectrum
2. Build a Healthcare Data Warehouse Using Amazon EMR, amzn.to/2rr7LWq
Amazon Redshift, AWS Lambda, and OMOP
Cost 3. Run Mixed Workloads with Amazon Redshift Workload
Efficient amzn.to/2srIL1g
Management
4. Converging Data Silos to Amazon Redshift Using AWS amzn.to/2kIr1bq
Simple DMS
5. Powering Amazon Redshift Analytics with Apache Spark amzn.to/2rgR8Z7
Elastic and Amazon Machine Learning
6. Using pgpool and Amazon ElastiCache for Query Caching amzn.to/2lr66MH
with Amazon Redshift
Secure
7. Extending Seven Bridges Genomics with Amazon Redshift amzn.to/2tlylga
and R
8. Zero Admin Lambda based Redshift Loader bit.ly/2swvvI6
Compatible
London Amazon Redshift

Wednesday, July 5, 2017 - 6:00 PM to 8:00 PM

60 Holborn Viaduct, London
http://goo.gl/maps/yMZPT
{1:“Redshift Deep Dive and new features since last Meetup” | 2: “OLX presenting Advanced Analytics and
Machine Learning with Redshift” | 3:“Other customer/partner case studies” | 4:“Next steps for the community”}
Thank You
Data is magic!

CompTIA CASP+ CAS-004 Certification Guide
92% (12)
CompTIA CASP+ CAS-004 Certification Guide
654 pages
Gangboard Admin: Amazon Redshift Interview Questions and Answers
No ratings yet
Gangboard Admin: Amazon Redshift Interview Questions and Answers
112 pages
Introduction To Data Engineering
100% (1)
Introduction To Data Engineering
23 pages
Spark Interview Questions
No ratings yet
Spark Interview Questions
3 pages
Aws-Database-Migration-Service - User Guide
No ratings yet
Aws-Database-Migration-Service - User Guide
341 pages
Mastering Spark SQL PDF
100% (1)
Mastering Spark SQL PDF
1,776 pages
Optimizing Tableau Aws Redshift Whitepaper
No ratings yet
Optimizing Tableau Aws Redshift Whitepaper
33 pages
Sqoop Commands
No ratings yet
Sqoop Commands
4 pages
Amazon vs. Brian Hall - Complaint For Injunction
No ratings yet
Amazon vs. Brian Hall - Complaint For Injunction
17 pages
Getting Started With Amazon Redshift
No ratings yet
Getting Started With Amazon Redshift
51 pages
Aws - DB Notes
No ratings yet
Aws - DB Notes
10 pages
Redshift DG PDF
100% (1)
Redshift DG PDF
1,161 pages
Aws Redshift: Calculations Are Typically Executed On Small Number of Columns
No ratings yet
Aws Redshift: Calculations Are Typically Executed On Small Number of Columns
8 pages
Deep Dive and Best Practices For Amazon Redshift ANT418
100% (1)
Deep Dive and Best Practices For Amazon Redshift ANT418
85 pages
AWS Amazon EMR
100% (1)
AWS Amazon EMR
38 pages
Cheat Sheet - Redshift Performance Optimization
No ratings yet
Cheat Sheet - Redshift Performance Optimization
17 pages
02 - Apache Spark On Amazon EMR
No ratings yet
02 - Apache Spark On Amazon EMR
31 pages
Spark Tuning
No ratings yet
Spark Tuning
26 pages
Databricks Performance Tuning
No ratings yet
Databricks Performance Tuning
54 pages
HDFS Interview Questions
No ratings yet
HDFS Interview Questions
29 pages
Spark Interview Questions
No ratings yet
Spark Interview Questions
19 pages
Aws Certified ML Slides
No ratings yet
Aws Certified ML Slides
497 pages
Apache HIVE
100% (1)
Apache HIVE
105 pages
AWS Glue
No ratings yet
AWS Glue
10 pages
Top Answers To Spark Interview Questions
No ratings yet
Top Answers To Spark Interview Questions
4 pages
Dp203 Notes
No ratings yet
Dp203 Notes
87 pages
SnowProCore Exam Study Guide 050423
No ratings yet
SnowProCore Exam Study Guide 050423
16 pages
Spark SQL and DataFrames - Spark 2.2.0 Documentation
No ratings yet
Spark SQL and DataFrames - Spark 2.2.0 Documentation
35 pages
Lab - Exploring DataLake With Athena and Quicksight PDF
No ratings yet
Lab - Exploring DataLake With Athena and Quicksight PDF
22 pages
Pyspark PDF
100% (1)
Pyspark PDF
406 pages
Snowflake
No ratings yet
Snowflake
122 pages
Pyspark Material
No ratings yet
Pyspark Material
16 pages
Azure Databricks Overview
100% (1)
Azure Databricks Overview
4 pages
Databricks - Spark Streaming
No ratings yet
Databricks - Spark Streaming
55 pages
Certification
No ratings yet
Certification
16 pages
Intro To Spark Development
No ratings yet
Intro To Spark Development
172 pages
Spark Intreview FAQ
100% (2)
Spark Intreview FAQ
21 pages
06.introduction To Data Factory
No ratings yet
06.introduction To Data Factory
26 pages
Apache Spark Interview Questions
No ratings yet
Apache Spark Interview Questions
12 pages
Trivago Pipeline
No ratings yet
Trivago Pipeline
18 pages
Mongodb Interview Questions (V4.4)
No ratings yet
Mongodb Interview Questions (V4.4)
25 pages
5.AWS Lambda PDF
No ratings yet
5.AWS Lambda PDF
27 pages
Spark Interview Questions
100% (1)
Spark Interview Questions
7 pages
Spark SQL
No ratings yet
Spark SQL
24 pages
Key Features: General-Purpose Fast Cluster Computing Platform
No ratings yet
Key Features: General-Purpose Fast Cluster Computing Platform
16 pages
AWS-Archi-SERVERLESS MULTI-TIER ARCHITECTURE
No ratings yet
AWS-Archi-SERVERLESS MULTI-TIER ARCHITECTURE
7 pages
Apache Spark Analytics Made Simple PDF
No ratings yet
Apache Spark Analytics Made Simple PDF
76 pages
AWS Master Class-1
No ratings yet
AWS Master Class-1
164 pages
Details of Delta Lake Tutorial
67% (3)
Details of Delta Lake Tutorial
43 pages
L02 - Spark SQL For Data Processing: CBG1C04 Big Data Programming
No ratings yet
L02 - Spark SQL For Data Processing: CBG1C04 Big Data Programming
23 pages
Microsoft Certified: Azure Data Engineer Associate - Skills Measured
No ratings yet
Microsoft Certified: Azure Data Engineer Associate - Skills Measured
4 pages
ADB Course Catalog
No ratings yet
ADB Course Catalog
84 pages
Databricks Practice Questions
No ratings yet
Databricks Practice Questions
83 pages
Lakehouse: A Unified Data Architecture
No ratings yet
Lakehouse: A Unified Data Architecture
9 pages
Top 200 Data Engineer Interview Question PDF
100% (4)
Top 200 Data Engineer Interview Question PDF
482 pages
Advanced Spark Training
0% (1)
Advanced Spark Training
49 pages
Snowflake Interview 2024 03
100% (1)
Snowflake Interview 2024 03
167 pages
Ultimate AWS Certified Solutions Architect Associate Exam Guide: Master Designing Resilient, Scalable Architectures with Core and Advanced AWS Services to Crack the SAA-C03 Certification (English Edition)
From Everand
Ultimate AWS Certified Solutions Architect Associate Exam Guide: Master Designing Resilient, Scalable Architectures with Core and Advanced AWS Services to Crack the SAA-C03 Certification (English Edition)
Otieno Ododa
No ratings yet
Apache Spark 2.x Cookbook
From Everand
Apache Spark 2.x Cookbook
Rishi Yadav
No ratings yet
Learning Kibana 5.0
From Everand
Learning Kibana 5.0
Bahaaldine Azarmi
No ratings yet
PySpark Essentials: A Practical Guide to Distributed Computing
From Everand
PySpark Essentials: A Practical Guide to Distributed Computing
Robert Johnson
No ratings yet
BDA305 NEW LAUNCH! Intro To Amazon Redshift Spectrum - Now Query Exabytes of Data in S3!1!20
No ratings yet
BDA305 NEW LAUNCH! Intro To Amazon Redshift Spectrum - Now Query Exabytes of Data in S3!1!20
20 pages
Amazon Redshift Best Practices
No ratings yet
Amazon Redshift Best Practices
47 pages
Software Project Management 3 1 0 4 Unit I
No ratings yet
Software Project Management 3 1 0 4 Unit I
1 page
M.B.A (CBCS Pattern) (For The Affiliated College Students Admitted During The Academic Year 20012-13&onwards) Examinations
No ratings yet
M.B.A (CBCS Pattern) (For The Affiliated College Students Admitted During The Academic Year 20012-13&onwards) Examinations
11 pages
21.mass Communication 22.advertising Management
No ratings yet
21.mass Communication 22.advertising Management
6 pages
MBA - Full Time (Regular) - Regulation & Syllabus: Bharathiar University: Coimbatore-46
No ratings yet
MBA - Full Time (Regular) - Regulation & Syllabus: Bharathiar University: Coimbatore-46
59 pages
P 17 Rev
No ratings yet
P 17 Rev
61 pages
Pganalyze - Best Practices For Optimizing Postgres Query Performance
100% (1)
Pganalyze - Best Practices For Optimizing Postgres Query Performance
26 pages
Logsene Brochure PDF
No ratings yet
Logsene Brochure PDF
24 pages
Postgresql'S Io Subsystem: Problems, Workarounds, Solutions: Andres Freund Postgresql Developer & Committer
No ratings yet
Postgresql'S Io Subsystem: Problems, Workarounds, Solutions: Andres Freund Postgresql Developer & Committer
23 pages
PostgreSQL Backups The Modern Way
No ratings yet
PostgreSQL Backups The Modern Way
50 pages
Software Quality Assurance Guide For Use With DOE O 414.1D, Quality Assurance
No ratings yet
Software Quality Assurance Guide For Use With DOE O 414.1D, Quality Assurance
131 pages
BA4T6F International Financial Management
No ratings yet
BA4T6F International Financial Management
1 page
Kubernetes DevOps Expert
No ratings yet
Kubernetes DevOps Expert
2 pages
Brkewn 2026
No ratings yet
Brkewn 2026
187 pages
Az-305 00
0% (1)
Az-305 00
10 pages
SDDC Poster 0801417 ANSI-E
No ratings yet
SDDC Poster 0801417 ANSI-E
2 pages
CMPTL Data Sheet Data Refinery
No ratings yet
CMPTL Data Sheet Data Refinery
4 pages
Article 1: Enterprise Resource Planning and Business Model Innovation: Process, Evaluation and Outcome
No ratings yet
Article 1: Enterprise Resource Planning and Business Model Innovation: Process, Evaluation and Outcome
8 pages
VCF 301 Administering
100% (1)
VCF 301 Administering
133 pages
HUAWEI CloudEngine 6800 Switch Datasheet
No ratings yet
HUAWEI CloudEngine 6800 Switch Datasheet
12 pages
OpenSAP s4h34 Week 3 All Slides
No ratings yet
OpenSAP s4h34 Week 3 All Slides
82 pages
Pwcs Annual Digital Iq Survey
No ratings yet
Pwcs Annual Digital Iq Survey
9 pages
Iot - MCQ
No ratings yet
Iot - MCQ
16 pages
GCP Fund Module 7 Developing, Deploying, and Monitoring in The Cloud
No ratings yet
GCP Fund Module 7 Developing, Deploying, and Monitoring in The Cloud
15 pages
Test-paper-TW_Gagangowda H K
No ratings yet
Test-paper-TW_Gagangowda H K
6 pages
Jaswinder Singh-Salesforce Developer-Resume2 (1) Latest
No ratings yet
Jaswinder Singh-Salesforce Developer-Resume2 (1) Latest
2 pages
Magic Quadrant For Application Delivery Controllers
No ratings yet
Magic Quadrant For Application Delivery Controllers
11 pages
Unit 1
No ratings yet
Unit 1
17 pages
ITGC Cloud Checklist
No ratings yet
ITGC Cloud Checklist
23 pages
In-App Extensibility
No ratings yet
In-App Extensibility
31 pages
Quiz AWS
No ratings yet
Quiz AWS
2 pages
HBR Connected Products Summary
100% (1)
HBR Connected Products Summary
9 pages
5.3.2.8 Explore The Smart City
No ratings yet
5.3.2.8 Explore The Smart City
4 pages
World’s Largest Regulated Stablecoin _ Circle
No ratings yet
World’s Largest Regulated Stablecoin _ Circle
31 pages
Business Model Canvas
No ratings yet
Business Model Canvas
5 pages
Empowerment Technology: Quarter 1 - Module 8
No ratings yet
Empowerment Technology: Quarter 1 - Module 8
24 pages
4 2
No ratings yet
4 2
4 pages
Everest Group Peak Matrix Intelligent Document Processing Technology Vendors Focus On Abbyy 2020
No ratings yet
Everest Group Peak Matrix Intelligent Document Processing Technology Vendors Focus On Abbyy 2020
17 pages
Ashok Kumar Interview
No ratings yet
Ashok Kumar Interview
5 pages