0% found this document useful (0 votes)

96 views47 pages

Amazon Redshift Best Practices

The document outlines best practices for using Amazon Redshift, focusing on data models, table design, data types, and workload management. Key recommendations include using STAR schemas, appropriate data types, and optimizing table design through automatic adjustments of sort keys and distribution styles. It emphasizes the importance of materialized views for improving query performance and provides guidelines for effective data storage and management.

Uploaded by

meetlahuja

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

96 views47 pages

Amazon Redshift Best Practices

Uploaded by

meetlahuja

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Amazon Redshift

Best Practices

Saman Irfan
Senior GTM Specialist Solutions Architect Analytics
AWS

© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark.
Discussion Topics
• Data Models Bradley Todd
Liberty Mutual, Technology Architect
• Table Design Best Practices Redshift allows us to quickly spin up
clusters and provide our data scientists
• Data Lake Modelling Best Practices with a fast and easy method to access
data and generate insights

• Workload Management Best Practices

• Data Loads Best Practices

• Multitenancy Architecture Best Practices

© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark.
Redshift: Use Popular Data Models

Redshift can be used with a number of data models including…

A commonly used data model with Amazon Redshift
STAR Highly is the STAR schema, which separates data into large
Snowflake Schema fact and dimension (dim) tables:
Schema Denormalized • Facts refer to specific events (e.g. order
Most Common Less Common submitted.) and fact tables hold summary
detail for those events. e.g. the high-level
attributes of an order submitted such as
order_id, order_dt, product_id, & total_cost Fact
tables use foreign keys to link to dim tables
• The dimensions that make up a fact often have
attributes themselves that are more efficiently
stored in separate dim tables. e.g. a fact might
contain a product_id, but the actual product
details would be contained in a separate
products dim table (e.g. product_price,
Best Practice: Avoid highly normalized models. Models such as 3NF resemble the height_cm, width_cm, & product_id are columns
STAR schema, but has much more table normalization and are typically more that might be found in a products dim table)
appropriate with OLTP systems

© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark.
Amazon Redshift data storage &
Data Types

© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark.
Data storage in Redshift
• Data loaded into Redshift is stored in Redshift Managed Storage (RMS), storage is columnar

• Structured and semi-structured data can be loaded

• Amazon Redshift is ANSI SQL and ACID compliant

• Does not require indexes or db hints. Leverages sort keys, distribution keys, compression instead, to
achieve fast performance through parallelism and efficient data storage

• Data is organized as: Namespace > database > schema > objects

Namespace (One per endpoint)

database1 database2 databaseN

schema1 schema2 schemaN schema1 schema10 schema20 schema1 schemaN

database database database database database database database database

code objects code objects code objects code objects code objects code objects code objects code objects
objects objects objects objects objects objects objects objects

© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark.
Redshift Datatypes
Scalar Vector
Datatypes Datatype

Numeric Characters Datetime

BOOLEAN HLLSKETCH GEOMETRY VARBYTE SUPER
Types Types Types

Integer DECIMAL/ Floating

CHAR DATE
Type NUMERIC Point type

SMALLINT REAL VARCHAR TIME

DOUBLE
INT NCHAR TIMETZ
PRECISION

BIGINT TEXT TIMESTAMP

BPCHAR TIMESTAMPTZ

© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark.
C h oosin g D a ta T y p es: Best Pr a ctices
• Make columns only as wide as they need to be. Redshift
performance is about efficient I/O. Do not arbitrarily assign
maximum length/precision. This can slow down query execution
time.

• Use appropriate data types. Eg: Don’t store date as varchar

• Multibyte Characters - Use VARCHAR data type for UTF-8

multibyte characters support (up to a maximum of four bytes)

• Use GEOMETRY data type and spatial functions to store,

process and analyze Spatial data

• To improve performance of count distinct use HyperLogLog

Sketch datatype

• Use SUPER datatype to store semi-structured data and for

evolving schema & schema-less data
Additional Documentation
• Querying Spatial Data in Redshift
© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark.
Semi-structured data – SUPER datatype
Data type: SUPER
id name phones
SUPER
INTEGER SUPER

Easy, efficient, and powerful JSON processing [{"type":"work",

{"given":"Jane", "num":"9255550100"},
1
Fast row-oriented data ingestion "family":"Doe"} {"type":"cell",
"num": 6505550101} ]
{"given":"Richard",
Fast column-oriented analytics with "family":"Roe“, [{"type":"work",
2
materialized views over SUPER/JSON "middle":“John" "num": 5105550102}]
},

Access to schema-less nested data with

SELECT name.given AS firstname, name.middle as
easy-to-use SQL extensions powered middlename, ph.num
by the PartiQL query language FROM customers c, c.phones ph
WHERE ph.type = ‘work’;

Supports up to 16 MB of data for an individual firstname | middle | num

----------+---------------
SUPER field or object "Jane" | null | 9255550100
"Richard" | "John" | 5105550102

© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark.
SUPER datatype: Best Practices
• For low latency inserts or for small batch inserts, insert into SUPER. /*customer-orders-lineitem*/
Inserts into SUPER datatype are quicker. CREATE TABLE
customer_orders_lineitem
(c_custkey bigint
,c_name varchar
• If you join frequently using attributes stored in SUPER, create separate ,c_address varchar
,c_nationkey smallint
scalar datatype columns for those attributes to improve performance ,c_phone varchar
,c_acctbal decimal(12,2)
,c_mktsegment varchar
,c_comment varchar
• If you filter frequently using attributes stored in SUPER, create ,c_orders super );
separate scalar datatype columns for those attributes to improve JSON
{…}
usefulAttr1
performance
usefulAttr2

Complete
JSON record
• Use SUPER when your queries require strong consistency, predictable usefulAttrN

query performance, complex query support, and ease of use with

evolving schemas & schema-less data

• Use Redshift Spectrum instead of loading into SUPER, if data requires

integration with other AWS services (e.g. EMR)

© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark.
Table Design Best Practices

© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark.
Redshift table design
THREE MAIN CONCEPTS

Compression Distribution Sort

(Column Encoding) style keys

13
© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark.
Automatic table optimization
TABLE OPTIMIZATION DONE AUTOMATICALLY FOR YOU – NO MANUAL
INTERVENTION NEEDED

Continuously scans workload patterns

e
Automatically adjusts sort key, distribution
style and encoding over time to account for
changes in workload
Distribution Sort
Column Encryption
style keys
Can be enabled or disabled per
table/column

Optimizations are applied to Best Practice: Use auto options for compression,
tables/columns when load on distribution and sort keys
compute is less

Promotes ease of use, so that you

focus on business objectives rather
than database management

© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark.
Compression/Encoding
Goals Impact
• Allow more data to be stored • Allows 3x to 4x times more data to be stored
• Improve query performance by decreasing I/O

CREATE TABLE deep_dive (

aid INT ENCODE AUTO
,loc CHAR(3) ENCODE AUTO
,dt DATE ENCODE AZ64
);

aid loc dt
Column data is Blocks are individually A full block can
persisted to 1 MB encoded with 1 of contain millions
immutable blocks 13 encodings of values after
compression

© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark.
C omp r ession Best Pr a ctices
• Use default compression
• By default compression encoding is set to AUTO for all columns in
a table, which means Redshift automatically determines the best
compression encoding for that column
• Rely on that default compression

• In case you decide to fine-tune by choosing column encoding yourselves:

• Use AZ64 where possible
• Use ZSTD / LZO for high cardinality (VAR)CHAR columns
• Use BYTEDICT for low cardinality (VAR)CHAR columns

© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark.
Distribution Style
Distribution style is a table property that dictates how that table’s data is distributed

Goal Impact
• Distribute data evenly for parallel processing • Minimizes data redistribution by achieving
• Minimize data movement during query processing collocation

KEY: Value is hashed; same value goes to same location (slice)

EVEN: Round robin distribution
ALL: Full table data is made available in first slice for all compute node
AUTO: Redshift automatically manages distribution style

KEY ALL
ke
EVEN
yD
yA

key
B
key
ke

Slice Slice Slice Slice Slice Slice Slice Slice Slice Slice Slice Slice
1 2 3 4 1 2 3 4 1 2 3 4

Node 1 Node 2 Node 1 Node 2 Node 1 Node 2

© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark.
D istr ib ution Best Pr a ctices
• Use KEY style distribution for tables that are frequently joined
• Use high cardinality join column as the distribution key.
• Avoid date columns as the distribution key.
• When joining fact table with multiple dimension tables, use the same
distribution key for fact table and the large dimension table for co-
located join.
• Use ALL style distribution for small tables, <= 5 Million rows
• Use EVEN style distribution if a table is largely denormalized and does
not participate in joins, or if you don't have a clear choice for another
distribution style

© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark.
Data sorting
Redshift uses sort keys to physically order data on disk

Goal Impact
• Make queries run faster by increasing • Enables range-restricted scans
the effectiveness of zone maps and to prune blocks by leveraging
reducing I/O zone maps

SELECT count(*) FROM deep_dive WHERE dt = '06-09-2020';

Unsorted table Sorted by “dt”
MIN: 01-JUNE-2020 MIN: 01-JUNE-2020
MAX: 20-JUNE-2020 MAX: 06-JUNE-2020

MIN: 08-JUNE-2020 MIN: 07-JUNE-2020

MAX: 30-JUNE-2020 MAX: 12-JUNE-2020
3x I/0 1x I/0
MIN: 12-JUNE-2020 MIN: 13-JUNE-2020
MAX: 20-JUNE-2020 MAX: 21-JUNE-2020

MIN: 02-JUNE-2020 MIN: 21-JUNE-2020

MAX: 25-JUNE-2020 MAX: 30-JUNE-2020

© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark.
Sor t K ey Best Pr a ctices
• Don’t use sort keys for small tables <= 5 Million rows
• In case you decide to fine-tune by choosing a sort key yourselves for a large table,
• Pick the column/s that are most commonly used in filters as SORT KEY. For eg: If you query
most recent data frequently, date is the most appropriate sort key
• If you frequently join a table, specify the join column as both the sort key and the distribution
key on both tables. This results in merge join which is faster than the otherwise hash join.
• Don’t pick more than 4 columns to be in SORT KEY. When there are more than 4, there is no
added benefit from the additional columns
• When there are more than one column in SORT KEY, their order matters
• Effective sort key order is lower to higher cardinality
• Low cardinality columns come first high cardinality columns come last
• Always use the leading sort key column in the filter condition
• Don’t apply compression encoding on sort key columns
• Don’t apply functions in queries when using SORT KEY in filters. For Eg: If business_date column
is SORT KEY, don’t apply a filter to_char(business_date,’YYYY’) = ‘2023’

© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark.
Materialized Views

© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark.
Materialized Views
• Improve performance of complex, SLA sensitive, predictable and
repeated queries using Materialized views
• Materialized view persists the result set of the associated SQL
Redshift Materialized Views
• Materialized views can be refreshed automatically or manually
Materialized views can be created using the
• Redshift automatically determines best way to update data in
CREATE statement, and can be included
the materialized view (incremental or full refresh)
(default) or excluded from Redshift backups.
• Automatic query rewrite leverages relevant materialized views Materialized views can also have table
and can improve query performance by order(s) of magnitude attributes such as dist style and sort keys, and
• Automated materialized views: Redshift continuously monitors be refreshed at any time
workload to identify queries that will benefit from having a MV
CREATE MATERIALIZED VIEW mv_name
and automatically creates and manages MVs for them
[ BACKUP { YES | NO } ]
• Incremental Materialized views on external data lake tables: [ table_attributes ]
Materialized views in Redshift offer cost-effective incremental AS query
updates for external data lake tables, avoiding full re-
computation. REFRESH MATERIALIZED VIEW mv_name;

© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark.
Materialized Views – Best Practices
• Create materialized views that can be incrementally refreshed in order
to avoid full refresh.
• Schedule manual refresh for nested materialized views or those not
eligible for auto refresh.
• Follow query best practices when writing Materialized View queries.
• Follow table design best practices on distribution style and sort key
when creating the Materialized View.

© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark.
Data Lake Modelling
Best Practices

© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark.
Data modeling for Data Lakes Queries
• With Data Lakes, tables are collections of files

Data Lake Partitions

• Files can be organized as partitions
For example: Given a table mytable in the s3 location
Partition:
s3://mybucket/prefix/mytable: With data
organized under prefixes, 2023, 2022, 2021, 2020 etc, s3://mybucket/prefix/mytable/
The years become partitions for mytable
s3://mybucket/prefix/mytable/yyyy=2023
• Partitions are based on S3 prefix s3://mybucket/prefix/mytable/yyyy=2022
• Tables may have thousands of partitions s3://mybucket/prefix/mytable/yyyy=2021
s3://mybucket/prefix/mytable/yyyy=2020
File type, file size and the way files are organized, significantly
impacts performance of data lake queries.
Redshift supports read data in:
• Open file formats: Parquet, ORC, JSON, CSV, etc.
• Open table formats: Apache HUDI, Apache Iceberg, Delta,
etc.
© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark.
Data Lake Queries Best practices
• Consider columnar format (e.g. Parquet, ORC) for performance and cost.
• With columnar formats, Amazon Redshift reads only the needed columns thereby reducing query cost.

• Set the table statistics (numRows) manually for Amazon S3 external tables.

• Avoid very large size files (> than 512 MB) and large number of small KB sized files.
• Supports parallel reads – between 128 MB and 1 GB.
• Does not support parallel reads – between 64 MB and 128 MB.

• Partition data on S3 and use frequently filtered columns as partition key.

• Avoid excessively granular partitions.
• Columns that are used as common filters are good candidates.
• Multilevel partitioning is encouraged if you frequently use more than one predicate. Eg: you can partition
based on both SHIPDATE and STORE.
• Create Glue partition Indexes to improve performance of partition pruning.

© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark. 29
Data Lake Queries Best practices
• Optimize query cost using query monitoring rules (QMR) such as spectrum_scan_size_mb or
spectrum_scan_row_count and also set query performance boundaries on data lake queries .

• Use GROUP BY clause - Replace complex DISTINCT operations with GROUP BY in your queries.

• Choose the right datatype when creating external tables.

• Choose varchar(<<appropriate_length>>) instead of varchar(max).
• Choose the datatype date instead of varchar for dates.

• Monitor and control your Amazon Redshift Spectrum usage and costs using usage limits.

© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark. 30
Amazon Redshift Data Sharing
Best Practices

© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark.
Data Sharing in Amazon Redshift
Hub and spoke
• Secure, live data access across clusters, account, and regions

Sharing Levels
• Database, schemas, tables, views and SQL UDFs
• Fine-grained access control

Data Consistency
• Transactional consistency across producer and consumer clusters
• Immediate availability of committed changes Data mesh

Multi-data warehouse writes (New)

• Write to shared databases from multiple warehouses
• Instant data availability across warehouses upon commit
• Flexible scaling of write workload (ETL, data processing)
• Secure collaboration on live data for use cases like customer 360

© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark.
Data Sharing Queries Best practices
Performance Best Practices
• Size consumer cluster compute capacity appropriately for read query performance.
• For frequently updated data, create and share materialized views from the producer cluster.
• For slowly changing data, share tables and build materialized views on the consumer cluster.
• Be aware of potential performance differences in cross-region data sharing due to network latency.
• Utilize Concurrency Scaling on both producer and consumer clusters for read/write operations.
• Use VACUUM RECLUSTER instead of full VACUUM for maintenance, especially on large shared objects.

Security Best Practices

• Use the IncludeNew option cautiously; default to FALSE for fine-grained control over shared objects.
• Implement fine-grained access control using Late Binding Views or Materialized Views on the consumer
cluster.
• Ensure Redshift clusters are encrypted when sharing data with Redshift Serverless or across accounts.
• Utilize different KMS keys for producer and consumer clusters if needed, as data sharing supports this
configuration.

© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark. 33
Workload management
Best Practices

© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark.
Workload management
Allows for the separation of different query workloads

Goal
Prioritize important queries
Throttle / abort less important queries

Queue SQA Concurrency scaling Query Monitoring Rules

Service class that users When enabled,
interact with Amazon Redshift Protects against
Automatically detect automatically adds wasteful use of the
Logical separation of short-running queries transient clusters, compute resources
user workloads and runs them within in seconds, to serve
the short query queue, sudden spike in
if queuing occurs Rules applied to a WLM
If you run ETL, concurrent requests queue allow queries to
dashboards and adhoc with consistently fast be: LOGGED, ABORTED,
queries, create 3 performance HOPPED
queues, one for each

© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark.
Types of workload management
Manual WLM Auto WLM

Memory allocation Manual and static Automatic and dynamic

Concurrency Manual and static Automatic and dynamic
Prioritization Cannot be done Can be done at queue level
De-prioritization Cannot be done Can be done at query level using QMR
SQA Can be enabled manually Automatically enabled
Concurrency Scaling and QMR Configurable for each queue Configurable for each queue

© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark.
Workload management: Best Practices
• Use Auto WLM if your workload is highly unpredictable, or you are using default WLM.

• Use QMR on query_execution_time, query_temp_blocks_to_disk and spectrum_scan_size_mb or

spectrum_scan_row_count.
Only for Manual WLM
• Use manual WLM if you want to manually fine-tune and completely understand your workload
patterns or require throttling certain types of queries depending on the time of day.
• Keep #WLM queues to a minimum, typically just three queues, to avoid having unused queues.

• Limit ingestion/ELT concurrency to two to three.

• To maximize query throughput, use ensure the total concurrent queries is 15 or less.

• Save the superuser queue for administration tasks and canceling queries.

© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark.
Data Ingestion: AWS Services
Batch Near Real Time
• COPY command
S3 file • Redshift Spectrum
• COPY job – S3 triggered

Streaming • Streaming Ingestion (Amazon

sources Kinesis, Amazon MSK)

Transactional • Zero ETL Integration from Aurora

• AWS Database Migration Service
databases • AWS Database Migration Service

Amazon Data
• Third Party Data available in ADX • Third Party Data available in ADX
Exchange (ADX)

SaaS Applications • Amazon AppFlow • Amazon AppFlow

© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark. 39
COPY Command
§ Used to ingest data from
RA3.4XL compute
§ Amazon S3 (most common source)
§ Amazon EMR 0 1 2 3

§ Amazon DynamoDB
§ Remote host (SSH)

1 input file

§ Can ingest files in various formats, compression

schemes
§ File format: CSV, JSON, Avro, Parquet, ORC etc RA3.4XL compute

§ Compression Options: BZIP2, GZIP, LZOP, ZSTD 0 1 2 3

§ Encryption

§ One compute slice can process one file

§ COPY continues to scale linearly as you add more
compute
4 input files
© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark.
COPY JOB
• Extension of COPY command for automated S3 data
loading

• Key Features: COPY JOB Syntax

• Detects and loads new S3 files without manual
COPY public.target_table
intervention
FROM 's3://amzn-s3-demo-
• Uses original COPY parameters and prevents file bucket/staging-folder'
duplication IAM_ROLE
• Tracks loaded files to ensure one-time loading 'arn:aws:iam::123456789012:role/My
LoadRoleName'
• Only ingests files created after job creation
JOB CREATE my_copy_job_name
• Jobs don't run when cluster is paused AUTO ON;

© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark. 41
COPY Command Best Practices
§ Use COPY command to load data whenever possible.
RA3.4XL compute
§ Use a single COPY command per table.
0 1 2 3
§ When using COPY, avoid loading from many small files or large non-
splittable files.

§ If COPY is not possible, do bulk inserts using INSERT statement. Avoid

single row inserts. 1 input file

• Use COPY JOB for automated/incremental loading of data from

Amazon S3
RA3.4XL compute
• In using COPY command, optimal file size are:
0 1 2 3
• For non-splittable file when each file is between 1MB-1GB each after
compression
• For splittable file when each file size is:
• Between 128MB-1GB for columnar files, specifically Parquet and
ORC.
• Between 64MB-10GB for row-oriented (CSV) data that do not use
any these keywords – REMOVEQUOTES, ESCAPE and FIXEDWIDTH. 4 input files
© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark.
Data Loading Best Practices
§ Load your data in sort key order to avoid needing to vacuum.

§ For large amounts of data, load in small sequential blocks according to sort order:
§ eliminates the need to vacuum.
§ you use much less intermediate sort space during each load, and makes it easier to
restart if the COPY fails and is rolled back.

§ For data with fixed retention period, organize your data as a sequence of time-series
tables.

§ Use MERGE statement to perform upserts.

§ Enforce Primary, Unique or Foreign Key constraints in ETL.

§ Wrap workflow/statements in an explicit transaction.

§ Consider using TRUNCATE instead of DELETE.

© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark.
(AUTO) VACUUM
• The VACUUM process runs either manually or automatically in the background
• Goals
• VACUUM will remove rows that are marked as deleted
• VACUUM will globally sort tables
• For tables with a sort key, ingestion operations will locally sort new data and write it into the
unsorted region

• Best practices
• VACUUM should be run only as necessary
• For the majority of workloads, AUTO VACUUM DELETE will reclaim space and
AUTO TABLE SORT will sort the needed portions of the table
• In cases where you know your workload – VACUUM can be run manually
• Run vacuum operations on a regular schedule
• Perform vacuum re-cluster on large tables

§ In the vast majority of cases, AUTO ANALYZE automatically handles statistics gathering

• Best practices

• For the majority of workloads, AUTO ANALYZE will collect statistics

• ANALYZE can be run periodically after ingestion on just the columns that WHERE
predicates are filtered on
• Analyze after VACUUM
• Utility to manually run VACUUM and ANALYZE on all the tables in the cluster:
https://bit.ly/34ZR3PP

© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark.
Amazon Redshift Advisor
To improve the performance and decrease the operating costs, Amazon Redshift Advisor offers specific
recommendations by analyzing performance and usage metrics. Advisor ranks recommendations by order
of impact.
• Amazon Redshift Advisor available in
Amazon Redshift console

• Runs daily scanning operational metadata

• Observes with the lens of best practices

• Provides tailored, high-impact recommendations to

optimize your Amazon Redshift cluster for
performance and cost savings

© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark.
Multi Tenant Strategy
Multi Cluster Model Multi Database Model Multi Schema Model Multi ID Model

Cluster1/ Cluster2/
Workgroup1 Workgroup2

Data sharing

Workload Type • Completely Isolated workloads, yet Workloads having different Workloads requiring Workloads requiring
share data among tenants • security policies • common security policies • same storage constructs. Access
• Distributed data repositories • Isolation levels, collation • same isolation & collation control with RLS
• frequent queries across tenants • Same tables, views across
tenants.
Scalability Highly scalable model Scaling is limited to cluster Scaling is limited to cluster Scaling is limited to cluster

Cross-tenant R/W Supported for Reads. No writes Supported for Reads. No writes Both Reads/Writes Both Reads/Writes

Examples • Separating ETL, BI workloads • Independent business units with • Multiple departments, functional Each tenant is a persona accessing
• Dev, QA, PROD DBs no/limited cross querying units often cross query data from same storage structure
• Banking DB and Insurance DB • Sales, Finance, Marketing

Most frequently used and suitable for DW workloads

Deep Dive and Best Practices For Amazon Redshift ANT418
100% (1)
Deep Dive and Best Practices For Amazon Redshift ANT418
85 pages
Amazon Red Shift
No ratings yet
Amazon Red Shift
54 pages
AWS Redshift for Data Engineers
No ratings yet
AWS Redshift for Data Engineers
8 pages
Data Warehouse
No ratings yet
Data Warehouse
42 pages
Deep Dive On AWS Redshift
67% (3)
Deep Dive On AWS Redshift
73 pages
Partnercast - Amazon Redshift Super Class - Session 1 - Nov - 2022
No ratings yet
Partnercast - Amazon Redshift Super Class - Session 1 - Nov - 2022
74 pages
Session 4 - Day 2 Amazon Redshift Overview and Architecture-1-20
No ratings yet
Session 4 - Day 2 Amazon Redshift Overview and Architecture-1-20
20 pages
Redshift Best Practices
No ratings yet
Redshift Best Practices
17 pages
AWS S3, IAM, EC2, EMR, Redshift Overview
100% (1)
AWS S3, IAM, EC2, EMR, Redshift Overview
16 pages
Getting Started With Amazon Redshift
No ratings yet
Getting Started With Amazon Redshift
51 pages
Migrate Your On-Premise Data Warehouse To Amazon Redshift: Noman Jaffery
100% (1)
Migrate Your On-Premise Data Warehouse To Amazon Redshift: Noman Jaffery
18 pages
AWS Redshift
No ratings yet
AWS Redshift
145 pages
Amazon Redshift论文
No ratings yet
Amazon Redshift论文
13 pages
Data Engineering 101 Redshift
No ratings yet
Data Engineering 101 Redshift
65 pages
AWS Data Engineering Cheatsheet2
No ratings yet
AWS Data Engineering Cheatsheet2
27 pages
BDA305 NEW LAUNCH! Intro To Amazon Redshift Spectrum - Now Query Exabytes of Data in S3!1!20
No ratings yet
BDA305 NEW LAUNCH! Intro To Amazon Redshift Spectrum - Now Query Exabytes of Data in S3!1!20
20 pages
Introductiontoamazonredshiftwebinar 130322140336 Phpapp01
No ratings yet
Introductiontoamazonredshiftwebinar 130322140336 Phpapp01
32 pages
Partnercast - Amazon Redshift Super Class - Session 2 - Nov 2022
No ratings yet
Partnercast - Amazon Redshift Super Class - Session 2 - Nov 2022
75 pages
Amazon Redshift - Analyze Data Across Your Lake House With Amazon Redshift
No ratings yet
Amazon Redshift - Analyze Data Across Your Lake House With Amazon Redshift
48 pages
Amazon AWS Redshift Overview
No ratings yet
Amazon AWS Redshift Overview
3 pages
Amazon Redshift
No ratings yet
Amazon Redshift
5 pages
Amazon Redshift Overview and Features
No ratings yet
Amazon Redshift Overview and Features
20 pages
Cheat Sheet - Redshift Performance Optimization
No ratings yet
Cheat Sheet - Redshift Performance Optimization
17 pages
Redshift-Developer Guide
No ratings yet
Redshift-Developer Guide
1,552 pages
Session 4 - Day 2 Amazon Redshift Overview and Architecture-51-60
No ratings yet
Session 4 - Day 2 Amazon Redshift Overview and Architecture-51-60
10 pages
Top AWS Redshift Interview Q&A
No ratings yet
Top AWS Redshift Interview Q&A
21 pages
Redshift DG PDF
100% (1)
Redshift DG PDF
1,161 pages
Redshift DG
No ratings yet
Redshift DG
871 pages
Amazon Redhsift
No ratings yet
Amazon Redhsift
25 pages
Amazon Redshift Database Developer Guide
No ratings yet
Amazon Redshift Database Developer Guide
783 pages
AWS Databases for Cloud Practitioner Exam
No ratings yet
AWS Databases for Cloud Practitioner Exam
10 pages
DynamoDB DeepDive
No ratings yet
DynamoDB DeepDive
100 pages
Module 4
No ratings yet
Module 4
38 pages
An Introduction To Amazon Redshift
No ratings yet
An Introduction To Amazon Redshift
10 pages
ANT205 R Achieving Your Modern Data Architecture
No ratings yet
ANT205 R Achieving Your Modern Data Architecture
71 pages
Amazon Redshift Interview Questions
100% (1)
Amazon Redshift Interview Questions
4 pages
Big Data Architectural Patterns and Best Practices On AWS Presentation
100% (1)
Big Data Architectural Patterns and Best Practices On AWS Presentation
56 pages
Session 4 - Day 2 Amazon Redshift Overview and Architecture-21-40
No ratings yet
Session 4 - Day 2 Amazon Redshift Overview and Architecture-21-40
20 pages
Orchestrate Redshift ETL Using AWS Glue and Step Functions: You Will Learn
No ratings yet
Orchestrate Redshift ETL Using AWS Glue and Step Functions: You Will Learn
4 pages
Amazon Redshift: Cloud Data Warehouse Guide
No ratings yet
Amazon Redshift: Cloud Data Warehouse Guide
9 pages
Amazon Redshift Cost Optimization Guide
No ratings yet
Amazon Redshift Cost Optimization Guide
12 pages
Aws Sol Mod 5
No ratings yet
Aws Sol Mod 5
24 pages
Amazon Redshift Interview Q&A Guide
50% (4)
Amazon Redshift Interview Q&A Guide
112 pages
Redshift-DA Handout
No ratings yet
Redshift-DA Handout
121 pages
Redshift DG
No ratings yet
Redshift DG
735 pages
AWS Database Services Overview
No ratings yet
AWS Database Services Overview
9 pages
Redshift DG
No ratings yet
Redshift DG
733 pages
Module 5
No ratings yet
Module 5
6 pages
AWS Certified Big Data Specialty Exam
No ratings yet
AWS Certified Big Data Specialty Exam
13 pages
Module 3 - Databases - On - AWS
No ratings yet
Module 3 - Databases - On - AWS
59 pages
Gangboard Admin: Amazon Redshift Interview Questions and Answers
No ratings yet
Gangboard Admin: Amazon Redshift Interview Questions and Answers
112 pages
Enterprise Data Warehousing On Aws
No ratings yet
Enterprise Data Warehousing On Aws
26 pages
Amazon Refshift Book PDF
No ratings yet
Amazon Refshift Book PDF
549 pages
AWS Databases: RDS, DynamoDB, Redshift, Aurora
No ratings yet
AWS Databases: RDS, DynamoDB, Redshift, Aurora
10 pages
AWS Databases for Businesses
No ratings yet
AWS Databases for Businesses
15 pages
CloudFoundations - 08b - Databases - Dynamo DB, Redshift, Aurora
No ratings yet
CloudFoundations - 08b - Databases - Dynamo DB, Redshift, Aurora
33 pages
Amazon's Shift to Redshift
No ratings yet
Amazon's Shift to Redshift
17 pages
1605192076066-614 DAS-C01 Study Guide
No ratings yet
1605192076066-614 DAS-C01 Study Guide
18 pages
DATA HANDLING in Python
No ratings yet
DATA HANDLING in Python
18 pages
Course On VBA
No ratings yet
Course On VBA
44 pages
Review of Python Basics End Game
No ratings yet
Review of Python Basics End Game
48 pages
Modul 1 Advance Java & J2EE
No ratings yet
Modul 1 Advance Java & J2EE
30 pages
Java-Arrays Compress
No ratings yet
Java-Arrays Compress
5 pages
Data Structures: ArrayList Overview
No ratings yet
Data Structures: ArrayList Overview
35 pages
Here Are The Answers To The C Programming Basics
No ratings yet
Here Are The Answers To The C Programming Basics
2 pages
Introduction To C Programming Notes Module 1
No ratings yet
Introduction To C Programming Notes Module 1
24 pages
Data Types Oracle
No ratings yet
Data Types Oracle
50 pages
Python Unit 1 To 5
No ratings yet
Python Unit 1 To 5
70 pages
Lecture 1 - Introduction
No ratings yet
Lecture 1 - Introduction
12 pages
Fronius Solar API V1 Guide
No ratings yet
Fronius Solar API V1 Guide
85 pages
Java Programming Course Overview
No ratings yet
Java Programming Course Overview
116 pages
Understanding Arrays in C++ Programming
No ratings yet
Understanding Arrays in C++ Programming
66 pages
.NET Framework Glossary
No ratings yet
.NET Framework Glossary
39 pages
Local vs Global Variables in JavaScript
No ratings yet
Local vs Global Variables in JavaScript
6 pages
IDS Notes Unit 1
No ratings yet
IDS Notes Unit 1
22 pages
Python Keywords
No ratings yet
Python Keywords
6 pages
Data Visualization & Preprocessing Guide
No ratings yet
Data Visualization & Preprocessing Guide
18 pages
Asp.n Et
No ratings yet
Asp.n Et
94 pages
MS Access 5 - Data Types
No ratings yet
MS Access 5 - Data Types
3 pages
Programming Languages: Type Systems: Onur Tolga S Ehito Glu
No ratings yet
Programming Languages: Type Systems: Onur Tolga S Ehito Glu
18 pages
Structured Programming Exam Answers
No ratings yet
Structured Programming Exam Answers
12 pages
Computer Science Basics for Class XI
No ratings yet
Computer Science Basics for Class XI
3 pages
Siemens Simatic S 7 300 - 400 - Working With STEP 7
100% (15)
Siemens Simatic S 7 300 - 400 - Working With STEP 7
110 pages
VHDL Entity and Architecture Declarations
No ratings yet
VHDL Entity and Architecture Declarations
12 pages
Mastering MySQL A Comprehensive Guide
No ratings yet
Mastering MySQL A Comprehensive Guide
10 pages
S7-1200 Bit Logic and Timer Instructions
No ratings yet
S7-1200 Bit Logic and Timer Instructions
16 pages
Java Object-Oriented Programming Overview
No ratings yet
Java Object-Oriented Programming Overview
8 pages
Declaring Constants in Java Explained
No ratings yet
Declaring Constants in Java Explained
31 pages

Amazon Redshift Best Practices

Uploaded by

Amazon Redshift Best Practices

Uploaded by

Amazon Redshift

• Workload Management Best Practices

• Data Loads Best Practices

• Multitenancy Architecture Best Practices

Redshift can be used with a number of data models including…

• Structured and semi-structured data can be loaded

Namespace (One per endpoint)

database1 database2 databaseN

schema1 schema2 schemaN schema1 schema10 schema20 schema1 schemaN

database database database database database database database database

Numeric Characters Datetime

Integer DECIMAL/ Floating

SMALLINT REAL VARCHAR TIME

BIGINT TEXT TIMESTAMP

• Use appropriate data types. Eg: Don’t store date as varchar

• Multibyte Characters - Use VARCHAR data type for UTF-8

• Use GEOMETRY data type and spatial functions to store,

• To improve performance of count distinct use HyperLogLog

• Use SUPER datatype to store semi-structured data and for

Easy, efficient, and powerful JSON processing [{"type":"work",

Access to schema-less nested data with

Supports up to 16 MB of data for an individual firstname | middle | num

query performance, complex query support, and ease of use with

• Use Redshift Spectrum instead of loading into SUPER, if data requires

Compression Distribution Sort

Continuously scans workload patterns

Promotes ease of use, so that you

CREATE TABLE deep_dive (

• In case you decide to fine-tune by choosing column encoding yourselves:

KEY: Value is hashed; same value goes to same location (slice)

Node 1 Node 2 Node 1 Node 2 Node 1 Node 2

SELECT count(*) FROM deep_dive WHERE dt = '06-09-2020';

MIN: 08-JUNE-2020 MIN: 07-JUNE-2020

MIN: 02-JUNE-2020 MIN: 21-JUNE-2020

Data Lake Partitions

• Partition data on S3 and use frequently filtered columns as partition key.

• Choose the right datatype when creating external tables.

Multi-data warehouse writes (New)

Security Best Practices

Queue SQA Concurrency scaling Query Monitoring Rules

Memory allocation Manual and static Automatic and dynamic

• Use QMR on query_execution_time, query_temp_blocks_to_disk and spectrum_scan_size_mb or

• Limit ingestion/ELT concurrency to two to three.

Streaming • Streaming Ingestion (Amazon

Transactional • Zero ETL Integration from Aurora

SaaS Applications • Amazon AppFlow • Amazon AppFlow

§ Can ingest files in various formats, compression

§ Compression Options: BZIP2, GZIP, LZOP, ZSTD 0 1 2 3

§ One compute slice can process one file

• Key Features: COPY JOB Syntax

§ If COPY is not possible, do bulk inserts using INSERT statement. Avoid

• Use COPY JOB for automated/incremental loading of data from

§ Use MERGE statement to perform upserts.

§ Enforce Primary, Unique or Foreign Key constraints in ETL.

§ Wrap workflow/statements in an explicit transaction.

§ Consider using TRUNCATE instead of DELETE.

• For the majority of workloads, AUTO ANALYZE will collect statistics

• Runs daily scanning operational metadata

• Observes with the lens of best practices

• Provides tailored, high-impact recommendations to

Most frequently used and suitable for DW workloads

You might also like