0% found this document useful (0 votes)
96 views47 pages

Amazon Redshift Best Practices

The document outlines best practices for using Amazon Redshift, focusing on data models, table design, data types, and workload management. Key recommendations include using STAR schemas, appropriate data types, and optimizing table design through automatic adjustments of sort keys and distribution styles. It emphasizes the importance of materialized views for improving query performance and provides guidelines for effective data storage and management.

Uploaded by

meetlahuja
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
96 views47 pages

Amazon Redshift Best Practices

The document outlines best practices for using Amazon Redshift, focusing on data models, table design, data types, and workload management. Key recommendations include using STAR schemas, appropriate data types, and optimizing table design through automatic adjustments of sort keys and distribution styles. It emphasizes the importance of materialized views for improving query performance and provides guidelines for effective data storage and management.

Uploaded by

meetlahuja
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Amazon Redshift

Best Practices

Saman Irfan
Senior GTM Specialist Solutions Architect Analytics
AWS

© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark.
Discussion Topics
• Data Models Bradley Todd
Liberty Mutual, Technology Architect
• Table Design Best Practices Redshift allows us to quickly spin up
clusters and provide our data scientists
• Data Lake Modelling Best Practices with a fast and easy method to access
data and generate insights

• Workload Management Best Practices

• Data Loads Best Practices

• Multitenancy Architecture Best Practices

© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark.
Data Models

© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark.
Redshift: Use Popular Data Models

Redshift can be used with a number of data models including…


A commonly used data model with Amazon Redshift
STAR Highly is the STAR schema, which separates data into large
Snowflake Schema fact and dimension (dim) tables:
Schema Denormalized • Facts refer to specific events (e.g. order
Most Common Less Common submitted.) and fact tables hold summary
detail for those events. e.g. the high-level
attributes of an order submitted such as
order_id, order_dt, product_id, & total_cost Fact
tables use foreign keys to link to dim tables
• The dimensions that make up a fact often have
attributes themselves that are more efficiently
stored in separate dim tables. e.g. a fact might
contain a product_id, but the actual product
details would be contained in a separate
products dim table (e.g. product_price,
Best Practice: Avoid highly normalized models. Models such as 3NF resemble the height_cm, width_cm, & product_id are columns
STAR schema, but has much more table normalization and are typically more that might be found in a products dim table)
appropriate with OLTP systems

© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark.
Amazon Redshift data storage &
Data Types

© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark.
Data storage in Redshift
• Data loaded into Redshift is stored in Redshift Managed Storage (RMS), storage is columnar

• Structured and semi-structured data can be loaded


• Amazon Redshift is ANSI SQL and ACID compliant

• Does not require indexes or db hints. Leverages sort keys, distribution keys, compression instead, to
achieve fast performance through parallelism and efficient data storage

• Data is organized as: Namespace > database > schema > objects

Namespace (One per endpoint)

database1 database2 databaseN

schema1 schema2 schemaN schema1 schema10 schema20 schema1 schemaN

database database database database database database database database


code objects code objects code objects code objects code objects code objects code objects code objects
objects objects objects objects objects objects objects objects

© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark.
Redshift Datatypes
Scalar Vector
Datatypes Datatype

Numeric Characters Datetime


BOOLEAN HLLSKETCH GEOMETRY VARBYTE SUPER
Types Types Types

Integer DECIMAL/ Floating


CHAR DATE
Type NUMERIC Point type

SMALLINT REAL VARCHAR TIME

DOUBLE
INT NCHAR TIMETZ
PRECISION

BIGINT TEXT TIMESTAMP

BPCHAR TIMESTAMPTZ

© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark.
C h oosin g D a ta T y p es: Best Pr a ctices
• Make columns only as wide as they need to be. Redshift
performance is about efficient I/O. Do not arbitrarily assign
maximum length/precision. This can slow down query execution
time.

• Use appropriate data types. Eg: Don’t store date as varchar

• Multibyte Characters - Use VARCHAR data type for UTF-8


multibyte characters support (up to a maximum of four bytes)

• Use GEOMETRY data type and spatial functions to store,


process and analyze Spatial data

• To improve performance of count distinct use HyperLogLog


Sketch datatype

• Use SUPER datatype to store semi-structured data and for


evolving schema & schema-less data
Additional Documentation
• Querying Spatial Data in Redshift
© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark.
Semi-structured data – SUPER datatype
Data type: SUPER
id name phones
SUPER
INTEGER SUPER

Easy, efficient, and powerful JSON processing [{"type":"work",


{"given":"Jane", "num":"9255550100"},
1
Fast row-oriented data ingestion "family":"Doe"} {"type":"cell",
"num": 6505550101} ]
{"given":"Richard",
Fast column-oriented analytics with "family":"Roe“, [{"type":"work",
2
materialized views over SUPER/JSON "middle":“John" "num": 5105550102}]
},

Access to schema-less nested data with


SELECT name.given AS firstname, name.middle as
easy-to-use SQL extensions powered middlename, ph.num
by the PartiQL query language FROM customers c, c.phones ph
WHERE ph.type = ‘work’;

Supports up to 16 MB of data for an individual firstname | middle | num


----------+---------------
SUPER field or object "Jane" | null | 9255550100
"Richard" | "John" | 5105550102

© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark.
SUPER datatype: Best Practices
• For low latency inserts or for small batch inserts, insert into SUPER. /*customer-orders-lineitem*/
Inserts into SUPER datatype are quicker. CREATE TABLE
customer_orders_lineitem
(c_custkey bigint
,c_name varchar
• If you join frequently using attributes stored in SUPER, create separate ,c_address varchar
,c_nationkey smallint
scalar datatype columns for those attributes to improve performance ,c_phone varchar
,c_acctbal decimal(12,2)
,c_mktsegment varchar
,c_comment varchar
• If you filter frequently using attributes stored in SUPER, create ,c_orders super );
separate scalar datatype columns for those attributes to improve JSON
{…}
usefulAttr1
performance
usefulAttr2

Complete
JSON record
• Use SUPER when your queries require strong consistency, predictable usefulAttrN

query performance, complex query support, and ease of use with


evolving schemas & schema-less data

• Use Redshift Spectrum instead of loading into SUPER, if data requires


integration with other AWS services (e.g. EMR)

© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark.
Table Design Best Practices

© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark.
Redshift table design
THREE MAIN CONCEPTS

Compression Distribution Sort


(Column Encoding) style keys

13
© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark.
Automatic table optimization
TABLE OPTIMIZATION DONE AUTOMATICALLY FOR YOU – NO MANUAL
INTERVENTION NEEDED

Continuously scans workload patterns


e
Automatically adjusts sort key, distribution
style and encoding over time to account for
changes in workload
Distribution Sort
Column Encryption
style keys
Can be enabled or disabled per
table/column

Optimizations are applied to Best Practice: Use auto options for compression,
tables/columns when load on distribution and sort keys
compute is less

Promotes ease of use, so that you


focus on business objectives rather
than database management

© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark.
Compression/Encoding
Goals Impact
• Allow more data to be stored • Allows 3x to 4x times more data to be stored
• Improve query performance by decreasing I/O

CREATE TABLE deep_dive (


aid INT ENCODE AUTO
,loc CHAR(3) ENCODE AUTO
,dt DATE ENCODE AZ64
);

aid loc dt
Column data is Blocks are individually A full block can
persisted to 1 MB encoded with 1 of contain millions
immutable blocks 13 encodings of values after
compression

© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark.
C omp r ession Best Pr a ctices
• Use default compression
• By default compression encoding is set to AUTO for all columns in
a table, which means Redshift automatically determines the best
compression encoding for that column
• Rely on that default compression

• In case you decide to fine-tune by choosing column encoding yourselves:


• Use AZ64 where possible
• Use ZSTD / LZO for high cardinality (VAR)CHAR columns
• Use BYTEDICT for low cardinality (VAR)CHAR columns

© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark.
Distribution Style
Distribution style is a table property that dictates how that table’s data is distributed

Goal Impact
• Distribute data evenly for parallel processing • Minimizes data redistribution by achieving
• Minimize data movement during query processing collocation

KEY: Value is hashed; same value goes to same location (slice)


EVEN: Round robin distribution
ALL: Full table data is made available in first slice for all compute node
AUTO: Redshift automatically manages distribution style

KEY ALL
ke
EVEN
yD
yA

key
B
key
ke

Slice Slice Slice Slice Slice Slice Slice Slice Slice Slice Slice Slice
1 2 3 4 1 2 3 4 1 2 3 4

Node 1 Node 2 Node 1 Node 2 Node 1 Node 2

© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark.
D istr ib ution Best Pr a ctices
• Use KEY style distribution for tables that are frequently joined
• Use high cardinality join column as the distribution key.
• Avoid date columns as the distribution key.
• When joining fact table with multiple dimension tables, use the same
distribution key for fact table and the large dimension table for co-
located join.
• Use ALL style distribution for small tables, <= 5 Million rows
• Use EVEN style distribution if a table is largely denormalized and does
not participate in joins, or if you don't have a clear choice for another
distribution style

© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark.
Data sorting
Redshift uses sort keys to physically order data on disk

Goal Impact
• Make queries run faster by increasing • Enables range-restricted scans
the effectiveness of zone maps and to prune blocks by leveraging
reducing I/O zone maps

SELECT count(*) FROM deep_dive WHERE dt = '06-09-2020';


Unsorted table Sorted by “dt”
MIN: 01-JUNE-2020 MIN: 01-JUNE-2020
MAX: 20-JUNE-2020 MAX: 06-JUNE-2020

MIN: 08-JUNE-2020 MIN: 07-JUNE-2020


MAX: 30-JUNE-2020 MAX: 12-JUNE-2020
3x I/0 1x I/0
MIN: 12-JUNE-2020 MIN: 13-JUNE-2020
MAX: 20-JUNE-2020 MAX: 21-JUNE-2020

MIN: 02-JUNE-2020 MIN: 21-JUNE-2020


MAX: 25-JUNE-2020 MAX: 30-JUNE-2020

© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark.
Sor t K ey Best Pr a ctices
• Don’t use sort keys for small tables <= 5 Million rows
• In case you decide to fine-tune by choosing a sort key yourselves for a large table,
• Pick the column/s that are most commonly used in filters as SORT KEY. For eg: If you query
most recent data frequently, date is the most appropriate sort key
• If you frequently join a table, specify the join column as both the sort key and the distribution
key on both tables. This results in merge join which is faster than the otherwise hash join.
• Don’t pick more than 4 columns to be in SORT KEY. When there are more than 4, there is no
added benefit from the additional columns
• When there are more than one column in SORT KEY, their order matters
• Effective sort key order is lower to higher cardinality
• Low cardinality columns come first high cardinality columns come last
• Always use the leading sort key column in the filter condition
• Don’t apply compression encoding on sort key columns
• Don’t apply functions in queries when using SORT KEY in filters. For Eg: If business_date column
is SORT KEY, don’t apply a filter to_char(business_date,’YYYY’) = ‘2023’

© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark.
Materialized Views

© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark.
Materialized Views
• Improve performance of complex, SLA sensitive, predictable and
repeated queries using Materialized views
• Materialized view persists the result set of the associated SQL
Redshift Materialized Views
• Materialized views can be refreshed automatically or manually
Materialized views can be created using the
• Redshift automatically determines best way to update data in
CREATE statement, and can be included
the materialized view (incremental or full refresh)
(default) or excluded from Redshift backups.
• Automatic query rewrite leverages relevant materialized views Materialized views can also have table
and can improve query performance by order(s) of magnitude attributes such as dist style and sort keys, and
• Automated materialized views: Redshift continuously monitors be refreshed at any time
workload to identify queries that will benefit from having a MV
CREATE MATERIALIZED VIEW mv_name
and automatically creates and manages MVs for them
[ BACKUP { YES | NO } ]
• Incremental Materialized views on external data lake tables: [ table_attributes ]
Materialized views in Redshift offer cost-effective incremental AS query
updates for external data lake tables, avoiding full re-
computation. REFRESH MATERIALIZED VIEW mv_name;

© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark.
Materialized Views – Best Practices
• Create materialized views that can be incrementally refreshed in order
to avoid full refresh.
• Schedule manual refresh for nested materialized views or those not
eligible for auto refresh.
• Follow query best practices when writing Materialized View queries.
• Follow table design best practices on distribution style and sort key
when creating the Materialized View.

© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark.
Data Lake Modelling
Best Practices

© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark.
Data modeling for Data Lakes Queries
• With Data Lakes, tables are collections of files

Data Lake Partitions


• Files can be organized as partitions
For example: Given a table mytable in the s3 location
Partition:
s3://mybucket/prefix/mytable: With data
organized under prefixes, 2023, 2022, 2021, 2020 etc, s3://mybucket/prefix/mytable/
The years become partitions for mytable
s3://mybucket/prefix/mytable/yyyy=2023
• Partitions are based on S3 prefix s3://mybucket/prefix/mytable/yyyy=2022
• Tables may have thousands of partitions s3://mybucket/prefix/mytable/yyyy=2021
s3://mybucket/prefix/mytable/yyyy=2020
File type, file size and the way files are organized, significantly
impacts performance of data lake queries.
Redshift supports read data in:
• Open file formats: Parquet, ORC, JSON, CSV, etc.
• Open table formats: Apache HUDI, Apache Iceberg, Delta,
etc.
© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark.
Data Lake Queries Best practices
• Consider columnar format (e.g. Parquet, ORC) for performance and cost.
• With columnar formats, Amazon Redshift reads only the needed columns thereby reducing query cost.

• Set the table statistics (numRows) manually for Amazon S3 external tables.

• Avoid very large size files (> than 512 MB) and large number of small KB sized files.
• Supports parallel reads – between 128 MB and 1 GB.
• Does not support parallel reads – between 64 MB and 128 MB.

• Partition data on S3 and use frequently filtered columns as partition key.


• Avoid excessively granular partitions.
• Columns that are used as common filters are good candidates.
• Multilevel partitioning is encouraged if you frequently use more than one predicate. Eg: you can partition
based on both SHIPDATE and STORE.
• Create Glue partition Indexes to improve performance of partition pruning.

© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark. 29
Data Lake Queries Best practices
• Optimize query cost using query monitoring rules (QMR) such as spectrum_scan_size_mb or
spectrum_scan_row_count and also set query performance boundaries on data lake queries .

• Use GROUP BY clause - Replace complex DISTINCT operations with GROUP BY in your queries.

• Choose the right datatype when creating external tables.


• Choose varchar(<<appropriate_length>>) instead of varchar(max).
• Choose the datatype date instead of varchar for dates.

• Monitor and control your Amazon Redshift Spectrum usage and costs using usage limits.

© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark. 30
Amazon Redshift Data Sharing
Best Practices

© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark.
Data Sharing in Amazon Redshift
Hub and spoke
• Secure, live data access across clusters, account, and regions

Sharing Levels
• Database, schemas, tables, views and SQL UDFs
• Fine-grained access control

Data Consistency
• Transactional consistency across producer and consumer clusters
• Immediate availability of committed changes Data mesh

Multi-data warehouse writes (New)


• Write to shared databases from multiple warehouses
• Instant data availability across warehouses upon commit
• Flexible scaling of write workload (ETL, data processing)
• Secure collaboration on live data for use cases like customer 360

© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark.
Data Sharing Queries Best practices
Performance Best Practices
• Size consumer cluster compute capacity appropriately for read query performance.
• For frequently updated data, create and share materialized views from the producer cluster.
• For slowly changing data, share tables and build materialized views on the consumer cluster.
• Be aware of potential performance differences in cross-region data sharing due to network latency.
• Utilize Concurrency Scaling on both producer and consumer clusters for read/write operations.
• Use VACUUM RECLUSTER instead of full VACUUM for maintenance, especially on large shared objects.

Security Best Practices


• Use the IncludeNew option cautiously; default to FALSE for fine-grained control over shared objects.
• Implement fine-grained access control using Late Binding Views or Materialized Views on the consumer
cluster.
• Ensure Redshift clusters are encrypted when sharing data with Redshift Serverless or across accounts.
• Utilize different KMS keys for producer and consumer clusters if needed, as data sharing supports this
configuration.

© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark. 33
Workload management
Best Practices

© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark.
Workload management
Allows for the separation of different query workloads

Goal
Prioritize important queries
Throttle / abort less important queries

Queue SQA Concurrency scaling Query Monitoring Rules


Service class that users When enabled,
interact with Amazon Redshift Protects against
Automatically detect automatically adds wasteful use of the
Logical separation of short-running queries transient clusters, compute resources
user workloads and runs them within in seconds, to serve
the short query queue, sudden spike in
if queuing occurs Rules applied to a WLM
If you run ETL, concurrent requests queue allow queries to
dashboards and adhoc with consistently fast be: LOGGED, ABORTED,
queries, create 3 performance HOPPED
queues, one for each

© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark.
Types of workload management
Manual WLM Auto WLM

Memory allocation Manual and static Automatic and dynamic


Concurrency Manual and static Automatic and dynamic
Prioritization Cannot be done Can be done at queue level
De-prioritization Cannot be done Can be done at query level using QMR
SQA Can be enabled manually Automatically enabled
Concurrency Scaling and QMR Configurable for each queue Configurable for each queue

© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark.
Workload management: Best Practices
• Use Auto WLM if your workload is highly unpredictable, or you are using default WLM.

• Use QMR on query_execution_time, query_temp_blocks_to_disk and spectrum_scan_size_mb or


spectrum_scan_row_count.
Only for Manual WLM
• Use manual WLM if you want to manually fine-tune and completely understand your workload
patterns or require throttling certain types of queries depending on the time of day.
• Keep #WLM queues to a minimum, typically just three queues, to avoid having unused queues.

• Limit ingestion/ELT concurrency to two to three.

• To maximize query throughput, use ensure the total concurrent queries is 15 or less.

• Save the superuser queue for administration tasks and canceling queries.

© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark.
Data Load

© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark.
Data Ingestion: AWS Services
Batch Near Real Time
• COPY command
S3 file • Redshift Spectrum
• COPY job – S3 triggered

Streaming • Streaming Ingestion (Amazon


sources Kinesis, Amazon MSK)

Transactional • Zero ETL Integration from Aurora


• AWS Database Migration Service
databases • AWS Database Migration Service

Amazon Data
• Third Party Data available in ADX • Third Party Data available in ADX
Exchange (ADX)

SaaS Applications • Amazon AppFlow • Amazon AppFlow

© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark. 39
COPY Command
§ Used to ingest data from
RA3.4XL compute
§ Amazon S3 (most common source)
§ Amazon EMR 0 1 2 3

§ Amazon DynamoDB
§ Remote host (SSH)

1 input file

§ Can ingest files in various formats, compression


schemes
§ File format: CSV, JSON, Avro, Parquet, ORC etc RA3.4XL compute

§ Compression Options: BZIP2, GZIP, LZOP, ZSTD 0 1 2 3

§ Encryption

§ One compute slice can process one file


§ COPY continues to scale linearly as you add more
compute
4 input files
© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark.
COPY JOB
• Extension of COPY command for automated S3 data
loading

• Key Features: COPY JOB Syntax


• Detects and loads new S3 files without manual
COPY public.target_table
intervention
FROM 's3://amzn-s3-demo-
• Uses original COPY parameters and prevents file bucket/staging-folder'
duplication IAM_ROLE
• Tracks loaded files to ensure one-time loading 'arn:aws:iam::123456789012:role/My
LoadRoleName'
• Only ingests files created after job creation
JOB CREATE my_copy_job_name
• Jobs don't run when cluster is paused AUTO ON;

© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark. 41
COPY Command Best Practices
§ Use COPY command to load data whenever possible.
RA3.4XL compute
§ Use a single COPY command per table.
0 1 2 3
§ When using COPY, avoid loading from many small files or large non-
splittable files.

§ If COPY is not possible, do bulk inserts using INSERT statement. Avoid


single row inserts. 1 input file

• Use COPY JOB for automated/incremental loading of data from


Amazon S3
RA3.4XL compute
• In using COPY command, optimal file size are:
0 1 2 3
• For non-splittable file when each file is between 1MB-1GB each after
compression
• For splittable file when each file size is:
• Between 128MB-1GB for columnar files, specifically Parquet and
ORC.
• Between 64MB-10GB for row-oriented (CSV) data that do not use
any these keywords – REMOVEQUOTES, ESCAPE and FIXEDWIDTH. 4 input files
© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark.
Data Loading Best Practices
§ Load your data in sort key order to avoid needing to vacuum.

§ For large amounts of data, load in small sequential blocks according to sort order:
§ eliminates the need to vacuum.
§ you use much less intermediate sort space during each load, and makes it easier to
restart if the COPY fails and is rolled back.

§ For data with fixed retention period, organize your data as a sequence of time-series
tables.

§ Use MERGE statement to perform upserts.

§ Enforce Primary, Unique or Foreign Key constraints in ETL.

§ Wrap workflow/statements in an explicit transaction.

§ Consider using TRUNCATE instead of DELETE.

© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark.
Post load Best Practices

© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark.
(AUTO) VACUUM
• The VACUUM process runs either manually or automatically in the background
• Goals
• VACUUM will remove rows that are marked as deleted
• VACUUM will globally sort tables
• For tables with a sort key, ingestion operations will locally sort new data and write it into the
unsorted region

• Best practices
• VACUUM should be run only as necessary
• For the majority of workloads, AUTO VACUUM DELETE will reclaim space and
AUTO TABLE SORT will sort the needed portions of the table
• In cases where you know your workload – VACUUM can be run manually
• Run vacuum operations on a regular schedule
• Perform vacuum re-cluster on large tables

© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark.
(AUTO) ANALYZE
§ The ANALYZE process collects table statistics for optimal query planning

§ In the vast majority of cases, AUTO ANALYZE automatically handles statistics gathering

• Best practices

• For the majority of workloads, AUTO ANALYZE will collect statistics


• ANALYZE can be run periodically after ingestion on just the columns that WHERE
predicates are filtered on
• Analyze after VACUUM
• Utility to manually run VACUUM and ANALYZE on all the tables in the cluster:
https://bit.ly/34ZR3PP

© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark.
Amazon Redshift Advisor

© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark.
Amazon Redshift Advisor
To improve the performance and decrease the operating costs, Amazon Redshift Advisor offers specific
recommendations by analyzing performance and usage metrics. Advisor ranks recommendations by order
of impact.
• Amazon Redshift Advisor available in
Amazon Redshift console

• Runs daily scanning operational metadata

• Observes with the lens of best practices

• Provides tailored, high-impact recommendations to


optimize your Amazon Redshift cluster for
performance and cost savings

© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark.
Best Practices for
Multitenancy Architectures

© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark.
Multi Tenant Strategy
Multi Cluster Model Multi Database Model Multi Schema Model Multi ID Model

Cluster1/ Cluster2/
Workgroup1 Workgroup2

Data sharing

Workload Type • Completely Isolated workloads, yet Workloads having different Workloads requiring Workloads requiring
share data among tenants • security policies • common security policies • same storage constructs. Access
• Distributed data repositories • Isolation levels, collation • same isolation & collation control with RLS
• frequent queries across tenants • Same tables, views across
tenants.
Scalability Highly scalable model Scaling is limited to cluster Scaling is limited to cluster Scaling is limited to cluster

Cross-tenant R/W Supported for Reads. No writes Supported for Reads. No writes Both Reads/Writes Both Reads/Writes

Examples • Separating ETL, BI workloads • Independent business units with • Multiple departments, functional Each tenant is a persona accessing
• Dev, QA, PROD DBs no/limited cross querying units often cross query data from same storage structure
• Banking DB and Insurance DB • Sales, Finance, Marketing

Most frequently used and suitable for DW workloads


© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark.
Thank you!

© 2025, Amazon Web Services, Inc. or its affiliates. All rights reserved. Amazon Confidential and Trademark. 51

You might also like