0% found this document useful (0 votes)
58 views

Microsoft - Query Processing Enhancements On Partitioned Tables and Indexes (Partition Elimination)

This document summarizes new query processing enhancements in SQL Server 2008 for partitioned tables and indexes, including partition elimination. It describes how partitioned tables are now internally represented with an added partition ID column, allowing partition-aware indexing and seeks. It also provides details on how to interpret partitioning information displayed in query execution plans.

Uploaded by

sivy75
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
58 views

Microsoft - Query Processing Enhancements On Partitioned Tables and Indexes (Partition Elimination)

This document summarizes new query processing enhancements in SQL Server 2008 for partitioned tables and indexes, including partition elimination. It describes how partitioned tables are now internally represented with an added partition ID column, allowing partition-aware indexing and seeks. It also provides details on how to interpret partitioning information displayed in query execution plans.

Uploaded by

sivy75
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

https://docs.microsoft.com/en-us/previous-versions/sql/sql-server-2008-r2/ms345599(v=sql.

105)

Query Processing Enhancements on Partitioned


Tables and Indexes (Partition Elimination)
SQL Server 2008 improves query processing performance on partitioned tables for many parallel plans,
changes the way parallel and serial plans are represented, and enhances the partitioning information
provided in both compile-time and run-time execution plans. This topic describes these improvements,
provides guidance on how to interpret the query execution plans of partitioned tables and indexes, and
provides best practices for improving query performance on partitioned objects.

Note

Partitioned tables and indexes are supported only in the SQL Server Enterprise, Developer, and
Evaluation editions.

New Partition-Aware Seek Operation


In SQL Server 2008, the internal representation of a partitioned table is changed so that the table
appears to the query processor to be a multicolumn index with PartitionID as the leading column.
PartitionID is a hidden computed column used internally to represent the ID of the partition containing
a specific row. For example, assume the table T, defined as T(a, b, c), is partitioned on column a, and
has a clustered index on column b. In SQL Server 2008, this partitioned table is treated internally as a
nonpartitioned table with the schema T(PartitionID, a, b, c) and a clustered index on the composite key
(PartitionID, b). This allows the query optimizer to perform seek operations based on PartitionID on any
partitioned table or index.

Partition elimination is now done in this seek operation.

In addition, the query optimizer is extended so that a seek or scan operation with one condition can be
done on PartitionID (as the logical leading column) and possibly other index key columns, and then a
second-level seek, with a different condition, can be done on one or more additional columns, for each
distinct value that meets the qualification for the first-level seek operation. That is, this operation, called
a skip scan, allows the query optimizer to perform a seek or scan operation based on one condition to
determine the partitions to be accessed and a second-level index seek operation within that operator
to return rows from these partitions that meet a different condition. For example, consider the following
query.

SELECT * FROM T WHERE a < 10 and b = 2;


For this example, assume that table T, defined as T(a, b, c), is partitioned on column a, and has a
clustered index on column b. The partition boundaries for table T are defined by the following partition
function:

CREATE PARTITION FUNCTION myRangePF1 (int) AS RANGE LEFT FOR VALUES (3, 7, 10);

To solve the query, the query processor performs a first-level seek operation to find every partition that
contains rows that meet the condition T.a < 10. This identifies the partitions to be accessed. Within
each partition identified, the processor then performs a second-level seek into the clustered index on
column b to find the rows that meet the condition T.b = 2 and T.a < 10.

The following illustration is a logical representation of the skip scan operation. It shows table T with
data in columns a and b. The partitions are numbered 1 through 4 with the partition boundaries shown
by dashed vertical lines. A first-level seek operation to the partitions (not shown in the illustration) has
determined that partitions 1, 2, and 3 meet the seek condition implied by the partitioning defined for
the table and the predicate on column a. That is, T.a < 10. The path traversed by the second-level seek
portion of the skip scan operation is illustrated by the curved line. Essentially, the skip scan operation
seeks into each of these partitions for rows that meet the condition b = 2. The total cost of the skip
scan operation is the same as that of three separate index seeks.

Displaying Partitioning Information in Query Execution Plans


The execution plans of queries on partitioned tables and indexes can be examined by using the
Transact-SQL SET statements SET SHOWPLAN_XML or SET STATISTICS XML, or by using the graphical
execution plan output in SQL Server Management Studio. For example, you can display the compile-
time execution plan by clicking Display Estimated Execution Plan on the Query Editor toolbar and
the run-time plan by clicking Include Actual Execution Plan.
Using these tools, you can ascertain the following information:

 The operations such as scans, seeks, inserts, updates, merges, and deletes that access partitioned
tables or indexes.
 The partitions accessed by the query. For example, the total count of partitions accessed and the
ranges of contiguous partitions that are accessed are available in run-time execution plans.
 When the skip scan operation is used in a seek or scan operation to retrieve data from one or
more partitions.

For more information about displaying execution plans, see Execution Plan How-to Topics.

Partition Information Enhancements

SQL Server 2008 provides enhanced partitioning information for both compile-time and run-time
execution plans. Execution plans now provide the following information:

 An optional Partitioned attribute that indicates that an operator, such as a seek, scan, insert,
update, merge, or delete, is performed on a partitioned table.
 A new SeekPredicateNew element with a SeekKeys subelement that includes PartitionID as the
leading index key column and filter conditions that specify range seeks on PartitionID. The
presence of two SeekKeys subelements indicates that a skip scan operation on PartitionID is used.
 Summary information that provides a total count of the partitions accessed. This information is
available only in run-time plans.

To demonstrate how this information is displayed in both the graphical execution plan output and the
XML Showplan output, consider the following query on the partitioned table fact_sales. This query
updates data in two partitions.

UPDATE fact_sales
SET quantity = quantity * 2
WHERE date_id BETWEEN 20080802 AND 20080902;

The following illustration shows the properties of the Clustered Index Seek operator in the compile-
time execution plan for this query.
To view the definition of the fact_sales table and the partition definition, see "Example" in this topic.
Partitioned Attribute

When an operator such as an Index Seek is executed on a partitioned table or index,


the Partitioned attribute appears in the compile-time and run-time plan and is set to True (1). The
attribute does not display when it is set to False (0).

The Partitioned attribute can appear in the following physical and logical operators:

 Table Scan
 Index Scan
 Index Seek
 Insert
 Update
 Delete
 Merge

As shown in the previous illustration, this attribute is displayed in the properties of the operator in
which it is defined. In the XML Showplan output, this attribute appears as Partitioned="1" in
the RelOp node of the operator in which it is defined.

New Seek Predicate

In XML Showplan output, the SeekPredicateNew element appears in the operator in which it is
defined. It can contain up to two occurrences of the SeekKeys sub-element. The first SeekKeys item
specifies the first-level seek operation at the partition ID level of the logical index. That is, this seek
determines the partitions that must be accessed to satisfy the conditions of the query. The
second SeekKeys item specifies the second-level seek portion of the skip scan operation that occurs
within each partition identified in the first-level seek.

Partition Summary Information

In run-time execution plans, partition summary information provides a count of the partitions accessed
and the identity of the actual partitions accessed. You can use this information to verify that the correct
partitions are accessed in the query and that all other partitions are eliminated from consideration.

The following information is provided: Actual Partition Count, and Partitions Accessed.

Actual Partition Count is the total number of partitions accessed by the query.

Partitions Accessed, in XML Showplan output, is the partition summary information that appears in
the new RuntimePartitionSummary element in RelOp node of the operator in which it is defined.
The following example shows the contents of the RuntimePartitionSummary element, indicating that
two total partitions are accessed (partitions 2 and 3).

<RunTimePartitionSummary>

<PartitionsAccessed PartitionCount="2" >

<PartitionRange Start="2" End="3" />

</PartitionsAccessed>

</RunTimePartitionSummary>

Displaying Partition Information by Using Other Showplan Methods

The Showplan methods SHOWPLAN_ALL, SHOWPLAN_TEXT and STATISTICS PROFILE do not report the
partition information described in this topic, with the following exception. As part of
the SEEK predicate, the partitions to be accessed are identified by a range predicate on the computed
column representing the partition ID. The following example shows the SEEK predicate for a Clustered
Index Seek operator. Partitions 2 and 3 are accessed, and the seek operator filters on the rows that
meet the condition date_id BETWEEN 20080802 AND 20080902.

|--Clustered Index Seek(OBJECT:([db_sales_test].[dbo].[fact_sales].[ci]),

SEEK:([PtnId1000] >= (2) AND [PtnId1000] <= (3)

AND [db_sales_test].[dbo].[fact_sales].[date_id] >= (20080802)

AND [db_sales_test].[dbo].[fact_sales].[date_id] <= (20080902))

ORDERED FORWARD)

Interpreting Execution Plans for Partitioned Heaps


In SQL Server 2008, a partitioned heap is treated as a logical index on the partition ID. Partition
elimination on a partitioned heap is represented in an execution plan as a Table Scan operator with a
SEEK predicate on partition ID. The following example shows the Showplan information provided:

|-- Table Scan (OBJECT: ([db].[dbo].[T]), SEEK: ([PtnId1001]=[Expr1011]) ORDERED FORWARD)


Interpreting Execution Plans for Collocated Joins
Join collocation can occur when two tables are partitioned using the same or equivalent partitioning
function and the partitioning columns from both sides of the join are specified in the join condition of
the query. The query optimizer can generate a plan where the partitions of each table that have equal
partition IDs are joined separately. Collocated joins can be faster than non-collocated joins because
they can require less memory and processing time. The optimizer chooses a non-collocated plan or a
collocated plan based on cost estimates.

In a collocated plan, the Nested Loops join reads one or more joined table or index partitions from the
inner side. The numbers within the Constant Scan operators represent the partition numbers.

When parallel plans for collocated joins are generated for partitioned tables or indexes,
a Parallelism operator appears between the Constant Scan and the Nested Loops join operators. In
this case, multiple threads on the outer side of the join each read and work on a different partition.

The following illustration demonstrates a parallel query plan for a collocated join.

Parallel Query Execution Strategy for Partitioned Objects


The query processor uses a parallel execution strategy for queries that select from partitioned objects.
As part of the execution strategy, the query processor determines the table partitions required for the
query and the proportion of threads to allocate to each partition. In most cases, the query processor
allocates an equal or almost equal number of threads to each partition, and then executes the query in
parallel across the partitions. The following paragraphs explain thread allocation in greater detail.
If the number of threads is less than the number of partitions, the query processor assigns each thread
to a different partition, initially leaving one or more partitions without an assigned thread. When a
thread finishes executing on a partition, the query processor assigns it to the next partition until each
partition has been assigned a single thread. This is the only case in which the query processor
reallocates threads to other partitions.

If the number of threads is equal to the number of partitions, the query processor assigns one thread
to each partition. When a thread finishes, it is not reallocated to another partition.

If the number of threads is greater than the number of partitions, the query processor allocates an
equal number of threads to each partition. If the number of threads is not an exact multiple of the
number of partitions, the query processor allocates one additional thread to some partitions in order
to use all of the available threads. Note that if there is only one partition, all threads will be assigned to
that partition. In the diagram below, there are four partitions and 14 threads. Each partition has 3
threads assigned, and two partitions have an additional thread, for a total of 14 thread assignments.
When a thread finishes, it is not reassigned to another partition.

Although the above examples suggest a straightforward way to allocate threads, the actual strategy is
more complex and accounts for other variables that occur during query execution. For example, if the
table is partitioned and has a clustered index on column A and a query has the predicate clause WHERE
A IN (13, 17, 25), the query processor will allocate one or more threads to each of these three seek
values (A=13, A=17, and A=25) instead of each table partition. It is only necessary to execute the query
in the partitions that contain these values, and if all of these seek predicates happen to be in the same
table partition, all of the threads will be assigned to the same table partition.
To take another example, suppose that the table has four partitions on column A with boundary points
(10, 20, 30), an index on column B, and the query has a predicate clause WHERE B IN (50, 100, 150).
Because the table partitions are based on the values of A, the values of B can occur in any of the table
partitions. Thus, the query processor will seek for each of the three values of B (50, 100, 150) in each of
the four table partitions. The query processor will assign threads proportionately so that it can execute
each of these 12 query scans in parallel.

Table partitions based on column A Seeks for column B in each table partition
Table Partition 1: A < 10 B=50, B=100, B=150
Table Partition 2: A >= 10 AND A < 20 B=50, B=100, B=150
Table Partition 3: A >= 20 AND A < 30 B=50, B=100, B=150
Table Partition 4: A >= 30 B=50, B=100, B=150

Best Practices
To improve the performance of queries that access a large amount of data from large partitioned tables
and indexes, we recommend the following best practices:

 Stripe each partition across many disks.


 When possible, use a server with enough main memory to fit frequently accessed partitions or all
partitions in memory to reduce I/O cost.
 If the data you query will not fit in memory, compress the tables and indexes. This will reduce I/O
cost.
 Use a server with fast processors and as many processor cores as you can afford, to take advantage
of parallel query processing capability.
 Ensure the server has sufficient I/O controller bandwidth.
 Create a clustered index on every large partitioned table to take advantage of B-tree scanning
optimizations.
 Follow the best practice recommendations in the white paper, "Loading Bulk Data into a
Partitioned Table," when bulk loading data into partitioned tables.

Example
The following example creates a test database containing a single table with seven partitions. Use the
tools described previously when executing the queries in this example to view partitioning information
for both compile-time and run-time plans.

Note

This example inserts more than 1 million rows into the table. Running this example may take several
minutes depending on your hardware. Before executing this example, verify that you have more than
1.5 GB of disk space available.
USE master;
GO

IF DB_ID (N'db_sales_test') IS NOT NULL


DROP DATABASE db_sales_test;
GO

CREATE DATABASE db_sales_test;


GO

USE db_sales_test;
GO

CREATE PARTITION FUNCTION [pf_range_fact](int)


AS RANGE RIGHT FOR VALUES (20080801, 20080901, 20081001, 20081101, 20081201, 20090101);
GO

CREATE PARTITION SCHEME [ps_fact_sales]


AS PARTITION [pf_range_fact]
ALL TO ([PRIMARY]);
GO

CREATE TABLE fact_sales(


date_id int,
product_id int,
store_id int,
quantity int,
unit_price numeric(7,2),
other_data char(1000)
)
ON ps_fact_sales(date_id);
GO

CREATE CLUSTERED INDEX ci ON fact_sales(date_id);


GO
PRINT 'Loading...';
SET NOCOUNT ON;

DECLARE @i int;
SET @i = 1;

WHILE (@i<1000000)
BEGIN
INSERT INTO fact_sales
VALUES(20080800 + (@i%30) + 1, @i%10000, @i%200, RAND() * 25, (@i%3) + 1, '');
SET @i += 1;
END;
GO

DECLARE @i int;
SET @i = 1;

WHILE (@i<10000)
BEGIN
INSERT INTO fact_sales
VALUES(20080900 + (@i%30) + 1, @i%10000, @i%200, RAND() * 25, (@i%3) + 1, '');
SET @i += 1;
END;
PRINT 'Done.';
GO

-- Two-partition query.
SET STATISTICS XML ON;
GO

SELECT date_id, SUM(quantity*unit_price) AS total_price


FROM fact_sales
WHERE date_id BETWEEN 20080802 AND 20080902
GROUP BY date_id ;
GO

SET STATISTICS XML OFF;


GO
-- Single-partition query.
SET STATISTICS XML ON;
GO

SELECT date_id, SUM(quantity*unit_price) AS total_price


FROM fact_sales
WHERE date_id BETWEEN 20080801 AND 20080831
GROUP BY date_id;
GO

SET STATISTICS XML OFF;


GO

You might also like