0% found this document useful (0 votes)
59 views7 pages

Indexes and Fragmentation in SQL 2000 Part1

Indexes and Fragmentation in SQL 2000
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
59 views7 pages

Indexes and Fragmentation in SQL 2000 Part1

Indexes and Fragmentation in SQL 2000
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

F -X C h a n ge

Indexes and Fragmentation in SQL Server 2000 Part 1 - SQLServerCentral Page 1 of 7 F -X C h a n ge


PD PD

!
W

W
O

O
N

N
y

y
bu

bu
to

to
k

k
lic

lic
C

C
w

w
m

m
w w
w

w
o

o
.d o .c .d o .c
c u -tr a c k c u -tr a c k

Welcome, Andres Mancera Rojas My Account :: Briefcase :: Logout Search: Go

Home Indexes and Fragmentation in Related tags


Articles Performance Tuning and Scaling
SQL Server 2000 Part 1
Editorials SQL Server 7, 2000
Forums By Joe Doherty, 2006/10/23
Total article views: 15717 | Views in the last 30 days: Related content
Scripts 115
Blogs Rate this | Join the discussion | Using Index Intersections
QotD Briefcase | Print By Neil Boyle | Category: Performance Tuning and
Scaling
Books I believe it's my academic background that has seen me | 4,737 reads
SQL Jobs develop the need to see the facts for myself before I feel
confident about a particular subject. It has always been
Training both a strength and a weakness of mine. On one hand it
Locking in SQL Server 7.0
Authors provides the motivation for me to tackle a subject I want to By Alexander Chigrik | Category: Performance Tuning
and Scaling
Active Threads understand, but on the other I have been guilty of
(not yet rated) | 5,739 reads
reinventing the wheel just so I can see it work for myself.
About us You might say this is natural for guys like us and essential
Contact us for personal development however I'm sure my past Optimizing SQL Server Performance
Newsletters superiors have seen it unnecessary for me to totally rebuild Using Files and Filegroups
a system from scratch when all that was required was a By Additional Articles | Category: Performance Tuning
Advertise simple configuration change. But then where does all the and Scaling
Write for us fun come from?! Rate this | 1,550 reads
With this in mind I have finally decided to tackle the subject
on fragmentation levels within a database. I've always The Ultimate SQL Server
been told that proper index maintenance is essential for By Additional Articles | Category: Performance Tuning
optimal performance, and over the years have employed and Scaling
different methods of achieving this. However I have never Rate this | 1,446 reads
really seen proof that my maintenance jobs actually have
any benefit. Sure I've seen the odd demonstration and
even made improvements to systems I've managed myself
but these have been isolated incidents where the customer
has made a complaint or a developer didn't take into
account the amount of data growth within a particular table.
SQL Jobs These problems have usually been resolved by reindexing
a particular index more frequently or applying a more
Sr. SQL Server
appropriate index.
Development What I'm talking about here is a system that has been in
DBA production for a number of years, where data volumes
Mendota have continually grown, application patches applied, OS
Heights, MN and SQL service packs and patches applied, etc. The
system is considerably different now to what is was when it
SQL Server was first released, and chances are you've taken over the
Developer responsibility of these systems somewhere in between.
Kirkland, WA
Ideally a performance audit would need to be carried out
Covestic periodically to ensure your systems are running optimally,
Product and I imagine this would be quite some task to undertake
Specialist on some of the systems that are out there.
(SQL Server, What I am interested in here is looking at what
Great... fragmentation is, how it happens and how we go about
Trenton, NJ resolving it. I am also interested in running a few
ASAP Staffing experiments to see how a SELECT query performs on
LLC tables that have indexes with differing fragmentation levels.
This article is more of a discovery voyage rather than a
SQL Server 'how to implement your index maintenance' document and
DBA will hopefully give us all a basic foundation to go away with
New York, NY and think about. You might read the entire article and think
SQL
I could have got this point over in much less text but I want
to take you through my entire thought process to aid a
Developer/Programmer
better understanding.
-Web...
Shoreview , MN After some thought I decided that it was best to revisit the
PartnerUp (A topic on how SQL stores it's data and look at the behavior
as data is added. And by running a few experiments we
Deluxe
can look further into how the system functions under the
Company)
surface. Then by adding an index or two we can look at the
MORE JOBS >
differences in the behavior that SQL exhibits.
POST A JOB > As I said earlier I need to see things for myself and so I am
going to start at the very beginning and create a blank
POWERED BY JOBTHREAD
database with one table.

http://www.sqlservercentral.com/articles/Performance+Tuning+and+Scaling/2648/ 6/30/2009
F -X C h a n ge
Indexes and Fragmentation in SQL Server 2000 Part 1 - SQLServerCentral Page 2 of 7 F -X C h a n ge
PD PD

!
W

W
O

O
N

N
y

y
bu

bu
to

to
k

k
lic

lic
C

C
w

w
m

m
w w
w

w
o

o
.d o .c .d o .c
c u -tr a c k c u -tr a c k
-- This script creates our test database and creates a table within it
-- You will have to modify the 2 filename parameters to values of where you want the DB c
CREATE DATABASE [myIndexes] ON (NAME = N'myIndexes_Data',
FILENAME = N'C:\myIndexes_Data.MDF' , SIZE = 1000, FILEGROWTH = 10%)
LOG ON (NAME = N'myIndexes_Log', FILENAME = N'C:\myIndexes_Log.LDF' , SIZE = 30, FILEGROW
COLLATE Latin1_General_CI_AS
GO
USE [myIndexes]
GO
CREATE TABLE [dbo].[myTable] (
[myPK] [uniqueidentifier] NOT NULL ,
[myID] [bigint] IDENTITY (1, 1) NOT NULL ,
[Char1] [varchar] (20) COLLATE Latin1_General_CI_AS NOT NULL ,
[Char2] [char] (200) COLLATE Latin1_General_CI_AS NOT NULL ,
[Char3] [varchar] (2000) COLLATE Latin1_General_CI_AS NOT NULL ,
[Num1] [int] NOT NULL ,
[Num2] [money] NOT NULL ,
[Date1] [datetime] NOT NULL
) ON [PRIMARY]
GO
ALTER DATABASE myIndexes SET RECOVERY SIMPLE
GO

Have a look at our database (myIndexes) and you'll see


that it is initially set to a 1GB data file, 30MB log file and
simple recovery. You might not have enough disk space so
feel free to reduce this but it is useful for what we'll do later
on.
Looking in the table (myTable) you'll see I've created a
number of fields with differing data types. Hopefully we'll
be able to see if these have any effect on performance
later on in the article. You'll also notice that there aren't any
indexes configured on the table - also known as a heap.
Initially we are just going to keep things simple and take a
look at the basics of the file structure and how SQL
maintains it.
OK, so how and where is the database and table (and
system objects) actually stored on the disk? And is there
any fragmentation at this point?
Well the database and all its objects are obviously stored
in the myIndexes.mdf in the location you specified when
creating the database. But where is [myTable] stored
within this .mdf file? The objects within a database are
stored within data pages which are a little over 8k each.
When you create a table you define a number of fields that
will store your data. The total number of characters that the
combined fields make up cannot exceed this 8k limit. If
each character takes up 1 byte then the total number of
characters you assign to the table (i.e. each row) cannot
be more than 8,000 (8,060 to be exact).
For example if you look at the table we have created in our
database you will see it is made up of the following fields:

myPK uniqueidentifier 16
myID bigint 8
Char1 varchar 20
Char2 char 200
Char3 varchar 2000
Num1 int 4
Num2 money 8
Date1 datetime 8
----
total = 2264

You can see that we have used a total 2,264 bytes. Also I
have used the term bytes instead of characters - that's the
official way field length is measured. Generally a text field
will use 1 byte per character but this does not easily
convert for a date (8 bytes) or a money field (8 bytes) so it
is best that we refer to the length from now on as bytes.
Other data types take up different amounts of storage
(e.g.. unicode characters use 2 bytes and would therefore
limit you to 4,000 characters per row). I want to keep things
relatively simple here so you can look them up in BOL if
you want to know more.
Note: Certain types can span multiple pages but we won't
be covering this here.

http://www.sqlservercentral.com/articles/Performance+Tuning+and+Scaling/2648/ 6/30/2009
F -X C h a n ge
Indexes and Fragmentation in SQL Server 2000 Part 1 - SQLServerCentral Page 3 of 7 F -X C h a n ge
PD PD

!
W

W
O

O
N

N
y

y
bu

bu
to

to
k

k
lic

lic
C

C
w

w
m

m
w w
w

w
o

o
.d o .c .d o .c
c u -tr a c k c u -tr a c k

OK, so now you see that in general we're limited to a


maximum row size of one data page (8k). More often than
not however your total row size will not be anywhere near
this and so SQL will be able store more than one row
within a single data page.
As you keep adding rows to your table SQL will start to add
more data pages and populate them accordingly. Now
there's one more thing you should know about your data
pages and that is SQL groups them into bundles of 8 - and
we refer to these as Extents. An extent is a group of 8 data
pages, holding up to 64k of data (i.e. 8k x 8 data pages =
64). Why SQL does this is belong the scope of this article
but lets just say it is for optimal disk performance.
Right let's work out how to find which data page is storing
our table data. Type the following command... Remember
that all the commands that we use will be against the
[myIndexes] database unless otherwise stated.

SELECT * FROM SYSINDEXES WHERE NAME = 'myTable'

This command searches for the system information about


our table object. You will see that one row is returned
containing a number of fields. The fields we're interested in
are:

first = a pointer to the first data page

dpages = the number of data pages used for our table data

Currently these values are set to zero because no data has


been inserted yet - so lets add our first record by running
the following command...

INSERT INTO [myTable] VALUES (NEWID(), 'test1', 'test1', 'test1', 123, 321, GETDATE() )

and confirm it is there...

SELECT * FROM [myTable]

Now we have a record in our database lets re-run the


command on the system table to find out about where our
data is stored...

SELECT * FROM SYSINDEXES WHERE NAME = 'myTable'

and we now see that these vales have been updated. The
values you get may differ from mine so bear this in mind
when we use them as parameters for the upcoming
exercise. My values are...

first = 0x220000000100

dpages = 1

So this is saying that the data for [myTable] starts on data


page 0x220000000100 (a hex value) and that a total of
one data page has been used so far.
Like this? Try these...
Now we'll use one of SQL's undocumented commands to
view the contents of the data page, but before we do we SQL 2000 DBA Toolkit Part 2
need to decipher the hex number by doing the following By Michael Coles | Category: Advanced Querying
(more information can be found here - | 19,854 reads
http://www.sqlservercentral.com/forums/shwmessage.aspx?
forumid=5&messageid=291762).
Data partitioning in SQL Server 2005 -
Split the 0x220000000100 number into two halves which Part I
will give us the page id part and the file id part. This gives By Additional Articles | Category: Administration
us page id: 0x220000 and file id: 0x000100. Because we're Rate this | 3,032 reads
only using one database file (myIndexes.mdf) then we can
be sure that the file id is 1. The page id needs to be
reversed which will give us 0x000022 (always leave the 0x Indexes and Fragmentation in SQL Server
bit at the front) and now convert this to decimal. I use 2000 Part 2
Google to do this - just type 'convert 0x000022 to decimal' By Joe Doherty | Category: Indexing
and I get 34. This is page where our data is stored. | 12,787 reads

http://www.sqlservercentral.com/articles/Performance+Tuning+and+Scaling/2648/ 6/30/2009
F -X C h a n ge
Indexes and Fragmentation in SQL Server 2000 Part 1 - SQLServerCentral Page 4 of 7 F -X C h a n ge
PD PD

!
W

W
O

O
N

N
y

y
bu

bu
to

to
k

k
lic

lic
C

C
w

w
m

m
w w
w

w
o

o
.d o .c .d o .c
c u -tr a c k c u -tr a c k

SQL's undocumented command DBCC PAGE allows us to


view the contents of a page but we must switch on the
trace element first...

DBCC TRACEON (3604)


DBCC PAGE (myIndexes, 1, 34, 3)

Remember to replace the value 34 with the one you got


from your Google result. Try changing this value to see
what else is lurking in other data pages but at this point
you'll only see system related data. I'm not going to
discuss this command because it is already well
documented on the Internet - see
http://support.microsoft.com/kb/q83065/ - you won't find it
in BOL and Microsoft say they do not support it seen as it
hasn't gone through the usual rigorous testing that
supported commands have. Or perhaps it has but they
wanted to keep it back as part of their own toolkit for
support purposes - I know I've been instructed to use it by
Microsoft Product Support services (PSS) in the past while
investigating database corruption.
Anyway back to the contents of our data page - you should
now see your record within the text and in the bottom
section. That's how you view the contents of your data
pages. And no, scanning these pages will not reveal
system passwords!
So far we have created a blank database, a table and
inserted one row of data and have seen that SQL stores
this in a data page. Now we need to get a decent amount
of data in there so we can start to look at how the system
grows and also see if there's anything interesting going on.
I have written a script that populates the table with random
data - unfortunately this script does take some time to run
but can be stopped and started as you see fit. Lets get this
running now and then continue talking about the next step.
Before you do run the script please run the following
command and store the results - we'll be talking about this
next and we need to capture the values before you start
adding the records.

DBCC SHOWCONTIG ('myTable')

Store these results and proceed to run this script to start


generating our records...

-- This script will insert records into the database


USE [myIndexes]

DECLARE @Loop INT


DECLARE @LoopTemp1 INT
DECLARE @LoopTemp2 INT
DECLARE @Char1 VARCHAR(20)
DECLARE @Char2 VARCHAR(200)
DECLARE @Char3 VARCHAR(2000)
DECLARE @Num1 INT
DECLARE @Num2 MONEY
DECLARE @Date1 DATETIME
DECLARE @Char INT

SET @Loop = 0

WHILE @Loop < 1000


BEGIN
-- Generate Char1
SET @LoopTemp1 = 0
SET @LoopTemp2 = 0
SET @Char1 = ''
SET @LoopTemp1 = (RAND()*19)+1 -- Randomise the length of the field
WHILE @LoopTemp2 < @LoopTemp1

You will see at the start of the script that it loops 1000
times (WHILE @Loop < 1000) - this also refers to the
number of records that will be added so ideally we want to
change this to nearer 500000. Perhaps you can run it
overnight. I've challenged a colleague at work to see which
of us can come up with a script that randomly inserts
values into the database the fastest - I have some good

http://www.sqlservercentral.com/articles/Performance+Tuning+and+Scaling/2648/ 6/30/2009
F -X C h a n ge
Indexes and Fragmentation in SQL Server 2000 Part 1 - SQLServerCentral Page 5 of 7 F -X C h a n ge
PD PD

!
W

W
O

O
N

N
y

y
bu

bu
to

to
k

k
lic

lic
C

C
w

w
m

m
w w
w

w
o

o
.d o .c .d o .c
c u -tr a c k c u -tr a c k
ideas but not had the time to try them out. If anyone is
interested in joining in the fun (or has any suggestions) let
me know and I'll send you the contest rules!
OK while the script is running lets run a check to make
sure we can see things happening as we would expect
them to. Run the following to check that the number of data
pages has been increasing - remember it was at one
before...

SELECT * FROM SYSINDEXES WHERE NAME = 'myTable'

If you repeatedly run this the number of rowcnt and dpages


should keep going up. However depending on how busy
your system is you may need to issue the following
command to force a system update. Remember this
command because it is very useful when you need up to
date information.

DBCC UPDATEUSAGE(0)

If you've ever resized your database or inserted a number


of records and SQL does not report values to represent
this then running this command will update SQL's internal
system values for you. We might see more of this in action
later. BOL definition: Reports and corrects inaccuracies in
the sysindexes table, which may result in incorrect space
usage.
If you remember earlier on one of the questions we asked
is whether we had any fragmentation after the point we
had inserted a record. Well I'm sure you'd agree that this is
highly unlikely since we are dealing with a brand new
database. And before you ran the script that inserts lots of
records I asked that you quickly run the DBCC
SHOWCONTIG command and store the results. Well now
we'll look at what we actually did here...
DBCC SHOWCONTIG, as BOL puts it, displays
fragmentation information for the data and indexes of the
specified table. We actually ran this command against our
table and would have received results like the following:

- Pages Scanned................................: 1
- Extents Scanned..............................: 1
- Extent Switches..............................: 0
- Avg. Pages per Extent........................: 1.0
- Scan Density [Best Count:Actual Count].......: 100.00% [1:1]
- Extent Scan Fragmentation ...................: 0.00%
- Avg. Bytes Free per Page.....................: 7827.0
- Avg. Page Density (full).....................: 3.30%

Lets look at each of the results in turn to gain an


understanding of what it is they are telling us.
Pages Scanned
This tells us that 1 page was scanned in total to
produce our results. And we are happy with the fact that
we only have 1 page.
Extents Scanned
As we know there are 8 pages per extent but seen as
we only have 1 page then that would infer we only have
1 extent.
Extent Switches
If there were 100 extents SQL might have to flick
between each of them multiple times to read all the
records in the data pages. When performing this
command think of it that SQL must access every record
therefore having to visit every page and extent that
contain current records. If the records are not in order
then SQL will need to keep flicking between Extents to
access the required data. You see when a record is
added it isn't necessarily added to the end of the table -
it might be inserted somewhere in the middle. Or if a
record is updated and it becomes too big to fit in to it's
current space allocation SQL will move it and set a
pointer to the new location. Therefore when it comes to
reading it again it will be passed to the new location
which may be on another page and extent. It is this
flicking that causes additional disk reads which in turn
affects performance.
Ideally the number of extents scanned should be one

http://www.sqlservercentral.com/articles/Performance+Tuning+and+Scaling/2648/ 6/30/2009
F -X C h a n ge
Indexes and Fragmentation in SQL Server 2000 Part 1 - SQLServerCentral Page 6 of 7 F -X C h a n ge
PD PD

!
W

W
O

O
N

N
y

y
bu

bu
to

to
k

k
lic

lic
C

C
w

w
m

m
w w
w

w
o

o
.d o .c .d o .c
c u -tr a c k c u -tr a c k
more than the number of extent switches that take
place. This would mean that SQL has started at extent
one and read through each of them in turn without being
redirected elsewhere.
Avg. Pages per Extent
As we know we only have one data page so SQL has
only create one at this point. As data starts to grow SQL
will think ahead and create more.
Scan Density
Best count is the ideal number of extent changes if
everything is contiguously linked. Actual count is the
actual number of extent changes. The number in scan
density is 100 if everything is contiguous; if it is less
than 100, some fragmentation exists. Scan density is a
percentage. As you can see from our results we have a
perfect 100%!
Extent Scan Fragmentation
Because we're not using an index yet this value is
irrelevant.
Avg. Bytes Free per Page
Average number of free bytes on the pages scanned.
The higher the number, the less full the pages are.
Lower numbers are better.
Avg. Page Density (full)
Average page density (as a percentage). This value
takes into account row size, so it is a more accurate
indication of how full your pages are. The higher the
percentage, the better.
If you run the DBCC SHOWCONTIG command again on
your table you will start to see different results on the
extents. Because we are working with a heap (a table
without a clustered index) the data is in no particular order.
It is generally in the order that it was inserted into the table
(providing that we are mainly inserting new rows) however
SQL will also insert data in to any available space it comes
across. If you do a SELECT * FROM [myTable] you will
probably find that the [myID] column doesn't increment
serially in the early part of the table and this is because as
SQL grows the pages and extents it will copy
approximately half of the contents of one page into the
next so that there's always room for quick insertion of
additional records later on (known as page splits). In the
early stages of our data generation script SQL will find free
space in some of the initially created data pages and insert
some records into these pages rather than at then end of
the table. I've never actually had confirmation of this but
this is how I understand it - it's as if SQL initially expands
quite quickly leaving a lot of free space in the first extent(s)
and then returns shortly after to fill them in.
I hope you have picked up on the point that because we
are using a heap the data is in no particular order. And if
you perform any kind of SELECT statement on a heap the
entire table must be scanned. And as we have read if we
were to select all the records from the table we wouldn't
necessarily get them back in the order they were inserted.
Saying that, and I hope you are with me at this point, can a
heap actually become fragmented? And if so is there a
command to defrag it? Well I suppose the answer to
whether a heap can become fragmented is yes but would
that actually cause us any problems? It's not as if we
would be scanning the table without any indexes applied
and, as we've not specified any order for the table, then
we're not loosing out. Fragmentation, of a kind, can occur
on a heap when forwarding records are created that point
to an updated record that was moved due to the update
making it bigger that it's currently allocated space (i.e.
there was no room for it in the current slot). There is no
command that will defrag a heap (unless you apply a
clustered index and remove it again as this will force an
order which will remain after the index is dropped, but will
not be maintained).

Conclusion
So we've not really learnt anything new about how to
improve our table fragmentation but we have been through
the motions of how data builds up within the database file.
And we've seen that having a table without an index, in
particular a clustered index, deserves the name 'heap'
because basically that is all it is, and of little use to us.
Running any form of query on a heap will force a full scan
of the table and could take a considerable amount of time.
And the results from the DBCC SHOWCONTIG command,
which is a tool for measuring fragmentation, is also of little
use on a heap.

http://www.sqlservercentral.com/articles/Performance+Tuning+and+Scaling/2648/ 6/30/2009
F -X C h a n ge
Indexes and Fragmentation in SQL Server 2000 Part 1 - SQLServerCentral Page 7 of 7 F -X C h a n ge
PD PD

!
W

W
O

O
N

N
y

y
bu

bu
to

to
k

k
lic

lic
C

C
w

w
m

m
w w
w

w
o

o
.d o .c .d o .c
c u -tr a c k c u -tr a c k

I am going to finish at this point because it is a big subject


to tackle and I want to get some feedback before releasing
the next part of this article. In my next installment we will
look further into data pages and start analyzing the
different types and how these manage the allocation of
extents and free space. Then we'll start using indexes.
We're a long way off but we'll get there slowly and in an
academic fashion.
Do have a play and a think about what we have covered so
far. Any suggestions would also be welcome. And please
remember I am by no means a SQL expert - this work is
my findings that I am willing to share with you. If you feel I
have mislead you, have incorrectly stated something or
simply need more info then please submit feedback.
Until next time...
Joe

By Joe Doherty, 2006/10/23


Total article views: 15717 | Views in the last 30 days:
115

Your response
Rate this | Join the discussion |
Briefcase | Print

Digg Delicious Technorati

Copyright © 2002-2009 Simple Talk Publishing. All Rights Reserved. Privacy Policy. Terms of Use

http://www.sqlservercentral.com/articles/Performance+Tuning+and+Scaling/2648/ 6/30/2009

You might also like