02.
INDEXING
ADVANCED DATABASE MANAGEMENT SYSTEMS
ICT 331-2
INTRODUCTION
Indexes used to speed up record retrieval in response to
certain search conditions.
Index structures provide secondary access paths.
Any field can be used to create an index.
Multiple indexes can be constructed
Most indexes based on ordered files
Tree data structures organize the index
TYPES OF SINGLE-LEVEL ORDERED INDEXES
Ordered index similar to index in a textbook.
Indexing field (attribute)
Index stores each value of the index field with list of
pointers to all disk blocks that contain records with that
field value
Values in index are ordered
Primary index
Specified on the ordering key field of ordered file of
records
TYPES OF SINGLE-LEVEL ORDERED INDEXES (CONT’D.)
Clustering index
Used if numerous records can have the same value for
the ordering field
Secondary index
Can be specified on any nonordering field
Data file can have several secondary indexes
PRIMARY INDEXES
Ordered file with two fields
Primary key, K(i)
Pointer to a disk block, P(i)
One index entry in the index file for each block in the data
file
Indexes may be dense or sparse
Dense index has an index entry for every search key
value in the data file
Sparse index has entries for only some search values
PRIMARY INDEXES (CONT’D.)
Primary index on the ordering key field of the file
PRIMARY INDEXES (CONT’D.)
Major problem: insertion and deletion of records
Move records around and change index values
Solutions
Use unordered overflow file
Use linked list of overflow records
SOLUTIONS: USE UNORDERED OVERFLOW FILE
Create an overflow file to store new records that cannot fit
into the main ordered file without disrupting its order.
When an insertion occurs, the new record is placed in the
overflow file instead of adjusting the main file.
SOLUTION: LINKED LIST OF OVERFLOW
Use a linked list structure to link overflow records to their
original block in the main file.
Each block in the main file has a pointer to its
corresponding overflow records.
CLUSTERING INDEXES
Clustering field
File records are physically ordered on a nonkey field
without a distinct value for each record
Structure of the Ordered File
Same type as clustering field
Disk block pointer
CLUSTERING INDEXES
The clustering index has an entry for each distinct value of
the clustering field.
There can be only one clustered index per table.
Blocks of fixed size are reserved for each value of the
clustering field to avoid physical reordering during
insertion and deletion.
A clustering index on the
Dept_number ordering
nonkey field of an
EMPLOYEE file
To locate a record:
• Search for the clustering field
value (K(i)) in the Index File.
• Use the Block Pointer (P(i)) to
access the block in the Data
File.
• Search for the record within
the block.
SECONDARY INDEXES
provides a secondary means of accessing a file for which
some primary access already exists.
Ordered file with two fields
Indexing field, K(i)
Block pointer or record pointer, P(i)
Usually need more storage space and longer search time
than primary index
Improved search time for arbitrary record
Dense secondary index
(with block pointers) on a
SECONDARY INDEXES (CONT’D.)
nonordering key field of a
file.
To retrieve a record:
• Find the Index Field Value in
the Index File.
• Use the Block Pointer to
access the corresponding
block in the Data File.
• Search for the desired record
within the block using the
secondary key field.
TYPES OF SINGLE-LEVEL ORDERED INDEXES (CONT’D.)
Table 1 Types of indexes based on the properties of the indexing field
Table 2 Properties of index types
MULTILEVEL INDEXES
Designed to greatly reduce remaining search space as
search is conducted
Reduces the search space by the blocking factor (𝑏𝑓𝑟),
also called the fan-out ( ).
represents the number of entries in a single block and is
larger than 2.
Searching a multilevel index requires approximately block
accesses.
Faster than binary search when 𝑓𝑜>2.
MULTILEVEL INDEXES
Because a single-level index is an ordered file, we can
create a primary index to the index itself ; in this case, the
original index file is called the first-level index and the
index to the index is called the second-level index.
We can repeat the process, creating a third, fourth, ..., top
level until all entries of the top level fit in one disk block
A multi-level index can be created for any type of first-
level index (primary, secondary, clustering) as long as the
first-level index consists of more than one disk block
MULTILEVEL INDEXES
Index file
Considered first (or base level) of a multilevel index
Second level
Primary index to the first level
Third level
Primary index to the second level
A two-level primary index
resembling ISAM (indexed
sequential access method)
organization
[Link] the two-level index
structure in the diagram,
how would you locate the
record with a primary key
of 46?
If a new record with a primary key of 90 needs to be inserted, how would the two-level index structure be updated?
A two-level primary index
resembling ISAM (indexed
sequential access method)
organization
[Link] the two-level index
structure in the diagram,
how would you locate the
record with a primary key
of 46?
DYNAMIC MULTILEVEL INDEXES USING B-TREES AND B+ -
TREES
Tree data structure terminology
Tree is formed of nodes
Each node (except root) has one parent and zero or
more child nodes
Leaf node has no child nodes
Unbalanced if leaf nodes occur at different levels
Nonleaf node called internal node
Subtree of node consists of node and all descendant
nodes
DYNAMIC MULTILEVEL INDEXES USING B-TREES AND B+ -
TREES
Because of the insertion and deletion problem, most multi-
level indexes use B-tree or B+-tree data structures, which
leave space in each tree node (disk block) to allow for new
index entries
These data structures are variations of search trees that
allow efficient insertion and deletion of new search values.
In B-Tree and B+-Tree data structures, each node
corresponds to a disk block
Each node is kept between half-full and completely full
TREE DATA STRUCTURE
A tree data structure that shows an unbalanced tree
SEARCH TREES AND B-TREES
Search tree used to
guide search for a
record
Given value of one
of record’s fields
A node in a search tree with pointers to subtrees below it
SEARCH TREES AND B-TREES (CONT’D.)
Algorithms
necessary for
inserting and
deleting search
values into and
from the tree
A search tree of order p = 3
B-TREES
Provide multi-level access structure
Tree is always balanced
Space wasted by deletion never becomes excessive
Each node is at least half-full
Each node in a B-tree of order p can have at most p-1
search values
B-TREE
B+ -TREES
Data pointers stored only at the leaf nodes
Leaf nodes have an entry for every value of the search
field, and a data pointer to the record if search field is a
key field
For a nonkey search field, the pointer points to a block
containing pointers to the data file records
Internal nodes
Some search field values from the leaf nodes repeated
to guide search
B+ -TREES (CONT’D.)
The nodes of a B+-tree (a) Internal node of a B+-tree with q−1 search values (b)
Leaf node of a B+-tree with q−1 search values and q−1 data pointers
SEARCHING FOR A RECORD WITH SEARCH KEY FIELD VALUE
K, USING A B+ -TREE
Algorithm : Searching for a
record with search key field
value K, using a B+ -Tree
INDEXES ON MULTIPLE KEYS
Multiple attributes involved in many retrieval and update
requests
Composite keys
Access structure using key value that combines
attributes
Partitioned hashing
Suitable for equality comparisons
INDEXES ON MULTIPLE KEYS (CONT’D.)
Grid files
Array with one
dimension for
each search
attribute
Example of a grid array on Dno and Age attributes
OTHER TYPES OF INDEXES
Hash indexes
Secondary structure for file access
Uses hashing on a search key other than the one used
for the primary data file organization
Index entries of form (K, Pr) or (K, P)
Pr: pointer to the record containing the key
P: pointer to the block containing the record for that
key
BITMAP INDEXES
Used with a large number of rows
Creates an index for one or more columns
Each value or value range in the column is indexed
Built on one particular value of a particular field
Array of bits
Existence bitmap
Bitmaps for B+ -tree leaf nodes
FUNCTION-BASED INDEXING
Value resulting from applying some function on a field (or
fields) becomes the index key
Introduced in Oracle relational DBMS
Example
Function UPPER(Lname) returns uppercase
representation
Query
SOME GENERAL ISSUES CONCERNING INDEXING
Physical index
Pointer specifies physical record address
Disadvantage: pointer must be changed if record is moved
Logical index
Used when physical record addresses expected to change
frequently
Entries of the form (K, Kp)
ADDITIONAL ISSUES RELATED TO STORAGE OF RELATIONS
AND INDEXES
Enforcing a key constraint on an attribute
Reject insertion if new record has same key attribute as
existing record
Duplicates occur if index is created on a nonkey field
Fully inverted file
Has secondary index on every field
Indexing hints in queries
Suggestions used to expedite query execution
ADDITIONAL ISSUES RELATED TO STORAGE OF RELATIONS
AND INDEXES (CONT’D.)
Column-based storage of relations
Alternative to traditional way of storing relations by row
Offers advantages for read-only queries
Offers additional freedom in index creation
THANK YOU!