0% found this document useful (0 votes)
26 views41 pages

Indexing

The document discusses various indexing techniques used in advanced database management systems to enhance record retrieval efficiency. It covers single-level ordered indexes, primary, clustering, and secondary indexes, as well as multilevel indexes utilizing B-trees and B+-trees for dynamic indexing. Additionally, it addresses issues related to storage, key constraints, and alternative storage methods like column-based storage.

Uploaded by

chamikalak2001
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views41 pages

Indexing

The document discusses various indexing techniques used in advanced database management systems to enhance record retrieval efficiency. It covers single-level ordered indexes, primary, clustering, and secondary indexes, as well as multilevel indexes utilizing B-trees and B+-trees for dynamic indexing. Additionally, it addresses issues related to storage, key constraints, and alternative storage methods like column-based storage.

Uploaded by

chamikalak2001
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd

02.

INDEXING
ADVANCED DATABASE MANAGEMENT SYSTEMS
ICT 331-2
INTRODUCTION

 Indexes used to speed up record retrieval in response to


certain search conditions.
 Index structures provide secondary access paths.
 Any field can be used to create an index.
 Multiple indexes can be constructed
 Most indexes based on ordered files
 Tree data structures organize the index
TYPES OF SINGLE-LEVEL ORDERED INDEXES
 Ordered index similar to index in a textbook.
 Indexing field (attribute)
 Index stores each value of the index field with list of
pointers to all disk blocks that contain records with that
field value
 Values in index are ordered
 Primary index
 Specified on the ordering key field of ordered file of
records
TYPES OF SINGLE-LEVEL ORDERED INDEXES (CONT’D.)

 Clustering index
 Used if numerous records can have the same value for
the ordering field
 Secondary index
 Can be specified on any nonordering field
 Data file can have several secondary indexes
PRIMARY INDEXES
 Ordered file with two fields
 Primary key, K(i)
 Pointer to a disk block, P(i)
 One index entry in the index file for each block in the data
file
 Indexes may be dense or sparse
 Dense index has an index entry for every search key
value in the data file
 Sparse index has entries for only some search values
PRIMARY INDEXES (CONT’D.)

Primary index on the ordering key field of the file


PRIMARY INDEXES (CONT’D.)

 Major problem: insertion and deletion of records


 Move records around and change index values
 Solutions
 Use unordered overflow file
 Use linked list of overflow records
SOLUTIONS: USE UNORDERED OVERFLOW FILE

 Create an overflow file to store new records that cannot fit


into the main ordered file without disrupting its order.
 When an insertion occurs, the new record is placed in the
overflow file instead of adjusting the main file.
SOLUTION: LINKED LIST OF OVERFLOW

 Use a linked list structure to link overflow records to their


original block in the main file.
 Each block in the main file has a pointer to its
corresponding overflow records.
CLUSTERING INDEXES

 Clustering field
 File records are physically ordered on a nonkey field
without a distinct value for each record
 Structure of the Ordered File
 Same type as clustering field
 Disk block pointer
CLUSTERING INDEXES

 The clustering index has an entry for each distinct value of


the clustering field.
 There can be only one clustered index per table.
 Blocks of fixed size are reserved for each value of the
clustering field to avoid physical reordering during
insertion and deletion.
A clustering index on the
Dept_number ordering
nonkey field of an
EMPLOYEE file

To locate a record:
• Search for the clustering field
value (K(i)) in the Index File.
• Use the Block Pointer (P(i)) to
access the block in the Data
File.
• Search for the record within
the block.
SECONDARY INDEXES
 provides a secondary means of accessing a file for which
some primary access already exists.
 Ordered file with two fields
 Indexing field, K(i)
 Block pointer or record pointer, P(i)
 Usually need more storage space and longer search time
than primary index
 Improved search time for arbitrary record
Dense secondary index
(with block pointers) on a
SECONDARY INDEXES (CONT’D.)
nonordering key field of a
file.

To retrieve a record:
• Find the Index Field Value in
the Index File.
• Use the Block Pointer to
access the corresponding
block in the Data File.
• Search for the desired record
within the block using the
secondary key field.
TYPES OF SINGLE-LEVEL ORDERED INDEXES (CONT’D.)

Table 1 Types of indexes based on the properties of the indexing field

Table 2 Properties of index types


MULTILEVEL INDEXES

 Designed to greatly reduce remaining search space as


search is conducted
 Reduces the search space by the blocking factor (𝑏𝑓𝑟),
also called the fan-out ( ).
 represents the number of entries in a single block and is
larger than 2.
 Searching a multilevel index requires approximately block
accesses.
 Faster than binary search when 𝑓𝑜>2.
MULTILEVEL INDEXES

 Because a single-level index is an ordered file, we can


create a primary index to the index itself ; in this case, the
original index file is called the first-level index and the
index to the index is called the second-level index.
 We can repeat the process, creating a third, fourth, ..., top
level until all entries of the top level fit in one disk block
 A multi-level index can be created for any type of first-
level index (primary, secondary, clustering) as long as the
first-level index consists of more than one disk block
MULTILEVEL INDEXES

 Index file
 Considered first (or base level) of a multilevel index
 Second level
 Primary index to the first level
 Third level
 Primary index to the second level
A two-level primary index
resembling ISAM (indexed
sequential access method)
organization

[Link] the two-level index


structure in the diagram,
how would you locate the
record with a primary key
of 46?
If a new record with a primary key of 90 needs to be inserted, how would the two-level index structure be updated?

A two-level primary index


resembling ISAM (indexed
sequential access method)
organization

[Link] the two-level index


structure in the diagram,
how would you locate the
record with a primary key
of 46?
DYNAMIC MULTILEVEL INDEXES USING B-TREES AND B+ -
TREES
 Tree data structure terminology
 Tree is formed of nodes
 Each node (except root) has one parent and zero or
more child nodes
 Leaf node has no child nodes
 Unbalanced if leaf nodes occur at different levels
 Nonleaf node called internal node
 Subtree of node consists of node and all descendant
nodes
DYNAMIC MULTILEVEL INDEXES USING B-TREES AND B+ -
TREES
 Because of the insertion and deletion problem, most multi-
level indexes use B-tree or B+-tree data structures, which
leave space in each tree node (disk block) to allow for new
index entries
 These data structures are variations of search trees that
allow efficient insertion and deletion of new search values.
 In B-Tree and B+-Tree data structures, each node
corresponds to a disk block
 Each node is kept between half-full and completely full
TREE DATA STRUCTURE

A tree data structure that shows an unbalanced tree


SEARCH TREES AND B-TREES

 Search tree used to


guide search for a
record
 Given value of one
of record’s fields

A node in a search tree with pointers to subtrees below it


SEARCH TREES AND B-TREES (CONT’D.)

 Algorithms
necessary for
inserting and
deleting search
values into and
from the tree
A search tree of order p = 3
B-TREES

 Provide multi-level access structure


 Tree is always balanced
 Space wasted by deletion never becomes excessive
 Each node is at least half-full
 Each node in a B-tree of order p can have at most p-1
search values
B-TREE
B+ -TREES
 Data pointers stored only at the leaf nodes
 Leaf nodes have an entry for every value of the search
field, and a data pointer to the record if search field is a
key field
 For a nonkey search field, the pointer points to a block
containing pointers to the data file records
 Internal nodes
 Some search field values from the leaf nodes repeated
to guide search
B+ -TREES (CONT’D.)

The nodes of a B+-tree (a) Internal node of a B+-tree with q−1 search values (b)
Leaf node of a B+-tree with q−1 search values and q−1 data pointers
SEARCHING FOR A RECORD WITH SEARCH KEY FIELD VALUE
K, USING A B+ -TREE

Algorithm : Searching for a


record with search key field
value K, using a B+ -Tree
INDEXES ON MULTIPLE KEYS

 Multiple attributes involved in many retrieval and update


requests
 Composite keys
 Access structure using key value that combines
attributes
 Partitioned hashing
 Suitable for equality comparisons
INDEXES ON MULTIPLE KEYS (CONT’D.)

 Grid files
 Array with one
dimension for
each search
attribute

Example of a grid array on Dno and Age attributes


OTHER TYPES OF INDEXES
 Hash indexes
 Secondary structure for file access
 Uses hashing on a search key other than the one used
for the primary data file organization
 Index entries of form (K, Pr) or (K, P)
 Pr: pointer to the record containing the key
 P: pointer to the block containing the record for that
key
BITMAP INDEXES

 Used with a large number of rows


 Creates an index for one or more columns
 Each value or value range in the column is indexed
 Built on one particular value of a particular field
 Array of bits
 Existence bitmap
 Bitmaps for B+ -tree leaf nodes
FUNCTION-BASED INDEXING
 Value resulting from applying some function on a field (or
fields) becomes the index key
 Introduced in Oracle relational DBMS
 Example
 Function UPPER(Lname) returns uppercase
representation

 Query
SOME GENERAL ISSUES CONCERNING INDEXING

 Physical index
 Pointer specifies physical record address
 Disadvantage: pointer must be changed if record is moved
 Logical index
 Used when physical record addresses expected to change
frequently
 Entries of the form (K, Kp)
ADDITIONAL ISSUES RELATED TO STORAGE OF RELATIONS
AND INDEXES
 Enforcing a key constraint on an attribute
 Reject insertion if new record has same key attribute as
existing record
 Duplicates occur if index is created on a nonkey field
 Fully inverted file
 Has secondary index on every field
 Indexing hints in queries
 Suggestions used to expedite query execution
ADDITIONAL ISSUES RELATED TO STORAGE OF RELATIONS
AND INDEXES (CONT’D.)

 Column-based storage of relations


 Alternative to traditional way of storing relations by row
 Offers advantages for read-only queries
 Offers additional freedom in index creation
THANK YOU!

You might also like