0% found this document useful (0 votes)

72 views16 pages

Lesson 9 Mod2l2

This document discusses different file organizations and indexing techniques in a database management system. It provides an overview of alternative file organizations like heap files, sorted files, and hashed files. Indexes are described as collections of data entries that allow efficient retrieval of records with a given key value. The document compares the costs of different file organizations and indexes for common operations. It also classifies indexes as primary or secondary, clustered or unclustered, and dense or sparse, noting the performance implications of these classifications.

Uploaded by

Russel Ponferrada

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

72 views16 pages

Lesson 9 Mod2l2

Uploaded by

Russel Ponferrada

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

File Organizations and Indexing

Module 2, Lecture 2
How index-learning turns no student pale
Yet holds the eel of science by the tail.
-- Alexander Pope (1688-1744)
Database Management Systems, R. Ramakrishnan

Alternative File Organizations

Many alternatives exist, each ideal for some
situation , and not so good in others:
Heap files: Suitable when typical access is a file
scan retrieving all records.
Sorted Files: Best if records must be retrieved in
some order, or only a `range of records is needed.
Hashed Files: Good for equality selections.
File is a collection of buckets. Bucket = primary
page plus zero or more overflow pages.
Hashing function h: h(r) = bucket in which
record r belongs. h looks at only some of the
fields of r, called the search fields.
Database Management Systems, R. Ramakrishnan

Cost Model for Our Analysis

We ignore CPU costs, for simplicity:

B: The number of data pages

R: Number of records per page
D: (Average) time to read or write disk page
Measuring number of page I/Os ignores gains of
pre-fetching blocks of pages; thus, even I/O cost is
only approximated.
Average-case analysis; based on several simplistic
assumptions.
Good enough to show the overall trends!
Database Management Systems, R. Ramakrishnan

Assumptions in Our Analysis

Single record insert and delete.

Heap Files:
Equality selection on key; exactly one match.
Insert always at end of file.

Sorted Files:
Files compacted after deletions.
Selections on sort field(s).

Hashed Files:
No overflow buckets, 80% page occupancy.

Database Management Systems, R. Ramakrishnan

Cost of Operations
Heap
File

Sorted
File

Hashed
File

Scan all recs

Equality Search
Range Search
Insert
Delete

Several assumptions underlie these (rough) estimates!

Database Management Systems, R. Ramakrishnan

Cost of Operations

Scan all recs

Heap
File
BD

Equality Search 0.5 BD

Sorted
File
BD

Hashed
File
1.25 BD

D log2B

Range Search

D (log2B + # of 1.25 BD
pages with
matches)
Search + BD
2D

Insert

Delete

Search + D Search + BD

Several assumptions underlie these (rough) estimates!

Database Management Systems, R. Ramakrishnan

Indexes

An index on a file speeds up selections on the

search key fields for the index.
Any subset of the fields of a relation can be the
search key for an index on the relation.
Search key is not the same as key (minimal set of
fields that uniquely identify a record in a relation).

An index contains a collection of data entries,

and supports efficient retrieval of all data
entries k* with a given key value k.

Database Management Systems, R. Ramakrishnan

Alternatives for Data Entry k* in Index

Three alternatives:
Data record with key value k
<k, rid of data record with search key value k>
<k, list of rids of data records with search key k>

Choice of alternative for data entries is

orthogonal to the indexing technique used to
locate data entries with a given key value k.
Examples of indexing techniques: B+ trees, hashbased structures
Typically, index contains auxiliary information that
directs searches to the desired data entries

Database Management Systems, R. Ramakrishnan

Alternatives for Data Entries (Contd.)

Alternative 1:
If this is used, index structure is a file organization
for data records (like Heap files or sorted files).
At most one index on a given collection of data
records can use Alternative 1. (Otherwise, data
records duplicated, leading to redundant storage
and potential inconsistency.)
If data records very large, # of pages containing
data entries is high. Implies size of auxiliary
information in the index is also large, typically.

Database Management Systems, R. Ramakrishnan

Alternatives for Data Entries (Contd.)

Alternatives 2 and 3:
Data entries typically much smaller than data
records. So, better than Alternative 1 with large
data records, especially if search keys are small.
(Portion of index structure used to direct search is
much smaller than with Alternative 1.)
If more than one index is required on a given file, at
most one index can use Alternative 1; rest must use
Alternatives 2 or 3.
Alternative 3 more compact than Alternative 2, but
leads to variable sized data entries even if search
keys are of fixed length.

Database Management Systems, R. Ramakrishnan

Index Classification

Primary vs. secondary: If search key contains

primary key, then called primary index.
Unique index: Search key contains a candidate key.

Clustered vs. unclustered: If order of data records

is the same as, or `close to, order of data entries,
then called clustered index.
Alternative 1 implies clustered, but not vice-versa.
A file can be clustered on at most one search key.
Cost of retrieving data records through index varies
greatly based on whether index is clustered or not!

Database Management Systems, R. Ramakrishnan

Clustered vs. Unclustered Index

Suppose that Alternative (2) is used for data entries,

and that the data records are stored in a Heap file.
To build clustered index, first sort the Heap file (with
some free space on each page for future inserts).
Overflow pages may be needed for inserts. (Thus, order of
data recs is `close to, but not identical to, the sort order.)

CLUSTERED

Index entries
direct search for
data entries

Data entries

UNCLUSTERED

Data entries
(Index File)
(Data file)

Records
Database Management Systems, R.Data
Ramakrishnan

Data Records

Index Classification (Contd.)

Dense vs. Sparse: If

there is at least one data
entry per search key
value (in some data
record), then dense.
Alternative 1 always
leads to dense index.
Every sparse index is
clustered!
Sparse indexes are
smaller; however, some
useful optimizations are
based on dense indexes.

Database Management Systems, R. Ramakrishnan

Ashby, 25, 3000

22
Basu, 33, 4003
Bristow, 30, 2007

25
30

Ashby

33
Cass

Cass, 50, 5004

Smith

Daniels, 22, 6003

Jones, 40, 6003

40
44
44

Smith, 44, 3000

50
Tracy, 44, 5004

Sparse Index
on
Name

Data File

Dense Index
on
Age

Index Classification (Contd.)

Composite Search Keys: Search

on a combination of fields.
Equality query: Every field
value is equal to a constant
value. E.g. wrt <sal,age> index:

age=20 and sal =75

Range query: Some field value

is not a constant. E.g.:

age =20; or age=20 and sal > 10

Data entries in index sorted

by search key to support
range queries.
Lexicographic order, or
Spatial order.

Database Management Systems, R. Ramakrishnan

Examples of composite key

indexes using lexicographic order.
11,80

12,10

12,20
13,75
<age, sal>
10,12
20,12
75,13

name age sal

bob 12

cal

joe 12

sue 13

<age>
10

Data records
sorted by name

80,11
<sal, age>

Data entries in index

sorted by <sal,age>

20
75
80
<sal>

Data entries
sorted by <sal>
14

Summary

Many alternative file organizations exist, each

appropriate in some situation.
If selection queries are frequent, sorting the
file or building an index is important.
Hash-based indexes only good for equality search.
Sorted files and tree-based indexes best for range
search; also good for equality search. (Files rarely
kept sorted in practice; B+ tree index is better.)

Index is a collection of data entries plus a way

to quickly find entries with given key values.

Database Management Systems, R. Ramakrishnan

Summary (Contd.)

Data entries can be actual data records, <key,

rid> pairs, or <key, rid-list> pairs.
Choice orthogonal to indexing technique used to
locate data entries with a given key value.

Can have several indexes on a given file of

data records, each with a different search key.
Indexes can be classified as clustered vs.
unclustered, primary vs. secondary, and
dense vs. sparse. Differences have important
consequences for utility/performance.

Database Management Systems, R. Ramakrishnan

Database Indexing Essentials
No ratings yet
Database Indexing Essentials
16 pages
MYCH8
No ratings yet
MYCH8
35 pages
Storage and Indexing
No ratings yet
Storage and Indexing
32 pages
Ch08 Storage Indexing Overview
No ratings yet
Ch08 Storage Indexing Overview
5 pages
Ch8 Storage Indexing Overview-95
No ratings yet
Ch8 Storage Indexing Overview-95
32 pages
Storage and Indexing in Databases
No ratings yet
Storage and Indexing in Databases
65 pages
Chap. 2 File Organization and Indexing: Abel J.P. Gomes
No ratings yet
Chap. 2 File Organization and Indexing: Abel J.P. Gomes
20 pages
Ch8 Storage Indexing Overview-95
No ratings yet
Ch8 Storage Indexing Overview-95
25 pages
Ch8 Storage Indexing Overview 95 HH Rev 1
No ratings yet
Ch8 Storage Indexing Overview 95 HH Rev 1
42 pages
File Storage and Indexing Guide
No ratings yet
File Storage and Indexing Guide
13 pages
V Unit
No ratings yet
V Unit
15 pages
V Unit
No ratings yet
V Unit
36 pages
Database File Organization Guide
No ratings yet
Database File Organization Guide
26 pages
DBMS Unit-5 Notes
No ratings yet
DBMS Unit-5 Notes
23 pages
File Organization and Indexing in Databases
No ratings yet
File Organization and Indexing in Databases
45 pages
Lesson 9 Lecture9
No ratings yet
Lesson 9 Lecture9
45 pages
File Organizations and Indexing: R&G Chapter 8
No ratings yet
File Organizations and Indexing: R&G Chapter 8
40 pages
Lecture12 (CNC 312)
No ratings yet
Lecture12 (CNC 312)
36 pages
Layers of A DBMS
No ratings yet
Layers of A DBMS
38 pages
DBMS Storage and Indexing
No ratings yet
DBMS Storage and Indexing
80 pages
Database Storage & Indexing Guide
No ratings yet
Database Storage & Indexing Guide
41 pages
Indexing
No ratings yet
Indexing
62 pages
Efficient File Indexing Methods
No ratings yet
Efficient File Indexing Methods
40 pages
File Organizations and Indexing: R&G Chapter 8
No ratings yet
File Organizations and Indexing: R&G Chapter 8
40 pages
W5 Storage Files Indexing pt1
No ratings yet
W5 Storage Files Indexing pt1
61 pages
26 - Databse Indexes
No ratings yet
26 - Databse Indexes
48 pages
Unit08 DBMS
100% (1)
Unit08 DBMS
45 pages
Database Management Systems Overview
No ratings yet
Database Management Systems Overview
45 pages
Storage and Indexing Methods
No ratings yet
Storage and Indexing Methods
43 pages
Lec 7
No ratings yet
Lec 7
34 pages
Unit-5 DBMS
No ratings yet
Unit-5 DBMS
28 pages
Database Storage and Indexing
No ratings yet
Database Storage and Indexing
14 pages
File Organization
No ratings yet
File Organization
19 pages
Module Iippt
No ratings yet
Module Iippt
27 pages
DBMS Unit 5
No ratings yet
DBMS Unit 5
58 pages
Lec20Indexing v1
No ratings yet
Lec20Indexing v1
57 pages
Index 1
No ratings yet
Index 1
25 pages
CIT 401 Lecture Note
No ratings yet
CIT 401 Lecture Note
46 pages
Class 6
No ratings yet
Class 6
15 pages
Types of Indexing Methods Explained
No ratings yet
Types of Indexing Methods Explained
60 pages
DP Ss3 Note First Term
67% (3)
DP Ss3 Note First Term
43 pages
Query Processing, Optimization, and Indexing Techniques
No ratings yet
Query Processing, Optimization, and Indexing Techniques
29 pages
File Structure and Indexing
No ratings yet
File Structure and Indexing
7 pages
File Organization and Indexing Methods
No ratings yet
File Organization and Indexing Methods
35 pages
L4 Indexing
No ratings yet
L4 Indexing
56 pages
UNIT-IV - File Organization
No ratings yet
UNIT-IV - File Organization
10 pages
IT3020 L06 Indexing
No ratings yet
IT3020 L06 Indexing
41 pages
Unit5 File Organization
No ratings yet
Unit5 File Organization
112 pages
11.2 Indexing
No ratings yet
11.2 Indexing
26 pages
Indexing Hashing Files
No ratings yet
Indexing Hashing Files
68 pages
DBMS-U5 Notes
No ratings yet
DBMS-U5 Notes
16 pages
Lesson 7 INF211 Lect 08
No ratings yet
Lesson 7 INF211 Lect 08
29 pages
Database Indexing Basics
No ratings yet
Database Indexing Basics
31 pages
B+ Trees and Multilevel Indexing Explained
No ratings yet
B+ Trees and Multilevel Indexing Explained
33 pages
Cosequential Processing (Sorting Large Files)
No ratings yet
Cosequential Processing (Sorting Large Files)
8 pages
Introduction to File Management
No ratings yet
Introduction to File Management
82 pages
B+ Trees and Multilevel Indexing Explained
No ratings yet
B+ Trees and Multilevel Indexing Explained
38 pages
HPE CS3200: Data Storage & Indexing
No ratings yet
HPE CS3200: Data Storage & Indexing
22 pages
Cosequential Processing and Sorting
No ratings yet
Cosequential Processing and Sorting
71 pages
Organizing Files For Performance: Jim Skon
No ratings yet
Organizing Files For Performance: Jim Skon
36 pages
Lesson 3 Fileorganization 111101105553 Phpapp02
No ratings yet
Lesson 3 Fileorganization 111101105553 Phpapp02
23 pages
File Processing in C++ and Unix
No ratings yet
File Processing in C++ and Unix
31 pages
Database Indexing Techniques Overview
No ratings yet
Database Indexing Techniques Overview
13 pages
Lesson 2-3 Fundamental File Processing Operations
No ratings yet
Lesson 2-3 Fundamental File Processing Operations
16 pages
Lesson 3 - 1 Managing Files of Records
No ratings yet
Lesson 3 - 1 Managing Files of Records
18 pages
Lesson 2-2
No ratings yet
Lesson 2-2
8 pages
First Grade Tech Skills Course
No ratings yet
First Grade Tech Skills Course
20 pages
BSIMM6: Software Security Maturity Model
No ratings yet
BSIMM6: Software Security Maturity Model
24 pages
Cisco Trustsec Sgacl High Availability
No ratings yet
Cisco Trustsec Sgacl High Availability
4 pages
Huawei USG6000E Configuration Guide
No ratings yet
Huawei USG6000E Configuration Guide
5 pages
CWS-250 Citrix DaaS Deployment and Administration
No ratings yet
CWS-250 Citrix DaaS Deployment and Administration
3 pages
AT24C164 16K EEPROM Specifications
No ratings yet
AT24C164 16K EEPROM Specifications
11 pages
Class 11 BST Chapter-5
No ratings yet
Class 11 BST Chapter-5
11 pages
SAP BPC Tutorial: Planning & Reporting Guide
No ratings yet
SAP BPC Tutorial: Planning & Reporting Guide
5 pages
Searching - 9618 Computer Science
No ratings yet
Searching - 9618 Computer Science
10 pages
Untitled
No ratings yet
Untitled
1 page
LabVIEW Graphical Programming
100% (1)
LabVIEW Graphical Programming
128 pages
Five Steps For Moving Away From Tape Backup
No ratings yet
Five Steps For Moving Away From Tape Backup
13 pages
Download Test Banks & Solutions
100% (95)
Download Test Banks & Solutions
33 pages
CBSE CLass 6 A QP Half Yearly
No ratings yet
CBSE CLass 6 A QP Half Yearly
3 pages
LNMIIT B.Tech ECE Curriculum Overview
No ratings yet
LNMIIT B.Tech ECE Curriculum Overview
4 pages
Basic & Advanced SQL Interview Questions and Answers
No ratings yet
Basic & Advanced SQL Interview Questions and Answers
24 pages
Understanding Computer Crime
No ratings yet
Understanding Computer Crime
20 pages
VPLAN Vyshnavi Chilukamukku
No ratings yet
VPLAN Vyshnavi Chilukamukku
4 pages
Silicon India Magazine
No ratings yet
Silicon India Magazine
2 pages
Intel CBB Platform Design Guide
100% (1)
Intel CBB Platform Design Guide
47 pages
Sudhanshu Resume
No ratings yet
Sudhanshu Resume
1 page
ThinkAir: Cloud-Based Mobile Offloading
No ratings yet
ThinkAir: Cloud-Based Mobile Offloading
9 pages
Accidentally Dropped Table Recovery Using RMAN Cloning
No ratings yet
Accidentally Dropped Table Recovery Using RMAN Cloning
17 pages
Application Events For Developers
No ratings yet
Application Events For Developers
136 pages
Overview of Backing Storage Devices
No ratings yet
Overview of Backing Storage Devices
16 pages
XP Embedded Boot Options Jones
No ratings yet
XP Embedded Boot Options Jones
66 pages
Risks of On-Premises Data Centers
No ratings yet
Risks of On-Premises Data Centers
5 pages
ET3491 Final - Merged Full Manual
100% (3)
ET3491 Final - Merged Full Manual
53 pages
Mastering OSPF for CCNP Route
No ratings yet
Mastering OSPF for CCNP Route
11 pages
A Comparative Study Between Applications
No ratings yet
A Comparative Study Between Applications
7 pages

Lesson 9 Mod2l2

Uploaded by

Lesson 9 Mod2l2

Uploaded by

File Organizations and Indexing

Alternative File Organizations

Cost Model for Our Analysis

B: The number of data pages

Assumptions in Our Analysis

Single record insert and delete.

Database Management Systems, R. Ramakrishnan

Scan all recs

Several assumptions underlie these (rough) estimates!

Scan all recs

Equality Search 0.5 BD

Several assumptions underlie these (rough) estimates!

An index on a file speeds up selections on the

An index contains a collection of data entries,

Database Management Systems, R. Ramakrishnan

Alternatives for Data Entry k* in Index

Choice of alternative for data entries is

Database Management Systems, R. Ramakrishnan

Alternatives for Data Entries (Contd.)

Database Management Systems, R. Ramakrishnan

Alternatives for Data Entries (Contd.)

Database Management Systems, R. Ramakrishnan

Primary vs. secondary: If search key contains

Clustered vs. unclustered: If order of data records

Database Management Systems, R. Ramakrishnan

Clustered vs. Unclustered Index

Suppose that Alternative (2) is used for data entries,

Index Classification (Contd.)

Dense vs. Sparse: If

Database Management Systems, R. Ramakrishnan

Ashby, 25, 3000

Cass, 50, 5004

Daniels, 22, 6003

Smith, 44, 3000

Index Classification (Contd.)

Composite Search Keys: Search

age=20 and sal =75

Range query: Some field value

age =20; or age=20 and sal > 10

Data entries in index sorted

Database Management Systems, R. Ramakrishnan

Examples of composite key

name age sal

Data entries in index

Many alternative file organizations exist, each

Index is a collection of data entries plus a way

Database Management Systems, R. Ramakrishnan

Data entries can be actual data records, <key,

Can have several indexes on a given file of

Database Management Systems, R. Ramakrishnan

You might also like