0% found this document useful (0 votes)

2 views52 pages

UNIT 4 Updated - 121124

Unit IV covers various aspects of Database Management Systems, focusing on data storage, indexing methods, and file organization. Key topics include the use of external storage, the role of buffer and disk space managers, and different indexing techniques such as primary, secondary, and hash-based indexing. Additionally, it discusses tree-based indexing structures like B-Trees and their efficiency in data retrieval.

Uploaded by

k.nikhil1701

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views52 pages

UNIT 4 Updated - 121124

Uploaded by

k.nikhil1701

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 52

UNIT IV

Database Management System

UNIT 4

 Indexes on Sequential Files

 Secondary Indexes

 B-Trees

 Hash Tables
DATA ON EXTERNAL STORAGE
 A DBMS stores vast quantity of data and the data must
persist across program execution
 Therefore, data is stored on external storage devices such as
disks and tapes, and fetched into main memory as needed
for processing.
 The unit of information read from or written to disk is a page
 The size of a page is a DBMS parameter and typical values
are 4KB and 8KB.
 The cost of page I/O (input from disk to main memory and
output from memory to disk) dominates the cost of typical
database operations, and database systems are carefully
optimized to minimize this cost.
 Disks are the most important external storage
devices. They allow us to retrieve any page at a (more
or less) fixed cost per page. However, if we read
several pages in the order that they are stored
physically, the cost can be much less than the cost if
reading the same pages in a random order.
 Tapes are sequential access devices and forces us to
read one page after the other. They are mostly used
to archive data that is not needed on regular basis.
 Each record in a file has a unique identifier called a
record id, (or rid). An rid has the property that we can
identify the disk address of the page containing the
record by using the rid.
 Data is read into memory for processing, and written to
disk for persistent storage, by a layer of software called
the buffer manager.
 When the files and access methods layer needs to
process a page, it asks the buffer manager to fetch the
page, specifying the page’s rid.
 The buffer manager fetches the page from disk if it is not
already in memory.
 Space on disk is managed by the disk space manager,
according to the DBMS software architecture.
 When the files and access methods layer needs
additional space to hold new records in a file, it asks
page the disk space manager to allocate an additional
disk page for the file, it also informs the disk space
manager when it no longer need one of its disk pages.
 The disk space manager keeps track of the pages in use by
the file layer, if a page is freed by the file layer, the space
manager tracks this and reuses the space if the file layer
requests a new page later on.
 Data on external storage is follows:
File and Access method

Buffer Manager

Disk space Manager

DISK
FILE ORGANIZATION
DB Database
collection

FILES

Sequence

RECORDS

Sequence

FIELDS

In Database the files are stored in sequential blocks in contiguous allocation

THE PROCESS OF GETTING THE RECORD INTO THE CPU FOR PROCESSING.(BUFFER MANAGER)

Secondary Memory

 RAM DISK

CPU

Harddisk, Transfers a complete block to the MM

 File organization is,
 how data is organized in Hard Disk
 how we are searching record in HD
 how it deletes and insert record in HD
in a efficient manner
WHAT IS AN INDEXING AND WHY INDEXING IS
USED
THE PROCESS OF GETTING THE RECORD INTO THE CPU FOR PROCESSING.
(BUFFER MANAGER)
Student details ie Student
Query: Select * from Student where RNO = 501 record is stored
permanently in Hard disk
 Query should be RAM /MM HD
processed by CPU Slow

MIPSCPU B0 B1 B2

B3 B4 B5

B6 Bn

HD is divided into logical

Blocks / Pages Blocks / Pages
I/O COST IS REDUCED BY INDEXING Example Book indexing
INDEXING HD
Query: Select * from Student where RNO = 501
If size of the block is 100 and there are 10000 records in student db, then the no. of blocks required is
10000/100=100
Data will be stored in two ways in HD, Sorted (ordered) or
Block 0
Unsorted(Unordered). Let us take that our db is
 in unorderered way in HD. RAM
stored
Our aim is reducing I/O cost
example ie getting less no. Block 1
blocks to MM
CPU
Block 2

Block 3
If CPU want to search for RNo. 501 from HD, each
block will be brought to MM, if not found block will be
sent back and will bring the next block for search.
Block n
Number of blocks we are calling to MM is called I/O cost
IMPLEMENTATION OF INDEXING
If our db is stored in ordered then, number of
entries in IF is number of blocks in HD or number
of entries will be equal to number of records in 501
HD. B0
502

Index File 503

B1
504
SK BP
505

B2
506

SK-Search Key may be R.No

BP-Block Point
Ordered file Primary Index Clustered index

Unordered file Secondary Index Secondary Index

Key Non key

Main types of Indexing:

1. Primary Indexing
2. Clustered Indexing
3. Secondary Indexing
PRIMARY INDEX

 If the index is created on the basis of the primary key of the

table, then it is known as primary indexing. These primary keys
are unique to each record and contain 1:1 relation between the
records.

 As primary keys are stored in sorted order, the performance of

the searching operation is quite efficient.

 The primary index can be classified into two types: Dense index
and Sparse index.
Dense index
 The dense index contains an index record for every search
key value in the data file. It makes searching faster.
 In this, the number of records in the index table is same as
the number of records in the main table.
 It needs more space to store index record itself. The index
records have the search key and a pointer to the actual
record on the disk.
Sparse index
 In the data file, index record appears only for a few items.
Each item points to a block.
 In this, instead of pointing to each record in the main table,
the index points to the records in the main table in a gap.
SECONDARY INDEX
In the sparse indexing, as the size of the table grows, the size of mapping also
grows. These mappings are usually kept in the primary memory so that
address fetch should be faster. Then the secondary memory searches the
actual data based on the address got from mapping. If the mapping size
grows then fetching the address itself becomes slower. In this case, the
sparse index will not be efficient. To overcome this problem, secondary
indexing is introduced.

In secondary indexing, to reduce the size of mapping, another level of

indexing is introduced. In this method, the huge range for the columns is
selected initially so that the mapping size of the first level becomes small.
Then each range is further divided into smaller ranges. The mapping of the
first level is stored in the primary memory, so that address fetch is faster. The
mapping of the second level and actual data are stored in the secondary
memory (hard disk).
A secondary index is usually dense (i.e., it has
an entry for every record) because it must point
directly to each record that contains a specific
value in the indexed column. Each entry in the
secondary index contains:

 The value of the indexed attribute (e.g., department name, city).

 A pointer to the location (or address) of the record in the sequential file.
If we want to find the record of roll 211 in the diagram, then it will search the
highest entry which is smaller than or equal to 211 in the first level index. It will
get 200 at this level.
Then in the second index level, again it does max (211) <= 211 and gets 210.
Now using the address 210, it goes to the data block and starts searching each
record till it gets 211.
 Dense Indexing- In a dense index, there is an index
record for every search key value in the database. This
makes searching faster but requires more space to store
index records itself. Index records contain search key
value and a pointer to the actual record on the disk.

 Sparse Indexing- In a sparse index, index records are not

created for every search key. An index record here
contains a search key and an actual pointer to the data
on the disk.
SOME OF THE SEARCHING TECHNIQUES

 Linear Search O(n)

9 6 7 10 3 12 5 8 15 1

0 1 2 3 4 5 6 7 8 9

 Binary Search O(log n)

1 3 5 6 7 8 9 10 12 15

0 1 2 3 4 5 6 7 8 9

Order of n we have moved to order of log n but actually we expect

order as just 1
Idea behind creating hash indexing

 Keys: 9,6,7,12,15,22…
6 7 9 12

0 1 2 3 4 5 6 7 8 9 10 11 12

If need to search for the key element 6, then directly go to index 6

HASH BASED INDEXING
 We can organize records using a technique called hashing to
quickly find records that have a given search key value.
 In this technique the records in a file are grouped in buckets
 A bucket consists of a primary page and possibly additional
pages linked in a chain.
 The bucket to which a record belongs can be determined by
applying a special function called a hash function m to the
search key.
 Given a bucket number, a hash based index structure allows
us to retrieve the primary key page for the bucket in one or
two disk I/Os
HASH BASED INDEXING
 Index entries partitioned into buckets in accordance
with a hash function, h(v), where v ranges are a search
key values.
 Each bucket is identified by an address ‘a’
 Bucket at address ‘a’ contains all index entries with search
key ‘v’ such that h(v)=a

 Each bucket is stored in a page(with possible overflow chain)

 If index entries contain rows, set of buckets forms an

integrated storage structure, else set of buckets forms an
(unclustered) secondary index.
HASH FUNCTIONS
1 DIVISION
2 MID SQUARE
3 DIGIT FOLDER
4 MULTIPLICATE (H(K)= FLR(T.S*(K*A))/SIZE
A=0.6180
K=23
TS=10
23=F(10*
WHAT IS COLLISION?
Hash collision is a state when the resultant hashes from two
or more data in the data set, wrongly map the same place in
the hash table.
How to deal with Hashing Collision?
There are two technique which you can use to avoid a hash
collision:
Rehashing: This method, invokes a secondary hash function,
which is applied continuously until an empty slot is found,
where a record should be placed.
Chaining: Chaining method builds a Linked list of items
whose key hashes to the same value. This method requires
an extra link field to each table position.
Given V
1) Evaluate h(v)
2) Fetch bucket at h(v)
3) Search bucket

Cost is number of pages in bucket (cheaper than B+ tree if no overflow

chains
INDEXING IS OF TWO TYPES –
1. SINGLE LEVEL INDEXING
2. MULTILEVEL INDEXING

 1. Single level indexing 501

B0
502

Index File 503

B1
504
SK BP
505

B2
506

SK-Search Key may be R.No

BP-Block Point
2. MULTILEVEL INDEXING

Data value pointer

501

B0
502

Index File
503
B1
504
SK BP
505

B2
506

Root Internal Leaf

node node node
SK-Search Key may be R.No
BP-Block Point
TREE-BASED INDEXING
 1. An alternative to hash-based indexing is to organize records using a tree like
structure

 2. The data entries are arranged in sorted order by search key value, and a
hierarchical search data structure is maintained that directs searches to the
correct page of data entries.

 3. The lowest level of the tree called leaf node, contains the data entries.

 4. This structure allows us to efficiently locate all the data entries with search
key values in a desired range.

 5. All searches begins at the topmost node, called the root, and the contents of
pages in non-leaf levels direct searches to the correct leaf page.

 6. Non-leaf pages contain node pointers separated by search key values.

 7. The node pointer to the left of a key value k points to a
subtree that contains only data entries less than k.
 8. The node pointer to the right of a key value k points to a
subtree that contains only data entries greater than or equal to k.
 9. The number of I/Os incurred during a search is equal to the
length of a path from the root to a leaf, plus the number of leaf
pages with qualifying data entries.
 10. The height of a balanced tree is the length of a path from
root to leaf.
 11. The average number of children for a non-leaf node is
called the fan-out of the tree.
 12. If every non-leaf node has ‘n’ children, a tree of height ‘h’
has nh leaf pages,
EXAMPLE FOR B-TREE

 Create B-tree with order 4 (Degree)

 Max no. of children = 4
 Max no. of keys = m-1 = 4-1 = 3

 Min no. of children = m/2 = 2

 Min no. of keys = m/2 – 1 = 1

 Keys = 10, 20, 40, 50, 60, 70, 80, 33, 35, 5, 15
Keys = 10, 20, 40, 50, 60, 70, 80, 33, 35, 5, 15

 S1 : 10 20 40

 S2: Insert 50 S3
40 S4

10 20 50 60 70

 S5: Insert 80

40 70

10 20 33 50 60 80

S6
Keys = 10, 20, 40, 50, 60, 70, 80, 33, 35, 5, 15

S6: Insert 33 40 70

10 20 33 50 60 80

S7: Insert 35
33 40 70

5 10 20 35 50 60 80

S8: Insert 5
15 33 40 70
S9: Insert 15

50 60 80
5 10 20 35
Keys = 10, 20, 40, 50, 60, 70, 80, 33, 35, 5, 15

15 33 40 70

5 10 20 35 50 60 80

Key value
Block pointer
40
Record pointer

15 33 70

5 10 20 35 50 60 80
CONSTRUCT B-TREE OF ORDER 4
 Keys = 1, 2, 3, 4, 5, 6, 7, 8, 9, 10

DBMS Unit-5
No ratings yet
DBMS Unit-5
5 pages
Unit - 5 - Part 2
No ratings yet
Unit - 5 - Part 2
33 pages
DBMS Unit 5
No ratings yet
DBMS Unit 5
58 pages
08 File Handling
No ratings yet
08 File Handling
18 pages
File Organization
No ratings yet
File Organization
47 pages
JNTUH Dbms Unit5
No ratings yet
JNTUH Dbms Unit5
56 pages
Layers of A DBMS
No ratings yet
Layers of A DBMS
38 pages
SelfStudy - Chapter 10, 11 - File Structure, Indexing and Hashing
No ratings yet
SelfStudy - Chapter 10, 11 - File Structure, Indexing and Hashing
33 pages
DBMS Unit9
No ratings yet
DBMS Unit9
44 pages
Unit 5
No ratings yet
Unit 5
185 pages
L4 Indexing
No ratings yet
L4 Indexing
56 pages
Lec20Indexing v1
No ratings yet
Lec20Indexing v1
57 pages
SQL Indexes 2
No ratings yet
SQL Indexes 2
10 pages
Chapter 8 Indexing NEW
No ratings yet
Chapter 8 Indexing NEW
43 pages
Indexing in DBMS
No ratings yet
Indexing in DBMS
7 pages
11.2 Indexing
No ratings yet
11.2 Indexing
26 pages
Indexing and Hashing: Basic Concept, Ordered Indices: Adbms
No ratings yet
Indexing and Hashing: Basic Concept, Ordered Indices: Adbms
22 pages
Index 1
No ratings yet
Index 1
25 pages
DINLect 1
No ratings yet
DINLect 1
69 pages
Chapter 11: Indexing and Storage: Modified From: Database System Concepts, 6 Ed
No ratings yet
Chapter 11: Indexing and Storage: Modified From: Database System Concepts, 6 Ed
53 pages
Chapter 6
No ratings yet
Chapter 6
62 pages
Unit 1
No ratings yet
Unit 1
11 pages
DBMS Unit 5 Notes
No ratings yet
DBMS Unit 5 Notes
28 pages
Co3 Session 21
No ratings yet
Co3 Session 21
53 pages
Indexing in DBMS
No ratings yet
Indexing in DBMS
6 pages
CIT 401 Lecture Note
No ratings yet
CIT 401 Lecture Note
46 pages
Mod4 Chap10 - 11 Indexing
No ratings yet
Mod4 Chap10 - 11 Indexing
77 pages
W5 Storage Files Indexing pt1
No ratings yet
W5 Storage Files Indexing pt1
61 pages
Unit 6
No ratings yet
Unit 6
38 pages
Dbms r18 Unit 5 Notes
No ratings yet
Dbms r18 Unit 5 Notes
24 pages
DBMS Unit-5
No ratings yet
DBMS Unit-5
33 pages
DBMS Unit-5
No ratings yet
DBMS Unit-5
23 pages
Indexing in DBMS
No ratings yet
Indexing in DBMS
4 pages
22-File Organization-06-09-2024
No ratings yet
22-File Organization-06-09-2024
23 pages
DBMS Storage and Indexing
No ratings yet
DBMS Storage and Indexing
90 pages
S - UNIT VII Indexing in Database
No ratings yet
S - UNIT VII Indexing in Database
9 pages
Index and Hashing 2017 Combined
No ratings yet
Index and Hashing 2017 Combined
60 pages
Dbms r18 Unit 5 Notes
No ratings yet
Dbms r18 Unit 5 Notes
24 pages
Indexing
No ratings yet
Indexing
62 pages
File Organization
No ratings yet
File Organization
41 pages
Indexing - II
No ratings yet
Indexing - II
57 pages
UNIT-IV - File Organization
No ratings yet
UNIT-IV - File Organization
10 pages
IE461-Lecture 01 - Introductioin To CIM
No ratings yet
IE461-Lecture 01 - Introductioin To CIM
36 pages
Inls 623 - Database Systems Ii - File Structures, Indexing, and Hashing
No ratings yet
Inls 623 - Database Systems Ii - File Structures, Indexing, and Hashing
41 pages
Wepik Understanding Ind As 38 Intangible Assets Recognition and Measurement 20240331091817zF5O
No ratings yet
Wepik Understanding Ind As 38 Intangible Assets Recognition and Measurement 20240331091817zF5O
12 pages
Boom Crack
No ratings yet
Boom Crack
12 pages
CES EduPack 2019 Quick Start Exercises
No ratings yet
CES EduPack 2019 Quick Start Exercises
34 pages
DBMS-U5 Notes
No ratings yet
DBMS-U5 Notes
16 pages
File Storage and Indexing: Lesson 13 Cs 3200 Kathleen Durant PHD
No ratings yet
File Storage and Indexing: Lesson 13 Cs 3200 Kathleen Durant PHD
46 pages
Memoryhierarchy Indexing
No ratings yet
Memoryhierarchy Indexing
9 pages
DBMS Internals: How Does It All Work?
No ratings yet
DBMS Internals: How Does It All Work?
94 pages
DBMS Storage and Indexing
No ratings yet
DBMS Storage and Indexing
80 pages
Brochure
No ratings yet
Brochure
2 pages
Unit-6 Storage Strategies
No ratings yet
Unit-6 Storage Strategies
43 pages
Telephoto Lenses
No ratings yet
Telephoto Lenses
20 pages
Indexing
No ratings yet
Indexing
6 pages
OM Chapter 7S
No ratings yet
OM Chapter 7S
49 pages
c700 500 Medical Cockpit 9066275 Ds en
No ratings yet
c700 500 Medical Cockpit 9066275 Ds en
4 pages
Lecture9 PDF
No ratings yet
Lecture9 PDF
45 pages
Xie Self-Training With Noisy Student Improves ImageNet Classification CVPR 2020 Paper
No ratings yet
Xie Self-Training With Noisy Student Improves ImageNet Classification CVPR 2020 Paper
12 pages
Lesson 9 Lecture9
No ratings yet
Lesson 9 Lecture9
45 pages
IST Charter Sept18 EN Final
No ratings yet
IST Charter Sept18 EN Final
16 pages
Comp 1 2023 2 Keyboarding
No ratings yet
Comp 1 2023 2 Keyboarding
26 pages
Problem Statement Tesis
No ratings yet
Problem Statement Tesis
2 pages
CQ Amateur Radio November 2020
No ratings yet
CQ Amateur Radio November 2020
116 pages
Smart Helmet For Mining Workers
No ratings yet
Smart Helmet For Mining Workers
7 pages
Lt20 21 Index
No ratings yet
Lt20 21 Index
28 pages
CL CyberAware AE Sk2of2
No ratings yet
CL CyberAware AE Sk2of2
38 pages
SCDL Examination Demo
No ratings yet
SCDL Examination Demo
27 pages
Class 8th Scienc
No ratings yet
Class 8th Scienc
1 page
2 Tgeu
No ratings yet
2 Tgeu
1 page
Activity 3 - UML Diagram-1
No ratings yet
Activity 3 - UML Diagram-1
18 pages
Chap. 2 File Organization and Indexing: Abel J.P. Gomes
No ratings yet
Chap. 2 File Organization and Indexing: Abel J.P. Gomes
20 pages
Axle Counter
No ratings yet
Axle Counter
6 pages
10.swift Securitas - Profile
No ratings yet
10.swift Securitas - Profile
23 pages
Historical Development of Science and Technology in The Philippines
No ratings yet
Historical Development of Science and Technology in The Philippines
4 pages
Chapter Four Discussion Questions
No ratings yet
Chapter Four Discussion Questions
4 pages
Lecture3 File Orgn
No ratings yet
Lecture3 File Orgn
13 pages
Data Indexing Presentation
No ratings yet
Data Indexing Presentation
38 pages
Fem Objective Questions
No ratings yet
Fem Objective Questions
12 pages
V Pol BiDiR
No ratings yet
V Pol BiDiR
1 page
York Chiller - Maintenance Requirements
No ratings yet
York Chiller - Maintenance Requirements
3 pages
CD # 0078 Pumps and Pumping Operations
100% (1)
CD # 0078 Pumps and Pumping Operations
2 pages
Communication System by Simon Haykin 2nd Edition PDF
50% (8)
Communication System by Simon Haykin 2nd Edition PDF
2 pages
DBMS Unit V
No ratings yet
DBMS Unit V
17 pages
Evacuation and Fire Safety Plan
100% (7)
Evacuation and Fire Safety Plan
12 pages

UNIT 4 Updated - 121124

Uploaded by

UNIT 4 Updated - 121124

Uploaded by

UNIT IV

Database Management System

 Indexes on Sequential Files

Disk space Manager

In Database the files are stored in sequential blocks in contiguous allocation

Harddisk, Transfers a complete block to the MM

HD is divided into logical

Index File 503

SK-Search Key may be R.No

Unordered file Secondary Index Secondary Index

Key Non key

Main types of Indexing:

 If the index is created on the basis of the primary key of the

 As primary keys are stored in sorted order, the performance of

In secondary indexing, to reduce the size of mapping, another level of

 The value of the indexed attribute (e.g., department name, city).

 Sparse Indexing- In a sparse index, index records are not

 Linear Search O(n)

 Binary Search O(log n)

Order of n we have moved to order of log n but actually we expect

If need to search for the key element 6, then directly go to index 6

 Each bucket is stored in a page(with possible overflow chain)

 If index entries contain rows, set of buckets forms an

Cost is number of pages in bucket (cheaper than B+ tree if no overflow

 1. Single level indexing 501

Index File 503

SK-Search Key may be R.No

Data value pointer

Root Internal Leaf

 6. Non-leaf pages contain node pointers separated by search key values.

 Create B-tree with order 4 (Degree)

 Min no. of children = m/2 = 2

 Min no. of keys = m/2 – 1 = 1

You might also like