0% found this document useful (0 votes)
4 views42 pages

DB_CH5

Chapter 5 discusses the physical database design process, transforming logical data models into technical specifications for data storage and retrieval. It covers inputs needed for design, including business requirements, data characteristics, and operational needs, as well as various file organizations and indexing types. The chapter emphasizes the importance of considering hardware and software characteristics, and outlines decisions related to attribute data types, file storage, and query optimization.

Uploaded by

ABEY BEKELE
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views42 pages

DB_CH5

Chapter 5 discusses the physical database design process, transforming logical data models into technical specifications for data storage and retrieval. It covers inputs needed for design, including business requirements, data characteristics, and operational needs, as well as various file organizations and indexing types. The chapter emphasizes the importance of considering hardware and software characteristics, and outlines decisions related to attribute data types, file storage, and query optimization.

Uploaded by

ABEY BEKELE
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 42

Chapter-5

Physical Database Design

Prepared by: Marta G. (MSc.)


Contents
 Introduction
 Physical Database Design Process in ◦ Data Storage & Operations on Files
Relational Databases  Files of unordered records (Heap Files)
◦ Inputs to Physical Database Design  Files of ordered records (Sorted Files)
Process  Hashing techniques
 Tables produced by logical database ◦ Indexing Types
design  Types of single level ordered index
 Business environment requirements  Multilevel indexes
 Dynamic multilevel indexes using B-
 Data characteristics
Trees & B+ Trees
 Application Xcs  Indexes on multiple indexes
 Operational requirements ◦ Query Optimization
 Hardware and software characteristics
◦ Attribute data types
Revision

Conceptual model vs Logical model vs Data model


 An ER model is typically drawn at up to three levels of
abstraction:
• Conceptual ERD / Conceptual Data Model
• Logical ERD / Logical Data Model
• Physical ERD / Physical Data Model
Revision cont…
Conceptual model
▪ Conceptual ERD models the business objects that should exist in a system and the
relationships between them.
▪ A conceptual model is developed to present an overall picture of the system by recognizing
the business objects involved. It defines what entities exist, NOT which tables.
Revision cont…

Logical ERD
 Logical ERD is a detailed version of a Conceptual ERD.
 A logical ER model is developed to enrich a conceptual model by defining explicitly the
columns in each entity and introducing operational and transactional entities.
Revision cont…
Physical ERD
 It represents the actual design blueprint of a relational database. It elaborates on the
logical data model by assigning each column with type, length, nullable, etc.
 Since a physical ERD represents how data should be structured and related in a specific
DBMS it is important to consider the convention and restriction of the actual database
system in which the database will be created.
 Make sure the column types are supported by the DBMS and reserved words are not
used in naming entities and columns.
Revision cont….

Summary
Introduction
▪ Physical database design is the process of transforming logical data models into
physical data models.
▪ Its Purpose is to translate the logical description of data into technical specifications
for storing and retrieving data.
▪ Information needed for physical file and database design includes:
▪ Normalized relations plus their size estimate
▪ Expectations and requirements for response time, data security, backup & recovery,
retention and integrity
▪ Descriptions of where and when data are used, entered, retrieved, deleted, updated, and
how often
▪ Descriptions of the technologies used to implement the database
▪ Definitions of attributes in the tables
Physical Database Design Process: Inputs
✓ Tables produced by logical database design
✓ Normalized tables
✓ Business environment requirements
✓ Response time: delay from the time that the enter key is pressed to execute a
query and the result appears on the screen.
✓ Throughput: measure of how many queries from simultaneous users must
be responded in a given period of time by the application and the database
that supports it.
✓ Data characteristics
✓ Data volume assessment: how much data will be in the database, roughly
how many records is each table expected to have?
✓ Data volatility: refers to how often stored data is updated, deleted?...
Cont…
✓ Application characteristics
✓ Which applications are the most important to the company?
✓ What is the nature of applications that will use the data?
✓ Which data will be accessed by each application?
✓ Application data requirements
✓ Application priorities
Cont…
 Operational Requirements
◦ Security : protecting data from theft or malicious destruction and making
sure that sensitive data is accessible only to those who are concerned.
◦ Backup & recovery:
 Backup: copy of the entire data to ensure database safety.
 Recovery: being able to recover a table or database that has been
corrupted or lost due to hardware or software failure.
 Hardware and Software Characteristics
◦ DBMS Xcs: Attribute data type options, and SQL query features, which
must be known and taken into account during physical database design.
◦ Hardware Xcs: Processor speeds and disk data transfer rates.
Things to be analyzed by Database Designers cont…

 Database queries & transactions


 The expected frequency of invocation of queries &
transactions
 The time constraints of queries & transactions
 The expected frequencies of update operations
 The uniqueness constraints on attributes
Decisions
▪ By considering the above inputs and analyzing
them, the decisions to be done on the physical
database design are;
▪ Attribute data types
▪ Physical record and data/file storages
▪ File organizations
▪ Indexes
▪ Query optimization
Choosing Attribute Data Types
 CHAR - fixed-length character
 VARCHAR() - variable-length characters
 NUMERIC - positive/negative numbers
◦ Exact
 SMALLINT → 16 bit
 INT → 32 bit
 LONG → 64 bit
 DECIMAL → 128 bit ...
◦ Approximate
 FLOAT → 32 bit with ≈ 6-9 digit precision...
 REAL → 32 bit with ≈ 15 digit precision...
 DOUBLE → 64 bit with ≈ 32 bit precision...
Physical Records and Data/File Storage
▪ Physical record: is a group of fields stored in adjacent
memory locations and retrieved together as a unit
whenever necessary.
▪ Page: the amount of data read or written in one input/output
(I/O) operation.
▪ Blocking factor: number of physical records per page.
▪ The collection of data that makes up a computerized
database must be stored physically on some computer
storage medium.
▪ The database management software (DBMS) allow users to
create, retrieve, update, delete and process this data
whenever needed.
Data/File Storage: Categories
Three main storage categories:
▪ Primary storage

▪ Secondary storage

▪ Tertiary storage
Con’t…
Primary storage
▪ It includes storage media that store a data which can be operated
directly by the computer central processing unit (CPU).
▪ Primary storage includes;
▪ The computer main memory (RAM) and
▪ Smaller but faster cache memories.
▪ Primary storage usually provides fast read and write to data. It is of
limited storage capacity.
▪ Primary storage devices are more expensive.
▪ The contents of main memory are lost in case maybe when power
failure, a system crash or other issue occurs.
Cont…
Secondary storage
▪ Includes large storage devices such as computer hard disk (HDD) and SSD.
▪ These devices usually have
▪ A larger capacity,
▪ Less cost, and
▪ Provide slower access to data than primary storage devices.
▪ Data in secondary storage cannot be processed directly by the CPU, it must
first be copied into primary storage.
▪ They are called online storage devices because they can be accessed in short
period of time whenever needed.
Tertiary storage
▪ Optical disks (CD-ROMs, DVDs, and other similar storage media) and
magnetic tapes which are removable media are used in today’s systems as
offline storage for archiving databases.
Memory Hierarchies and Storage Devices
Memory hierarchies
Cont…
 Storage capacity, access speed and cost comparison, as we move
from one level to the other
Data Storage in Databases
 Data stored in databases
◦ Persistent data
◦ Temporary data
 Databases typically store large amounts of data that must
persist over long periods of time.
◦ This is the reason that the data is often referred to as persistent
data.
 Parts of this data are accessed and processed repeatedly.
 Parts of the data may be accesses and processes very rarely.
 Transient/temporary data: persists for only a limited time
during program execution.
Cont…

▪ Typical database applications need only a small portion of the


database at a time for processing.
▪ When a specific operation on data is needed, first the data must
be located on disk→ copied to main memory to be ready for
processing → processed→ re-written to the disk if its content is
changed.
▪ The data stored on disk is organized as files of records.
▪ Each record is a collection of data values that can be interpreted as
facts about entities, their attributes & their relationships.
▪ Records should be stored on disk in a manner that makes possible
to locate records/data efficiently when they are needed.
Operations on files
Operations on files are usually grouped into:
 Retrieval operations→ do not change any data in the file,
but only locate certain records so that their field values can
be examined and processed.
 Update operations→ change the file by insertion or deletion
of records or by modification of field values.
 In either case, we may have to select one or more records
for retrieval, deletion, or modification based on a selection
condition (or filtering condition)→ which specifies a criteria
that the desired record or set of records must satisfy.
Cont…
▪ Each located record in the given file/table is checked to
determine whether it satisfies the full selection condition.
▪ Actual operations for locating and accessing file records may
vary from system to system.
▪ Typically, high-level programs, such as DBMS software
programs, access records by using certain commands, such as:
▪ Open, Reset, Find/Locate, Read/Get, FindNext, Delete, Modify,
Insert, Close, Scan....
Cont…
▪ Open→ prepares the file for reading or writing.
▪ Reset→ sets the file pointer of an open file to the beginning
of the file.
▪ Find/ Locate→ searches for the first record that satisfies a
search condition.
▪ Read/Get→ copies the current record from the buffer to a
program variable in the user program.
▪ FindNext→ searches for the next record in the file that
satisfies the search condition.
▪ Delete→ deletes the current record and (eventually) updates
the file on disk to reflect the deletion.
Cont…
▪ Modify→ modifies some field values for the current record
and (eventually) updates the file on disk to reflect the
modification.
▪ Insert→ inserts a new record in the file by locating the block
where the record is to be inserted, transferring that block into
a main memory buffer (if it is not already there), writing the
record into the buffer, and (eventually) writing the buffer to
disk to reflect the insertion.
▪ Close→ completes the file access by releasing the buffers
and performing any other needed cleanup operations.
▪ Scan → if the file has just been opened or reset, scan returns
the first record; otherwise it returns the next record.
Cont…

 At this point, it is worthwhile to note the difference between


the terms file organization and access method.
 File organization→ refers to the organization of the data or a
file into records, blocks, and access structures, which
includes the way records and blocks are placed on the
storage medium and interlinked.
 Access method→ provides a group of operations that can be
applied to get/retrieve a file.
File Organizations
▪ There are several primary file organization types, which
determine how the file records are physically placed on the disk.
▪ Heap file (unordered file) → places the records on disk in no particular
order → by appending new records at the end of the file.
▪ Sorted file (sequential file) → keeps the records ordered by the value of a
particular field (called the sorting key).
▪ Hashed file → uses a hash function applied to a particular field (called the
hash key) to determine a record’s placement on disk.
▪ Secondary organization or auxiliary access structures → allows
efficient access & storage to file records based on alternate fields
than those that have been used for the primary file organization.
Decisions to be done: Indexing (remember slide 14)
 Indexes are a powerful tool used in the background of a database to speed up
querying.
 Indexes power queries by providing a method to quickly lookup the requested data.
 Simply put, an index is a pointer to data in a table.

▪ Design decisions about indexing

▪ Whether to index an attribute: the general rules for creating an index on an


attribute are that
▪ The attribute must either be a key, or

▪ There must be some query that uses that attribute either in a selection or a join condition.

▪ What attribute or attributes to index on: an index can be constructed on a


single attribute, or on more than one attribute if it is a composite index.
Design decisions about indexing Con’t…
▪ Whether to set up a clustered index: At most, one index per table can be a
primary or clustering index.

▪ In most RDBMSs, this is specified by the keyword CLUSTER. (If the


attribute is a key, a primary index is created, whereas a clustering index is
created if the attribute is not a key.)

▪ If a table requires several indexes, the decision about which one should be
the primary or clustering index depends upon whether keeping the table
ordered on that attribute is needed.
Cont…
▪ Design decisions about indexing
▪ Whether to use a hash index over a tree index: RDBMSs can use B+
trees for indexing.
▪ Whether to use dynamic hashing for the file: for files that are very
volatile → those that grow and shrink continuously, one of the
dynamic hashing schemes would be suitable.
Index Types
▪ Index
▪ Dense
▪ Sparse
▪ Single Level
▪ Multilevel
▪ Single level ordered indexes
▪ Primary index,
▪ Secondary,and
▪ Clustering
▪ Multilevel indexes
▪ Search tree
▪ B -tree
▪ B+ -tree
Dense Vs Sparse Index
Dense :
▪ An index entry is created for every search key value (for each records) in
each block.
▪ This index contains search key value and a pointer to the actual record.
▪ Large index size
▪ Less time needed to locate arbitrary data.
Sparse :
▪ One index entry for each block.
▪ Indexes are created only for some of the data records.
▪ Small index size
▪ More time needed to locate arbitrary data
▪ Records must be clustered or arranged in blocks
▪ Faster write
Cont…
Dense Indexing
Sparse Indexing
Reading assignment (Have at least a highlight understanding about
the following)
▪ Primary index
▪ Clustering index
▪ Secondary index
▪ Multilevel Index (Search tree, B –tree, B+ -tree)
▪ Dynamic multilevel indexes using B-Trees & B+ Trees
▪ Indexes on multiple indexes
▪ Query Optimization
Thank you !!!

You might also like