0% found this document useful (0 votes)
1 views9 pages

unit ii to v dbms b.com

The document discusses database integrity and normalization, explaining concepts such as functional dependency, normalization forms (1NF, 2NF, 3NF, BCNF), and their significance in reducing data redundancy and anomalies. It also covers different types of file organizations, including sequential, heap, hash, B+ tree, and clustered file organizations, detailing their advantages and disadvantages. Overall, the document provides a comprehensive overview of how to structure and manage data effectively in databases.

Uploaded by

dhamoder
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1 views9 pages

unit ii to v dbms b.com

The document discusses database integrity and normalization, explaining concepts such as functional dependency, normalization forms (1NF, 2NF, 3NF, BCNF), and their significance in reducing data redundancy and anomalies. It also covers different types of file organizations, including sequential, heap, hash, B+ tree, and clustered file organizations, detailing their advantages and disadvantages. Overall, the document provides a comprehensive overview of how to structure and manage data effectively in databases.

Uploaded by

dhamoder
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 9

UNIT-II: DATABASE INTEGRITY AND NORMALISATION:

1. FUNCTIONAL DEPENDENCY
 Functional dependency (FD) is a set of constraints between two attributes in a relation.
Functional dependency says that if two tuples have same values for attributes A1, A2,...,
An, then those two tuples must have to have same values for attributes B1, B2, ..., Bn.
 Functional dependency is represented by an arrow sign (→) that is, X→Y, where X
functionally determines Y. The left-hand side attributes determine the values of attributes
on the right-hand side.
Fully Functional Dependency :
If X and Y are an attribute set of a relation, Y is fully functional dependent on X, if Y is
functionally dependent on X but not on any proper subset of X.
Partial Functional Dependency :
A functional dependency X->Y is a partial dependency if Y is functionally dependent on
X and Y can be determined by any proper subset of X.
2. NORMALIZATION
Normalization is a technique for dividing relation into relations and identifying anomalies
in the existing relations.
Advantages of Normalization:
 Less storage space
 Reduces data redundancy in a database
 It eliminates serious manipulation anomalies.
First Normal Form (1NF):
A relation in which the intersection of each row and column contains one and only one
value.Example:
EmpNum EmpPhone EmpDegrees
111 040-23840112
222 040-23987654 { BA, BSc, PhD }
333 040-23456789 { BSc, MSc }
Transformation into 1NF:
Employee( EmpNum, EmpPhone) EmployeeDegree(EmpNum, EmpDegrees)
EmpNum EmpPhone
EmpNum EmpDegrees
111 040-23840112 222 BA
222 040-23987654 222 BSc
333 040-23456789 222 PhD
333 BSc
333 MSc

Second Normal Form (2NF):


Partial Dependency: A partial dependency exists when an attribute B is functionally dependent
on an attribute A, and A is a component of a multipart candidate key.
2NF: A relation is in 2NF if it is in 1NF, and every non-key attribute is fully dependent on each
candidate key. (That is, we don’t have any partial functional dependency.)
A relation in 2NF will not have any partial dependencies.

Example: Consider this InvLine table (in 1NF):


InvNum LineNum ProdNum Qty InvDate
Transformation into 2NF:
We can improve the database by decomposing the above relation into relations:
InvNum LineNum ProdNum Qty

InvNum InvDate

Third Normal Form (3NF):


Transitive dependency: A condition where A, B, and C are attributes of a relation such that if
A →B and B→C, then C is transitively dependent on A via B.
3NF: A relation that is in first and second normal form, and in which no non-key attribute is
transitively dependent on the candidate key.
General definition of 3NF: A relation schema R is in 3NF if every nonprime attribute of R is:
1) fully functionally dependent on every key of R and 2) nontransitively dependent on every key of R
Example:Consider an Employee relation:
EmpNum EmpName DeptNum DeptName
. Transformation into 3NF:
EmpNum EmpName DeptNum DeptNum DeptName

Boyce-Codd Normal Form (BCNF): - Stronger than 3NF


Determinant: Refers to the attribute or group of attributes on the left hand side of the arrow of a
functional dependency.
Ex: Consider an FD, EmpNum→EmpEmail. Here, EmpNum is a determinant of EmpEmail.
BCNF: A relation is in BCNF, if and only if, every determinant is a candidate key.
Example:

Transformation into BCNF:

Note: any relation that is in BCNF, is in 3NF The First Normal Form -The Second Normal Form
- The Third Normal Form - Boyce Codd Normal Form –
3. DIFFERENT TYPES OF FILE ORGANIZATIONS
File – A file is named collection of related information that is recorded on secondary storage
such as magnetic disks, magnetic tables and optical disks.
What is File Organization?
File Organization refers to the logical relationships among various records that constitute
the file, particularly with respect to the means of identification and access to any specific record.
In simple terms, Storing the files in certain order is called file Organization. File
Structure refers to the format of the label and data blocks and of any logical control record.
Types of File Organizations
 Sequential File Organization
 Heap File Organization
 Hash File Organization
 B+ Tree File Organization
 Clustered File Organization
Sequential File Organization
The easiest method for file Organization is Sequential method. In this method the file are
stored one after another in a sequential manner.
Advantages
 Fast and efficient method for huge amount of data.
 Simple design.
 Files can be easily stored in magnetic tapes i.e cheaper storage mechanism.
Disadvantages.
 Time wastage as we cannot jump on a particular record that is required, but we have to move
in a sequential manner which takes our time.
 Sorted file method is inefficient as it takes time and space for sorting records.
Heap File Organization
Heap File Organization works with data blocks. In this method records are inserted at the
end of the file, into the data blocks. No Sorting or Ordering is required in this method. If a data
block is full, the new record is stored in some other block, Here the other data block need not be
the very next data block, but it can be any block in the memory. It is the responsibility of DBMS
to store and manage the new records.

Advantages
 Fetching and retrieving records is faster than sequential record but only in case of small
databases.
 When there is a huge number of data needs to be loaded into the database at a time, then this
method of file Organization is best suited.
Disadvantages
 Problem of unused memory blocks.
 Inefficient for larger databases.
Hash File Organization :
Hashing is an efficient technique to directly search the location of desired data on the disk
without using index structure. Data is stored at the data blocks whose address is generated by
using hash function. The memory location where these records are stored is called as data block
or data bucket.
 Data bucket – Data buckets are the memory locations where the records are stored. These
buckets are also considered as Unit Of Storage.
 Hash Function – Hash function is a mapping function that maps all the set of search keys to
actual record address. Generally, hash function uses primary key to generate the hash index –
address of the data block. Hash function can be simple mathematical function to any complex
mathematical function.
 Hash Index-The prefix of an entire hash value is taken as a hash index. Every hash index has
a depth value to signify how many bits are used for computing a hash function. These bits
can address 2n buckets. When all these bits are consumed ? then the depth value is increased
linearly and twice the buckets are allocated.
Below given diagram clearly depicts how hash function work:

B+ Tree File Organization –


B+ Tree, as the name suggests, It uses a tree like structure to store records in File. It uses the
concept of Key indexing where the primary key is used to sort the records. For each primary key,
an index value is generated and mapped with the record. An index of a record is the address of
record in the file.
B+ Tree is very much similar to binary search tree, with the only difference that instead of just
two children, it can have more than two. All the information is stored in leaf node and the
intermediate nodes acts as pointer to the leaf nodes. The information in leaf nodes always remain
a sorted sequential linked list.
advantages

 Tree traversal is easier and faster.


 Searching becomes easy as all records are stored only in leaf nodes and are sorted sequential
linked list.
 There is no restriction on B+ tree size. It may grows/shrink as the size of data
increases/decreases.
disadvantages
 Inefficient for static tables.
Cluster File Organization –
In cluster file organization, two or more related tables/records are stored withing same file
known as clusters. These files will have two or more tables in the same data block and the key
attributes which are used to map these table together are stored only once.
Thus it lowers the cost of searching and retrieving various records in different files as they are
now combined and kept in a single cluster.
For example we have two tables or relation Employee and Department. These table are related to
each other.
File Organization defines how file records are mapped onto disk blocks. We have four types
of File Organization to organize file records −

Heap File Organization


When a file is created using Heap File Organization, the Operating System allocates
memory area to that file without any further accounting details. File records can be placed
anywhere in that memory area. It is the responsibility of the software to manage the records.
Heap File does not support any ordering, sequencing, or indexing on its own.
Sequential File Organization
Every file record contains a data field (attribute) to uniquely identify that record. In
sequential file organization, records are placed in the file in some sequential order based on the
unique key field or search key. Practically, it is not possible to store all the records sequentially
in physical form.
Hash File Organization
Hash File Organization uses Hash function computation on some fields of the records.
The output of the hash function determines the location of disk block where the records are to
be placed.
Clustered File Organization Clustered file organization is not considered good for large
databases. In this mechanism, related records from one or more relations are kept in the same
disk block, that is, the ordering of records is not based on primary key or search key.

You might also like