Database Management System (DBMS) Notes
Database Management System (DBMS) Notes
Database
A database is a collection of information that is organized so that
it can be easily accessed, managed and updated.
Characteristics
Traditionally, data was organized in file formats. DBMS was a new
concept then, and all the research was done to make it overcome the
deficiencies in traditional style of data management. A modern DBMS
has the following characteristics −
Real-world entity − A modern DBMS is more realistic and uses real-world
entities to design its architecture. It uses the behavior and attributes too.
For example, a school database may use students as an entity and their age
as an attribute.
Relation-based tables − DBMS allows entities and relations among them
to form tables. A user can understand the architecture of a database just by
looking at the table names.
Isolation of data and application − A database system is entirely different
than its data. A database is an active entity, whereas data is said to be
passive, on which the database works and organizes. DBMS also stores
metadata, which is data about data, to ease its own process.
Less redundancy − DBMS follows the rules of normalization, which splits a
relation when any of its attributes is having redundancy in values.
Normalization is a mathematically rich and scientific process that reduces
data redundancy.
Consistency − Consistency is a state where every relation in a database
remains consistent. There exist methods and techniques, which can detect
attempt of leaving database in inconsistent state. A DBMS can provide
greater consistency as compared to earlier forms of data storing
applications like file-processing systems.
Query Language − DBMS is equipped with query language, which makes it
more efficient to retrieve and manipulate data. A user can apply as many
and as different filtering options as required to retrieve a set of data.
Traditionally it was not possible where file-processing system was used.
ACID Properties − DBMS follows the concepts
of Atomicity, Consistency, Isolation, and Durability (normally shortened as
ACID). These concepts are applied on transactions, which manipulate data
in a database. ACID properties help the database stay healthy in multi-
transactional environments and in case of failure.
Multiuser and Concurrent Access − DBMS supports multi-user
environment and allows them to access and manipulate data in parallel.
Though there are restrictions on transactions when users attempt to handle
the same data item, but users are always unaware of them.
Multiple views − DBMS offers multiple views for different users. A user
who is in the Sales department will have a different view of database than a
person working in the Production department. This feature enables the
users to have a concentrate view of the database according to their
requirements.
Security − Features like multiple views offer security to some extent where
users are unable to access data of other users and departments. DBMS
offers methods to impose constraints while entering data into the database
and retrieving the same at a later stage. DBMS offers many different levels
of security features, which enables multiple users to have different views
with different features. For example, a user in the Sales department cannot
see the data that belongs to the Purchase department. Additionally, it can
also be managed how much data of the Sales department should be
displayed to the user. Since a DBMS is not saved on the disk as traditional
file systems, it is very hard for miscreants to break the code.
Users
A typical DBMS has users with different rights and permissions who use
it for different purposes. Some users retrieve data and some back it up.
The users of a DBMS can be broadly categorized as follows −
Advantages of DBMS
The database management system has a number of advantages as compared to
traditional computer file-based processing approach. The DBA must keep in
mind these benefits or capabilities during databases and monitoring the
DBMS.The Main advantages of DBMS are described below.
Sharing of Data
In DBMS, data can be shared by authorized users of the organization. The
database administrator manages the data and gives rights to users to access the
data. Many users can be authorized to access the same piece of
information simultaneously. The remote users can also share same data.
Similarly, the data of same database can be shared between different
application programs.
Data Consistency
By controlling the data redundancy, the data consistency is obtained. If a data
item appears only once, any update to its value has to be performed only once
and the updated value is immediately available to all users. If the DBMS has
controlled redundancy, the database system enforces consistency.
Integration of Data
In Database management system, data in database is stored in tables. A single
database contains multiple tables and relationships can be created between
tables (or associated data entities). This makes easy to retrieve and update
data.
Integration Constraints
Integrity constraints or consistency rules can be applied to database so that the
correct data can be entered into database. The constraints may be applied to
data item within a single record or the may be applied to relationships between
records.
Data Security
Form is very important object of DBMS. You can create forms very easily and
quickly in DBMS. Once a form is created, it can be used many times and it can
be modified very easily. The created forms are also saved along with database
and behave like a software component. A form provides very easy way (user-
friendly) to enter data into database, edit data and display data from database.
The non-technical users can also perform various operations on database
through forms without going into technical details of a fatabase.
Report Writers
Most of the DBMSs provide the report writer tools used to create reports. The
users can create very easily and quickly. Once a report is created, it can be
used may times and it can be modified very easily. The created reports are also
saved along with database and behave like a software component.
Control Over Concurrency
In a computer file-based system, if two users are allowed to access data
simultaneously, it is possible that they will interfere with each other. For
example, if both users attempt to perform update operation on the same record,
then one may overwrite the values recorded by the other. Most database
management systems have sub-systems to control the concurrency so that
transactions are always recorded with accuracy.
Backup and Recovery Procedures
In a computer file-based system, the user creates the backup of data regularly
to protect the valuable data from damage due to failures to the computer
system or application program. It is very time consuming method, if amount of
data is large. Most of the DBMSs provide the 'backup and recovery' sub-systems
that automatically create the backup of data and restore data if required.
Data Independence
The separation of data structure of database from the application program that
uses the data is called data independence. In DBMS, you can easily change the
structure of database without modifying the application program.
Disadvantages of DBMS
The disadvantages of the database approach are summarized as follows:
1. Cost
2. Complexity
Any organization have many employees working for it and they can
perform many others tasks too that are not in their domain but it is not
easy for them to work on DBMS. A team of technical staff is required who
understand DBMS and company have to pay handsome salary to them
too.
4. Database Failure
As we know that in DBMS, all the files are stored in single database so
chances of database failure become more. Any accidental failure of
component may cause loss of valuable data. This is really a big question
mark for big firms.
A DBMS requires disk storage for the data and sometimes you need to
purchase extra space to store your data. Also sometimes you need to a
dedicated machine for better performance of database. These machines
and storage space increase extra costs of hardware.
6. Size
Data conversion may require at any time and organization has to take
this step. It is unbelievable that data conversion cost is more than the
costs of DBMS hardware and machine combined. Trained staff is needed
to convert data to new system. It is a key reason that most of the
organizations are still working on their old DBMS due to high cost of
data conversion.
8. Currency Maintenance
9. Performance
Traditional files system was very good for small organizations as they
give splendid performance. But DBMS gives poor performance for small
scale firms as its speed is slow.
File Organization
Relative data and information is stored collectively in file formats. A file
is a sequence of records stored in binary format. A disk drive is
formatted into several blocks that can store records. File records are
mapped onto those disk blocks.
File Organization defines how file records are mapped onto disk blocks.
We have four types of File Organization to organize file records −
Heap File Organization
When a file is created using Heap File Organization, the Operating
System allocates memory area to that file without any further
accounting details. File records can be placed anywhere in that memory
area. It is the responsibility of the software to manage the records. Heap
File does not support any ordering, sequencing, or indexing on its own.
Conventionally, the data were stored and processed using traditional file processing
systems. In these traditional file systems, each file is independent of other file, and data
in different files can be integrated only by writing individual program for each
application. The data and the application programs that uses the data are so arranged
that any change to the data requires modifying all the programs that uses the data. This
is because each file is hard-coded with specific information like data type, data size etc.
Some time it is even not possible to identify all the programs using that data and is
identified on a trial-and-error basis.
A file processing system of an organization is shown in figure below. All functional areas
in the organization creates, processes and disseminates its own files. The files such as
inventory and payroll generate separate files and do not communicate with each other.
No doubt such an organization was simple to operate and had better local control but
the data of the organization is dispersed throughout the functional sub-systems. These
days, databases are preferred because of many disadvantages of traditional file systems.
1) Data Redundancy: Since each application has its own data file, the same data may
have to be recorded and stored in many files. For example, personal file and payroll file,
both contain data on employee name, designation etc. The result is unnecessary
duplicate or redundant data items. This redundancy requires additional or higher
storage space, costs extra time and money, and requires additional efforts to keep all
files upto-date.
3) Lack of Data Integration: Since independent data file exists, users face difficulty in
getting information on any ad hoc query that requires accessing the data stored in many
files. In such a case complicated programs have to be developed to retrieve data from
every file or the users have to manually collect the required information.
4) Program Dependence: The reports produced by the file processing system are
program dependent, which means if any change in the format or structure of data and
records in the file is to be made, the programs have to modified correspondingly. Also, a
new program will have to be developed to produce a new report.
6) Limited Data Sharing: There is limited data sharing possibilities with the
traditional file system. Each application has its own private files and users have little
choice to share the data outside their own applications. Complex programs required to
be written to obtain data from several incompatible files.
7) Poor Data Control: There was no centralised control at the data element level,
hence a traditional file system is decentralised in nature. It could be possible that the
data field may have multiple names defined by the different departments of an
organization and depending on the file it was in. This situation leads to different
meaning of a data field in different context or same meaning for different fields. This
causes poor data control.
8) Problem of Security: It is very difficult to enforce security checks and access rights
in a traditional file system, since application programs are added in an adhoc manner.
Data: The whole data in the system is stored in a single database. This data in the
database are both shared and integrated. Sharing of data means individual pieces of
data in the database is shared among different users and every user can access the
same piece of data but may be for different purposes. Integration of data means the
database can be function of several distinct files with redundancy controlled among the
files.
Hardware: The hardware consists of the secondary storage devices like disks, drums
and so on, where the database resides together with other devices. There is two types of
hardware. The first one, i.e., processor and main memory that supports in running the
DBMS. The second one is the secondary storage devices, i.e., hard disk, magnetic disk
etc., that are used to hold the stored data.
Software: A layer or interface of software exists between the physical database and the
users. This layer is called the DBMS. All requests from the users to access the database
are handled by the DBMS. Thus, the DBMS shields the database users from hardware
details. Furthermore, the DBMS provides the other facilities like accessing and
updating the data in the files and adding and deleting files itself.
Users: The users are the people interacting with the database system in any way. There
are four types of users interacting with the database systems. These are Application
Programmers, online users, end users or naive users and finally the Database
Administrator (DBA).
The Database Systems provide the following advantages over the traditional file system.
2) Data consistency: The problem of updating multiple files in traditional file system
leads to inaccurate data as different files may contain different information of the same
data item at a given point of time. This causes incorrect or contradictory information to
its users. In database systems, this problem of inconsistent data is automatically solved
by controlling the redundancy.
3) Program data independence: The traditional file systems are generally data
dependent, which implies that the data organization and access strategies are dictated
by the needs of the specific application and the application programs are developed
accordingly. However, the database systems provide an independence between the file
system and application program, that allows for changes at one level of the data without
affecting others. This property of database systems allow to change data without
changing the application programs that process the data.
4) Sharing of data: In database systems, the data is centrally controlled and can be
shared by all authorized users. The sharing of data means not only the existing
applications programs can also share the data in the database but new application
programs can be developed to operate on the existing data. Furthermore, the
requirements of the new application programs may be satisfied without creating any
new file.
6) Improved data integrity: Data integrity means that the data contained in the
database is both accurate and consistent. The centralized control property allow
adequate checks can be incorporated to provide data integrity. One integrity check that
should be incorporated in the database is to ensure that if there is a reference to certain
object, that object must exist.
7) Improved security: Database security means protecting the data contained in the
database from unauthorised users. The DBA ensures that proper access procedures are
followed, including proper authentical schemes for access to the DBMS and additional
checks before permitting access to sensitive data. The level of security could be
different for various types of data and operations.
10) Improved backup and recovery facility: Through its backup and recovery
subsystem, the database system provides the facilities for recovering from hardware or
software failures. The recovery subsystem of the database system ensures that the
database is restored to the state it was in before the program started executing, in case
of system crash.
12) Data quality is high: The quality of data in database systems are very high as
compared to traditional file systems. This is possible due to the presence of tools and
processes in the database system.
13) Good data accessibility and responsiveness: The database systems provide
query languages or report writers that allow the users to ask ad hoc queries to obtain
the needed information immediately, without the requirement to write application
programs (as in case of file system), that access the information from the database. This
is possible due to integration in database systems.
14) Concurrency control: The database systems are designed to manage simultaneous
(concurrent) access of the database by many users. They also prevents any loss of
information or loss of integrity due to these concurrent accesses.
1) Complexity increases: The data structure may become more complex because of
the centralised database supporting many applications in an organization. This may lead
to difficulties in its management and may require professionals for management.
2) Requirement of more disk space: The wide functionality and more complexity
increase the size of DBMS. Thus, it requires much more space to store and run than the
traditional file system.
4) Cost of conversion: The cost of conversion from old file-system to new database
system is very high. In some cases the cost of conversion is so high that the cost of
DBMS and extra hardware becomes insignificant. It also includes the cost of training
manpower and hiring the specialized manpower to convert and run the system.
6) Need for backup and recovery: For a database system to be accurate and available
all times, a procedure is required to be developed and used for providing backup copies
to all its users when damage occurs.
8) More installational and management cost: The big and complete database
systems are more costly. They require trained manpower to operate the system and has
additional annual maintenance and support costs.
Database Schema
A database schema is the skeleton structure that represents the logical view of
the entire database. It defines how the data is organized and how the relations
among them are associated. It formulates all the constraints that are to be
applied on the data.
A database schema defines its entities and the relationship among them. It
contains a descriptive detail of the database, which can be depicted by means of
schema diagrams. It’s the database designers who design the schema to help
programmers understand the database and make it useful.
SCHEMAS
Physical Database Schema: This schema pertains to the actual storage
of data and its form of storage like files, indices, etc. It defines how the
data will be stored in a secondary storage.
Logical Database Schema: This schema defines all the logical
constraints that need to be applied on the data stored. It defines tables,
views, and integrity constraints.
Database Instance
It is important that we distinguish these two terms individually. Database
schema is the skeleton of database. It is designed when the database doesn't
exist at all. Once the database is operational, it is very difficult to make any
changes to it. A database schema does not contain any data or information. A
database instance is a state of operational database with data at any given time.
It contains a snapshot of the database. Database instances tend to change with
time. A DBMS ensures that its every instance (state) is in a valid state, by
diligently following all the validations, constraints, and conditions that the
database designers have imposed.
The data in the database at a particular moment in time is called a database
state or snapshot. It is also called the current set of occurrences or
instances in the database
Three Level Architecture of DBMS
Following are the three levels of database architecture,
1. Physical Level
2. Conceptual Level
3. External Level
A database system normally contains a lot of data in addition to users’ data. For
example, it stores data about data, known as metadata, to locate and retrieve
data easily. It is rather difficult to modify or update a set of metadata once it is
stored in the database. But as a DBMS expands, it needs to change over time to
satisfy the requirements of the users. If the entire data is dependent, it would
become a tedious and highly complex job. Metadata itself follows a layered
architecture, so that when we change data at one layer, it does not affect the
data at another level. This data is independent but mapped to each other.
The concept of data independence can be defined as the capacity to
change the schema at one level of a database system without having to
change the schema at the next higher level. .
We can define two types of data independence:
Database Administrator
One of the main reasons for using DBMSs is tohave central control of both
thedata
and the programs that access those data. A person who has such central
control
over the system is called a database administrator (DBA). The functions of
a DBA
include:
• Schema definition. The DBA creates the original database schema by
executing
a set of data definition statements in the DDL.
• Storage structure and access-method definition.
• Schema and physical-organization modification. TheDBAcarries out
changes
to the schema and physical organization to reflect the changing needs of the
organization, or to alter the physical organization to improve performance.
• Granting of authorization for data access. By granting different types
of
authorization, the database administrator can regulate which parts of the
database various users can access. The authorization information is kept in a
special system structure that the database system consults whenever
someone
attempts to access the data in the system.
• Routine maintenance. Examples of the database administrator’s routine
maintenance activities are:
◦ Periodically backing up the database, either onto tapes or onto remote
servers, to prevent loss of data in case of disasters such as flooding.
◦ Ensuring that enough free disk space is available for normal operations,
and upgrading disk space as required.
◦ Monitoring jobs running on the database and ensuring that performance
is not degraded by very expensive tasks submitted by some users.
Entity
The basic object that the ER model represents is
an entity, which is a thing in the real world with an independent existence.An
entity
may be an object with a physical existence (for example, a particular person, car,
house, or employee) or it may be an object with a conceptual existence (for
instance,
a company, a job, or a university course).
Attributes
Entities are represented by means of their properties called
attributes. All attributes have values. For example, a student entity
may have name, class, and age as attributes. There exists a domain
or range of values that can be assigned to attributes. For example,
a student's name cannot be a numeric value. It has to be
alphabetic. A student's age cannot be negative, etc.
Types of Attributes
1. Simple attribute: Simple attributes are atomic values, which
cannot be divided further. For example, a student's phone
number is an atomic value of 10 digits.
2. Composite attribute: Composite attributes are made of more
than one simple attribute. For example, a student's complete
name may have first_name and last_name.
3. Derived attribute: Derived attributes are the attributes that
do not exist in the physical database, but their values are
derived from other attributes present in the database. For
example, average_salary in a department should not be saved
directly in the database, instead it can be derived. For another
example, age can be derived from date_of_birth.
4. Single-value attribute: Single-value attributes contain
single value. For example: Social_Security_Number.
5. Multi-value attribute: Multi-value attributes may contain
more than one values. For example, a person can have more
than one phone number, email_address, etc.
These attribute types can come together in a way like:
simple single-valued attributes
simple multi-valued attributes
composite single-valued attributes
composite multi-valued attributes
Relationship
The association among entities is called a relationship. For example, an
employee works_at a department, a student enrolls in a course. Here,
Works_at and Enrolls are called relationships.
Relationship Set
A set of relationships of similar type is called a relationship set. Like entities, a
relationship too can have attributes. These attributes are called descriptive
attributes.
A relationship type R among n entity types E1, E2, ..., En defines a set of
associations—
or a relationship set—among entities from these entity types. As for the
case of entity types and entity sets, a relationship type and its corresponding
relationship
set are customarily referred to by the same name, R.
Degree of Relationship
The number of participating entities in a relationship defines the degree of the
Relationship. Hence, the WORKS_FOR relationship is of degree two.
A relationship of degree two is called binary, and one of degree three is called
ternary. An example of a ternary relationship is SUPPLY.
Role Names and Recursive Relationships. Each entity type that participates
in a relationship type plays a particular role in the relationship. The role name
signifies
the role that a participating entity from the entity type plays in each relationship
instance, and helps to explain what the relationship means. For example, in the
WORKS_FOR relationship type, EMPLOYEE plays the role of employee or worker and
DEPARTMENT plays the role of department or employer.
Role names are not technically necessary in relationship types where all the
participating
entity types are distinct, since each participating entity type name can be
used as the role name. However, in some cases the same entity type participates
more than once in a relationship type in different roles. In such cases the role name
becomes essential for distinguishing the meaning of the role that each participating
entity plays. Such relationship types are called recursive relationships
Mapping Cardinalities
Cardinality defines the number of entities in one entity set, which can be
associated with the number of entities of other set via relationship set.
One-to-one: One entity from entity set A can be associated with at most
one entity of entity set B and vice versa.
One-to-many: One entity from entity set A can be associated with more
than one entities of entity set B, however an entity from entity set B can
be associated with at most one entity.
Many-to-one: More than one entities from entity set A can be associated
with at most one entity of entity set B, however an entity from entity set
B can be associated with more than one entity from entity set A.
Many-to-many: One entity from A can be associated with more than one
entity from B and vice versa.
ER DIAGRAM
REPRESENTATION
An E-R diagram consists of the following major components:
• Rectangles divided into two parts represent entity sets. The first part, which
in this textbook is shaded blue, contains the name of the entity set. The second
part contains the names of all the attributes of the entity set.
• Diamonds represent relationship sets.
• Undivided rectangles represent the attributes of a relationship set.Attributes
that are part of the primary key are underlined.
• Lines link entity sets to relationship sets.
• Dashed lines link attributes of a relationship set to the relationship set.
• Double lines indicate total participation of an entity in a relationship set.
• Double diamonds represent identifying relationship sets linked to weak
entity sets.
Relationship
Relationships are represented by diamond-shaped box. Name of the relationship
is written inside the diamond-box. All the entities (rectangles) participating in a
relationship are connected to it by a line.
Many-to-many: The following image reflects that more than one instance
of an entity on the left and more than one instance of an entity on the
right can be associated with the relationship. It depicts many-to-many
relationship.
Participation Constraints
Total Participation: Each entity is involved in the relationship. Total
participation is represented by double lines.
Partial participation: Not all entities are involved in the relationship.
Partial participation is represented by single lines.
DATA MODELS
Data models define how the logical structure of a database is modeled. Data
Models are fundamental entities to introduce abstraction in a DBMS. Data
models define how data is connected to each other and how they are processed
and stored inside the system.
The very first data model could be flat data-models, where all the data used are
to be kept in the same plane. Earlier data models were not so scientific, hence
they were prone to introduce lots of duplication and update anomalies.
Historically, in database design, three models are commonly used. They are,
Hierarchical Model
Network Model
Relational Model
Relational model
The most common model, the relational model sorts data
into tables, also known as relations, each of which consists
of columns and rows. Each column lists an attribute of the
entity in question, such as price, zip code, or birth date.
Together, the attributes in a relation are called a domain. A
particular attribute or combination of attributes is chosen as
a primary key that can be referred to in other tables, when
it’s called a foreign key.
Hierarchical model
The hierarchical model organizes data into a tree-like
structure, where each record has a single parent or root.
Sibling records are sorted in a particular order. That order
is used as the physical order for storing the database. This
model is good for describing many real-world relationships.
This model was primarily used by IBM’s Information
Management Systems in the 60s and 70s, but they are rarely
seen today due to certain operational inefficiencies.
Network model
The network model builds on the hierarchical model by
allowing many-to-many relationships between linked
records, implying multiple parent records. Based on
mathematical set theory, the model is constructed with sets
of related records. Each set consists of one owner or parent
record and one or more member or child records. A record
can be a member or child in multiple sets, allowing this
model to convey complex relationships.
Characteristi
Hierarchical model Network model Relational model
c
One to One,
Allowed the network
Data One to many or one to One to many, Many
model to support many
structure one relationships to many
to many relationships
relationships
Based on relational
A record can have many data structures
Data Based on parent child
parents as well as many
structure relationship
children.
The relational model is very simple and elegant; a database is a collection of one
or more relations, where each relation is a table with rows and columns. This
simple tabular representation enables even novice users to understand the
contents of a database and it permits the use of simple, high-level languages to
query the data. The major advantages of the relational model over the older
data models are its simple data representation and the ease with which even
complex queries can be expressed.
Attributes
Each attribute Ai is the name of a role played by some domain D in the relation
schema R.
Domain
Tuples / Records
A single row of a table, which contains a single record for that relation is
called a tuple.
In relational model terminology all the rows are called tuples or records
in the relation. Consider a table STUDENT. In this table there are six
rows, it means there six tuples or records in this table
Relation schema
“The relation schema describes the column headers for the table
or relation”. A relation schema R denoted by R (A1, A2, A3…An), is
made up of a relation name R and a list of attributes A1, A2, A3… An.
Each attribute Aj, is the name of role played by some domain D in the
relation schema R. D is called domain of Aj and is denoted by dom (Aj).
A relation schema is used to describe a relation R, and R is called the
name of this relation.
Relation
“A relation is defined as a set of tuples”.
Characteristics of relations
Keys
Any attribute in the table which uniquely identifies each record in the
table is called key. It can be a single attribute or a combination of
attributes. For example, in STUDENT table, STUDENT_ID is a key,
since it is unique for each student. In PERSON table, his passport
number, driving license number, phone number, SSN, email address is
keys since they are unique for each person.
Keys are very important part of Relational database. They are used to
establish and identify relation between tables. They also ensure that each
record within a table can be uniquely identified by combination of one or
more fields within a table.
Primary Key
Candidate Key
Candidate keys are defined as the set of fields from which
primary key can be selected. It is an attribute or set of attribute
that can act as a primary key for a table to uniquely identify each
record in that table.
as candidate key.
Foreign key
In a company there would be different departments - Accounting, Human
Resource (HR), development, Quality, etc. An employee, who works for that
company, works in specific department. But we know that employee and
department are two different entities. So we cannot store his department
information in employee table. Instead what we do is we link these two tables
by means of primary key of one of the table i.e.; in this case, we pick the
primary key of department table - DEPARTMENT_ID and add it as a new
attribute/column in the Employee table. Now DEPARTMENT_ID is a foreign key
for Employee table, and both the tables are related!
Note: - Names of the attribute in both the tables can be different. It's all
when we really create the table via script matters !
Super Key
Super Key is defined as a set of attributes within a table that uniquely
identifies each record within a table. Super Key is a superset of Candidate
key.
A superkey is a combination of columns that uniquely identifies any row
within a relational database management system (RDBMS) table. A
candidate key is a closely related concept in which the superkey is reduced
to the minimum number of columns required to uniquely identify each row.
Super keys:
{Emp_SSN}
{Emp_Number}
{Emp_SSN, Emp_Number}
{Emp_SSN, Emp_Name}
{Emp_SSN, Emp_Number, Emp_Name}
{Emp_Number, Emp_Name}
Candidate Keys:
{Emp_SSN}
{Emp_Number}
Compound key
A key in a table is formed by combining more than one attributes/columns of the
same table. These columns of the table can or cannot be keys in the table. The
compound key acts as a primary key only when all the columns in the compound
keys are together, individually those columns are not keys. In other words,
unique record from the table is fetched only if we combine more than one
column. If we use them individually, we will not get any unique record.
In the table above, STUDENT_ID, 100 alone gives us multiple courses. To know
about particular course we need both STUDENT_ID and COURSE_ID. In this
case, both the IDs are primary keys from their table, but in STUDENT_COURSE
table, they form primary key when they are combined together. Hence they are
compound key.
Composite key
Composite key is similar to compound key, but the columns which are part of
composite keys are always keys in that table.
Key that consist of two or more attributes that uniquely identify an entity occurance
is called Composite key. But any attribute that makes up the Composite key is
not a simple key in its own.
Unique key
Unique key is just like a primary key with a little difference that primary key
enforces the NOT NULL constraint but unique key do not enforce NOT NULL
constraint in the relation. It means a unique key is a key which allows a relation
to accept only unique values and null value. A unique value can accept only one
null value in a relation.
Non-key Attribute
Non-key attributes are attributes other than candidate key attributes in a table.
Non-prime Attribute
Non-prime Attributes are attributes other than Primary attribute.
Relational database
A relational database is a collection of data items organized as a set of
formally-described tables from which data can be accessed or
reassembled in many different ways without having to reorganize the
database tables. The relational database was invented by E. F. Codd at
IBM in 1970.
The leading RDBMS products are Oracle, IBM's DB2 and Microsoft's SQL
Server. Despite repeated challenges by competing technologies, as well
as the claim by some experts that no current RDBMS has fully
implemented relational principles, the majority of new corporate
databases are still being created and managed with an RDBMS.
RDBMS store the data into collection of tables, which might be related by
common fields (database table columns). RDBMS also provide relational
operators to manipulate the data stored into the database tables.
Constraints:-
Domain Constraints –
Domain Constraints specifies that what set of values an attribute can
take. Value of each attribute X must be an atomic value from the domain
of X.
The data type associated with domains include integer, character, string,
date, time, currency etc. An attribute value must be available in the
corresponding domain. Consider the example below –
Tuple Uniqueness Constraints –
A relation is defined as a set of tuples. All tuples or all rows in a relation
must be unique or distinct. Suppose if in a relation, tuple uniqueness
constraint is applied, then all the rows of that table must be unique i.e. it
does not contain the duplicate values. For example,
Key Constraints –
A relation is defined as a set of tuples. By definition all the elements of a set are
distinct; hence, all the tuples in a relation must also be distinct. This means that
no two tuples can have the same combination of values for all their attributes. A
key constraint is a statement that a certain subset of the fields of a relation is a
unique identifier for a tuple.
There are three types of key constraints that are most common.
A FOREIGN KEY constraint prevents any actions that would destroy link
between tables with the corresponding data values. A foreign key in one
table points to a primary key in another table. Foreign keys prevent
actions that would leave rows with foreign key values when there are no
primary keys with that value. The foreign key constraints are used to
enforce referential integrity.
Integrity Constraints
The entity integrity constraint states that primary keys can't be null. There must
be a proper value in the primary key field.
This is because the primary key value is used to identify individual rows in a
table. If there were null values for primary keys, it would mean that we could
not indentify those rows.
On the other hand, there can be null values other than primary key fields. Null
value means that one doesn't know the value for that field. Null value is
different from zero value or space.
In the Car Rental database in the Car table each car must have a proper and
unique Reg_No. There might be a car whose rate is unknown - maybe the car is
broken or it is brand new - i.e. the Rate field has a null value. See the picture
below.
The entity integrity constraints assure that a spesific row in a table can be
identified.
Examples
Rule 1. You can't delete any of the rows in the CarType table that are visible in
the picture since all the car types are in use in the Car table.
Rule 2. You can't change any of the model_ids in the CarType table since all the
car types are in use in the Car table.
Rule 3. The values that you can enter in the model_id field in the Car table must
be in the model_id field in the CarType table.
Rule 4. The model_id field in the Car table can have a null value which means
that the car type of that car in not known
Relational Algebra
Relational algebra is a procedural query language, which takes instances of
relations as input and yields instances of relations as output. It uses operators
to perform queries. An operator can be either unary or binary. They accept
relations as their input and yield relations as their output. Relational algebra is
performed recursively on a relation and intermediate results are also
considered relations.
The relational algebra is a theoretical language with operations that work on
one or more relations to define another relation without changing the original
relation(s).
While using the relational algebra, user has to specify what is required and
what are the procedures or steps to obtain the required output. Both the
relational algebra and the relational calculus are formal, non-user-friendly
languages. They have been used as the basis for other, higher-level Data
Manipulation Languages (DMLs) for relational databases. They illustrate the
basic operations required of any DML and serve as the standard of comparison
for other relational languages.
Clauses can be connected by the standard Boolean operators and, or, and
not to form a general selection condition. For example, to select the
tuples for all employees who either work in department 4 and make over
$25,000 per year, or work in department 5 and make over $30,000, we
can specify the following SELECT operation:
σ(Dno=4 AND Salary>25000) OR (Dno=5 AND Salary>30000)(EMPLOYEE)
The results of relational algebra are also relations but without any name.
The rename operation allows us to rename the output relation. ‘rename’
operation is denoted with small Greek letter rho ρ.
Notation: ρ x (E)
Where the result of expression E is saved with name of x.
r ∪ s = { t | t ∈ r or t ∈ s}
It performs binary union between two given relations and is defined as:
Notation: r U s.
Where r and s are either database relations or relation result set
(temporary relation).
Notation: r ∩ s.
all tuples that are in both R and S.
The result of set difference operation is tuples, which are present in one
relation but are not in the second relation.
Notation: r − s
Finds all the tuples that are present in r but not in s.
R ∪ (S ∪ T) = (R ∪ S) ∪ T and (R ∩ S ) ∩ T = R ∩ (S ∩ T )
operations; that is,
R ∩ S = ((R ∪ S ) − (R − S )) − (S − R)
difference as follows:
Notation: r × s
r × s = { q t | q ∈ r and t ∈ s}
Where r and s are relations and their output will be defined as:
= X ∪ Y.
appear in R1 in combination with every tuple from R2(Y), where Z
R1(Z) ÷ R2(Y)
R ⋈<join condition> S
and S(B1, B2, ..., Bm) is
The result of the JOIN is a relation Q with n + m attributes Q(A1, A2, ...,
An, B1, B2, ... , Bm) in that order; Q has one tuple for each combination
of tuples—one from R and one from S—whenever the combination
satisfies the join condition. This is the main difference between
CARTESIAN PRODUCT and JOIN. In JOIN, only combinations of tuples
satisfying the join condition appear in the result, whereas in the
CARTESIAN PRODUCT all combinations of tuples are included in the
result. The join condition is specified on attributes from the two relations
R and S and is evaluated for each combination of tuples. Each tuple
combination for which the join condition evaluates to TRUE is included in
the resulting relation Q as a single combined tuple.
A general join condition is of the form
<condition> AND <condition> AND...AND <condition>
where each <condition> is of the form Ai θ Bj, Ai is an attribute of R, Bj
is an attribute of S, Ai and Bj have the same domain, and θ (theta) is one
of the comparison operators {=, <, ≤, >, ≥, ≠}.
Types of Joins
R1 ⋈θ R2
Notation:
R1 and R2 are relations having attributes (A1, A2, .., An) and (B1,
B2,.. ,Bn) such that the attributes don’t have anything in common, that is,
R1 ∩ R2 = Φ. Theta join can use all kinds of comparison operators.
Outer Joins
Theta Join, Equijoin, and Natural Join are called inner joins. An inner join
includes only those tuples with matching attributes and the rest are
discarded in the resulting relation. Therefore, we need to use outer joins
to include all the tuples from the participating relations in the resulting
relation. There are three kinds of outer joins: left outer join, right outer
join, and full outer join.
A join that includes rows even if they do not have related rows in
the joined table is called as Outer Join.
In other words, OUTER JOIN is based on the fact that : ONLY the
matching entries in ONE OF the tables (RIGHT or LEFT) or BOTH of the
tables(FULL) SHOULD be listed.
SELF JOIN
A self join is a join in which a table is joined with itself (which is also
called Unary relationships), especially when the table has a FOREIGN
KEY which references its own PRIMARY KEY. To join a table itself means
that each row of the table is combined with itself and with every other
row of the table.
Relational Calculus
Relational calculus is an query language which is non procedural, and
instead of algebra it uses mathematical predicate calculus. The relational
calculus is not the same like that of differential and integral calculus in
mathematics, but takes its name from a branch of symbolic logic termed
as predicate calculus. When applied to databases, it is found in two
forms. These are
Tuple relational calculus which was originally proposed by Codd in the
year 1972 and
Domain relational calculus which was proposed by Lacroix and Pirotte in
the year 1977.
In first order logic or predicate calculus, a predicate is a truth valued
function with arguments. When we replace with values for the
arguments, the function yields an expression, called a proposition, which
will be either true or false.
{t | EMPLOYEE (t) AND t.DEPT_ID = 10} – this select all the tuples of
employee name who work for Department 10.
For example, select EMP_ID and EMP_NAME of employees who work for
department 10
Other concepts of TRC like free variable, bound variable, WFF etc
remains same in DRC too. Its only difference is DRC is based on
attributes of relation.