0% found this document useful (0 votes)
37 views62 pages

Database Note

notes

Uploaded by

serinajena2206
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views62 pages

Database Note

notes

Uploaded by

serinajena2206
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 62

Module 1

DATABASE USES IN REAL LIFE:


ENTERPRISE:
An enterprise DBMS is such in which 100 to 10, 000 individuals can access simultaneously.
Businesses and big companies use this to handle their vast data set. Such database allows
businesses to increase their productivity. This kind of database can handle large organizations
with thousands of employees and busy web server with lakhs of people accessing it
simultaneously online.
Typically DBMS is managed by Database administrator, or DBA, who is specialist in particular
software product. DBA instructs-system to load, retrieve, or change data in database, as well as
tells who can access data and what commands each one can use.
MANUFACTURING:
Manufacturing organizations make various kinds of items and deal them consistently. To keep the
data about their items like bills, acquisition of the item, amount, inventory network the executives,
information base administration framework (DBMS) is utilized.
1. Railway Reservation System –
In the rail route reservation framework, the information base is needed to store the record or
information of ticket appointments, status about train’s appearance, and flight. Additionally,
if trains get late, individuals become acquainted with it through the information base update.
2. Library Management System –
There are lots of books in the library so; it is difficult to store the record of the relative
multitude of books in a register or duplicate. Along these lines, the data set administration
framework (DBMS) is utilized to keep up all the data identified with the name of the book,
issue date, accessibility of the book, and its writer.
3. Banking –
Database the executive’s framework is utilized to store the exchange data of the client in the
information base.
4. Education Sector –
Presently, assessments are led online by numerous schools and colleges. They deal with all
assessment information through the data set administration framework (DBMS). In spite of
that understudy’s enlistments subtleties, grades, courses, expense, participation, results, and
so forth all the data is put away in the information base.
5. Credit card exchanges –
The database Management framework is utilized for buying on charge cards and age of
month to month proclamations.
6. Social Media Sites –
We all utilization of online media sites to associate with companions and to impart our
perspectives to the world. Every day, many people group pursue these online media accounts
like Pinterest, Facebook, Twitter, and Google in addition to. By the utilization of the data set
administration framework, all the data of clients are put away in the information base and, we
become ready to interface with others.
7. Broadcast communications –
Without DBMS any media transmission organization can’t think. The Database the
executive’s framework is fundamental for these organizations to store the call subtleties and
month to month postpaid bills in the information base.
8. Account –
The information base administration framework is utilized for putting away data about deals,
holding and acquisition of monetary instruments, for example, stocks and bonds in a data
set.
9. Online Shopping –
These days, web-based shopping has become a major pattern. Nobody needs to visit the shop
and burn through their time. Everybody needs to shop through web based shopping sites, (for
example, Amazon, Flipkart, Snapdeal) from home. So all the items are sold and added
uniquely with the assistance of the information base administration framework (DBMS).
Receipt charges, installments, buy data these are finished with the assistance of DBMS.
10. Human Resource Management –
Big firms or organizations have numerous specialists or representatives working under them.
They store data about worker’s compensation, assessment, and work with the assistance of an
information base administration framework (DBMS).
11. Airline Reservation System –
This framework is equivalent to the railroad reservation framework. This framework
additionally utilizes an information base administration framework to store the records of
flight takeoff, appearance, and defer status.
12. Healthcare-
i. DBMS is used in healthcare to manage patient data, medical records, and
billing
ii. information.
DATA BASE PURPOSES:
1)Data redundancy and inconsistency:
The same information may be written in several files. This redundancy leads to higher
storage and access cost. It may lead to inconsistency.
2) Difficulty in accessing data :
The conventional file processing system do not allow data to retrieved in a convenient
and efficient manner according to user choice.
3) Data isolation :
Because data are scattered in various file and files may be in different formats with
new application programs to retrieve the appropriate data is difficult.
4) Integrity Problems:
Developers enforce data validation in the system by adding appropriate code in
the various application program. How ever when new constraints are added, it is
difficult to change the programs to enforce them.
5) Atomicity:
It is difficult to ensure atomicity in a file processing system when transaction
failure occurs due to power failure, networking problems etc.
6) Concurrent access:
In the file processing system it is not possible to access a same file for transaction at
same time
7) Security problems:
There is no security provided in file processing system to secure the data from
unauthorized user access.
DATA MODELS :
A Data Model in Database Management System (DBMS) is the concept of tools that are
developed to summarize the description of the database. Data Models provide us with a
transparent picture of data which helps us in creating an actual database. It shows us from the
design of the data to its proper implementation of data.
Types of Relational Models:
1. Conceptual Data Model
2. Representational Data Model
3. Physical Data Model
1. Conceptual Data Model :
The conceptual data model describes the database at a very high level and is useful to understand
the needs or requirements of the database. It is this model, that is used in the requirement-gathering
process i.e. before the Database Designers start making a particular database. One such popular
model is the entity relationship model (ER model). The ER model specializes in entities,
relationships, and even attributes that are used by database designers. In terms of this concept, a
discussion can be made even with non-computer science(non-technical) users and stakeholders, and
their requirements can be understood.
Entity-Relationship Model( ER Model):
It is a high-level data model which is used to define the data and the relationships between them. It
is basically a conceptual design of any database which is easy to design the view of data.
Components of ER Model:
1. Entity: An entity is referred to as a real-world object. It can be a name, place, object,
class, etc. These are represented by a rectangle in an ER Diagram.
2. Attributes: An attribute can be defined as the description of the entity. These are
represented by Eclipse in an ER Diagram. It can be Age, Roll Number, or Marks for a
Student.
3. Relationship: Relationships are used to define relations among different entities.
Diamonds and Rhombus are used to show Relationships.
Characteristics of a conceptual data model:
 Offers Organization-wide coverage of the business concepts.
 This type of Data Models are designed and developed for a business audience.
 The conceptual model is developed independently of hardware specifications like data
storage capacity, location or software specifications like DBMS vendor and technology.
 The focus is to represent data as a user will see it in the “real world.”
Conceptual data models known as Domain models create a common vocabulary for all stakeholders
by establishing basic concepts and scope

2. Representational Data Model :


This type of data model is used to represent only the logical part of the database and does not
represent the physical structure of the database. The representational data model allows us to focus
primarily, on the design part of the database. A popular representational model is a Relational
model. The relational Model consists of Relational Algebra and Relational Calculus. In the
Relational Model, we basically use tables to represent our data and the relationships between them.
It is a theoretical concept whose practical implementation is done in Physical Data Model.
The advantage of using a Representational data model is to provide a foundation to form the base
for the Physical model
3. Physical Data Model :
The physical Data Model is used to practically implement Relational Data Model. Ultimately, all
data in a database is stored physically on a secondary storage device such as discs and tapes. This is
stored in the form of files, records, and certain other data structures. It has all the information on the
format in which the files are present and the structure of the databases, the presence of external data
structures, and their relation to each other. Here, we basically save tables in memory so they can be
accessed efficiently. In order to come up with a good physical model, we have to work on the
relational model in a better way. Structured Query Language (SQL) is used to practically
implement Relational Algebra..
This Data Model describes HOW the system will be implemented using a specific DBMS system.
This model is typically created by DBA and developers. The purpose is actual implementation of the
database.

Characteristics of a physical data model:


The physical data model describes data need for a single project or application though it maybe
integrated with other physical data models based on project scope.
Data Model contains relationships between tables that which addresses cardinality and nullability of
the relationships.
Developed for a specific version of a DBMS, location, data storage or technology to be used in the
project.
Columns should have exact datatypes, lengths assigned and default values.
Primary and Foreign keys, views, indexes, access profiles, and authorizations, etc. are defined
Some Other Data Models :
1.Hierarchical Model :
The hierarchical Model is one of the oldest models in the data model which was developed by IBM,
in the 1950s. In a hierarchical model, data are viewed as a collection of tables, or we can say
segments that form a hierarchical relation. In this, the data is organized into a tree-like structure
where each record consists of one parent record and many children. Even if the segments are
connected as a chain-like structure by logical associations, then the instant structure can be a fan
structure with multiple branches. We call the illogical associations as directional associations.
2. Network Model:
The Network Model was formalized by the Database Task group in the 1960s. This model is the
generalization of the hierarchical model. This model can consist of multiple parent segments and
these segments are grouped as levels but there exists a logical association between the segments
belonging to any level. Mostly, there exists a many-to-many logical association between any of the
two segments.
3. Object-Oriented Data Model:
In the Object-Oriented Data Model, data and their relationships are contained in a single structure
which is referred to as an object in this data model. In this, real-world problems are represented as
objects with different attributes. All objects have multiple relationships between them. Basically, it
is a combination of Object Oriented programming and a Relational Database Model.
4. Float Data Model:
The float data model basically consists of a two-dimensional array of data models that do not
contain any duplicate elements in the array. This data model has one drawback it cannot store a
large amount of data that is the tables cann’t be of large size.
5. Context Data Model :
The Context data model is simply a data model which consists of more than one data model. For
example, the Context data model consists of ER Model, Object-Oriented Data Model, etc. This
model allows users to do more than one thing which each individual data model can do.
6. Semi-Structured Data Model:
Semi-Structured data models deal with the data in a flexible way. Some entities may have extra
attributes and some entities may have some missing attributes. Basically, you can represent data
here in a flexible way.
Advantages of Data Models :
1. Data Models help us in representing data accurately.
2. It helps us in finding the missing data and also in minimizing Data Redundancy.
3. Data Model provides data security in a better way.
4. The data model should be detailed enough to be used for building the physical database.
5. The information in the data model can be used for defining the relationship between
tables, primary and foreign keys, and stored procedures.
Disadvantages of Data Models:
1. In the case of a vast database, sometimes it becomes difficult to understand the data
model.
2. You must have the proper knowledge of SQL to use physical models.
3. Even smaller change made in structure require modification in the entire application.
4. There is no set data manipulation language in DBMS.
5. To develop Data model one should know physical data stored characteristics.

Three Schema Architecture :


In database management system (DBMS), schema refers to the logical aspect or logical structure of
the Database, which tells us about how data is stored and accessed. Architecture refers to the overall
design and organization of the database. The three schema architecture separates the logical and
physical aspects of the DBMS. This allows changing one layer of the database without affecting the
other layers. It also helps to maintain data integrity and consistency.
The three layers of a three-schema architecture are:
 External layer
 Conceptual layer
 Internal layer
1. External Schema :
In DBMS, the External layer provides a logical view of the database. The External layer is the
database portion that users can access and use. It is the topmost layer designed to provide a user-
friendly database interface.
Let us understand this through an example of an Employee Management system, where an
employee login into the system, and the system shows the employee’s details.
2.Conceptual schema:
The Conceptual schema is the database section that tells the difference between all the different data
sets. It represents the structure of a database. In an employee database, it describes columns or
attributes of the table.
It can also be called a high-level representation of a database. The conceptual schema is mostly
represented by the Entity-relationship Model ( ER Model ), which uses symbols to represent the
data elements and relationships visually for a specific system. In an ER Model, the database is
represented by ER Diagram.
Now Let us consider the Employee management system. The ER Diagram for the system would
look like the following.
This ER Diagram illustrates the relationships among the Employee, Department, Employee's Role,
and Login System.
3.Internal schema:
The Internal schema, also known as Physical Schema, is a database section where all the data is kept
and arranged. Here we need to decide where data should be stored and how it should be stored.
The Key Features of Internal Schema include the following:
 It determines how the data is stored in the database.
 It creates indexes to the data so the records can be accessed quickly.
 It compresses the data in a form such that the quality of data is not lost and it takes up less
space.
 It divides large tables into smaller partitions for better management and performance.
 It also includes security features so the data is never breached or hacked.

DBMS INSTANCES:
We are considering an Online Employee Management system to be our database. The database
schema is created before creating the database. So that in future, if we make any change to one
layer, it doesn’t affect the other layers. DBMS instance refers to the information in our database at
any instance or time.
Some steps in creating a schema in DBMS include:
 Determining what kind of data is to be stored in the database.
 Once the purpose of the database is determined, the next step is to define the tables.
 Within each table, our next step is to define the columns of the tables.
 Each column has a data type specifying the information that will be kept there. Dates,
numbers, and text are some of the data kinds.
 There are relationships between the tables in the database that must be defined.
Database Schema Designs:
The process of developing a structure for storing and organizing data in a database is known as
database schema design.
Following are some of the database schema designs:
 Hierarchical schema design arranges data in a binary tree-like structure.
 Network schema design allows for more complex relationships, where each record can have
multiple parent and child records.
 Relational schema design organizes data into tables with rows and columns.
 Dimensional schema design uses a star schema to organize data into fact tables and
dimension tables.
Advantages:
 There are several benefits to using a three-schema architecture in DBMS. Some of these
benefits include:
 One of the main advantages of a DBMS's three schemas is its data independence. All three
layers are distinct from each other. So we can make changes to one layer without affecting
other layers.
 Each schema can scale independently, which can enhance the performance of the database
and manage more traffic at the same time.
 It is simpler to maintain and change each layer individually in a three-schema design due to
the separation of the layers.
Disadvantages:
 Despite many benefits of three schema architecture, there are a few disadvantages of it:
 This method can be difficult and expensive for big companies because it takes a lot of work
to set up and maintain.
 It can also cause slow-downs and mistakes if the data is not converted correctly between the
different parts.
 Sometimes, it can also be hard to make sure only the right people can access sensitive
information.
Database Languages in DBMS:
A DBMS has appropriate languages and interfaces to express database queries and updates.
Database languages can be used to read, store and update the data in the database.
Types of Database Languages:

1. Data Definition Language (DDL):


DDL stands for Data Definition Language. It is used to define database structure or pattern.
It is used to create schema, tables, indexes, constraints, etc. in the database.
Using the DDL statements, you can create the skeleton of the database.
Data definition language is used to store the information of metadata like the number of tables and
schemas, their names, indexes, columns in each table, constraints, etc.
Here are some tasks that come under DDL:
 Create: It is used to create objects in the database.
 Alter: It is used to alter the structure of the database.
 Drop: It is used to delete objects from the database.
 Truncate: It is used to remove all records from a table.
 Rename: It is used to rename an object.
 Comment: It is used to comment on the data dictionary.
These commands are used to update the database schema that's why they come under Data
definition language.
2. Data Manipulation Language (DML):
DML stands for Data Manipulation Language. It is used for accessing and manipulating data in a
database. It handles user requests.
Here are some tasks that come under DML:
 Select: It is used to retrieve data from a database.
 Insert: It is used to insert data into a table.
 Update: It is used to update existing data within a table.
 Delete: It is used to delete all records from a table.
 Merge: It performs UPSERT operation, i.e., insert or update operations.
 Call: It is used to call a structured query language or a Java subprogram.
 Explain Plan: It has the parameter of explaining data.
 Lock Table: It controls concurrency.
3. Data Control Language (DCL):
DCL stands for Data Control Language. It is used to retrieve the stored or saved data.
The DCL execution is transactional. It also has rollback parameters.
Here are some tasks that come under DCL:
 Grant: It is used to give user access privileges to a database.
 Revoke: It is used to take back permissions from the user.
There are the following operations which have the authorization of Revoke:
CONNECT, INSERT, USAGE, EXECUTE, DELETE, UPDATE and SELECT.
4. Transaction Control Language (TCL):
TCL is used to run the changes made by the DML statement. TCL can be grouped into a logical
transaction.
Here are some tasks that come under TCL:
 Commit: It is used to save the transaction on the database.
 Rollback: It is used to restore the database to original since the last Commit.
SQL :
SQL stands for Structured Query Language. It is used for storing and managing data in relational
database management system (RDMS).It is a standard language for Relational Database System. It
enables a user to create, read, update and delete relational databases and tables.All the RDBMS like
MySQL, Informix, Oracle, MS Access and SQL Server use SQL as their standard database
language.SQL allows users to query the database in a number of ways, using English-like
statements.
Rules:
Structure query language is not case sensitive. Generally, keywords of SQL are written in uppercase.
Statements of SQL are dependent on text lines. We can use a single SQL statement on one or
multiple text line.Using the SQL statements, you can perform most of the actions in a database.
SQL depends on tuple relational calculus and relational algebra.
SQL process:
When an SQL command is executing for any RDBMS, then the system figure out the best way to
carry out the request and the SQL engine determines that how to interpret the task.
In the process, various components are included. These components can be optimization Engine,
Query engine, Query dispatcher, classic, etc.All the non-SQL queries are handled by the classic
query engine, but SQL query engine won't handle logical files.

DATABASE DESIGN:
Database Design is a collection of processes that facilitate the designing, development,
implementation and maintenance of enterprise data management systems. Properly designed
database are easy to maintain, improves data consistency and are cost effective in terms of disk
storage space. The database designer decides how the data elements correlate and what data must be
stored.The main objectives of database design in DBMS are to produce logical and physical designs
models of the proposed database system.The logical model concentrates on the data requirements
and the data to be stored independent of physical considerations. It does not concern itself with how
the data will be stored or where it will be stored physically.The physical data design model involves
translating the logical DB design of the database onto physical media using hardware resources and
software systems such as database management systems (DBMS).

Database development life cycle:


The database development life cycle has a number of stages that are followed when developing
database systems.The steps in the development life cycle do not necessarily have to be followed
religiously in a sequential manner.On small database systems, the process of database design is
usually very simple and does not involve a lot of steps.In order to fully appreciate the above
diagram, let’s look at the individual components listed in each step for overview of design process
in DBMS.
Requirements analysis:
 Planning – This stages of database design concepts are concerned with planning of entire
Database Development Life Cycle. It takes into consideration the Information Systems strategy
of the organization.
 System definition – This stage defines the scope and boundaries of the proposed database
system.

Database designing:
 Logical model – This stage is concerned with developing a database model based on
requirements. The entire design is on paper without any physical implementations or specific
DBMS considerations.
 Physical model – This stage implements the logical model of the database taking into account
the DBMS and physical implementation factors.
Implementation:
 Data conversion and loading – this stage of relational databases design is concerned with
importing and converting data from the old system into the new database.
 Testing – this stage is concerned with the identification of errors in the newly implemented
system. It checks the database against requirement specifications.
Database engine:
A database engine (or storage engine) is the underlying software component that a database
management system (DBMS) uses to create, read, update and delete (CRUD) data from a database.
Most database management systems include their own application programming interface (API) that
allows the user to interact with their underlying engine without going through the user interface of
the DBMS.
The term "database engine" is frequently used interchangeably with "database server" or "database
management system". A "database instance" refers to the processes and memory structures of the
running database engine.
Three Parts that make up the Database System are:
 Query Processor
 Storage Manager
 Disk Storage
1. Query Processor:
The query processing is handled by the query processor, as the name implies. It executes the
user's query, to put it simply. In this way, the query processor aids the database system in
making data access simple and easy. The query processor's primary duty is to successfully
execute the query. The Query Processor transforms (or interprets) the user's application
program-provided requests into instructions that a computer can understand.
Components of the Query Processor:
1. DDL Interpreter:
Data Definition Language is what DDL stands for. As implied by the name, the DDL
Interpreter interprets DDL statements like those used in schema definitions (such as
create, remove, etc.). This interpretation yields a set of tables that include the meta-data
(data of data) that is kept in the data dictionary. Metadata may be stored in a data
dictionary. In essence, it is a part of the disc storage that will be covered in a later section
of this article.
2. DML Compiler:
Compiler for DML Data Manipulation Language is what DML stands for. In keeping with
its name, the DML Compiler converts DML statements like select, update, and delete into
low-level instructions or simply machine-readable object code, to enable execution. The
optimization of queries is another function of the DML compiler. Since a single question
can typically be translated into a number of evaluation plans. As a result, some
optimization is needed to select the evaluation plan with the lowest cost out of all the
options. This process, known as query optimization, is exclusively carried out by the
DML compiler. Simply put, query optimization determines the most effective technique
to carry out a query.
3. Embedded DML Pre-compiler:
Before the query evaluation, the embedded DML commands in the application program
(such as SELECT, FROM, etc., in SQL) must be pre-compiled into standard procedural
calls (program instructions that the host language can understand). Therefore, the DML
statements which are embedded in an application program must be converted into routine
calls by the Embedded DML Pre-compiler.
4. Query Optimizer:
It starts by taking the evaluation plan for the question, runs it, and then returns the result.
Simply said, the query evaluation engine evaluates the SQL commands used to access the
database's contents before returning the result of the query. In a nutshell, it is in charge of
analyzing the queries and running the object code that the DML Compiler produces.
Apache Drill, Presto, and other Query Evaluation Engines are a few examples.
2. Storage Manager:
An application called Storage Manager acts as a conduit between the queries made and the data kept
in the database. Another name for it is Database Control System. By applying the restrictions and
running the DCL instructions, it keeps the database's consistency and integrity. It is in charge of
retrieving, storing, updating, and removing data from the database.

Components of Storage Manager:


 Integrity Manager:
 Whenever there is any change in the database, the Integrity manager will manage the
integrity constraints.
 Authorization Manager:
 Authorization manager verifies the user that he is valid and authenticated for the specific
query or request.
 File Manager:
 All the files and data structure of the database are managed by this component.
 Transaction Manager:
 It is responsible for making the database consistent before and after the transactions.
Concurrent processes are generally controlled by this component.
 Buffer Manager:
 The transfer of data between primary and main memory and managing the cache memory is
done by the buffer manager.
3. Disk Storage:
A DBMS can use various kinds of Data Structures as a part of physical system implementation in
the form of disk storage.
Components of Disk Storage:
Data Dictionary:
It contains the metadata (data of data), which means each object of the database has some
information about its structure. So, it creates a repository which contains the details about the
structure of the database object.
 Data Files:
This component stores the data in the files.
 Indices:
These indices are used to access and retrieve the data in a very fast and efficient way.

QUERY PARSER RELATIONALN ALGEBRA


&TRANS
EXPRESSION

OPTIMIZER

QUERY EVOLUTION EXECUTION


O/P ENGINE PLAN

META
DATA DATA

DATABASE USER:
Database users interact with data to update, read and modify the given information on a daily basis.
There are various types of database users and we will learn in detail about them.
Database users can be divided into the following types −
 End Users
i. Naive users / Parametric users
ii. Sophisticated users
 Application Programmer or Specialized users or Back-End Developer
 System Analyst
 Database Administrator (DBA)
 Temporary Users or Casual Users

End Users/Parametric Users :


These users access the database from the front end with the help of a pre-developed application.
They have little knowledge about the design and working of databases.
There are two types of end users:
1. Naive Users :These naive users are those users who don’t have any database knowledge.
They depend on pre-developed applications like Bank Management Systems, Library
Management Systems, Hospital Management Systems, and Railway Ticket Booking
Systems(IRCTC) and get the desired result.
2. Sophisticated Users :These users interact with the system without writing a program and
have separate databases for personal use. In the database, the user passes each query to the
query processor.
There are two ways to interact with a system −
o They use the structure query language to run the query on the database.
o They use the tool of data analysis software. For example, data engineers and data
scientists are familiar with databases.

Application programmers/End Developer :


These programmers write the code for an application program that uses the database. The
application programmer can make the application according to user requirements and control
software that runs an entire computer system. The application program is written in any
programming language like C#, .net, JAVA, etc., and focuses on business, engineering, and science
program.
Application Programmers are divided into four different types:
 Web Developers
 Computer Hardware Programmers
 Database Developers
 Software Developers
Examples of Application programmers develop software like −
 Content access software
 Educational Software
 Information Worker Software
 Media Development Software
 Product Engineering Software
 Enterprise Software

System Analyst:
A System Analyst has also known as a business technology analyst. These professionals are
responsible for the design, structure, and properties of databases. The application programmer uses
the specifications provided by the system analyst to construct the software that is used by end users.
The analyst will gather information from the shareholders as well as end users to understand their
requirements and translate it into functional specifications for the new system.
Examples of System Analysts :
 They serve as team leaders.
 They are responsible for managing projects.
 They are the supervisor who manages the lower-level information Staff.

Database Administrator (DBA):


The DBA is the group of people that includes everything required to manage and solve every
complex. The DBA can easily use the database to find the information they need and to plan the
goal of the database. To meet future needs, they are ready for future scope and provide solutions for
end users. Therefore, they are known for high-level management.
For example :
o To handle the data loss.
o To secure the privacy of data.
o Monitor the recovery and backup of the database.

Temporary Users/Casual Users :


These users utilize the database for testing and are only accessible for a limited time. According to
business requirements, these users update a little or new information to the database with the help of
a database administrator. It helps to maintain the security and integrity of data.
For example:
o High-level management people are temporary users with little knowledge of DBMS.
DATABASE ADMINISTRATOR :
A Database Administrator (DBA) is an individual or person responsible for controlling,
maintaining, coordinating, and operating a database management system. Managing, securing, and
taking care of the database systems is a prime responsibility. They are responsible and in charge of
authorizing access to the database, coordinating, capacity, planning, installation, and monitoring
uses, and acquiring and gathering software and hardware resources as and when needed. Their role
also varies from configuration, database design, migration, security, troubleshooting, backup, and
data recovery. Database administration is a major and key function in any firm or organization that
is relying on one or more databases. They are overall commanders of the Database system.
Types of Database Administrator (DBA) :
1. Administrative DBA :
Their job is to maintain the server and keep it functional. They are concerned with data
backups, security, troubleshooting, replication, migration, etc.
2. Data Warehouse DBA :
Assigned earlier roles, but held accountable for merging data from various sources into the
data warehouse. They also design the warehouse, with cleaning and scrubs data prior to
loading.
3. Cloud DBA :
Nowadays companies are preferring to save their workpiece on cloud storage. As it
reduces the chance of data loss and provides an extra layer of data security and integrity.
4. Development DBA :
They build and develop queries, stores procedure, etc. that meets firm or organization
needs. They are par at programming.
5. Application DBA :
They particularly manage all requirements of application components that interact with the
database and accomplish activities such as application installation and coordination,
application upgrades, database cloning, data load process management, etc.
6. Architect :
They are held responsible for designing schemas like building tables. They work to build a
structure that meets organizational needs. The design is further used by developers and
development DBAs to design and implement real applications.
7. OLAP DBA:
They design and build multi-dimensional cubes for determination support or OLAP
systems.
8. Data Modeler:
In general, a data modeler is in charge of a portion of a data architect’s duties. A data
modeler is typically not regarded as a DBA, but this is not a hard and fast rule.
9. Task-Oriented DBA :
To concentrate on a specific DBA task, large businesses may hire highly specialised DBAs.
They are quite uncommon outside of big corporations. Recovery and backup DBA, whose
responsibility it is to guarantee that the databases of businesses can be recovered, is an
example of a task-oriented DBA. However, this specialism is not present in the majority of
firms. These task-oriented DBAs will make sure that highly qualified professionals are
working on crucial DBA tasks when it is possible.
10.Database Analyst:
This position doesn’t actually have a set definition. Junior DBAs may occasionally be
referred to as database analysts. A database analyst occasionally performs functions that are
comparable to those of a database architect. The term “Data Administrator” is also used to
describe database analysts and data analysts. Additionally, some businesses occasionally
refer to database administrators as data analysts.
Importance of Database Administrator (DBA) :
 Database Administrator manages and controls three levels of database internal level,
conceptual level, and external level of Database management system architecture and in
discussion with the comprehensive user community, gives a definition of the world view of
the database. It then provides an external view of different users and applications.
 Database Administrator ensures held responsible to maintain integrity and security of
database restricting from unauthorized users. It grants permission to users of the database
and contains a profile of each and every user in the database.
 Database Administrators are also held accountable that the database is protected and
secured and that any chance of data loss keeps at a minimum.
 Database Administrator is solely responsible for reducing the risk of data loss as it backup
the data at regular intervals.
Role and Duties of Database Administrator (DBA) :
1. Decides hardware –
They decide on economical hardware, based on cost, performance, and efficiency of
hardware, and best suits the organization. It is hardware that is an interface between end
users and the database.
2. Manages data integrity and security –
Data integrity needs to be checked and managed accurately as it protects and restricts data
from unauthorized use. DBA eyes on relationships within data to maintain data integrity.
3. Database Accessibility –
Database Administrator is solely responsible for giving permission to access data available
in the database. It also makes sure who has the right to change the content.
4. Database design –
DBA is held responsible and accountable for logical, physical design, external model
design, and integrity and security control.
5. Database implementation –
DBA implements DBMS and checks database loading at the time of its implementation.
6. Query processing performance –
DBA enhances query processing by improving speed, performance, and accuracy.
7. Tuning Database Performance –
If the user is not able to get data speedily and accurately then it may lose organization’s
business. So by tuning SQL commands DBA can enhance the performance of the database.
Various responsibilities of Database Administrator (DBA) :
 Responsible for designing overall database schema (tables & fields).
 To select and install database software and hardware.
 Responsible for deciding on access methods and data storage.
 DBA selects appropriate DBMS software like oracle, SQL server or MySQL.
 Used in designing recovery procedures.
 DBA decides the user access level and security checks for accessing, modifying or
manipulating data.
 DBA is responsible for specifying various techniques for monitoring the database
performance
DATABASE HISTORY:
The history of databases dates back long before computers were invented. In the past, data was
stored in journals, libraries, and filing cabinets, taking up space and making it difficult to find and
back up. The advent of computers in the early 1960s marked the beginning of computerised
databases. Charles Bachman designed the first database known as the Integrated Data Store,
followed by the Information Management System developed by IBM. These databases were
navigational, requiring users to navigate through the entire database to find the information they
wanted. The two main models of this navigational database were the hierarchical model developed
by IBM and the network model introduced at the Conference on Data Systems Languages
(CODASYL).
The 1970s saw the release of a paper by E. F. Codd entitled "A Relational Model of Data for Large
Shared Data Banks." This paper marked the beginning of the relational database, which shows the
relationship between different data records and is more space-efficient, making it cost-effective for
data storage. This led to the creation of INGRES, a relational database model, at the University of
California, Berkeley, which used the QUEL query language. IBM released its own relational
database, System R, which was the first to use Structured Query Language (SQL).
The 1980s marked a time of growth and standardisation for relational databases, with the
navigational database models fading. The commercialisation of relational systems saw a rise in their
use and popularity, with SQL becoming the standard language for databases. Another noteworthy
event was the emergence of Object-oriented database management systems (OODBMS), which
viewed data as objects and worked with programming languages that supported the object-oriented
approach.
The 1990s saw the rise of the World Wide Web, fueling demand for client-server database systems,
and the exponential growth of the database industry. MySQL was created in 1995, offering an
alternative to the database systems of large companies like Oracle and Microsoft. Object-oriented
database systems also grew more popular in the 1990s.
The term NoSQL was coined in 1998, referring to databases that use a query language other than
SQL to store and retrieve data. NoSQL databases are useful for unstructured data and saw growth in
the 2000s, allowing for faster processing of larger, more varied datasets. NoSQL databases are more
flexible than the traditional relational databases that had risen the decade before.
The 2010s saw a rise in data awareness, with the emergence of big data and increased emphasis on
data protection. This led to the development of automation software as a popular tool when
interacting with databases. With the need to collect, organise, and make use of large amounts of
data, distributed databases became more popular, storing data across multiple physical locations
instead of in one place. The importance of data protection was highlighted by legislation like GDPR
and the NIS directive.
In conclusion, the history of databases is a rich and evolving one that has impacted and been
impacted by the advancements of technology, and the need to manage and protect large amounts of
data efficiently and effectively.
DBMS Architecture:
The DBMS design depends upon its architecture. The basic client/server architecture is used to deal
with a large number of PCs, web servers, database servers and other components that are connected
with networks.The client/server architecture consists of many PCs and a workstation which are
connected via the network.DBMS architecture depends upon how users are connected to the
database to get their request done.
1.Centralised architecture:
A centralized database is stored at a single location such as a mainframe computer. It is maintained
and modified from that location only and usually accessed using an internet connection such as a
LAN or WAN. The centralized database is used by organisations such as colleges, companies, banks
etc.
As can be seen from the above diagram, all the information for the organisation is stored in a single
database. This database is known as the centralized database.
Advantages:
 Some advantages of Centralized Database Management System are −
 The data integrity is maximised as the whole database is stored at a single physical location.
This means that it is easier to coordinate the data and it is as accurate and consistent as
possible.
 The data redundancy is minimal in the centralised database. All the data is stored together
and not scattered across different locations. So, it is easier to make sure there is no redundant
data available.
 Since all the data is in one place, there can be stronger security measures around it. So, the
centralised database is much more secure.
 Data is easily portable because it is stored at the same place.
 The centralized database is cheaper than other types of databases as it requires less power
and maintenance.
 All the information in the centralized database can be easily accessed from the same location
and at the same time.

Disadvantages:
 Some disadvantages of Centralized Database Management System are −
 Since all the data is at one location, it takes more time to search and access it. If the network
is slow, this process takes even more time.
 There is a lot of data access traffic for the centralized database. This may create a bottleneck
situation.
 Since all the data is at the same location, if multiple users try to access it simultaneously it
creates a problem. This may reduce the efficiency of the system.
 If there are no database recovery measures in place and a system failure occurs, then all the
data in the database will be destroyed.
2.Client-server Architecture of DBMS :
We first talk about client/server architecture in general, and then we look at how DBMSs use it. In
order to handle computing settings with a high number of PCs, workstations, file servers, printers,
database servers, etc., the client/server architecture was designed.
A network connects various pieces of software and hardware, including email and web server
software. To define specialized servers with a particular functionality is the aim. For instance, it is
feasible to link a number of PCs or compact workstations to a file server that manages the client
machines' files as clients. By having connections to numerous printers, different devices can be
designated as a printer server; all print requests from clients are then directed to this machine. The
category of specialized servers also includes web servers and email servers. Many client machines
can utilize the resources offered by specialized servers. The user is given the proper user interfaces
for these servers as well as local processing power to run local applications on the client devices.
This idea can be applied to various types of software, where specialist applications, like a CAD
(computer-aided design) package, are kept on particular server computers and made available to a
variety of clients. Some devices (such as workstations or PCs with discs that only have client
software installed) would only be client sites.
The idea of client/server architecture presupposes an underpinning structure made up of several PCs
and workstations as well as fewer mainframe computers connected via LANs as well as other types
of computer networks. In this system, a client is often a user machine that offers local processing
and user interface capabilities. When a client needs access to extra features-like database access-that
are not available on that system, it connects to a server that offers those features. A server is a
computer system that includes both hardware and software that can offer client computer services
like file access, printing, archiving, or database access. Generally speaking, some workstations
install both client and server software, while others just install client software. Client and server
software, however, typically run on separate workstations, which is more typical. On this underlying
client/server framework, Two-tier and Three-tier fundamental DBMS architectures were
developed.
Two-Tier Client Server Architecture:
Here, the term "two-tier" refers to our architecture's two layers-the Client layer and the Data layer.
There are a number of client computers in the client layer that can contact the database server. The
API on the client computer will use JDBC or some other method to link the computer to the
database server. This is due to the possibility of various physical locations for clients and database
servers.
Three-Tier Client-Server Architecture:
The Business Logic Layer is an additional layer that serves as a link between the Client layer and
the Data layer in this instance. The layer where the application programs are processed is the
business logic layer, unlike a Two-tier architecture, where queries are performed in the database
server. Here, the application programs are processed in the application server itself.
Parallel database:
Nowadays organizations need to handle a huge amount of data with a high transfer rate. For such
requirements, the client-server or centralized system is not efficient. With the need to improve the
efficiency of the system, the concept of the parallel database comes in picture. A parallel database
system seeks to improve the performance of the system through parallelizing concept.
Need:
Multiple resources like CPUs and Disks are used in parallel. The operations are performed
simultaneously, as opposed to serial processing. A parallel server can allow access to a single
database by users on multiple machines. It also performs many parallelization operations like data
loading, query processing, building indexes, and evaluating queries.

Advantages:
Here, we will discuss the advantages of parallel databases. Let’s have a look.
1. Performance Improvement:
By connecting multiple resources like CPU and disks in parallel we can significantly
increase the performance of the system.
2. High_availability:
In the parallel database, nodes have less contact with each other, so the failure of one
node doesn’t cause for failure of the entire system. This amounts to significantly higher
database availability.
3. Proper_resource_utilization:
Due to parallel execution, the CPU will never be idle. Thus, proper utilization of
resources is there.
4. Increase_Reliability:
When one site fails, the execution can continue with another available site which is
having a copy of data. Making the system more reliable.
Performance_Measurement_of_Databases:
Here, we will emphasize the performance measurement factor-like Speedup and Scale-up. Let’s
understand it one by one with the help of examples.
Speedup: –
The ability to execute the tasks in less time by increasing the number of resources is called Speedup.

Distributed Database System in DBMS:


A distributed database is essentially a database that is dispersed across numerous sites, i.e., on
various computers or over a network of computers, and is not restricted to a single system. A
distributed database system is spread across several locations with distinct physical components.
This can be necessary when different people from all over the world need to access a certain
database. It must be handled such that, to users, it seems to be a single database.
Types:
1. Homogeneous Database: A homogeneous database stores data uniformly across all
locations. All sites utilize the same operating system, database management system, and data
structures. They are therefore simple to handle.
2. Heterogeneous Database: With a heterogeneous distributed database, many locations may
employ various software and schema, which may cause issues with queries and transactions.
Moreover, one site could not be even aware of the existence of the other sites. Various
operating systems and database applications may be used by various machines. They could
even employ separate database data models. Translations are therefore necessary for
communication across various sites.
Uses for distributed databases:
 The corporate management information system makes use of it.
 Multimedia apps utilize it.
 Used in hotel chains, military command systems, etc.
 The production control system also makes use of it
Characteristics of distributed databases:
Distributed databases are logically connected to one another when they are part of a collection, and
they frequently form a single logical database. Data is physically stored across several sites and is
separately handled in distributed databases. Each site's processors are connected to one another
through a network, but they are not set up for multiprocessing.
A widespread misunderstanding is that a distributed database is equivalent to a loosely coupled file
system. It's considerably more difficult than that in reality. Although distributed databases use
transaction processing, they are not the same as systems that use it.
Generally speaking, distributed databases have the following characteristics:
 Place unrelated
 Spread-out query processing
 The administration of distributed transactions
 Independent of hardware
 Network independent of operating systems
 Transparency of transactions
 DBMS unrelated
Types of DBMS Architecture:

Database architecture can be seen as a single tier or multi-tier.


But logically, database architecture is of two types like:
 Tier Architecture
 Tier architecture
 Tier architecture.
1-Tier Architecture:
In this architecture, the database is directly available to the user. It means the user can directly sit on
the DBMS and uses it.Any changes done here will directly be done on the database itself. It doesn't
provide a handy tool for end users.The 1-Tier architecture is used for development of the local
application, where programmers can directly communicate with the database for the quick response.
2-Tier Architecture:
The 2-Tier architecture is same as basic client-server. In the two-tier architecture, applications on
the client end can directly communicate with the database at the server side. For this interaction,
API's like: ODBC, JDBC are used.The user interfaces and application programs are run on the
client-side.The server side is responsible to provide the functionalities like: query processing and
transaction management.To communicate with the DBMS, client-side application establishes a
connection with the server side.

3-Tier Architecture:
The 3-Tier architecture contains another layer between the client and server. In this architecture,
client can't directly communicate with the server.The application on the client-end interacts with an
application server which further communicates with the database system.End user has no idea about
the existence of the database beyond the application server. The database also has no idea about any
other user beyond the application.The 3-Tier architecture is used in case of large web application.
Characteristics and Benefits of a Database:
There are a number of characteristics that distinguish the database approach from the file-based
system or approach. This chapter describes the benefits (and features) of the database system.
 Self-describing nature of a database system:
A database system is referred to as self-describing because it not only contains the database
itself, but also metadata which defines and describes the data and relationships between
tables in the database. This information is used by the DBMS software or database users if
needed. This separation of data and information about the data makes a database system
totally different from the traditional file-based system in which the data definition is part of
the application programs.
 Insulation between program and data:
In the file-based system, the structure of the data files is defined in the application programs
so if a user wants to change the structure of a file, all the programs that access that file might
need to be changed as well.On the other hand, in the database approach, the data structure is
stored in the system catalogue and not in the programs. Therefore, one change is all that is
needed to change the structure of a file. This insulation between the programs and data is also
called program-data independence.
 Support for multiple views of data:
A database supports multiple views of data. A view is a subset of the database, which is
defined and dedicated for particular users of the system. Multiple users in the system might
have different views of the system. Each view might contain only the data of interest to a
user or group of users.
 Sharing of data and multiuser system:
Current database systems are designed for multiple users. That is, they allow many users to
access the same database at the same time. This access is achieved through features
called concurrency control strategies. These strategies ensure that the data accessed are
always correct and that data integrity is maintained.
The design of modern multiuser database systems is a great improvement from those in the past
which restricted usage to one person at a time.
DATABASE DESIGNER:
A database designer is in charge of designing, developing, executing and preserving a company's
data management systems. One of the most important responsibilities of a database designer is to
form relationships between various elements of data and give it a logical structure.
KEY RESPONSIBILITIES OF A DATABASE DESIGNER:
 Understand the organisation's data to skillfully carry out the company's database design projects.
 Install and configure relational database management system on the company's server.
 Design database schemas and create databases for varied projects of the company.
 Handle the creation of new users, define roles and privileges and grant access to them.
 Assist application development teams to easily connect to the databases.
 Track the performance of the databases and fix issues quickly to facilitate smooth functioning.
 Use the best techniques for enhanced scalability and efficiency of large databases.
 Understand complex problems, devise solutions and transform them into software requirements.
 Conduct data research and query large and complex datasets to provide the best data modelling.
Data Independence:
Data independence can be explained using the three-schema architecture.
Data independence refers characteristic of being able to modify the schema at one level of the database system without
altering the schema at the next higher level.
There are two types of data independence:
1. Logical Data Independence:
 Logical data independence refers characteristic of being able to change the conceptual schema
without having to change the external schema.
 Logical data independence is used to separate the external level from the conceptual view.
 If we do any changes in the conceptual view of the data, then the user view of the data would not
be affected.
 Logical data independence occurs at the user interface level.
2. Physical Data Independence:
 Physical data independence can be defined as the capacity to change the internal schema without
having to change the conceptual schema.
 If we do any changes in the storage size of the database system server, then the Conceptual
structure of the database will not be affected.
 Physical data independence is used to separate conceptual levels from the internal levels.
 Physical data independence occurs at the logical interface level.

Interfaces in DBMS:
A database management system (DBMS) interface is a user interface that allows for the ability to
input queries to a database without using the query language itself. User-friendly interfaces provided
by DBMS may include the following:
 Menu-Based Interfaces
 Forms-Based Interfaces
 Graphical User Interfaces
 Natural Language Interfaces
 Speech Input and Output Interfaces
 Interfaces for Parametric Users
 Interfaces for the Database Administrator (DBA)
1.Menu-Based Interfaces:
These interfaces present the user with lists of options (called menus) that lead the user through the
formation of a request. The basic advantage of using menus is that they remove the tension of
remembering specific commands and syntax of any query language. The query is basically
composed step by step by collecting or picking options from a menu that is shown by the system.
Pull-down menus are a very popular technique in Web-based interfaces. They are also often used
in browsing interfaces which allow a user to look through the contents of a database in an
exploratory and unstructured manner.
2.Forms-Based Interfaces:
A forms-based interface displays a form to each user. Users can fill out all of the form entries to
insert new data, or they can fill out only certain entries, in which case the DBMS will redeem the
same type of data for other remaining entries. These types of forms are usually designed or created
and programmed for users that have no expertise in operating systems. Many DBMS’s have form
specification languages which are special languages that help specify such forms.
Example: SQL Forms is a form-based language that specifies queries using a form designed in
conjunction with the relational database schema.
3.Graphical User Interface:
A GUI typically displays a schema to the user in diagrammatic form. The user then can specify a
query by manipulating the diagram. In many cases, GUI utilise both menus and forms. Most GUI
use a pointing device such as a mouse, to pick a certain part of the displayed schema diagram.
4.Natural Language Interfaces:
These interfaces accept requests written in English or some other language and attempt to
understand them. A Natural language interface has its own schema, which is similar to the database
conceptual schema as well as a dictionary of important words.
The natural language interface refers to the words in its schema as well as to the set of standard
words in a dictionary to interpret the request. If the interpretation is successful, the interface
generates a high-level query corresponding to the natural language and submits it to the DBMS for
processing, otherwise, a dialogue is started with the user to clarify any provided condition or
request. The main disadvantage of this is that the capabilities of this type of interface are not that
advance.
5.Speech Input and Output Interfaces:
There is limited use of speech be it for a query or an answer to a question or being a result of a
request it is becoming commonplace. Applications with limited vocabulary such as inquiries for
telephone directory, flight arrival/departure, and bank account information are allowed speech for
input and output to enable ordinary folks to access this information.
The Speech input is detected using predefined words and used to set up the parameters that are
supplied to the queries. For output, a similar conversion from text or numbers into speech takes
place.
6.Interface for Parametric Users:
Interfaces for Parametric Users contain some commands that can be handled with a minimum of
keystrokes. It is generally used in bank transactions for transferring money. These operations are
performed repeatedly.
7.Interfaces for Database Administrators (DBA):
Most database system contains privileged commands that can be used only by the DBA’s staff.
These include commands for creating accounts, setting system parameters, granting account
authorization, changing a schema, and reorganizing the storage structures of databases.
EF Codd’s Rules in DBMS:
Codd’s rules are proposed by a computer scientist named Dr. Edgar F. Codd and he also invent the
relational model for database management. These rules are made to ensure data integrity,
consistency, and usability. This set of rules basically signifies the characteristics and requirements of
a relational database management system (RDBMS). In this article, we will learn about various
Codd’s rules.
Codd’s Rules in DBMS:
Rule 1: The Information Rule:
All information, whether it is user information or metadata, that is stored in a database must be
entered as a value in a cell of a table. It is said that everything within the database is organized in a
table layout.
Rule 2: The Guaranteed Access Rule:
Each data element is guaranteed to be accessible logically with a combination of the table name,
primary key (row value), and attribute name (column value).
Rule 3: Systematic Treatment of NULL Values:
Every Null value in a database must be given a systematic and uniform treatment.
Rule 4: Active Online Catalog Rule:
The database catalog, which contains metadata about the database, must be stored and accessed
using the same relational database management system.
Rule 5: The Comprehensive Data Sublanguage Rule:
A crucial component of any efficient database system is its ability to offer an easily understandable
data manipulation language (DML) that facilitates defining, querying, and modifying information
within the database.
Rule 6: The View Updating Rule:
All views that are theoretically updatable must also be updatable by the system.
Rule 7: High-level Insert, Update, and Delete:
A successful database system must possess the feature of facilitating high-level insertions, updates,
and deletions that can grant users the ability to conduct these operations with ease through a single
query.
Rule 8: Physical Data Independence:
Application programs and activities should remain unaffected when changes are made to the
physical storage structures or methods.
Rule 9: Logical Data Independence :
Application programs and activities should remain unaffected when changes are made to the logical
structure of the data, such as adding or modifying tables.
Rule 10: Integrity Independence:
Integrity constraints should be specified separately from application programs and stored in the
catalog. They should be automatically enforced by the database system.
Rule 11: Distribution Independence:
The distribution of data across multiple locations should be invisible to users, and the database
system should handle the distribution transparently.

Rule 12: Non-Subversion Rule:


If the interface of the system is providing access to low-level records, then the interface must not be
able to damage the system and bypass security and integrity constraints.
The Database System Environment:
A DBMS is a complex software system. In this section we discuss the types of soft-ware
components that constitute a DBMS and the types of computer system soft-ware with which the
DBMS interacts.
1.DBMS Component Modules:
The database and the DBMS catalog are usually stored on disk. Access to the disk is controlled
primarily by the operating system (OS), which schedules disk read/write. Many DBMSs have their
own buffer management module to schedule disk read/write, because this has a considerable effect
on performance. Reducing disk read/write improves performance considerably. A higher-
level stored data manager module of the DBMS controls access to DBMS information that is
stored on disk, whether it is part of the database or the catalog.
Let us consider the top part of Figure 2.3 first. It shows interfaces for the DBA staff, casual users
who work with interactive interfaces to formulate queries, application programmers who create
programs using some host programming languages, and parametric users who do data entry work by
supplying parameters to predefined transactions. The DBA staff works on defining the database and
tuning it by making changes to its definition using the DDL and other privileged commands.
The DDL compiler processes schema definitions, specified in the DDL, and stores descriptions
of the schemas (meta-data) in the DBMS catalog. The catalog includes information such as the
names and sizes of files, names and data types of data items, storage details of each file, mapping
information among schemas, and constraints. In addition, the catalog stores many other types of
information that are needed by the DBMS modules, which can then look up the catalog information
as needed.

Casual users and persons with occasional need for information from the database interact using
some form of interface, which we call the interactive query interface in Figure 2.3. We have not
explicitly shown any menu-based or form-based interaction that may be used to generate the
interactive query automatically. These queries are parsed and validated for correctness of the query
syntax, the names of files and data elements, and so on by a query compiler that compiles them into
an internal form. This internal query is subjected to query optimization (discussed in Chapters 19
and 20). Among other things, the query optimizer is concerned with the rearrangement and possible
reordering of operations, elimination of redundancies, and use of correct algorithms and indexes
during execution. It consults the system catalog for statistical and other physical information about
the stored data and generates executable code that performs the necessary operations for the query
and makes calls on the runtime processor.
Application programmers write programs in host languages such as Java, C, or C++ that are
submitted to a pre compiler. The pre compiler extracts DML commands from an application
program written in a host programming language. These commands are sent to the DML compiler
for compilation into object code for database access. The rest of the program is sent to the host
language compiler. The object codes for the DML commands and the rest of the program are linked,
forming a canned transaction whose executable code includes calls to the runtime database
processor. Canned transactions are executed repeatedly by parametric users, who simply supply the
parameters to the transactions. Each execution is considered to be a separate transaction. An
example is a bank withdrawal transaction where the account number and the amount may be
supplied as parameters.
In the lower part of Figure 2.3, the runtime database processor executes (1) the privileged
commands, (2) the executable query plans, and (3) the canned transactions with runtime parameters.
It works with the system catalog and may update it with statistics. It also works with the stored data
manager, which in turn uses basic operating system services for carrying out low-level input/output
(read/write) operations between the disk and main memory. The runtime database processor handles
other aspects of data transfer, such as management of buffers in the main memory. Some DBMSs
have their own buffer management module while others depend on the OS for buffer management.
We have shown concurrency control and backup and recovery systems separately as a module in
this figure. They are integrated into the working of the runtime database processor for purposes of
transaction management.
It is now common to have the client program that accesses the DBMS running on a separate
computer from the computer on which the database resides. The former is called the client
computer running a DBMS client software and the latter is called the database server. In some cases,
the client accesses a middle computer, called the application server, which in turn accesses the
database server. We elaborate on this topic in Section 2.5.
Figure 2.3 is not meant to describe a specific DBMS; rather, it illustrates typical DBMS
modules. The DBMS interacts with the operating system when disk accesses—to the database or to
the catalog—are needed. If the computer system is shared by many users, the OS will schedule
DBMS disk access requests and DBMS processing along with other processes. On the other hand, if
the computer system is mainly dedicated to running the database server, the DBMS will control
main memory buffering of disk pages. The DBMS also interfaces with compilers for general-
purpose host programming languages, and with application servers and client programs running on
separate machines through the system network interface.

2. Database System Utilities:


In addition to possessing the software modules just described, most DBMSs have database
utilities that help the DBA manage the database system. Common utilities have the following types
of functions:
Loading:
A loading utility is used to load existing data files—such as text files or sequential files—into the
database. Usually, the current (source) for mat of the data file and the desired (target) database file
structure are specified to the utility, which then automatically reformats the data and stores it in the
database. With the proliferation of DBMSs, transferring data from one DBMS to another is
becoming common in many organizations. Some vendors are offering products that generate the
appropriate loading programs, given the existing source and target database storage descriptions
(internal schemas). Such tools are also called conversion tools. For the hierarchical DBMS called
IMS (IBM) and for many network DBMSs including IDMS (Computer Associates), SUPRA
(Cincom), and IMAGE (HP), the vendors or third-party companies are making a variety of
conversion tools available (e.g., Cincom’s SUPRA Server SQL) to transform data into the relational
model.
Backup:
A backup utility creates a backup copy of the database, usually by dumping the entire database onto
tape or other mass storage medium. The backup copy can be used to restore the database in case of
catastrophic disk failure. Incremental backups are also often used, where only changes since the
previous backup are recorded. Incremental backup is more complex, but saves storage space.
Database storage reorganization:
This utility can be used to reorganize a set of database files into different file organizations, and
create new access paths to improve performance.
Performance monitoring:
Such a utility monitors database usage and provides statistics to the DBA. The DBA uses the
statistics in making decisions such as whether or not to reorganize files or whether to add or drop
indexes to improve performance.
Other utilities may be available for sorting files, handling data compression, monitoring access by
users, interfacing with the network, and performing other functions.
3. Tools, Application Environments, and Communications
Facilities:
Other tools are often available to database designers, users, and the DBMS. CASE tools are used in
the design phase of database systems. Another tool that can be quite useful in large organizations is
an expanded data dictionary (or data repository) system. In addition to storing catalog information
about schemas and constraints, the data dictionary stores other information, such as design
decisions, usage standards, application program descriptions, and user information. Such a system is
also called an information repository. This information can be accessed directly by users or the DBA
when needed. A data dictionary utility is similar to the DBMS catalog, but it includes a wider
variety of information and is accessed mainly by users rather than by the DBMS software.
Application development environments, such as PowerBuilder (Sybase) or JBuilder (Borland), have
been quite popular. These systems provide an environment for developing database applications and
include facilities that help in many facets of database systems, including database design, GUI
development, querying and updating, and application program development.
The DBMS also needs to interface with communications software, whose function is to allow users
at locations remote from the database system site to access the data-base through computer
terminals, workstations, or personal computers. These are connected to the database site through
data communications hardware such as Internet routers, phone lines, long-haul networks, local
networks, or satellite communication devices. Many commercial database systems have
communication packages that work with the DBMS. The integrated DBMS and data
communications system is called a DB/DC system. In addition, some distributed DBMSs are
physically distributed over multiple machines. In this case, communications net-works are needed to
connect the machines. These are often local area networks (LANs), but they can also be other types
of networks.

Module 2
ER Diagram:
 ER Diagram stands for Entity Relationship Diagram, also known as ERD is a diagram that
displays the relationship of entity sets stored in a database. In other words, ER diagrams help to
explain the logical structure of databases.
 ER diagrams are created based on three basic concepts: entities, attributes and relationships.
 ER Diagrams contain different symbols that use rectangles to represent entities, ovals to define
attributes and diamond shapes to represent relationships.
 At first look, an ER diagram looks very similar to the flowchart. However, ER Diagram includes
many specialized symbols, and its meanings make this model unique. The purpose of ER
Diagram is to represent the entity framework infrastructure.
ER Model:
 ER Model stands for Entity Relationship Model is a high-level conceptual data model diagram.
ER model helps to systematically analyze data requirements to produce a well-designed
database.
 The ER Model represents real-world entities and the relationships between them. Creating an ER
Model in DBMS is considered as a best practice before implementing your database.
 ER Modeling helps you to analyze data requirements systematically to produce a well-designed
database. So, it is considered a best practice to complete ER modeling before implementing your
database.
History of ER models:
ER diagrams are visual tools that are helpful to represent the ER model. Peter Chen proposed ER
Diagram in 1971 to create a uniform convention that can be used for relational databases and
networks. He aimed to use an ER model as a conceptual modeling approach.
Why use ER Diagrams?
 Here, are prime reasons for using the ER Diagram
 Helps you to define terms related to entity relationship modeling
 Provide a preview of how all your tables should connect, what fields are going to be on each
table
 Helps to describe entities, attributes, relationships
 ER diagrams are translatable into relational tables which allows you to build databases quickly
 ER diagrams can be used by database designers as a blueprint for implementing data in specific
software applications
 The database designer gains a better understanding of the information to be contained in the
database with the help of ERP diagram
 ERD Diagram allows you to communicate with the logical structure of the database to users
Facts about ER Diagram Model:
 Now in this ERD Diagram Tutorial, let’s check out some interesting facts about ER Diagram
Model:
 ER model allows you to draw Database Design
 It is an easy to use graphical tool for modeling data
 Widely used in Database Design
 It is a GUI representation of the logical structure of a Database
 It helps you to identifies the entities which exist in a system and the relationships between those
entities
Symbols Used in ER Model:
ER Model is used to model the logical view of the system from a data perspective which consists of
these symbols:
 Rectangles: Rectangles represent Entities in the ER Model.
 Ellipses: Ellipses represent Attributes in the ER Model.
 Diamond: Diamonds represent Relationships among Entities.
 Lines: Lines represent attributes to entities and entity sets with other relationship types.
 Double Ellipse: Double Ellipses represent Multi-Valued Attributes.
 Double Rectangle: Double Rectangle represents a Weak Entity.

Components of ER Diagram:
ER Model consists of Entities, Attributes, and Relationships among Entities in a Database System.
ER Diagram Examples:
For example, in a University database, we might have entities for Students, Courses, and Lecturers.
Students entity can have attributes like Rollno, Name, and DeptID. They might have relationships
with Courses and Lecturers.

WHAT IS ENTITY?
A real-world thing either living or non-living that is easily recognizable and nonrecognizable. It is
anything in the enterprise that is to be represented in our database. It may be a physical thing or
simply a fact about the enterprise or an event that happens in the real world.
An entity can be place, person, object, event or a concept, which stores data in the database. The
characteristics of entities are must have an attribute, and a unique key. Every entity is made up of
some ‘attributes’ which represent that entity.
Examples of entities:
 Person: Employee, Student, Patient
 Place: Store, Building
 Object: Machine, product, and Car
 Event: Sale, Registration, Renewal
 Concept: Account, Course
Notation of an Entity:

Entity set(Student):
An entity set is a group of similar kind of entities. It may contain entities with attribute sharing
similar values. Entities are represented by their properties, which also called attributes. All attributes
have their separate values. For example, a student entity may have a name, age, class, as attributes.
Example of Entities:
A university may have some departments. All these departments employ various lecturers and offer
several programs.
Some courses make up each program. Students register in a particular program and enroll in various
courses. A lecturer from the specific department takes each course, and each lecturer teaches a
various group of students.

Relationship:
Relationship is nothing but an association among two or more entities. E.g., Tom works in the
Chemistry department.
Entities take part in relationships. We can often identify relationships with verbs or verb phrases.
For example:
 You are attending this lecture
 I am giving the lecture
 Just loke entities, we can classify relationships according to relationship-types:
 A student attends a lecture
 A lecturer is giving a lecture.
Weak Entities :
A weak entity is a type of entity which doesn’t have its key attribute. It can be identified uniquely by
considering the primary key of another entity. For that, weak entity sets need to have participation.

Strong Entity Set Weak Entity Set


 It does not have enough attributes to build a
 Strong entity set always has a primary key.
primary key.
 It is represented by a rectangle symbol.  It is represented by a double rectangle symbol.
 It contains a Primary key represented by the  It contains a Partial Key which is represented
underline symbol. by a dashed underline symbol.
 The member of a strong entity set is called as  The member of a weak entity set called as a
dominant entity set. subordinate entity set.
 Primary Key is one of its attributes which helps  In a weak entity set, it is a combination of
Strong Entity Set Weak Entity Set
primary key and partial key of the strong entity
to identify its member.
set.
 In the ER diagram the relationship between two  The relationship between one strong and a
strong entity set shown by using a diamond weak entity set shown by using the double
symbol. diamond symbol.
 The connecting line of the strong entity set  The line connecting the weak entity set for
with the relationship is single. identifying relationship is double.
Attributes:
Attributes are the properties that define the entity type. For example, Roll_No, Name, DOB, Age,
Address, and Mobile_No are the attributes that define entity type Student. In ER diagram, the
attribute is represented by an oval.

1. Key Attribute:
The attribute which uniquely identifies each entity in the entity set is called the key attribute. For
example, Roll_No will be unique for each student. In ER diagram, the key attribute is represented
by an oval with underlying lines.

2. Composite Attribute:
An attribute composed of many other attributes is called a composite attribute. For example, the
Address attribute of the student Entity type consists of Street, City, State, and Country. In ER
diagram, the composite attribute is represented by an oval comprising of ovals.

3. Multivalued Attribute:
An attribute consisting of more than one value for a given entity. For example, Phone_No (can be
more than one for a given student). In ER diagram, a multivalued attribute is represented by a
double oval.

4. Derived Attribute:
An attribute that can be derived from other attributes of the entity type is known as a derived
attribute. e.g.; Age (can be derived from DOB). In ER diagram, the derived attribute is represented
by a dashed oval.
The Complete Entity Type Student with its Attributes can be represented as:

Relationship Type and Relationship Set:


A Relationship Type represents the association between entity types. For example, ‘Enrolled in’ is a
relationship type that exists between entity type Student and Course. In ER diagram, the relationship
type is represented by a diamond and connecting the entities with lines.

A set of relationships of the same type is known as a relationship set. The following relationship set
depicts S1 as enrolled in C2, S2 as enrolled in C1, and S3 as registered in C3.

Degree of a Relationship Set:


The number of different entity sets participating in a relationship set is called the degree of a
relationship set.
1. Unary Relationship: When there is only ONE entity set participating in a relation, the
relationship is called a unary relationship. For example, one person is married to only one person.
2. Binary Relationship: When there are TWO entities set participating in a relationship, the
relationship is called a binary relationship. For example, a Student is enrolled in a Course.

3. n-ary Relationship: When there are n entities set participating in a relation, the relationship is
called an n-ary relationship.
Cardinality:
The number of times an entity of an entity set participates in a relationship set is known
as cardinality. Cardinality can be of different types:
1. One-to-One:
When each entity in each entity set can take part only once in the relationship, the cardinality is one-
to-one. Let us assume that a male can marry one female and a female can marry one male. So the
relationship will be one-to-one.
the total number of tables that can be used in this is 2.
Using Sets, it can be represented as:

2.One-to-Many:
In one-to-many mapping as well where each entity can be related to more than one relationship and
the total number of tables that can be used in this is 2. Let us assume that one surgeon deparment
can accomodate many doctors. So the Cardinality will be 1 to M. It means one deparment has many
Doctors.
Using sets, one-to-many cardinality can be represented as:

3.Many-to-One: When entities in one entity set can take part only once in the relationship set and
entities in other entity sets can take part more than once in the relationship set, cardinality is many
to one. Let us assume that a student can take only one course but one course can be taken by many
students. So the cardinality will be n to 1. It means that for one course there can be n students but
for one student, there will be only one course.
Using Sets, it can be represented as:

In this case, each student is taking only 1 course but 1 course has been taken by many students.
4. Many-to-Many: When entities in all entity sets can take part more than once in the relationship
cardinality is many to many. Let us assume that a student can take more than one course and one
course can be taken by many students. So the relationship will be many to many.
the total number of tables that can be used in this is 3.
A Sample Database Application:
In this section we describe a sample database application, called COMPANY, which serves to
illustrate the basic ER model concepts and their use in schema design. We list the data
requirements for the database here, and then create its conceptual schema step-by-step as we
introduce the modeling concepts of the ER model. The COMPANY database keeps track of a
company’s employees, departments, and projects. Suppose that after the requirements
collection and analysis phase, the database designers provide the following description of
the miniworld—the part of the company that will be represented in the database.
The company is organized into departments. Each department has a unique name, a unique
number, and a particular employee who manages the department. We keep track of the start
date when that employee began man-aging the department. A department may have several
locations.
A department controls a number of projects, each of which has a unique name, a unique
number, and a single location.
We store each employee’s name, Social Security number, address, salary, sex (gender), and
birth date. An employee is assigned to one department, but may work on several projects,
which are not necessarily controlled by the same department. We keep track of the current
number of hours per week that an employee works on each project. We also keep track of the
direct supervisor of each employee (who is another employee).
We want to keep track of the dependents of each employee for insurance purposes. We keep
each dependent’s first name, sex, birth date, and relation-ship to the employee.
Figure 7.2 shows how the schema for this database application can be displayed by means of
the graphical notation known as ER diagrams. This figure will be explained gradually as the
ER model concepts are presented. We describe the step-by-step process of deriving this
schema from the stated requirements—and explain the ER diagrammatic notation—as we
introduce the ER model concepts.
ER Design Issues:
In the previous sections of the data modeling, we learned to design an ER diagram. We also
discussed different ways of defining entity sets and relationships among them. We also understood
the various designing shapes that represent a relationship, an entity, and its attributes. However,
users often mislead the concept of the elements and the design process of the ER diagram. Thus, it
leads to a complex structure of the ER diagram and certain issues that does not meet the
characteristics of the real-world enterprise model.
Here, we will discuss the basic design issues of an ER database schema in the following points:
1) Use of Entity Set vs Attributes:
The use of an entity set or attribute depends on the structure of the real-world enterprise
that is being modelled and the semantics associated with its attributes. It leads to a
mistake when the user use the primary key of an entity set as an attribute of another entity
set. Instead, he should use the relationship to do so. Also, the primary key attributes are
implicit in the relationship set, but we designate it in the relationship sets.
2) Use of Entity Set vs. Relationship Sets:
It is difficult to examine if an object can be best expressed by an entity set or relationship
set. To understand and determine the right use, the user need to designate a relationship
set for describing an action that occurs in-between the entities. If there is a requirement of
representing the object as a relationship set, then its better not to mix it with the entity set.
3) Use of Binary vs n-ary Relationship Sets:
Generally, the relationships described in the databases are binary relationships. However,
non-binary relationships can be represented by several binary relationships. For example,
we can create and represent a ternary relationship 'parent' that may relate to a child, his
father, as well as his mother. Such relationship can also be represented by two binary
relationships i.e, mother and father, that may relate to their child. Thus, it is possible to
represent a non-binary relationship by a set of distinct binary relationships.
4) Placing Relationship Attributes:
The cardinality ratios can become an affective measure in the placement of the
relationship attributes. So, it is better to associate the attributes of one-to-one or one-to-
many relationship sets with any participating entity sets, instead of any relationship set.
The decision of placing the specified attribute as a relationship or entity attribute should
possess the charactestics of the real world enterprise that is being modelled.
Types of Keys in Relational Model :


Keys are one of the basic requirements of a relational database model. It is widely used to identify
the tuples(rows) uniquely in the table. We also use keys to set up relations amongst various columns
and tables of a relational database.
Different Types of Keys in the Relational Model
1. Candidate Key
2. Primary Key
3. Super Key
4. Alternate Key
5. Foreign Key
6. Composite Key
1.Candidate Key:
The minimal set of attributes that can uniquely identify a tuple is known as a candidate key. For Example,
STUD_NO in STUDENT relation.
 It is a minimal super key.
 It is a super key with no repeated data is called a candidate key.
 The minimal set of attributes that can uniquely identify a record.
 It must contain unique values.
 It can contain NULL values.
 Every table must have at least a single candidate key.
 A table can have multiple candidate keys but only one primary key.
 The value of the Candidate Key is unique and may be null for a tuple.
 There can be more than one candidate key in a relationship.
Example:
STUD_NO is the candidate key for relation STUDENT.
Table STUDENT
STUD_NO SNAME ADDRESS PHONE

1 Shyam Delhi 123456789

2 Rakesh Kolkata 223365796

3 Suraj Delhi 175468965

 The candidate key can be simple (having only one attribute) or composite as well.
Example:
{STUD_NO, COURSE_NO} is a composite
candidate key for relation STUDENT_COURSE.
Table STUDENT_COURSE:
TEACHER_N
STUD_NO O COURSE_NO

1 001 C001

2 056 C005

Note: In SQL Server a unique constraint that has a nullable column, allows the value ‘null‘ in that
column only once. That’s why the STUD_PHONE attribute is a candidate here, but can not be a
‘null’ value in the primary key attribute.
2.Primary Key:
There can be more than one candidate key in relation out of which one can be chosen as the primary
key. For Example, STUD_NO, as well as STUD_PHONE, are candidate keys for relation
STUDENT but STUD_NO can be chosen as the primary key (only one out of many candidate
keys).
 It is a unique key.
 It can identify only one tuple (a record) at a time.
 It has no duplicate values, it has unique values.
 It cannot be NULL.
 Primary keys are not necessarily to be a single column; more than one column can also be
a primary key for a table.
Example:
STUDENT table -> Student(STUD_NO, SNAME,
ADDRESS, PHONE) , STUD_NO is a primary key

Table STUDENT
STUD_NO SNAME ADDRESS PHONE

1 Shyam Delhi 123456789

2 Rakesh Kolkata 223365796

3 Suraj Delhi 175468965

3.Super Key:
The set of attributes that can uniquely identify a tuple is known as Super Key. For Example,
STUD_NO, (STUD_NO, STUD_NAME), etc. A super key is a group of single or multiple keys that
identifies rows in a table. It supports NULL values.
 Adding zero or more attributes to the candidate key generates the super key.
 A candidate key is a super key but vice versa is not true.
 Super Key values may also be NULL.
Example:
Consider the table shown above.
STUD_NO+PHONE is a super key.
4.Alternate Key:
The candidate key other than the primary key is called an alternate key.
 All the keys which are not primary keys are called alternate keys.
 It is a secondary key.
 It contains two or more fields to identify two or more records.
 These values are repeated.
 Eg:- SNAME, and ADDRESS is Alternate keys
Example:
Consider the table shown above.
STUD_NO, as well as PHONE both,
are candidate keys for relation STUDENT but
PHONE will be an alternate key
(only one out of many candidate keys).

5.Foreign Key:
If an attribute can only take the values which are present as values of some other attribute, it will be
a foreign key to the attribute to which it refers. The relation which is being referenced is called
referenced relation and the corresponding attribute is called referenced attribute the relation which
refers to the referenced relation is called referencing relation and the corresponding attribute is
called referencing attribute. The referenced attribute of the referenced relation should be the primary
key to it.
 It is a key it acts as a primary key in one table and it acts as
secondary key in another table.
 It combines two or more relations (tables) at a time.
 They act as a cross-reference between the tables.
 For example, DNO is a primary key in the DEPT table and a non-key in EMP
Example:
Refer Table STUDENT shown above.
STUD_NO in STUDENT_COURSE is a
foreign key to STUD_NO in STUDENT relation.
Table STUDENT_COURSE:
TEACHER_N
STUD_NO O COURSE_NO

1 005 C001
TEACHER_N
STUD_NO O COURSE_NO

2 056 C005

It may be worth noting that, unlike the Primary Key of any given relation, Foreign Key can be
NULL as well as may contain duplicate tuples i.e. it need not follow uniqueness constraint. For
Example, STUD_NO in the STUDENT_COURSE relation is not unique. It has been repeated for
the first and third tuples. However, the STUD_NO in STUDENT relation is a primary key and it
needs to be always unique, and it cannot be null.

6.Composite Key:
Sometimes, a table might not have a single column/attribute that uniquely identifies all the records
of a table. To uniquely identify rows of a table, a combination of two or more columns/attributes
can be used. It still can give duplicate values in rare cases. So, we need to find the optimal set of
attributes that can uniquely identify rows in a table.
 It acts as a primary key if there is no primary key in a table
 Two or more attributes are used together to make a composite key.
 Different combinations of attributes may give different accuracy in terms of identifying
the rows uniquely.

Example:
FULLNAME + DOB can be combined
together to access the details of a student.

Weak Entity Set in ER diagrams:


An entity type should have a key attribute which uniquely identifies each entity in the entity set, but
there exists some entity type for which key attribute can’t be defined. These are called Weak Entity
type.
The entity sets which do not have sufficient attributes to form a primary key are known as weak
entity sets and the entity sets which have a primary key are known as strong entity sets.
As the weak entities do not have any primary key, they cannot be identified on their own, so they
depend on some other entity (known as owner entity). The weak entities have total participation
constraint (existence dependency) in its identifying relationship with owner identity. Weak entity
types have partial keys. Partial Keys are set of attributes with the help of which the tuples of the
weak entities can be distinguished and identified.
Note :Weak entity always has total participation but Strong entity may not have total participation.
Weak entity is depend on strong entity to ensure the existence of weak entity. Like strong entity,
weak entity does not have any primary key, It has partial discriminator key. Weak entity is
represented by double rectangle. The relation between one strong and one weak entity is represented
by double diamond.

Weak entities are represented with double rectangular box in the ER Diagram and the identifying
relationships are represented with double diamond. Partial Key attributes are represented with
dotted lines.

Example-1:
In the below ER Diagram, ‘Payment’ is the weak entity. ‘Loan Payment’ is the identifying
relationship and ‘Payment Number’ is the partial key. Primary Key of the Loan along with the
partial key would be used to identify the records.

Example-2:
The existence of rooms is entirely dependent on the existence of a hotel. So room can be seen as the
weak entity of the hotel.
Example-3:
The bank account of a particular bank has no existence if the bank doesn’t exist anymore.
Extended ER features:
Using the ER model for bigger data creates a lot of complexity while designing a database model,
So in order to minimize the complexity Generalization, Specialization, and Aggregation were
introduced in the ER model and these were used for data abstraction in which an abstraction
mechanism is used to hide details of a set of objects. Some of the terms were added to the Enhanced
ER Model, where some new concepts were added. These new concepts are:
 Generalization
 Specialization
 Aggregation
Generalization:
Generalization is the process of extracting common properties from a set of entities and creating a
generalized entity from it. It is a bottom-up approach in which two or more entities can be
generalized to a higher-level entity if they have some attributes in common. For Example,
STUDENT and FACULTY can be generalized to a higher-level entity called PERSON as shown in
Figure 1. In this case, common attributes like P_NAME, and P_ADD become part of a
higher entity (PERSON), and specialized attributes like S_FEE become part of a specialized entity
(STUDENT).
Generalization is also called as ‘ Bottom-up approach”.

Specialization:
In specialization, an entity is divided into sub-entities based on its characteristics. It is a top-down
approach where the higher-level entity is specialized into two or more lower-level entities. For
Example, an EMPLOYEE entity in an Employee management system can be specialized into
DEVELOPER, TESTER, etc. as shown in Figure 2. In this case, common attributes like E_NAME,
E_SAL, etc. become part of a higher entity (EMPLOYEE), and specialized attributes like
TES_TYPE become part of a specialized entity (TESTER).
Specialization is also called as ” Top-Down approch”.

Inheritance:
It is an important feature of generalization and specialization
 Attribute inheritance: allows lower level entities to inherit the attributes of higher level
entities and vice versa.
Participation inheritance: In participation inheritance, relationships involving higher
level entity set also inherited by lower level entity and vice versa.
Aggregation:
An ER diagram is not capable of representing the relationship between an entity and a relationship
which may be required in some scenarios. In those cases, a relationship with its corresponding
entities is aggregated into a higher-level entity. Aggregation is an abstraction through which we can
represent relationships as higher-level entity sets.
For Example, an Employee working on a project may require some machinery. So, REQUIRE
relationship is needed between the relationship WORKS_FOR and entity MACHINERY. Using
aggregation, WORKS_FOR relationship with its entities EMPLOYEE and PROJECT is aggregated
into a single entity and relationship REQUIRE is created between the aggregated entity and
MACHINERY.

Structural Constraints of Relationships in ER


Model



Prerequisite – ER Model To understand Structural Constraints, we must take a look at Cardinality
Ratios and Participation Constraints. Cardinality Ratios of relationships : The entities are denoted
by rectangle and relationships by diamond.

There are numbers (represented by M and N) written above the lines which connect relationships
and entities. These are called cardinality ratios. These represent the maximum number of entities
that can be associated with each other through relationship, R.

Types of Cardinality :
There can be 4 types of cardinality –
1. One-to-one (1:1) – When one entity in each entity set takes part at most once in the
relationship, the cardinality is one-to-one.
2. One-to-many (1: N) – If entities in the first entity set take part in the relationship set at
most once and entities in the second entity set take part many times (at least twice), the
cardinality is said to be one-to-many.
3. Many-to-one (N:1) – If entities in the first entity set take part in the relationship set many
times (at least twice), while entities in the second entity set take part at most once, the
cardinality is said to be many-to-one.
4. Many-to-many (N: N) – The cardinality is said to be many to many if entities in both the
entity sets take part many times (at least twice) in the relationship set.

Participation Constraints : Participation Constraints tell us that the participation in a relationship


can either be total or partial. When each entity in an entity set participates in a relation, it is
called Total Participation. However, when all entities in the given entity set do not participate in a
relation, it is called Partial Participation.

.
Structural Constraints : Structural Constraints are also called Structural properties of a database
management system (DBMS). Cardinality Ratios and Participation Constraints taken together are
called Structural Constraints. The name constraints refer to the fact that such limitations must be
imposed on the data, for the DBMS system to be consistent with the requirements

.
Relation Data Model
Relational data model is the primary data model, which is used widely around the world for data
storage and processing. This model is simple and it has all the properties and capabilities required to
process data with storage efficiency.
INFORMAL DEFINITIONS:
RELATION:
 A table of values
 a set of rows
 a set of columns
 Each row represents an entity or relationship
 Each row has a value of an item or set of items
 that uniquely identifies that row in the table – Sometimes row-ids or sequential numbers
are
 assigned to identify the rows in the table – Each column typically is called by its column
name
 or column header or attribute name
FORMAL DEFINITIONS:
 Schema of a Relation: R (A1, A2, .....An)
 Relation schema R is defined over attributes A1, A2, .....An
 D is called the domain of Ai and is denoted by dom(Ai) • R is called the name of this
relation
 Degree (or arity) of a relation is the number of attributes n of
 its relation schema
Example -
STUDENT(Name, Ssn, Home_phone, Address,
Office_phone, Age, Gpa) • STUDENT(Name: string, Ssn: string, Home_phone: string,
Address: string, Office_phone: string, Age: integer, Gpa: real)
FORMAL DEFINITIONS:
• Tuple is an ordered set of values.
• A relation may be regarded as a set of tuples (rows) • Columns in a table are also called attributes
of the Relation
• CUSTOMER (Cust-id, Cust-name, Address, Phone#) • <632895, "John Smith", "101 Main St.
Atlanta, GA
30332", "(404) 894-2000">
FORMAL DEFINITIONS:
• Domain: A domain may have a data-type or a format
defined for it.
Example -
“USA_phone_numbers” are the set of 10 digit
phone numbers valid in the U.S
FORMAL DEFINITIONS
• Relation (or relation state) r of the relation schema
R(A1, A2, ... , An), also denoted by r(R), is a set of n-tuples r = {t1, t2, ... , tm} • Each tuple t is an
ordered list of n values t =<v1, v2, ... , vn>, where each value vi, 1 ≤ i ≤ n, is an element of dom
(Ai) or is a special NULL value
Definition
• R: schema of the relation
• r(R) - r of R: a specific "value" or population of R
• R is also called the intension of a relation
• r is also called the extension of a relation
Characteristics of relation:
1. Ordering of tuples in a relation: A relation is defined as a set of tuples. Mathematically,
elements ,of a set have no order among them; hence, tuples in a relation do not have any
particular order. In other words a relation is not a sensitive to the ordering of tuples.
2. Ordering of a tuples within a tuple: According to the preceding definition of a relation, an n-
tuple is an ordered list of n-values, so the ordering of values in a tuple and hence of attributes
in a relation in a relation schema is important .However ,at a more abstract level, the order of
attributes and their values is not the important as long as precedence between attributes and
values is maintained.
3. Values and NULLs in the tuples: Each value in a tuple is an atomic value; that is not divisible
into components within the framework of the basic relational model. Hence ,composite and
multivalued attributes are not allowed. This model is sometimes called the flat-relational
model. An important concept is that of NULL values, which are used to represent the values
of attributes that may be unknown or may not apply to a tuple.
4. Interpretation(Meaning)of relation: The relation schema can be interpreted as a declaration or
as a declaration or a type of assertion. For example ,the schema of the STUDENT relation
asserts that, in general ,a student entity has a names, home-phone, address, GPA. Each tuple
in the relation can then be interpreted as particular instance of relation.
Concepts:
Tables :In relational data model, relations are saved in the format of Tables. This format stores the
relation among entities. A table has rows and columns, where rows represents records and columns
represent the attributes.
Tuple :A single row of a table, which contains a single record for that relation is called a tuple.
Relation instance :A finite set of tuples in the relational database system represents relation
instance. Relation instances do not have duplicate tuples.
Relation schema :A relation schema describes the relation name (table name), attributes, and their
names.
Relation key :Each row has one or more attributes, known as relation key, which can identify the
row in the relation (table) uniquely.
Attribute domain :Every attribute has some pre-defined value scope, known as attribute domain.
Constraints:
Every relation has some conditions that must hold for it to be a valid relation. These conditions are
called Relational Integrity Constraints.
There are three main integrity constraints −
 Key constraints
 Entity constraints
 Domain constraints
 Referential integrity constraints
A.Key Constraints:
There must be at least one minimal subset of attributes in the relation, which can identify a tuple
uniquely. This minimal subset of attributes is called key for that relation. If there are more than one
such minimal subsets, these are called candidate keys.
Key constraints force that −
in a relation with a key attribute, no two tuples can have identical values for key attributes.
a key attribute can not have NULL values.
Key constraints are also referred to as Entity Constraints.
Types of Keys in Relational Model :


Keys are one of the basic requirements of a relational database model. It is widely used to identify
the tuples(rows) uniquely in the table. We also use keys to set up relations amongst various columns
and tables of a relational database.

 Candidate Key
 Primary Key
 Super Key
 Alternate Key
 Foreign Key
 Composite Key
1.Candidate Key:
The minimal set of attributes that can uniquely identify a tuple is known as a candidate key. For
Example, STUD_NO in STUDENT relation.
 It is a minimal super key.
 It is a super key with no repeated data is called a candidate key.
 The minimal set of attributes that can uniquely identify a record.
 It must contain unique values.
 It can contain NULL values.
 Every table must have at least a single candidate key.
 A table can have multiple candidate keys but only one primary key.
 The value of the Candidate Key is unique and may be null for a tuple.
 There can be more than one candidate key in a relationship.
Example:
STUD_NO is the candidate key for relation STUDENT.
Table STUDENT
STUD_NO SNAME ADDRESS PHONE

1 Shyam Delhi 123456789

2 Rakesh Kolkata 223365796

3 Suraj Delhi 175468965

 The candidate key can be simple (having only one attribute) or composite as well.
Example:
{STUD_NO, COURSE_NO} is a composite
candidate key for relation STUDENT_COURSE.

Table STUDENT_COURSE
STUD_N TEACHER_N COURSE_N
O O O

1 001 C001
STUD_N TEACHER_N COURSE_N
O O O

2 056 C005

Note: In SQL Server a unique constraint that has a nullable column, allows the value ‘null‘ in that
column only once. That’s why the STUD_PHONE attribute is a candidate here, but can not be a
‘null’ value in the primary key attribute.
2.Primary Key:
There can be more than one candidate key in relation out of which one can be chosen as the primary
key. For Example, STUD_NO, as well as STUD_PHONE, are candidate keys for relation
STUDENT but STUD_NO can be chosen as the primary key (only one out of many candidate
keys).
 It is a unique key.
 It can identify only one tuple (a record) at a time.
 It has no duplicate values, it has unique values.
 It cannot be NULL.
 Primary keys are not necessarily to be a single column; more than one column can also be
a primary key for a table.
Example:
STUDENT table -> Student(STUD_NO, SNAME,
ADDRESS, PHONE) , STUD_NO is a primary key
Table STUDENT
STUD_NO SNAME ADDRESS PHONE

1 Shyam Delhi 123456789

2 Rakesh Kolkata 223365796

3 Suraj Delhi 175468965

3.Super Key:
The set of attributes that can uniquely identify a tuple is known as Super Key. For Example,
STUD_NO, (STUD_NO, STUD_NAME), etc. A super key is a group of single or multiple keys that
identifies rows in a table. It supports NULL values.
 Adding zero or more attributes to the candidate key generates the super key.
 A candidate key is a super key but vice versa is not true.
 Super Key values may also be NULL.
Example:
Consider the table shown above.
STUD_NO+PHONE is a super key.
4.Alternate Key:
The candidate key other than the primary key is called an alternate key.
 All the keys which are not primary keys are called alternate keys.
 It is a secondary key.
 It contains two or more fields to identify two or more records.
 These values are repeated.
 Eg:- SNAME, and ADDRESS is Alternate keys
Example:
Consider the table shown above.
STUD_NO, as well as PHONE both,
are candidate keys for relation STUDENT but
PHONE will be an alternate key
(only one out of many candidate keys).

5.Foreign Key:
If an attribute can only take the values which are present as values of some other attribute, it will be
a foreign key to the attribute to which it refers. The relation which is being referenced is called
referenced relation and the corresponding attribute is called referenced attribute the relation which
refers to the referenced relation is called referencing relation and the corresponding attribute is
called referencing attribute. The referenced attribute of the referenced relation should be the primary
key to it.
 It is a key it acts as a primary key in one table and it acts as
secondary key in another table.
 It combines two or more relations (tables) at a time.
 They act as a cross-reference between the tables.
 For example, DNO is a primary key in the DEPT table and a non-key in EMP
Example:
Refer Table STUDENT shown above.
STUD_NO in STUDENT_COURSE is a
foreign key to STUD_NO in STUDENT relation.
Table STUDENT_COURSE
STUD_N TEACHER_N COURSE_N
O O O

1 005 C001

2 056 C005

It may be worth noting that, unlike the Primary Key of any given relation, Foreign Key can be
NULL as well as may contain duplicate tuples i.e. it need not follow uniqueness constraint. For
Example, STUD_NO in the STUDENT_COURSE relation is not unique. It has been repeated for
the first and third tuples. However, the STUD_NO in STUDENT relation is a primary key and it
needs to be always unique, and it cannot be null.
6.Composite Key:
Sometimes, a table might not have a single column/attribute that uniquely identifies all the records
of a table. To uniquely identify rows of a table, a combination of two or more columns/attributes
can be used. It still can give duplicate values in rare cases. So, we need to find the optimal set of
attributes that can uniquely identify rows in a table.
 It acts as a primary key if there is no primary key in a table
 Two or more attributes are used together to make a composite key.
 Different combinations of attributes may give different accuracy in terms of identifying
the rows uniquely.
Example:
FULLNAME + DOB can be combined
together to access the details of a student.

B.Entity integrity:
An entity is any person, place, or thing to be recorded in a database. Each table represents an entity,
and each row of a table represents an instance of that entity. For example, if order is an entity,
the orders table represents the idea of an order and each row in the table represents a specific order.
To identify each row in a table, the table must have a primary key. The primary key is a unique
value that identifies each row. This requirement is called the entity integrity constraint.
For example, the orders table primary key is order_num. The order_num column holds a unique
system-generated order number for each row in the table. To access a row of data in
the orders table, use the following SELECT statement:
SELECT * FROM orders WHERE order_num = 1001;
Using the order number in the WHERE clause of this statement enables you to access a row easily
because the order number uniquely identifies that row. If the table allowed duplicate order numbers,
it would be almost impossible to access one single row because all other columns of this table allow
duplicate values.
C.Domain Constraints:
Attributes have specific values in real-world scenario. For example, age can only be a positive
integer. The same constraints have been tried to employ on the attributes of a relation. Every
attribute is bound to have a specific range of values. For example, age cannot be less than zero and
telephone numbers cannot contain a digit outside 0-9.
D.Referential integrity Constraints
Referential integrity constraints work on the concept of Foreign Keys. A foreign key is a key
attribute of a relation that can be referred in other relation.
Referential integrity constraint states that if a relation refers to a key attribute of a different or same
relation, then that key element must exist.
Example:

In the above example, we have 2 relations, Customer and Billing.


Tuple for CustomerID =1 is referenced twice in the relation Billing. So we know
CustomerName=Google has billing amount $300
Operations in Relational Model:
Four basic update operations performed on relational database model are
 Insert
 Update
 Delete
 Select.
Whenever one of these operations are applied, integrity constraints specified on the relational
database schema must never be violated.
Insert Operation:
The insert operation gives values of the attribute for a new tuple which should be inserted into a
relation. Insert is used to insert data into the relation

Update Operation:
You can see that in the below-given relation table CustomerName= ‘Apple’ is updated from Inactive
to Active. Modify allows you to change the values of some attributes in existing tuples.
Delete Operation:
To specify deletion, a condition on the attributes of the relation selects the tuple to be deleted.
Delete is used to delete tuples from the table.

In the above-given example, CustomerName= “Apple” is deleted from the table.


The Delete operation could violate referential integrity if the tuple which is deleted is referenced by
foreign keys from other tuples in the same database.
Select Operation:
Select allows you to choose a specific range of data.

In the above-given example, CustomerName=”Amazon” is selected


Advantages of Relational Database Model:
 Simplicity: A Relational data model in DBMS is simpler than the hierarchical and network
model.
 Structural Independence: The relational database is only concerned with data and not with
a structure. This can improve the performance of the model.
 Easy to use: The Relational model in DBMS is easy as tables consisting of rows and
columns are quite natural and simple to understand
 Query capability: It makes possible for a high-level query language like SQL to avoid
complex database navigation.
 Data independence: The Structure of Relational database can be changed without having to
change any application.
 Scalable: Regarding a number of records, or rows, and the number of fields, a database
should be enlarged to enhance its usability
Disadvantages of Relational Model:
Few relational databases have limits on field lengths which can’t be exceeded.
Relational databases can sometimes become complex as the amount of data grows, and the relations
between pieces of data become more complicated.Complex relational database systems may lead to
isolated databases where the information cannot be shared from one system to another.
Relational Algebra in DBMS
Relational algebra in DBMS is a procedural query language. Queries in relational algebra are
performed using operators. Relational Algebra is the fundamental block for modern language SQL
and modern Database Management Systems such as Oracle Database, Mircosoft SQL Server, IBM
Db2, etc.

What is Relational Algebra in DBMS?


Relational Algebra came in 1970 and was given by Edgar F. Codd (Father of DBMS). It is also
known as Procedural Query Language(PQL) as in PQL, a programmer/user has to mention two
things, "What to Do" and "How to Do".
Suppose our data is stored in a database, then relational algebra is used to access the data from the
database.
The First thing is we have to access the data, this needs to be specified in the query as "What to
Do", but we have to also specify the method/procedure in the query that is "How to Do" or how to
access the data from the database.
Types of Relational Operations in DBMS
In Relational Algebra, we have two types of Operations.
 Basic Operations
 Derived Operations
Applying these operations over relations/tables will give us new relations as output.

Basic Operations:
Six fundamental operations are mentioned below. The majority of data retrieval operations are carried out by
these. Let's know them one by one.
But, before moving into detail, let's have two tables or we can say relations STUDENT(ROLL, NAME,
AGE) and EMPLOYEE(EMPLOYEE_NO, NAME, AGE) which will be used in the below examples.
STUDENT
ROLL NAME AGE
1 Aman 20
2 Atul 18
3 Baljeet 19
4 Harsh 20
5 Prateek 21
6 Prateek 23
EMPLOYEE
EMPLOYEE_
NAME AGE
NO
E-1 Anant 20
E-2 Ashish 23
E-3 Baljeet 25
E-4 Harsh 20
E-5 Pranav 22

Select (σ):
Select operation is done by Selection Operator which is represented by "sigma"(σ). It is used to
retrieve tuples(rows) from the table where the given condition is satisfied. It is a unary
operator means it requires only one operand.
Notation : σ p(R)
Where σ is used to represent SELECTION
R is used to represent RELATION
p is the logic formula
Let's understand this with an example:
Suppose we want the row(s) from STUDENT Relation where "AGE" is 20
σ AGE=20 (STUDENT)
This will return the following output:
ROLL NAME AGE
1 Aman 20
4 Harsh 20

Project (∏):
Project operation is done by Projection Operator which is represented by "pi"(∏). It is used to
retrieve certain attributes(columns) from the table. It is also known as vertical partitioning as it
separates the table vertically. It is also a unary operator.
Notation : ∏ a(r)
Where ∏ is used to represent PROJECTION
r is used to represent RELATION
a is the attribute list
Let's understand this with an example:
Suppose we want the names of all students from STUDENT Relation.
∏ NAME(STUDENT)
This will return the following output:
NAME
Aman
Atul
Baljeet
Harsh
NAME
Prateek

As you can see from the above output it eliminates duplicates.


For multiple attributes, we can separate them using a ",".
∏ ROLL,NAME(STUDENT)
Above code will return two columns, ROLL and NAME.
ROLL NAME
1 Aman
2 Atul
3 Baljeet
4 Harsh
5 Prateek
6 Prateek

Union (∪):
Union operation is done by Union Operator which is represented by "union"(∪). It is the same as the
union operator from set theory, i.e., it selects all tuples from both relations but with the exception
that for the union of two relations/tables both relations must have the same set of Attributes. It is
a binary operator as it requires two operands.
Notation: R ∪ S
Where R is the first relation
S is the second relation
If relations don't have the same set of attributes, then the union of such relations will result
in NULL.
Let's have an example to clarify the concept:
Suppose we want all the names from STUDENT and EMPLOYEE relation.
∏ NAME(STUDENT) ∪ ∏ NAME(EMPLOYEE)
NAME
Aman
Anant
Ashish
Atul
Baljeet
Harsh
Pranav
Prateek
As we can see from the above output it also eliminates duplicates.
Set Difference (-):
Set Difference as its name indicates is the difference between two relations (R-S). It is denoted by a
"Hyphen"(-) and it returns all the tuples(rows) which are in relation R but not in relation S. It is also
a binary operator.
Notation : R - S
Where R is the first relation
S is the second relation
Just like union, the set difference also comes with the exception of the same set of attributes in both
relations.
Let's take an example where we would like to know the names of students who are in STUDENT
Relation but not in EMPLOYEE Relation.
∏ NAME(STUDENT) - ∏ NAME(EMPLOYEE)
This will give us the following output:
NAME
Aman
Atul
Prateek

Cartesian product (X):


Cartesian product is denoted by the "X" symbol. Let's say we have two relations R and S. Cartesian product
will combine every tuple(row) from R with all the tuples from S. I know it sounds complicated, but once we
look at an example, you'll see what I mean.
Notation: R X S
Where R is the first relation
S is the second relation
As we can see from the notation it is also a binary operator.
Let's combine the two relations STUDENT and EMPLOYEE.
STUDENT X EMPLOYEE
ROLL NAME AGE EMPLOYEE_NO NAME AGE
1 Aman 20 E-1 Anant 20
1 Aman 20 E-2 Ashish 23
1 Aman 20 E-3 Baljeet 25
1 Aman 20 E-4 Harsh 20
1 Aman 20 E-5 Pranav 22
2 Atul 18 E-1 Anant 20
2 Atul 18 E-2 Ashish 23
2 Atul 18 E-3 Baljeet 25
2 Atul 18 E-4 Harsh 20
2 Atul 18 E-5 Pranav 22
Rename (ρ):
Rename operation is denoted by "Rho"(ρ). As its name suggests it is used to rename the output
relation. Rename operator too is a binary operator.
Notation: ρ(R,S)
Where R is the new relation name
S is the old relation name
Let's have an example to clarify this
Suppose we are fetching the names of students from STUDENT relation. We would like to rename
this relation as STUDENT_NAME.
ρ(STUDENT_NAME,∏ NAME(STUDENT))
STUDENT_NAME

NAME
Aman
Atul
Baljeet
Harsh
Prateek
As you can see, this output relation is named "STUDENT_NAME".
Takeaway:
 Select (σ) is used to retrieve tuples(rows) based on certain conditions.
 Project (∏) is used to retrieve attributes(columns) from the relation.
 Union (∪) is used to retrieve all the tuples from two relations.
 Set Difference (-) is used to retrieve the tuples which are present in R but not in S(R-S).
 Cartesian product (X) is used to combine each tuple from the first relation with each tuple from the
second relation.
 Rename (ρ) is used to rename the output relation.

Derived Operations:
Also known as extended operations, these operations can be derived from basic operations and
hence named Derived Operations.
These include three operations:
 Join Operations
 Intersection operations
 Division operations.
Join Operations:
Join Operation in DBMS are binary operations that allow us to combine two or more relations.
They are further classified into two types: Inner Join, and Outer Join.
First, let's have two relations EMPLOYEE consisting of E_NO, E_NAME, CITY and EXPERIENCE.
EMPLOYEE table contains employee's information such as id, name, city, and experience of employee(In
Years). The other relation is DEPARTMENT consisting
of D_NO, D_NAME, E_NO and MIN_EXPERIENCE.
DEPARTMENT table defines the mapping of an employee to their department. It contains Department
Number, Department Name, Employee Id of the employee working in that department, and the minimum
experience required(In Years) to be in that department.
EMPLOYEE
E_NO E_NAME CITY EXPERIENCE
E-1 Ram Delhi 04
E-2 Varun Chandigarh 09
E-3 Ravi Noida 03
E-4 Amit Bangalore 07

DEPARTMENT

D_NO D_NAME E_NO MIN_EXPERIENCE

D-1 HR E-1 03

D-2 IT E-2 05

D-3 Marketing E-3 02


Also, let's have the Cartesian Product of the above two relations. It will be much easier to understand Join
Operations when we have the Cartesian Product.

E_NO E_NAME CITY EXPERIENCE D_NO D_NAME E_NO MIN_EXPERIENCE


E-1 Ram Delhi 04 D-1 HR E-1 03
E-1 Ram Delhi 04 D-2 IT E-2 05
E-1 Ram Delhi 04 D-3 Marketing E-3 02
E-2 Varun Chandigarh 09 D-1 HR E-1 03
E-2 Varun Chandigarh 09 D-2 IT E-2 05
E-2 Varun Chandigarh 09 D-3 Marketing E-3 02
E-3 Ravi Noida 03 D-1 HR E-1 03
E-3 Ravi Noida 03 D-2 IT E-2 05
E-3 Ravi Noida 03 D-3 Marketing E-3 02
E-4 Amit Bangalore 07 D-1 HR E-1 03
E-4 Amit Bangalore 07 D-2 IT E-2 05
E-4 Amit Bangalore 07 D-3 Marketing E-3 02

Inner Join:
When we perform Inner Join, only those tuples returned that satisfy the certain condition. It is also
classified into three types: Theta Join, Equi Join and Natural Join.

Theta Join (θ):


Theta Join combines two relations using a condition. This condition is represented by the
symbol "theta"(θ). Here conditions can be inequality conditions such as >,<,>=,<=, etc.
Notation : R ⋈θ S
Where R is the first relation
S is the second relation
Let's have a simple example to understand this.
Suppose we want a relation where EXPERIENCE from EMPLOYEE >= MIN_EXPERIENCE from
DEPARTMENT.
EMPLOYEE⋈θ EMPLOYEE.EXPERIENCE>=DEPARTMENT.MIN_EXPERIENCE
DEPARTMENT
E_NO E_NAME CITY EXPERIENCE D_NO D_NAME E_NO MIN_EXPERIENCE
E-1 Ram Delhi 04 D-1 HR E-1 03
E-1 Ram Delhi 04 D-3 Marketing E-3 02
E-2 Varun Chandigarh 09 D-1 HR E-1 03
E-2 Varun Chandigarh 09 D-2 IT E-2 05
E-2 Varun Chandigarh 09 D-3 Marketing E-3 02
E-3 Ravi Noida 03 D-1 HR E-1 03
E-3 Ravi Noida 03 D-3 Marketing E-3 02
E-4 Amit Bangalore 07 D-1 HR E-1 03
E-4 Amit Bangalore 07 D-2 IT E-2 05
E-4 Amit Bangalore 07 D-3 Marketing E-3 02
Check the Cartesian Product, if in any tuple/row EXPERIENCE >= MIN_EXPERIENCE then
insert this tuple/row in output relation.
Equi Join:
Equi Join is a special case of theta join where the condition can only
contain **equality(=)** comparisons.
A non-equijoin is the inverse of an equi join, which occurs when you join on a condition other than
"=".
Let's have an example where we would like to join EMPLOYEE and DEPARTMENT relation
where E_NO from EMPLOYEE = E_NO from DEPARTMENT.
EMPLOYEE ⋈EMPLOYEE.E_NO = DEPARTMENT.E_NO DEPARTMENT
E_N E_NAM EXPERIENC E_N MIN_EXPERIENC
CITY D_NO D_NAME
O E E O E
E-1 Ram Delhi 04 D-1 HR E-1 03
E-2 Varun Chandigarh 09 D-2 IT E-2 05
E-3 Ravi Noida 03 D-3 Marketing E-3 02
Check Cartesian Product, if the tuple contains same E_NO, insert that tuple in the output relation

Natural Join (⋈):


A comparison operator is not used in a natural join. It does not concatenate like a Cartesian product.
A Natural Join can be performed only if two relations share at least one common attribute.
Furthermore, the attributes must share the same name and domain.
Natural join operates on matching attributes where the values of the attributes in both relations are
the same and remove the duplicate ones.
Preferably Natural Join is performed on the foreign key.
Notation : R ⋈ S
Where R is the first relation
S is the second relation
Let's say we want to join EMPLOYEE and DEPARTMENT relation with E_NO as a common
attribute.
Notice, here E_NO has the same name in both the relations and also consists of the same domain, i.e., in
both relations E_NO is a string.
EMPLOYEE ⋈ DEPARTMENT
E_NAM EXPERIENC MIN_EXPERIENC
E_NO CITY D_NO D_NAME
E E E
E-1 Ram Delhi 04 D-1 HR 03
E-2 Varun Chandigarh 09 D-2 IT 05
E-3 Ravi Noida 03 D-3 Marketing 02
But unlike the above operation, where we have two columns of E_NO, here we are having only one
column of E_NO. This is because Natural Join automatically keeps a single copy of a common
attribute.
Outer Join:
Unlike Inner Join which includes the tuple that satisfies the given condition, Outer Join also
includes some/all the tuples which don't satisfy the given condition.
It is also of three types:
 Left Outer Join
 Right Outer Join
 Full Outer Join.
Left Outer Join:
As we can see from the diagram, Left Outer Join returns the matching tuples(tuples present in both
relations) and the tuples which are only present in Left Relation, here R.
However, if the matching tuples are NULL, then attributes/columns of Right Relation, here S are
made NULL in the output relation.
Let's understand this a bit more using an example:
EMPLOYEE ⟕EMPLOYEE.E_NO = DEPARTMENT.E_NO DEPARTMENT
Here we are combining EMPLOYEE and DEPARTMENT relation with the constraint that
EMPLOYEE's E_NO must be equal to DEPARTMENT's E_NO.
E_N E_NA EXPERIEN D_N D_NAM MIN_EXPERIE
CITY
O ME CE O E NCE
E-1 Ram Delhi 04 D-1 HR 03
Chandigar
E-2 Varun 09 D-2 IT 05
h
Marketin
E-3 Ravi Noida 03 D-3 02
g
E-4 Amit Bangalore 07 - - -

As you can see here, all the tuples from left, i.e., EMPLOYEE relation are present. But E-4 is not
satisfying the given condition, i.e., E_NO from EMPLOYEE must be equal to E_NO from
DEPARTMENT, still it is included in the output relation. This is because Outer Join also includes
some/all the tuples which don't satisfy the condition. That's why Outer Join marked E-4's
corresponding tuple/row from DEPARTMENT as NULL.
Right Outer Join:
Right Outer Join returns the matching tuples and the tuples which are only present in Right
Relation here S.
The same happens with the Right Outer Join, if the matching tuples are NULL, then the attributes of
Left Relation, here R are made NULL in the output relation.
We will combine EMPLOYEE and DEPARTMENT relations with the same constraint as above.
EMPLOYEE ⟖EMPLOYEE.E_NO = DEPARTMENT.E_NO DEPARTMENT
E_NO E_NAME CITY EXPERIENCE D_NO D_NAME MIN_EXPERIENCE
E-1 Ram Delhi 04 D-1 HR 03
E-2 Varun Chandigarh 09 D-2 IT 05
E-3 Ravi Noida 03 D-3 Marketing 02

As all the tuples from DEPARTMENT relation have a corresponding E_NO in EMPLOYEE
relation, therefore no tuple from EMPLOYEE relation contains a NULL.
Full Outer Join:
Full Outer Join returns all the tuples from both relations. However, if there are no matching
tuples then, their respective attributes are made NULL in output relation.
Again, combine the EMPLOYEE and DEPARTMENT relation with the same constraint.
EMPLOYEE ⟗EMPLOYEE.E_NO = DEPARTMENT.E_NO DEPARTMENT
E_NO E_NAME CITY EXPERIENCE D_NO D_NAME MIN_EXPERIENCE
E-1 Ram Delhi 04 D-1 HR 03
E-2 Varun Chandigarh 09 D-2 IT 05
E-3 Ravi Noida 03 D-3 Marketing 02
E-4 Amit Bangalore 07 - - -

Intersection (∩):
Intersection operation is done by Intersection Operator which is represented by "intersection"(∩).It
is the same as the intersection operator from set theory, i.e., it selects all the tuples which are present
in both relations. It is a binary operator as it requires two operands. Also, it eliminates duplicates.
Notation : R ∩ S
Where R is the first relation
S is the second relation
Let's have an example to clarify the concept:
Suppose we want the names which are present in STUDENT as well as in EMPLOYEE relation,
Relations we used in Basic Operations.
∏ NAME(STUDENT) ∩ ∏ NAME(EMPLOYEE)
NAME
Baljeet
Harsh

Division (÷):
Division Operation is represented by "division"(÷ or /) operator and is used in queries that involve
keywords "every", "all", etc.
Notation : R(X,Y)/S(Y)
Here,
R is the first relation from which data is retrieved.
S is the second relation that will help to retrieve the data.
X and Y are the attributes/columns present in relation. We can have multiple attributes in relation,
but keep in mind that attributes of S must be a proper subset of attributes of R.
For each corresponding value of Y, the above notation will return us the value of X from
tuple<X,Y> which exists everywhere.
It's a bit difficult to understand this in a theoretical way, but you will understand this with an
example:
Let's have two relations, ENROLLED and COURSE. ENROLLED consist of two attributes
STUDENT_ID and COURSE_ID. It denotes the map of students who are enrolled in given courses.
COURSE contains the list of courses available.
See, here attributes/columns of COURSE relation are a proper subset of attributes/columns of
ENROLLED relation. Hence Division operation can be used here.

ENROLLED
STUDENT_ID COURSE_ID
Student_1 DBMS
Student_2 DBMS
Student_1 OS
Student_3 OS
COURSE
COURSE_ID
DBMS
OS
Now the query is to return the STUDENT_ID of students who are enrolled in every course.
ENROLLED(STUDENT_ID, COURSE_ID)/COURSE(COURSE_ID)
This will return the following relation as output.
STUDENT_ID
Student_1
Operations are divided into two main categories: Basic and Derived.
 Basic consists of six Operations: SELECT, PROJECT, UNION, SET DIFFERENCE,
CARTESIAN PRODUCT, RENAME.
 Derived Consist of three Operations: JOINS, INTERSECTION, DIVISION.
Joins are of two types: Inner Join and Outer Join.
 Inner Join is further classified into three types:
i. Theta Join
ii. Equi Join
iii. Natural Join.
 Outer Join also consists of three types:
i. Left Outer Join
ii. Right Outer Join
iii. Full Outer Join.

Advantages:
 Expressive Power: Extended operators allow for more complex queries and transformations
that cannot be easily expressed using basic relational algebra operations.
 Data Reduction: Aggregation operators, such as SUM, AVG, COUNT, and MAX, can
reduce the amount of data that needs to be processed and displayed.
 Data Transformation: Extended operators can be used to transform data into different
formats, such as pivoting rows into columns or vice versa.
 More Efficient: Extended operators can be more efficient than expressing the same query in
terms of basic relational algebra operations, since they can take advantage of specialized
algorithms and optimizations.

Disadvantages:
 Complexity: Extended operators can be more difficult to understand and use than basic
relational algebra operations. They require a deeper understanding of the underlying data and
the operators themselves.
 Performance: Some extended operators, such as outer joins, can be expensive in terms of
performance, especially when dealing with large data sets.
 Non-standardized: There is no universal set of extended operators, and different relational
database management systems may implement them differently or not at all.
 Data Integrity: Some extended operators, such as aggregate functions, can introduce
potential problems with data integrity if not used properly. For example, using AVG on a
column that contains null values can result in unexpected or incorrect results.
Takeaway:
 Theta Join (θ) combines two relations based on a condition.
 Equi Join is a type of Theta Join where only equality condition (=) is used.
 Natural Join (⋈) combines two relations based on a common attribute (preferably foreign
key).
 Left Outer Join (⟕) returns the matching tuples and tuples which are only present in the left
relation.
 Right Outer Join (⟖) returns the matching tuples and tuples which are only present in the
right relation.
 Full Outer Join (⟗) returns all the tuples present in the left and right relations.
Conclusion:
Relational Algebra in DBMS is a theoretical model which is the fundamental block for SQL. It
comprises different mathematics operations.

You might also like