Database Note
Database Note
DBMS INSTANCES:
We are considering an Online Employee Management system to be our database. The database
schema is created before creating the database. So that in future, if we make any change to one
layer, it doesn’t affect the other layers. DBMS instance refers to the information in our database at
any instance or time.
Some steps in creating a schema in DBMS include:
Determining what kind of data is to be stored in the database.
Once the purpose of the database is determined, the next step is to define the tables.
Within each table, our next step is to define the columns of the tables.
Each column has a data type specifying the information that will be kept there. Dates,
numbers, and text are some of the data kinds.
There are relationships between the tables in the database that must be defined.
Database Schema Designs:
The process of developing a structure for storing and organizing data in a database is known as
database schema design.
Following are some of the database schema designs:
Hierarchical schema design arranges data in a binary tree-like structure.
Network schema design allows for more complex relationships, where each record can have
multiple parent and child records.
Relational schema design organizes data into tables with rows and columns.
Dimensional schema design uses a star schema to organize data into fact tables and
dimension tables.
Advantages:
There are several benefits to using a three-schema architecture in DBMS. Some of these
benefits include:
One of the main advantages of a DBMS's three schemas is its data independence. All three
layers are distinct from each other. So we can make changes to one layer without affecting
other layers.
Each schema can scale independently, which can enhance the performance of the database
and manage more traffic at the same time.
It is simpler to maintain and change each layer individually in a three-schema design due to
the separation of the layers.
Disadvantages:
Despite many benefits of three schema architecture, there are a few disadvantages of it:
This method can be difficult and expensive for big companies because it takes a lot of work
to set up and maintain.
It can also cause slow-downs and mistakes if the data is not converted correctly between the
different parts.
Sometimes, it can also be hard to make sure only the right people can access sensitive
information.
Database Languages in DBMS:
A DBMS has appropriate languages and interfaces to express database queries and updates.
Database languages can be used to read, store and update the data in the database.
Types of Database Languages:
DATABASE DESIGN:
Database Design is a collection of processes that facilitate the designing, development,
implementation and maintenance of enterprise data management systems. Properly designed
database are easy to maintain, improves data consistency and are cost effective in terms of disk
storage space. The database designer decides how the data elements correlate and what data must be
stored.The main objectives of database design in DBMS are to produce logical and physical designs
models of the proposed database system.The logical model concentrates on the data requirements
and the data to be stored independent of physical considerations. It does not concern itself with how
the data will be stored or where it will be stored physically.The physical data design model involves
translating the logical DB design of the database onto physical media using hardware resources and
software systems such as database management systems (DBMS).
Database designing:
Logical model – This stage is concerned with developing a database model based on
requirements. The entire design is on paper without any physical implementations or specific
DBMS considerations.
Physical model – This stage implements the logical model of the database taking into account
the DBMS and physical implementation factors.
Implementation:
Data conversion and loading – this stage of relational databases design is concerned with
importing and converting data from the old system into the new database.
Testing – this stage is concerned with the identification of errors in the newly implemented
system. It checks the database against requirement specifications.
Database engine:
A database engine (or storage engine) is the underlying software component that a database
management system (DBMS) uses to create, read, update and delete (CRUD) data from a database.
Most database management systems include their own application programming interface (API) that
allows the user to interact with their underlying engine without going through the user interface of
the DBMS.
The term "database engine" is frequently used interchangeably with "database server" or "database
management system". A "database instance" refers to the processes and memory structures of the
running database engine.
Three Parts that make up the Database System are:
Query Processor
Storage Manager
Disk Storage
1. Query Processor:
The query processing is handled by the query processor, as the name implies. It executes the
user's query, to put it simply. In this way, the query processor aids the database system in
making data access simple and easy. The query processor's primary duty is to successfully
execute the query. The Query Processor transforms (or interprets) the user's application
program-provided requests into instructions that a computer can understand.
Components of the Query Processor:
1. DDL Interpreter:
Data Definition Language is what DDL stands for. As implied by the name, the DDL
Interpreter interprets DDL statements like those used in schema definitions (such as
create, remove, etc.). This interpretation yields a set of tables that include the meta-data
(data of data) that is kept in the data dictionary. Metadata may be stored in a data
dictionary. In essence, it is a part of the disc storage that will be covered in a later section
of this article.
2. DML Compiler:
Compiler for DML Data Manipulation Language is what DML stands for. In keeping with
its name, the DML Compiler converts DML statements like select, update, and delete into
low-level instructions or simply machine-readable object code, to enable execution. The
optimization of queries is another function of the DML compiler. Since a single question
can typically be translated into a number of evaluation plans. As a result, some
optimization is needed to select the evaluation plan with the lowest cost out of all the
options. This process, known as query optimization, is exclusively carried out by the
DML compiler. Simply put, query optimization determines the most effective technique
to carry out a query.
3. Embedded DML Pre-compiler:
Before the query evaluation, the embedded DML commands in the application program
(such as SELECT, FROM, etc., in SQL) must be pre-compiled into standard procedural
calls (program instructions that the host language can understand). Therefore, the DML
statements which are embedded in an application program must be converted into routine
calls by the Embedded DML Pre-compiler.
4. Query Optimizer:
It starts by taking the evaluation plan for the question, runs it, and then returns the result.
Simply said, the query evaluation engine evaluates the SQL commands used to access the
database's contents before returning the result of the query. In a nutshell, it is in charge of
analyzing the queries and running the object code that the DML Compiler produces.
Apache Drill, Presto, and other Query Evaluation Engines are a few examples.
2. Storage Manager:
An application called Storage Manager acts as a conduit between the queries made and the data kept
in the database. Another name for it is Database Control System. By applying the restrictions and
running the DCL instructions, it keeps the database's consistency and integrity. It is in charge of
retrieving, storing, updating, and removing data from the database.
OPTIMIZER
META
DATA DATA
DATABASE USER:
Database users interact with data to update, read and modify the given information on a daily basis.
There are various types of database users and we will learn in detail about them.
Database users can be divided into the following types −
End Users
i. Naive users / Parametric users
ii. Sophisticated users
Application Programmer or Specialized users or Back-End Developer
System Analyst
Database Administrator (DBA)
Temporary Users or Casual Users
System Analyst:
A System Analyst has also known as a business technology analyst. These professionals are
responsible for the design, structure, and properties of databases. The application programmer uses
the specifications provided by the system analyst to construct the software that is used by end users.
The analyst will gather information from the shareholders as well as end users to understand their
requirements and translate it into functional specifications for the new system.
Examples of System Analysts :
They serve as team leaders.
They are responsible for managing projects.
They are the supervisor who manages the lower-level information Staff.
Disadvantages:
Some disadvantages of Centralized Database Management System are −
Since all the data is at one location, it takes more time to search and access it. If the network
is slow, this process takes even more time.
There is a lot of data access traffic for the centralized database. This may create a bottleneck
situation.
Since all the data is at the same location, if multiple users try to access it simultaneously it
creates a problem. This may reduce the efficiency of the system.
If there are no database recovery measures in place and a system failure occurs, then all the
data in the database will be destroyed.
2.Client-server Architecture of DBMS :
We first talk about client/server architecture in general, and then we look at how DBMSs use it. In
order to handle computing settings with a high number of PCs, workstations, file servers, printers,
database servers, etc., the client/server architecture was designed.
A network connects various pieces of software and hardware, including email and web server
software. To define specialized servers with a particular functionality is the aim. For instance, it is
feasible to link a number of PCs or compact workstations to a file server that manages the client
machines' files as clients. By having connections to numerous printers, different devices can be
designated as a printer server; all print requests from clients are then directed to this machine. The
category of specialized servers also includes web servers and email servers. Many client machines
can utilize the resources offered by specialized servers. The user is given the proper user interfaces
for these servers as well as local processing power to run local applications on the client devices.
This idea can be applied to various types of software, where specialist applications, like a CAD
(computer-aided design) package, are kept on particular server computers and made available to a
variety of clients. Some devices (such as workstations or PCs with discs that only have client
software installed) would only be client sites.
The idea of client/server architecture presupposes an underpinning structure made up of several PCs
and workstations as well as fewer mainframe computers connected via LANs as well as other types
of computer networks. In this system, a client is often a user machine that offers local processing
and user interface capabilities. When a client needs access to extra features-like database access-that
are not available on that system, it connects to a server that offers those features. A server is a
computer system that includes both hardware and software that can offer client computer services
like file access, printing, archiving, or database access. Generally speaking, some workstations
install both client and server software, while others just install client software. Client and server
software, however, typically run on separate workstations, which is more typical. On this underlying
client/server framework, Two-tier and Three-tier fundamental DBMS architectures were
developed.
Two-Tier Client Server Architecture:
Here, the term "two-tier" refers to our architecture's two layers-the Client layer and the Data layer.
There are a number of client computers in the client layer that can contact the database server. The
API on the client computer will use JDBC or some other method to link the computer to the
database server. This is due to the possibility of various physical locations for clients and database
servers.
Three-Tier Client-Server Architecture:
The Business Logic Layer is an additional layer that serves as a link between the Client layer and
the Data layer in this instance. The layer where the application programs are processed is the
business logic layer, unlike a Two-tier architecture, where queries are performed in the database
server. Here, the application programs are processed in the application server itself.
Parallel database:
Nowadays organizations need to handle a huge amount of data with a high transfer rate. For such
requirements, the client-server or centralized system is not efficient. With the need to improve the
efficiency of the system, the concept of the parallel database comes in picture. A parallel database
system seeks to improve the performance of the system through parallelizing concept.
Need:
Multiple resources like CPUs and Disks are used in parallel. The operations are performed
simultaneously, as opposed to serial processing. A parallel server can allow access to a single
database by users on multiple machines. It also performs many parallelization operations like data
loading, query processing, building indexes, and evaluating queries.
Advantages:
Here, we will discuss the advantages of parallel databases. Let’s have a look.
1. Performance Improvement:
By connecting multiple resources like CPU and disks in parallel we can significantly
increase the performance of the system.
2. High_availability:
In the parallel database, nodes have less contact with each other, so the failure of one
node doesn’t cause for failure of the entire system. This amounts to significantly higher
database availability.
3. Proper_resource_utilization:
Due to parallel execution, the CPU will never be idle. Thus, proper utilization of
resources is there.
4. Increase_Reliability:
When one site fails, the execution can continue with another available site which is
having a copy of data. Making the system more reliable.
Performance_Measurement_of_Databases:
Here, we will emphasize the performance measurement factor-like Speedup and Scale-up. Let’s
understand it one by one with the help of examples.
Speedup: –
The ability to execute the tasks in less time by increasing the number of resources is called Speedup.
3-Tier Architecture:
The 3-Tier architecture contains another layer between the client and server. In this architecture,
client can't directly communicate with the server.The application on the client-end interacts with an
application server which further communicates with the database system.End user has no idea about
the existence of the database beyond the application server. The database also has no idea about any
other user beyond the application.The 3-Tier architecture is used in case of large web application.
Characteristics and Benefits of a Database:
There are a number of characteristics that distinguish the database approach from the file-based
system or approach. This chapter describes the benefits (and features) of the database system.
Self-describing nature of a database system:
A database system is referred to as self-describing because it not only contains the database
itself, but also metadata which defines and describes the data and relationships between
tables in the database. This information is used by the DBMS software or database users if
needed. This separation of data and information about the data makes a database system
totally different from the traditional file-based system in which the data definition is part of
the application programs.
Insulation between program and data:
In the file-based system, the structure of the data files is defined in the application programs
so if a user wants to change the structure of a file, all the programs that access that file might
need to be changed as well.On the other hand, in the database approach, the data structure is
stored in the system catalogue and not in the programs. Therefore, one change is all that is
needed to change the structure of a file. This insulation between the programs and data is also
called program-data independence.
Support for multiple views of data:
A database supports multiple views of data. A view is a subset of the database, which is
defined and dedicated for particular users of the system. Multiple users in the system might
have different views of the system. Each view might contain only the data of interest to a
user or group of users.
Sharing of data and multiuser system:
Current database systems are designed for multiple users. That is, they allow many users to
access the same database at the same time. This access is achieved through features
called concurrency control strategies. These strategies ensure that the data accessed are
always correct and that data integrity is maintained.
The design of modern multiuser database systems is a great improvement from those in the past
which restricted usage to one person at a time.
DATABASE DESIGNER:
A database designer is in charge of designing, developing, executing and preserving a company's
data management systems. One of the most important responsibilities of a database designer is to
form relationships between various elements of data and give it a logical structure.
KEY RESPONSIBILITIES OF A DATABASE DESIGNER:
Understand the organisation's data to skillfully carry out the company's database design projects.
Install and configure relational database management system on the company's server.
Design database schemas and create databases for varied projects of the company.
Handle the creation of new users, define roles and privileges and grant access to them.
Assist application development teams to easily connect to the databases.
Track the performance of the databases and fix issues quickly to facilitate smooth functioning.
Use the best techniques for enhanced scalability and efficiency of large databases.
Understand complex problems, devise solutions and transform them into software requirements.
Conduct data research and query large and complex datasets to provide the best data modelling.
Data Independence:
Data independence can be explained using the three-schema architecture.
Data independence refers characteristic of being able to modify the schema at one level of the database system without
altering the schema at the next higher level.
There are two types of data independence:
1. Logical Data Independence:
Logical data independence refers characteristic of being able to change the conceptual schema
without having to change the external schema.
Logical data independence is used to separate the external level from the conceptual view.
If we do any changes in the conceptual view of the data, then the user view of the data would not
be affected.
Logical data independence occurs at the user interface level.
2. Physical Data Independence:
Physical data independence can be defined as the capacity to change the internal schema without
having to change the conceptual schema.
If we do any changes in the storage size of the database system server, then the Conceptual
structure of the database will not be affected.
Physical data independence is used to separate conceptual levels from the internal levels.
Physical data independence occurs at the logical interface level.
Interfaces in DBMS:
A database management system (DBMS) interface is a user interface that allows for the ability to
input queries to a database without using the query language itself. User-friendly interfaces provided
by DBMS may include the following:
Menu-Based Interfaces
Forms-Based Interfaces
Graphical User Interfaces
Natural Language Interfaces
Speech Input and Output Interfaces
Interfaces for Parametric Users
Interfaces for the Database Administrator (DBA)
1.Menu-Based Interfaces:
These interfaces present the user with lists of options (called menus) that lead the user through the
formation of a request. The basic advantage of using menus is that they remove the tension of
remembering specific commands and syntax of any query language. The query is basically
composed step by step by collecting or picking options from a menu that is shown by the system.
Pull-down menus are a very popular technique in Web-based interfaces. They are also often used
in browsing interfaces which allow a user to look through the contents of a database in an
exploratory and unstructured manner.
2.Forms-Based Interfaces:
A forms-based interface displays a form to each user. Users can fill out all of the form entries to
insert new data, or they can fill out only certain entries, in which case the DBMS will redeem the
same type of data for other remaining entries. These types of forms are usually designed or created
and programmed for users that have no expertise in operating systems. Many DBMS’s have form
specification languages which are special languages that help specify such forms.
Example: SQL Forms is a form-based language that specifies queries using a form designed in
conjunction with the relational database schema.
3.Graphical User Interface:
A GUI typically displays a schema to the user in diagrammatic form. The user then can specify a
query by manipulating the diagram. In many cases, GUI utilise both menus and forms. Most GUI
use a pointing device such as a mouse, to pick a certain part of the displayed schema diagram.
4.Natural Language Interfaces:
These interfaces accept requests written in English or some other language and attempt to
understand them. A Natural language interface has its own schema, which is similar to the database
conceptual schema as well as a dictionary of important words.
The natural language interface refers to the words in its schema as well as to the set of standard
words in a dictionary to interpret the request. If the interpretation is successful, the interface
generates a high-level query corresponding to the natural language and submits it to the DBMS for
processing, otherwise, a dialogue is started with the user to clarify any provided condition or
request. The main disadvantage of this is that the capabilities of this type of interface are not that
advance.
5.Speech Input and Output Interfaces:
There is limited use of speech be it for a query or an answer to a question or being a result of a
request it is becoming commonplace. Applications with limited vocabulary such as inquiries for
telephone directory, flight arrival/departure, and bank account information are allowed speech for
input and output to enable ordinary folks to access this information.
The Speech input is detected using predefined words and used to set up the parameters that are
supplied to the queries. For output, a similar conversion from text or numbers into speech takes
place.
6.Interface for Parametric Users:
Interfaces for Parametric Users contain some commands that can be handled with a minimum of
keystrokes. It is generally used in bank transactions for transferring money. These operations are
performed repeatedly.
7.Interfaces for Database Administrators (DBA):
Most database system contains privileged commands that can be used only by the DBA’s staff.
These include commands for creating accounts, setting system parameters, granting account
authorization, changing a schema, and reorganizing the storage structures of databases.
EF Codd’s Rules in DBMS:
Codd’s rules are proposed by a computer scientist named Dr. Edgar F. Codd and he also invent the
relational model for database management. These rules are made to ensure data integrity,
consistency, and usability. This set of rules basically signifies the characteristics and requirements of
a relational database management system (RDBMS). In this article, we will learn about various
Codd’s rules.
Codd’s Rules in DBMS:
Rule 1: The Information Rule:
All information, whether it is user information or metadata, that is stored in a database must be
entered as a value in a cell of a table. It is said that everything within the database is organized in a
table layout.
Rule 2: The Guaranteed Access Rule:
Each data element is guaranteed to be accessible logically with a combination of the table name,
primary key (row value), and attribute name (column value).
Rule 3: Systematic Treatment of NULL Values:
Every Null value in a database must be given a systematic and uniform treatment.
Rule 4: Active Online Catalog Rule:
The database catalog, which contains metadata about the database, must be stored and accessed
using the same relational database management system.
Rule 5: The Comprehensive Data Sublanguage Rule:
A crucial component of any efficient database system is its ability to offer an easily understandable
data manipulation language (DML) that facilitates defining, querying, and modifying information
within the database.
Rule 6: The View Updating Rule:
All views that are theoretically updatable must also be updatable by the system.
Rule 7: High-level Insert, Update, and Delete:
A successful database system must possess the feature of facilitating high-level insertions, updates,
and deletions that can grant users the ability to conduct these operations with ease through a single
query.
Rule 8: Physical Data Independence:
Application programs and activities should remain unaffected when changes are made to the
physical storage structures or methods.
Rule 9: Logical Data Independence :
Application programs and activities should remain unaffected when changes are made to the logical
structure of the data, such as adding or modifying tables.
Rule 10: Integrity Independence:
Integrity constraints should be specified separately from application programs and stored in the
catalog. They should be automatically enforced by the database system.
Rule 11: Distribution Independence:
The distribution of data across multiple locations should be invisible to users, and the database
system should handle the distribution transparently.
Casual users and persons with occasional need for information from the database interact using
some form of interface, which we call the interactive query interface in Figure 2.3. We have not
explicitly shown any menu-based or form-based interaction that may be used to generate the
interactive query automatically. These queries are parsed and validated for correctness of the query
syntax, the names of files and data elements, and so on by a query compiler that compiles them into
an internal form. This internal query is subjected to query optimization (discussed in Chapters 19
and 20). Among other things, the query optimizer is concerned with the rearrangement and possible
reordering of operations, elimination of redundancies, and use of correct algorithms and indexes
during execution. It consults the system catalog for statistical and other physical information about
the stored data and generates executable code that performs the necessary operations for the query
and makes calls on the runtime processor.
Application programmers write programs in host languages such as Java, C, or C++ that are
submitted to a pre compiler. The pre compiler extracts DML commands from an application
program written in a host programming language. These commands are sent to the DML compiler
for compilation into object code for database access. The rest of the program is sent to the host
language compiler. The object codes for the DML commands and the rest of the program are linked,
forming a canned transaction whose executable code includes calls to the runtime database
processor. Canned transactions are executed repeatedly by parametric users, who simply supply the
parameters to the transactions. Each execution is considered to be a separate transaction. An
example is a bank withdrawal transaction where the account number and the amount may be
supplied as parameters.
In the lower part of Figure 2.3, the runtime database processor executes (1) the privileged
commands, (2) the executable query plans, and (3) the canned transactions with runtime parameters.
It works with the system catalog and may update it with statistics. It also works with the stored data
manager, which in turn uses basic operating system services for carrying out low-level input/output
(read/write) operations between the disk and main memory. The runtime database processor handles
other aspects of data transfer, such as management of buffers in the main memory. Some DBMSs
have their own buffer management module while others depend on the OS for buffer management.
We have shown concurrency control and backup and recovery systems separately as a module in
this figure. They are integrated into the working of the runtime database processor for purposes of
transaction management.
It is now common to have the client program that accesses the DBMS running on a separate
computer from the computer on which the database resides. The former is called the client
computer running a DBMS client software and the latter is called the database server. In some cases,
the client accesses a middle computer, called the application server, which in turn accesses the
database server. We elaborate on this topic in Section 2.5.
Figure 2.3 is not meant to describe a specific DBMS; rather, it illustrates typical DBMS
modules. The DBMS interacts with the operating system when disk accesses—to the database or to
the catalog—are needed. If the computer system is shared by many users, the OS will schedule
DBMS disk access requests and DBMS processing along with other processes. On the other hand, if
the computer system is mainly dedicated to running the database server, the DBMS will control
main memory buffering of disk pages. The DBMS also interfaces with compilers for general-
purpose host programming languages, and with application servers and client programs running on
separate machines through the system network interface.
Module 2
ER Diagram:
ER Diagram stands for Entity Relationship Diagram, also known as ERD is a diagram that
displays the relationship of entity sets stored in a database. In other words, ER diagrams help to
explain the logical structure of databases.
ER diagrams are created based on three basic concepts: entities, attributes and relationships.
ER Diagrams contain different symbols that use rectangles to represent entities, ovals to define
attributes and diamond shapes to represent relationships.
At first look, an ER diagram looks very similar to the flowchart. However, ER Diagram includes
many specialized symbols, and its meanings make this model unique. The purpose of ER
Diagram is to represent the entity framework infrastructure.
ER Model:
ER Model stands for Entity Relationship Model is a high-level conceptual data model diagram.
ER model helps to systematically analyze data requirements to produce a well-designed
database.
The ER Model represents real-world entities and the relationships between them. Creating an ER
Model in DBMS is considered as a best practice before implementing your database.
ER Modeling helps you to analyze data requirements systematically to produce a well-designed
database. So, it is considered a best practice to complete ER modeling before implementing your
database.
History of ER models:
ER diagrams are visual tools that are helpful to represent the ER model. Peter Chen proposed ER
Diagram in 1971 to create a uniform convention that can be used for relational databases and
networks. He aimed to use an ER model as a conceptual modeling approach.
Why use ER Diagrams?
Here, are prime reasons for using the ER Diagram
Helps you to define terms related to entity relationship modeling
Provide a preview of how all your tables should connect, what fields are going to be on each
table
Helps to describe entities, attributes, relationships
ER diagrams are translatable into relational tables which allows you to build databases quickly
ER diagrams can be used by database designers as a blueprint for implementing data in specific
software applications
The database designer gains a better understanding of the information to be contained in the
database with the help of ERP diagram
ERD Diagram allows you to communicate with the logical structure of the database to users
Facts about ER Diagram Model:
Now in this ERD Diagram Tutorial, let’s check out some interesting facts about ER Diagram
Model:
ER model allows you to draw Database Design
It is an easy to use graphical tool for modeling data
Widely used in Database Design
It is a GUI representation of the logical structure of a Database
It helps you to identifies the entities which exist in a system and the relationships between those
entities
Symbols Used in ER Model:
ER Model is used to model the logical view of the system from a data perspective which consists of
these symbols:
Rectangles: Rectangles represent Entities in the ER Model.
Ellipses: Ellipses represent Attributes in the ER Model.
Diamond: Diamonds represent Relationships among Entities.
Lines: Lines represent attributes to entities and entity sets with other relationship types.
Double Ellipse: Double Ellipses represent Multi-Valued Attributes.
Double Rectangle: Double Rectangle represents a Weak Entity.
Components of ER Diagram:
ER Model consists of Entities, Attributes, and Relationships among Entities in a Database System.
ER Diagram Examples:
For example, in a University database, we might have entities for Students, Courses, and Lecturers.
Students entity can have attributes like Rollno, Name, and DeptID. They might have relationships
with Courses and Lecturers.
WHAT IS ENTITY?
A real-world thing either living or non-living that is easily recognizable and nonrecognizable. It is
anything in the enterprise that is to be represented in our database. It may be a physical thing or
simply a fact about the enterprise or an event that happens in the real world.
An entity can be place, person, object, event or a concept, which stores data in the database. The
characteristics of entities are must have an attribute, and a unique key. Every entity is made up of
some ‘attributes’ which represent that entity.
Examples of entities:
Person: Employee, Student, Patient
Place: Store, Building
Object: Machine, product, and Car
Event: Sale, Registration, Renewal
Concept: Account, Course
Notation of an Entity:
Entity set(Student):
An entity set is a group of similar kind of entities. It may contain entities with attribute sharing
similar values. Entities are represented by their properties, which also called attributes. All attributes
have their separate values. For example, a student entity may have a name, age, class, as attributes.
Example of Entities:
A university may have some departments. All these departments employ various lecturers and offer
several programs.
Some courses make up each program. Students register in a particular program and enroll in various
courses. A lecturer from the specific department takes each course, and each lecturer teaches a
various group of students.
Relationship:
Relationship is nothing but an association among two or more entities. E.g., Tom works in the
Chemistry department.
Entities take part in relationships. We can often identify relationships with verbs or verb phrases.
For example:
You are attending this lecture
I am giving the lecture
Just loke entities, we can classify relationships according to relationship-types:
A student attends a lecture
A lecturer is giving a lecture.
Weak Entities :
A weak entity is a type of entity which doesn’t have its key attribute. It can be identified uniquely by
considering the primary key of another entity. For that, weak entity sets need to have participation.
1. Key Attribute:
The attribute which uniquely identifies each entity in the entity set is called the key attribute. For
example, Roll_No will be unique for each student. In ER diagram, the key attribute is represented
by an oval with underlying lines.
2. Composite Attribute:
An attribute composed of many other attributes is called a composite attribute. For example, the
Address attribute of the student Entity type consists of Street, City, State, and Country. In ER
diagram, the composite attribute is represented by an oval comprising of ovals.
3. Multivalued Attribute:
An attribute consisting of more than one value for a given entity. For example, Phone_No (can be
more than one for a given student). In ER diagram, a multivalued attribute is represented by a
double oval.
4. Derived Attribute:
An attribute that can be derived from other attributes of the entity type is known as a derived
attribute. e.g.; Age (can be derived from DOB). In ER diagram, the derived attribute is represented
by a dashed oval.
The Complete Entity Type Student with its Attributes can be represented as:
A set of relationships of the same type is known as a relationship set. The following relationship set
depicts S1 as enrolled in C2, S2 as enrolled in C1, and S3 as registered in C3.
3. n-ary Relationship: When there are n entities set participating in a relation, the relationship is
called an n-ary relationship.
Cardinality:
The number of times an entity of an entity set participates in a relationship set is known
as cardinality. Cardinality can be of different types:
1. One-to-One:
When each entity in each entity set can take part only once in the relationship, the cardinality is one-
to-one. Let us assume that a male can marry one female and a female can marry one male. So the
relationship will be one-to-one.
the total number of tables that can be used in this is 2.
Using Sets, it can be represented as:
2.One-to-Many:
In one-to-many mapping as well where each entity can be related to more than one relationship and
the total number of tables that can be used in this is 2. Let us assume that one surgeon deparment
can accomodate many doctors. So the Cardinality will be 1 to M. It means one deparment has many
Doctors.
Using sets, one-to-many cardinality can be represented as:
3.Many-to-One: When entities in one entity set can take part only once in the relationship set and
entities in other entity sets can take part more than once in the relationship set, cardinality is many
to one. Let us assume that a student can take only one course but one course can be taken by many
students. So the cardinality will be n to 1. It means that for one course there can be n students but
for one student, there will be only one course.
Using Sets, it can be represented as:
In this case, each student is taking only 1 course but 1 course has been taken by many students.
4. Many-to-Many: When entities in all entity sets can take part more than once in the relationship
cardinality is many to many. Let us assume that a student can take more than one course and one
course can be taken by many students. So the relationship will be many to many.
the total number of tables that can be used in this is 3.
A Sample Database Application:
In this section we describe a sample database application, called COMPANY, which serves to
illustrate the basic ER model concepts and their use in schema design. We list the data
requirements for the database here, and then create its conceptual schema step-by-step as we
introduce the modeling concepts of the ER model. The COMPANY database keeps track of a
company’s employees, departments, and projects. Suppose that after the requirements
collection and analysis phase, the database designers provide the following description of
the miniworld—the part of the company that will be represented in the database.
The company is organized into departments. Each department has a unique name, a unique
number, and a particular employee who manages the department. We keep track of the start
date when that employee began man-aging the department. A department may have several
locations.
A department controls a number of projects, each of which has a unique name, a unique
number, and a single location.
We store each employee’s name, Social Security number, address, salary, sex (gender), and
birth date. An employee is assigned to one department, but may work on several projects,
which are not necessarily controlled by the same department. We keep track of the current
number of hours per week that an employee works on each project. We also keep track of the
direct supervisor of each employee (who is another employee).
We want to keep track of the dependents of each employee for insurance purposes. We keep
each dependent’s first name, sex, birth date, and relation-ship to the employee.
Figure 7.2 shows how the schema for this database application can be displayed by means of
the graphical notation known as ER diagrams. This figure will be explained gradually as the
ER model concepts are presented. We describe the step-by-step process of deriving this
schema from the stated requirements—and explain the ER diagrammatic notation—as we
introduce the ER model concepts.
ER Design Issues:
In the previous sections of the data modeling, we learned to design an ER diagram. We also
discussed different ways of defining entity sets and relationships among them. We also understood
the various designing shapes that represent a relationship, an entity, and its attributes. However,
users often mislead the concept of the elements and the design process of the ER diagram. Thus, it
leads to a complex structure of the ER diagram and certain issues that does not meet the
characteristics of the real-world enterprise model.
Here, we will discuss the basic design issues of an ER database schema in the following points:
1) Use of Entity Set vs Attributes:
The use of an entity set or attribute depends on the structure of the real-world enterprise
that is being modelled and the semantics associated with its attributes. It leads to a
mistake when the user use the primary key of an entity set as an attribute of another entity
set. Instead, he should use the relationship to do so. Also, the primary key attributes are
implicit in the relationship set, but we designate it in the relationship sets.
2) Use of Entity Set vs. Relationship Sets:
It is difficult to examine if an object can be best expressed by an entity set or relationship
set. To understand and determine the right use, the user need to designate a relationship
set for describing an action that occurs in-between the entities. If there is a requirement of
representing the object as a relationship set, then its better not to mix it with the entity set.
3) Use of Binary vs n-ary Relationship Sets:
Generally, the relationships described in the databases are binary relationships. However,
non-binary relationships can be represented by several binary relationships. For example,
we can create and represent a ternary relationship 'parent' that may relate to a child, his
father, as well as his mother. Such relationship can also be represented by two binary
relationships i.e, mother and father, that may relate to their child. Thus, it is possible to
represent a non-binary relationship by a set of distinct binary relationships.
4) Placing Relationship Attributes:
The cardinality ratios can become an affective measure in the placement of the
relationship attributes. So, it is better to associate the attributes of one-to-one or one-to-
many relationship sets with any participating entity sets, instead of any relationship set.
The decision of placing the specified attribute as a relationship or entity attribute should
possess the charactestics of the real world enterprise that is being modelled.
Types of Keys in Relational Model :
Keys are one of the basic requirements of a relational database model. It is widely used to identify
the tuples(rows) uniquely in the table. We also use keys to set up relations amongst various columns
and tables of a relational database.
Different Types of Keys in the Relational Model
1. Candidate Key
2. Primary Key
3. Super Key
4. Alternate Key
5. Foreign Key
6. Composite Key
1.Candidate Key:
The minimal set of attributes that can uniquely identify a tuple is known as a candidate key. For Example,
STUD_NO in STUDENT relation.
It is a minimal super key.
It is a super key with no repeated data is called a candidate key.
The minimal set of attributes that can uniquely identify a record.
It must contain unique values.
It can contain NULL values.
Every table must have at least a single candidate key.
A table can have multiple candidate keys but only one primary key.
The value of the Candidate Key is unique and may be null for a tuple.
There can be more than one candidate key in a relationship.
Example:
STUD_NO is the candidate key for relation STUDENT.
Table STUDENT
STUD_NO SNAME ADDRESS PHONE
The candidate key can be simple (having only one attribute) or composite as well.
Example:
{STUD_NO, COURSE_NO} is a composite
candidate key for relation STUDENT_COURSE.
Table STUDENT_COURSE:
TEACHER_N
STUD_NO O COURSE_NO
1 001 C001
2 056 C005
Note: In SQL Server a unique constraint that has a nullable column, allows the value ‘null‘ in that
column only once. That’s why the STUD_PHONE attribute is a candidate here, but can not be a
‘null’ value in the primary key attribute.
2.Primary Key:
There can be more than one candidate key in relation out of which one can be chosen as the primary
key. For Example, STUD_NO, as well as STUD_PHONE, are candidate keys for relation
STUDENT but STUD_NO can be chosen as the primary key (only one out of many candidate
keys).
It is a unique key.
It can identify only one tuple (a record) at a time.
It has no duplicate values, it has unique values.
It cannot be NULL.
Primary keys are not necessarily to be a single column; more than one column can also be
a primary key for a table.
Example:
STUDENT table -> Student(STUD_NO, SNAME,
ADDRESS, PHONE) , STUD_NO is a primary key
Table STUDENT
STUD_NO SNAME ADDRESS PHONE
3.Super Key:
The set of attributes that can uniquely identify a tuple is known as Super Key. For Example,
STUD_NO, (STUD_NO, STUD_NAME), etc. A super key is a group of single or multiple keys that
identifies rows in a table. It supports NULL values.
Adding zero or more attributes to the candidate key generates the super key.
A candidate key is a super key but vice versa is not true.
Super Key values may also be NULL.
Example:
Consider the table shown above.
STUD_NO+PHONE is a super key.
4.Alternate Key:
The candidate key other than the primary key is called an alternate key.
All the keys which are not primary keys are called alternate keys.
It is a secondary key.
It contains two or more fields to identify two or more records.
These values are repeated.
Eg:- SNAME, and ADDRESS is Alternate keys
Example:
Consider the table shown above.
STUD_NO, as well as PHONE both,
are candidate keys for relation STUDENT but
PHONE will be an alternate key
(only one out of many candidate keys).
5.Foreign Key:
If an attribute can only take the values which are present as values of some other attribute, it will be
a foreign key to the attribute to which it refers. The relation which is being referenced is called
referenced relation and the corresponding attribute is called referenced attribute the relation which
refers to the referenced relation is called referencing relation and the corresponding attribute is
called referencing attribute. The referenced attribute of the referenced relation should be the primary
key to it.
It is a key it acts as a primary key in one table and it acts as
secondary key in another table.
It combines two or more relations (tables) at a time.
They act as a cross-reference between the tables.
For example, DNO is a primary key in the DEPT table and a non-key in EMP
Example:
Refer Table STUDENT shown above.
STUD_NO in STUDENT_COURSE is a
foreign key to STUD_NO in STUDENT relation.
Table STUDENT_COURSE:
TEACHER_N
STUD_NO O COURSE_NO
1 005 C001
TEACHER_N
STUD_NO O COURSE_NO
2 056 C005
It may be worth noting that, unlike the Primary Key of any given relation, Foreign Key can be
NULL as well as may contain duplicate tuples i.e. it need not follow uniqueness constraint. For
Example, STUD_NO in the STUDENT_COURSE relation is not unique. It has been repeated for
the first and third tuples. However, the STUD_NO in STUDENT relation is a primary key and it
needs to be always unique, and it cannot be null.
6.Composite Key:
Sometimes, a table might not have a single column/attribute that uniquely identifies all the records
of a table. To uniquely identify rows of a table, a combination of two or more columns/attributes
can be used. It still can give duplicate values in rare cases. So, we need to find the optimal set of
attributes that can uniquely identify rows in a table.
It acts as a primary key if there is no primary key in a table
Two or more attributes are used together to make a composite key.
Different combinations of attributes may give different accuracy in terms of identifying
the rows uniquely.
Example:
FULLNAME + DOB can be combined
together to access the details of a student.
Weak entities are represented with double rectangular box in the ER Diagram and the identifying
relationships are represented with double diamond. Partial Key attributes are represented with
dotted lines.
Example-1:
In the below ER Diagram, ‘Payment’ is the weak entity. ‘Loan Payment’ is the identifying
relationship and ‘Payment Number’ is the partial key. Primary Key of the Loan along with the
partial key would be used to identify the records.
Example-2:
The existence of rooms is entirely dependent on the existence of a hotel. So room can be seen as the
weak entity of the hotel.
Example-3:
The bank account of a particular bank has no existence if the bank doesn’t exist anymore.
Extended ER features:
Using the ER model for bigger data creates a lot of complexity while designing a database model,
So in order to minimize the complexity Generalization, Specialization, and Aggregation were
introduced in the ER model and these were used for data abstraction in which an abstraction
mechanism is used to hide details of a set of objects. Some of the terms were added to the Enhanced
ER Model, where some new concepts were added. These new concepts are:
Generalization
Specialization
Aggregation
Generalization:
Generalization is the process of extracting common properties from a set of entities and creating a
generalized entity from it. It is a bottom-up approach in which two or more entities can be
generalized to a higher-level entity if they have some attributes in common. For Example,
STUDENT and FACULTY can be generalized to a higher-level entity called PERSON as shown in
Figure 1. In this case, common attributes like P_NAME, and P_ADD become part of a
higher entity (PERSON), and specialized attributes like S_FEE become part of a specialized entity
(STUDENT).
Generalization is also called as ‘ Bottom-up approach”.
Specialization:
In specialization, an entity is divided into sub-entities based on its characteristics. It is a top-down
approach where the higher-level entity is specialized into two or more lower-level entities. For
Example, an EMPLOYEE entity in an Employee management system can be specialized into
DEVELOPER, TESTER, etc. as shown in Figure 2. In this case, common attributes like E_NAME,
E_SAL, etc. become part of a higher entity (EMPLOYEE), and specialized attributes like
TES_TYPE become part of a specialized entity (TESTER).
Specialization is also called as ” Top-Down approch”.
Inheritance:
It is an important feature of generalization and specialization
Attribute inheritance: allows lower level entities to inherit the attributes of higher level
entities and vice versa.
Participation inheritance: In participation inheritance, relationships involving higher
level entity set also inherited by lower level entity and vice versa.
Aggregation:
An ER diagram is not capable of representing the relationship between an entity and a relationship
which may be required in some scenarios. In those cases, a relationship with its corresponding
entities is aggregated into a higher-level entity. Aggregation is an abstraction through which we can
represent relationships as higher-level entity sets.
For Example, an Employee working on a project may require some machinery. So, REQUIRE
relationship is needed between the relationship WORKS_FOR and entity MACHINERY. Using
aggregation, WORKS_FOR relationship with its entities EMPLOYEE and PROJECT is aggregated
into a single entity and relationship REQUIRE is created between the aggregated entity and
MACHINERY.
There are numbers (represented by M and N) written above the lines which connect relationships
and entities. These are called cardinality ratios. These represent the maximum number of entities
that can be associated with each other through relationship, R.
Types of Cardinality :
There can be 4 types of cardinality –
1. One-to-one (1:1) – When one entity in each entity set takes part at most once in the
relationship, the cardinality is one-to-one.
2. One-to-many (1: N) – If entities in the first entity set take part in the relationship set at
most once and entities in the second entity set take part many times (at least twice), the
cardinality is said to be one-to-many.
3. Many-to-one (N:1) – If entities in the first entity set take part in the relationship set many
times (at least twice), while entities in the second entity set take part at most once, the
cardinality is said to be many-to-one.
4. Many-to-many (N: N) – The cardinality is said to be many to many if entities in both the
entity sets take part many times (at least twice) in the relationship set.
.
Structural Constraints : Structural Constraints are also called Structural properties of a database
management system (DBMS). Cardinality Ratios and Participation Constraints taken together are
called Structural Constraints. The name constraints refer to the fact that such limitations must be
imposed on the data, for the DBMS system to be consistent with the requirements
.
Relation Data Model
Relational data model is the primary data model, which is used widely around the world for data
storage and processing. This model is simple and it has all the properties and capabilities required to
process data with storage efficiency.
INFORMAL DEFINITIONS:
RELATION:
A table of values
a set of rows
a set of columns
Each row represents an entity or relationship
Each row has a value of an item or set of items
that uniquely identifies that row in the table – Sometimes row-ids or sequential numbers
are
assigned to identify the rows in the table – Each column typically is called by its column
name
or column header or attribute name
FORMAL DEFINITIONS:
Schema of a Relation: R (A1, A2, .....An)
Relation schema R is defined over attributes A1, A2, .....An
D is called the domain of Ai and is denoted by dom(Ai) • R is called the name of this
relation
Degree (or arity) of a relation is the number of attributes n of
its relation schema
Example -
STUDENT(Name, Ssn, Home_phone, Address,
Office_phone, Age, Gpa) • STUDENT(Name: string, Ssn: string, Home_phone: string,
Address: string, Office_phone: string, Age: integer, Gpa: real)
FORMAL DEFINITIONS:
• Tuple is an ordered set of values.
• A relation may be regarded as a set of tuples (rows) • Columns in a table are also called attributes
of the Relation
• CUSTOMER (Cust-id, Cust-name, Address, Phone#) • <632895, "John Smith", "101 Main St.
Atlanta, GA
30332", "(404) 894-2000">
FORMAL DEFINITIONS:
• Domain: A domain may have a data-type or a format
defined for it.
Example -
“USA_phone_numbers” are the set of 10 digit
phone numbers valid in the U.S
FORMAL DEFINITIONS
• Relation (or relation state) r of the relation schema
R(A1, A2, ... , An), also denoted by r(R), is a set of n-tuples r = {t1, t2, ... , tm} • Each tuple t is an
ordered list of n values t =<v1, v2, ... , vn>, where each value vi, 1 ≤ i ≤ n, is an element of dom
(Ai) or is a special NULL value
Definition
• R: schema of the relation
• r(R) - r of R: a specific "value" or population of R
• R is also called the intension of a relation
• r is also called the extension of a relation
Characteristics of relation:
1. Ordering of tuples in a relation: A relation is defined as a set of tuples. Mathematically,
elements ,of a set have no order among them; hence, tuples in a relation do not have any
particular order. In other words a relation is not a sensitive to the ordering of tuples.
2. Ordering of a tuples within a tuple: According to the preceding definition of a relation, an n-
tuple is an ordered list of n-values, so the ordering of values in a tuple and hence of attributes
in a relation in a relation schema is important .However ,at a more abstract level, the order of
attributes and their values is not the important as long as precedence between attributes and
values is maintained.
3. Values and NULLs in the tuples: Each value in a tuple is an atomic value; that is not divisible
into components within the framework of the basic relational model. Hence ,composite and
multivalued attributes are not allowed. This model is sometimes called the flat-relational
model. An important concept is that of NULL values, which are used to represent the values
of attributes that may be unknown or may not apply to a tuple.
4. Interpretation(Meaning)of relation: The relation schema can be interpreted as a declaration or
as a declaration or a type of assertion. For example ,the schema of the STUDENT relation
asserts that, in general ,a student entity has a names, home-phone, address, GPA. Each tuple
in the relation can then be interpreted as particular instance of relation.
Concepts:
Tables :In relational data model, relations are saved in the format of Tables. This format stores the
relation among entities. A table has rows and columns, where rows represents records and columns
represent the attributes.
Tuple :A single row of a table, which contains a single record for that relation is called a tuple.
Relation instance :A finite set of tuples in the relational database system represents relation
instance. Relation instances do not have duplicate tuples.
Relation schema :A relation schema describes the relation name (table name), attributes, and their
names.
Relation key :Each row has one or more attributes, known as relation key, which can identify the
row in the relation (table) uniquely.
Attribute domain :Every attribute has some pre-defined value scope, known as attribute domain.
Constraints:
Every relation has some conditions that must hold for it to be a valid relation. These conditions are
called Relational Integrity Constraints.
There are three main integrity constraints −
Key constraints
Entity constraints
Domain constraints
Referential integrity constraints
A.Key Constraints:
There must be at least one minimal subset of attributes in the relation, which can identify a tuple
uniquely. This minimal subset of attributes is called key for that relation. If there are more than one
such minimal subsets, these are called candidate keys.
Key constraints force that −
in a relation with a key attribute, no two tuples can have identical values for key attributes.
a key attribute can not have NULL values.
Key constraints are also referred to as Entity Constraints.
Types of Keys in Relational Model :
Keys are one of the basic requirements of a relational database model. It is widely used to identify
the tuples(rows) uniquely in the table. We also use keys to set up relations amongst various columns
and tables of a relational database.
Candidate Key
Primary Key
Super Key
Alternate Key
Foreign Key
Composite Key
1.Candidate Key:
The minimal set of attributes that can uniquely identify a tuple is known as a candidate key. For
Example, STUD_NO in STUDENT relation.
It is a minimal super key.
It is a super key with no repeated data is called a candidate key.
The minimal set of attributes that can uniquely identify a record.
It must contain unique values.
It can contain NULL values.
Every table must have at least a single candidate key.
A table can have multiple candidate keys but only one primary key.
The value of the Candidate Key is unique and may be null for a tuple.
There can be more than one candidate key in a relationship.
Example:
STUD_NO is the candidate key for relation STUDENT.
Table STUDENT
STUD_NO SNAME ADDRESS PHONE
The candidate key can be simple (having only one attribute) or composite as well.
Example:
{STUD_NO, COURSE_NO} is a composite
candidate key for relation STUDENT_COURSE.
Table STUDENT_COURSE
STUD_N TEACHER_N COURSE_N
O O O
1 001 C001
STUD_N TEACHER_N COURSE_N
O O O
2 056 C005
Note: In SQL Server a unique constraint that has a nullable column, allows the value ‘null‘ in that
column only once. That’s why the STUD_PHONE attribute is a candidate here, but can not be a
‘null’ value in the primary key attribute.
2.Primary Key:
There can be more than one candidate key in relation out of which one can be chosen as the primary
key. For Example, STUD_NO, as well as STUD_PHONE, are candidate keys for relation
STUDENT but STUD_NO can be chosen as the primary key (only one out of many candidate
keys).
It is a unique key.
It can identify only one tuple (a record) at a time.
It has no duplicate values, it has unique values.
It cannot be NULL.
Primary keys are not necessarily to be a single column; more than one column can also be
a primary key for a table.
Example:
STUDENT table -> Student(STUD_NO, SNAME,
ADDRESS, PHONE) , STUD_NO is a primary key
Table STUDENT
STUD_NO SNAME ADDRESS PHONE
3.Super Key:
The set of attributes that can uniquely identify a tuple is known as Super Key. For Example,
STUD_NO, (STUD_NO, STUD_NAME), etc. A super key is a group of single or multiple keys that
identifies rows in a table. It supports NULL values.
Adding zero or more attributes to the candidate key generates the super key.
A candidate key is a super key but vice versa is not true.
Super Key values may also be NULL.
Example:
Consider the table shown above.
STUD_NO+PHONE is a super key.
4.Alternate Key:
The candidate key other than the primary key is called an alternate key.
All the keys which are not primary keys are called alternate keys.
It is a secondary key.
It contains two or more fields to identify two or more records.
These values are repeated.
Eg:- SNAME, and ADDRESS is Alternate keys
Example:
Consider the table shown above.
STUD_NO, as well as PHONE both,
are candidate keys for relation STUDENT but
PHONE will be an alternate key
(only one out of many candidate keys).
5.Foreign Key:
If an attribute can only take the values which are present as values of some other attribute, it will be
a foreign key to the attribute to which it refers. The relation which is being referenced is called
referenced relation and the corresponding attribute is called referenced attribute the relation which
refers to the referenced relation is called referencing relation and the corresponding attribute is
called referencing attribute. The referenced attribute of the referenced relation should be the primary
key to it.
It is a key it acts as a primary key in one table and it acts as
secondary key in another table.
It combines two or more relations (tables) at a time.
They act as a cross-reference between the tables.
For example, DNO is a primary key in the DEPT table and a non-key in EMP
Example:
Refer Table STUDENT shown above.
STUD_NO in STUDENT_COURSE is a
foreign key to STUD_NO in STUDENT relation.
Table STUDENT_COURSE
STUD_N TEACHER_N COURSE_N
O O O
1 005 C001
2 056 C005
It may be worth noting that, unlike the Primary Key of any given relation, Foreign Key can be
NULL as well as may contain duplicate tuples i.e. it need not follow uniqueness constraint. For
Example, STUD_NO in the STUDENT_COURSE relation is not unique. It has been repeated for
the first and third tuples. However, the STUD_NO in STUDENT relation is a primary key and it
needs to be always unique, and it cannot be null.
6.Composite Key:
Sometimes, a table might not have a single column/attribute that uniquely identifies all the records
of a table. To uniquely identify rows of a table, a combination of two or more columns/attributes
can be used. It still can give duplicate values in rare cases. So, we need to find the optimal set of
attributes that can uniquely identify rows in a table.
It acts as a primary key if there is no primary key in a table
Two or more attributes are used together to make a composite key.
Different combinations of attributes may give different accuracy in terms of identifying
the rows uniquely.
Example:
FULLNAME + DOB can be combined
together to access the details of a student.
B.Entity integrity:
An entity is any person, place, or thing to be recorded in a database. Each table represents an entity,
and each row of a table represents an instance of that entity. For example, if order is an entity,
the orders table represents the idea of an order and each row in the table represents a specific order.
To identify each row in a table, the table must have a primary key. The primary key is a unique
value that identifies each row. This requirement is called the entity integrity constraint.
For example, the orders table primary key is order_num. The order_num column holds a unique
system-generated order number for each row in the table. To access a row of data in
the orders table, use the following SELECT statement:
SELECT * FROM orders WHERE order_num = 1001;
Using the order number in the WHERE clause of this statement enables you to access a row easily
because the order number uniquely identifies that row. If the table allowed duplicate order numbers,
it would be almost impossible to access one single row because all other columns of this table allow
duplicate values.
C.Domain Constraints:
Attributes have specific values in real-world scenario. For example, age can only be a positive
integer. The same constraints have been tried to employ on the attributes of a relation. Every
attribute is bound to have a specific range of values. For example, age cannot be less than zero and
telephone numbers cannot contain a digit outside 0-9.
D.Referential integrity Constraints
Referential integrity constraints work on the concept of Foreign Keys. A foreign key is a key
attribute of a relation that can be referred in other relation.
Referential integrity constraint states that if a relation refers to a key attribute of a different or same
relation, then that key element must exist.
Example:
Update Operation:
You can see that in the below-given relation table CustomerName= ‘Apple’ is updated from Inactive
to Active. Modify allows you to change the values of some attributes in existing tuples.
Delete Operation:
To specify deletion, a condition on the attributes of the relation selects the tuple to be deleted.
Delete is used to delete tuples from the table.
Basic Operations:
Six fundamental operations are mentioned below. The majority of data retrieval operations are carried out by
these. Let's know them one by one.
But, before moving into detail, let's have two tables or we can say relations STUDENT(ROLL, NAME,
AGE) and EMPLOYEE(EMPLOYEE_NO, NAME, AGE) which will be used in the below examples.
STUDENT
ROLL NAME AGE
1 Aman 20
2 Atul 18
3 Baljeet 19
4 Harsh 20
5 Prateek 21
6 Prateek 23
EMPLOYEE
EMPLOYEE_
NAME AGE
NO
E-1 Anant 20
E-2 Ashish 23
E-3 Baljeet 25
E-4 Harsh 20
E-5 Pranav 22
Select (σ):
Select operation is done by Selection Operator which is represented by "sigma"(σ). It is used to
retrieve tuples(rows) from the table where the given condition is satisfied. It is a unary
operator means it requires only one operand.
Notation : σ p(R)
Where σ is used to represent SELECTION
R is used to represent RELATION
p is the logic formula
Let's understand this with an example:
Suppose we want the row(s) from STUDENT Relation where "AGE" is 20
σ AGE=20 (STUDENT)
This will return the following output:
ROLL NAME AGE
1 Aman 20
4 Harsh 20
Project (∏):
Project operation is done by Projection Operator which is represented by "pi"(∏). It is used to
retrieve certain attributes(columns) from the table. It is also known as vertical partitioning as it
separates the table vertically. It is also a unary operator.
Notation : ∏ a(r)
Where ∏ is used to represent PROJECTION
r is used to represent RELATION
a is the attribute list
Let's understand this with an example:
Suppose we want the names of all students from STUDENT Relation.
∏ NAME(STUDENT)
This will return the following output:
NAME
Aman
Atul
Baljeet
Harsh
NAME
Prateek
Union (∪):
Union operation is done by Union Operator which is represented by "union"(∪). It is the same as the
union operator from set theory, i.e., it selects all tuples from both relations but with the exception
that for the union of two relations/tables both relations must have the same set of Attributes. It is
a binary operator as it requires two operands.
Notation: R ∪ S
Where R is the first relation
S is the second relation
If relations don't have the same set of attributes, then the union of such relations will result
in NULL.
Let's have an example to clarify the concept:
Suppose we want all the names from STUDENT and EMPLOYEE relation.
∏ NAME(STUDENT) ∪ ∏ NAME(EMPLOYEE)
NAME
Aman
Anant
Ashish
Atul
Baljeet
Harsh
Pranav
Prateek
As we can see from the above output it also eliminates duplicates.
Set Difference (-):
Set Difference as its name indicates is the difference between two relations (R-S). It is denoted by a
"Hyphen"(-) and it returns all the tuples(rows) which are in relation R but not in relation S. It is also
a binary operator.
Notation : R - S
Where R is the first relation
S is the second relation
Just like union, the set difference also comes with the exception of the same set of attributes in both
relations.
Let's take an example where we would like to know the names of students who are in STUDENT
Relation but not in EMPLOYEE Relation.
∏ NAME(STUDENT) - ∏ NAME(EMPLOYEE)
This will give us the following output:
NAME
Aman
Atul
Prateek
NAME
Aman
Atul
Baljeet
Harsh
Prateek
As you can see, this output relation is named "STUDENT_NAME".
Takeaway:
Select (σ) is used to retrieve tuples(rows) based on certain conditions.
Project (∏) is used to retrieve attributes(columns) from the relation.
Union (∪) is used to retrieve all the tuples from two relations.
Set Difference (-) is used to retrieve the tuples which are present in R but not in S(R-S).
Cartesian product (X) is used to combine each tuple from the first relation with each tuple from the
second relation.
Rename (ρ) is used to rename the output relation.
Derived Operations:
Also known as extended operations, these operations can be derived from basic operations and
hence named Derived Operations.
These include three operations:
Join Operations
Intersection operations
Division operations.
Join Operations:
Join Operation in DBMS are binary operations that allow us to combine two or more relations.
They are further classified into two types: Inner Join, and Outer Join.
First, let's have two relations EMPLOYEE consisting of E_NO, E_NAME, CITY and EXPERIENCE.
EMPLOYEE table contains employee's information such as id, name, city, and experience of employee(In
Years). The other relation is DEPARTMENT consisting
of D_NO, D_NAME, E_NO and MIN_EXPERIENCE.
DEPARTMENT table defines the mapping of an employee to their department. It contains Department
Number, Department Name, Employee Id of the employee working in that department, and the minimum
experience required(In Years) to be in that department.
EMPLOYEE
E_NO E_NAME CITY EXPERIENCE
E-1 Ram Delhi 04
E-2 Varun Chandigarh 09
E-3 Ravi Noida 03
E-4 Amit Bangalore 07
DEPARTMENT
D-1 HR E-1 03
D-2 IT E-2 05
Inner Join:
When we perform Inner Join, only those tuples returned that satisfy the certain condition. It is also
classified into three types: Theta Join, Equi Join and Natural Join.
As you can see here, all the tuples from left, i.e., EMPLOYEE relation are present. But E-4 is not
satisfying the given condition, i.e., E_NO from EMPLOYEE must be equal to E_NO from
DEPARTMENT, still it is included in the output relation. This is because Outer Join also includes
some/all the tuples which don't satisfy the condition. That's why Outer Join marked E-4's
corresponding tuple/row from DEPARTMENT as NULL.
Right Outer Join:
Right Outer Join returns the matching tuples and the tuples which are only present in Right
Relation here S.
The same happens with the Right Outer Join, if the matching tuples are NULL, then the attributes of
Left Relation, here R are made NULL in the output relation.
We will combine EMPLOYEE and DEPARTMENT relations with the same constraint as above.
EMPLOYEE ⟖EMPLOYEE.E_NO = DEPARTMENT.E_NO DEPARTMENT
E_NO E_NAME CITY EXPERIENCE D_NO D_NAME MIN_EXPERIENCE
E-1 Ram Delhi 04 D-1 HR 03
E-2 Varun Chandigarh 09 D-2 IT 05
E-3 Ravi Noida 03 D-3 Marketing 02
As all the tuples from DEPARTMENT relation have a corresponding E_NO in EMPLOYEE
relation, therefore no tuple from EMPLOYEE relation contains a NULL.
Full Outer Join:
Full Outer Join returns all the tuples from both relations. However, if there are no matching
tuples then, their respective attributes are made NULL in output relation.
Again, combine the EMPLOYEE and DEPARTMENT relation with the same constraint.
EMPLOYEE ⟗EMPLOYEE.E_NO = DEPARTMENT.E_NO DEPARTMENT
E_NO E_NAME CITY EXPERIENCE D_NO D_NAME MIN_EXPERIENCE
E-1 Ram Delhi 04 D-1 HR 03
E-2 Varun Chandigarh 09 D-2 IT 05
E-3 Ravi Noida 03 D-3 Marketing 02
E-4 Amit Bangalore 07 - - -
Intersection (∩):
Intersection operation is done by Intersection Operator which is represented by "intersection"(∩).It
is the same as the intersection operator from set theory, i.e., it selects all the tuples which are present
in both relations. It is a binary operator as it requires two operands. Also, it eliminates duplicates.
Notation : R ∩ S
Where R is the first relation
S is the second relation
Let's have an example to clarify the concept:
Suppose we want the names which are present in STUDENT as well as in EMPLOYEE relation,
Relations we used in Basic Operations.
∏ NAME(STUDENT) ∩ ∏ NAME(EMPLOYEE)
NAME
Baljeet
Harsh
Division (÷):
Division Operation is represented by "division"(÷ or /) operator and is used in queries that involve
keywords "every", "all", etc.
Notation : R(X,Y)/S(Y)
Here,
R is the first relation from which data is retrieved.
S is the second relation that will help to retrieve the data.
X and Y are the attributes/columns present in relation. We can have multiple attributes in relation,
but keep in mind that attributes of S must be a proper subset of attributes of R.
For each corresponding value of Y, the above notation will return us the value of X from
tuple<X,Y> which exists everywhere.
It's a bit difficult to understand this in a theoretical way, but you will understand this with an
example:
Let's have two relations, ENROLLED and COURSE. ENROLLED consist of two attributes
STUDENT_ID and COURSE_ID. It denotes the map of students who are enrolled in given courses.
COURSE contains the list of courses available.
See, here attributes/columns of COURSE relation are a proper subset of attributes/columns of
ENROLLED relation. Hence Division operation can be used here.
ENROLLED
STUDENT_ID COURSE_ID
Student_1 DBMS
Student_2 DBMS
Student_1 OS
Student_3 OS
COURSE
COURSE_ID
DBMS
OS
Now the query is to return the STUDENT_ID of students who are enrolled in every course.
ENROLLED(STUDENT_ID, COURSE_ID)/COURSE(COURSE_ID)
This will return the following relation as output.
STUDENT_ID
Student_1
Operations are divided into two main categories: Basic and Derived.
Basic consists of six Operations: SELECT, PROJECT, UNION, SET DIFFERENCE,
CARTESIAN PRODUCT, RENAME.
Derived Consist of three Operations: JOINS, INTERSECTION, DIVISION.
Joins are of two types: Inner Join and Outer Join.
Inner Join is further classified into three types:
i. Theta Join
ii. Equi Join
iii. Natural Join.
Outer Join also consists of three types:
i. Left Outer Join
ii. Right Outer Join
iii. Full Outer Join.
Advantages:
Expressive Power: Extended operators allow for more complex queries and transformations
that cannot be easily expressed using basic relational algebra operations.
Data Reduction: Aggregation operators, such as SUM, AVG, COUNT, and MAX, can
reduce the amount of data that needs to be processed and displayed.
Data Transformation: Extended operators can be used to transform data into different
formats, such as pivoting rows into columns or vice versa.
More Efficient: Extended operators can be more efficient than expressing the same query in
terms of basic relational algebra operations, since they can take advantage of specialized
algorithms and optimizations.
Disadvantages:
Complexity: Extended operators can be more difficult to understand and use than basic
relational algebra operations. They require a deeper understanding of the underlying data and
the operators themselves.
Performance: Some extended operators, such as outer joins, can be expensive in terms of
performance, especially when dealing with large data sets.
Non-standardized: There is no universal set of extended operators, and different relational
database management systems may implement them differently or not at all.
Data Integrity: Some extended operators, such as aggregate functions, can introduce
potential problems with data integrity if not used properly. For example, using AVG on a
column that contains null values can result in unexpected or incorrect results.
Takeaway:
Theta Join (θ) combines two relations based on a condition.
Equi Join is a type of Theta Join where only equality condition (=) is used.
Natural Join (⋈) combines two relations based on a common attribute (preferably foreign
key).
Left Outer Join (⟕) returns the matching tuples and tuples which are only present in the left
relation.
Right Outer Join (⟖) returns the matching tuples and tuples which are only present in the
right relation.
Full Outer Join (⟗) returns all the tuples present in the left and right relations.
Conclusion:
Relational Algebra in DBMS is a theoretical model which is the fundamental block for SQL. It
comprises different mathematics operations.