0% found this document useful (0 votes)
41 views70 pages

Introduction to Database Systems Overview

The document introduces databases and database management systems. It defines key terms like data, metadata, DBMS and compares file-based systems to database systems. Database systems offer advantages like reduced redundancy, data sharing and integrity. The document also describes database components including hardware, software, users and procedures.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views70 pages

Introduction to Database Systems Overview

The document introduces databases and database management systems. It defines key terms like data, metadata, DBMS and compares file-based systems to database systems. Database systems offer advantages like reduced redundancy, data sharing and integrity. The document also describes database components including hardware, software, users and procedures.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

INTRODUCTION TO DATABASE AND ITS ENVIRONMENT

1.1 Definition of terms

Data management: focuses on data collection, storage and retrieval, constitutes a core
activity for any organization. To generate relevant information efficiently you need
quick access to data (raw facts) from which the required information is produced.
Efficient data management requires the use of a computer database. A database is a
shared, integrated computer structure that houses a collection of:
End -user data: raw facts of interest to the user.
Meta data: The Meta data provides a description of the data characteristics and the set
of relationships that link the data found within the database.
The database: resembles a very well organized electronic filing cabinet in which
powerful software referred to as DBMS helps manage the cabinet’s contents.
DBMS: Database Management system that enables the creation of and management of
the database

1.2 Database vs. file based system

File based system


Consider a saving bank enterprise that keeps information about all customers and
savings accounts in permanent system files at the bank. The bank will need a number of
applications e.g.

i. Program to debit or credit an account


ii. A program to add a new account
iii. A program to find the balance of an account
iv. A program to generate monthly statements
v. Any new program would be added as per the banks requirements

Such a typical filing /processing system has the limitation of more and more files and
application programs being added to the system at any time. Such a scheme has a
number of major disadvantages:
1. Data redundancy and inconsistency - Since the files and application programs
are created by different programmers over a long period of type, the files are
likely to have different formats and the programs may be written in several
programming languages. Moreover, the same piece of information may be
duplicated in several files. This redundancy leads to higher storage and access
costs. It may also lead to inconsistency i.e. the various copies of the same data
may no longer agree.
2. Difficulty in accessing - Suppose that one of the bank officers needs to find out
the names of all customers who live within the city's 78-phone code. The officer
would ask the data processing department to generate such a list. Such a request
may not have been anticipated while designing the system originally and the
only options available are:-

 Extract the data manually


 Write the necessary application; therefore do not allow the data to
be accessed conveniently and efficiently
3. Data isolation - Since data is scattered in various files and files may be in
different formats, it may be difficult to write new applications programs to
retrieve the appropriate data.
4. Concurrent access anomalies - Interaction of concurrent updates may result in
inconsistent data e.g. if 2 customers withdraw funds say 50/= and 100/= from an
account at about the same time the result of the concurrent execution may leave
the account in an incorrect state.
5. Security problems - Not every user of the database system should be able to
access all the data. Since application programs are added to the system in an ad-
hoc manner, it is difficult to enforce security constraints.
6. Integrity - The data value stored in the database must satisfy certain types of
consistency constraints e.g. a balance of a bank account may never fall below a
prescribed value e.g. 5,000/=. These constraints are enforced in a system by
adding appropriate code in the various application programs. However, when
new constraints are added there is need to change the other programs to enforce.
Conclusion.
These difficulties among others have prompted the development of DBMS.
Database system
Unlike the file system with many separate and unrelated files, the Database consists of
logically related data store in a single data repository. The problems inherent in file
systems make using the database system very desirable and therefore, the database
represents a change in the way the end user data are stored accessed and arranged.

Advantages of the Database Systems


1. Centralized Control - Via the DBA it is possible to enforce centralized
management and control of data. This means that necessary modifications, which
do not affect other application changes, meet the data independence DBMS
requirement.

2. Reduction of redundancies - Unnecessary duplication of data is avoided effectively


reducing total amount of data required, consequently the reduction of storage space.
It also eliminates extra processing necessary to trace the required data in a large
mass of data. It also eliminates inconsistencies. Any redundancies that exist in the
DBMS are controlled and the system ensures that his multiple copies are consistent.

3. Shared data - In a DBMS, sharing of data under its control by a number of


application programs and user is possible e.g. backups.

4. Integrity - Centralized control can also ensure that adequate checks are
incorporated to the DBMS provide data integrity. Data integrity means
that the data contained in the database is both accurate and consistent e.g.
employee age must be between 28-25 years.

5. Security - Only authorized people must access confidential data. The DBA ensures
that proper access procedures are followed including proper authentication schemes
process that the DBMS and additional checks before permitting access to sensitive
data. Different levels of security can be implemented for various types of data or
operations.

6. Conflict Resolution - The DBA is in a position to resolve conflicting resolve


conflicting requirements of various users and applications. It is by choosing the
best file structure and access method to get optimum performance for the response.
This could be by classifying applications into critical and less critical applications.

7. Data Independence - It involves both logical and physical independence logical


data independence indicates that the conceptual schemes can be changed without
affecting the existing external schemes. Physical data independence indicates that
the physical storage structures/devices used for storing the data would be changed
without necessitating a change in the conceptual view or any of the external use.

Disadvantages of Database Systems

1. Cost - in terms of:


 The DBMS - software
 Purchasing or developing S/W
 H/W
 Workspace (disks for storage)
 Migration (movement from tradition separate systems to an integrated one)

2. Centralization Problems

You would require adequate backup incase of failure


You would require increased severity of security breaches and disruption of operation of
the organization because of downtimes and failures.

3. Complexity of Backup and recovery

File System Environment

Personnel Sales Accounts


Department Department Department

employees customer sales inventory accounts


accounts

Database System Environment

Personnel
Department DATABASE
Employees
Customers
Sales Department DBMS
Sales
Integrated Inventory
Accounts
System
Accounting
Department

The database eliminates most of the file systems' data inconsistencies, anomalies and
structural dependency problems. The current generation of DBMS software stores not
only the data structures in a central location but also stores the relationships between
the database components. The DBMS also takes care of defining all the required
access paths of the required component.
Database components

The term database system refers to an organization of components that define and
regulate the collection storage, management and use of data within a database
environment. The database system is composed of 5 major parts i.e.

a. Hardware
b. Software
c. People
d. Procedures
e. Data
Hardware
This identifies all the systems physical devices e.g. the composition peripherals, storage
devices etc.

Software
These are a collection of programs used by the computers within the database system.
i. O.S - manages all hardware components and makes it possible for all other
and software to run on the composition.
ii. The DBMS - manages the database within the database system e.g. Oracle,
DB2, Ms Access etc.
iii. Applications programs and utilities to access and manipulate data in the
DBMS.

People
These are all database systems users:-
1. Systems administrator - Oversees the database systems general operations.
2. Database administrator (DBA) - Manages the DBMS use and ensures that
the database is functioning properly. His functions include:

i. Scheme definition - The original database scheme is created by writing a set


of definitions, which are translated by DDL compiler to a set of tables that
are permanently stored in the data dictionary.
ii. Storage structure and Access Methods Definitions - By writing a set of
definitions for appropriate storage structures and access methods, which are
translated by the data storage and definition language compiler.
iii. Scheme and physical organisation modifications - Modification to either the
database schema or description of the physical storage organisation are
accompanied by writing a set of definitions which are used by either the
DDL compiler or the data storage and definition language compiler to
generate modification to appropriate internal systems tables e.g. data
dictionary.
iv. Granting authorization to data access - This is so as to regulate which parts of
the database users can access.
v. The database manager keeps integrity Constrains in a special system
structure whenever an update takes place in the system.

3. Database designers - These are the database architects who design the database
structure.
4. Systems Analysts & Programmers (application programmers) - They design and
implement the application programs they design & create the data entry scheme,
reports & procedures through which users access and manipulate the databases data.
5. End users - These are the people who use the application programs to run the
organizations daily operations. They fall in the following classes:

i. Sophisticated users - These interact with the system without writing programs.
They form their requests in a database query language.
ii. Specialized database applications that do not fit in the traditional data
processing framework e.g. CAD Systems, knowledge based & expect
systems.
iii. Application programmers: These interact with the system through the DML
& applications.
iv. Naive – Unsophisticated user who interact with the systems by invoking one
of the permanent application programs that have been written previously.

Procedures
 These are instructions and rules that govern the design and use of the database
system.
 They enforce standards by which business is conducted within the organisation an
with customers.
 They also ensure that there is an organized way to monitor and audit both the data
that enter the database and the information that is generated through the use of such
data.

Data
This covers the collection for facts stored in the database and since data is the raw
material from which information is generated the determination of what data is to be
stored into the database and how the data is to be organized is a vital part of the database
1.3 Database languages

A DBMS is software used to build, maintain and control database systems. It allows a
systematic approach to the storage and retrieval of data in a computer.
Most DBMS(s) have several major components, which include the following:

1. Data Definition Language (DDL) - These are commands used for creating and
altering the structure of the database.
The structures comprise of Field Names, Field sizes, Type of data for each field, File
organizational technique. The DDL commands are used to create new objects, alter the
structure of existing ones or completely remove objects from the system.

2. Data Manipulation language (DML) - This is the user language interface and is used for
executing and modifying the contents of the database. These commands allow access
and manipulation of data for output. They include commands for adding, inserting,
deleting, sorting, displaying, painting etc. These are the most frequently used commands
once the database has been created.
Interactive Data Manipulation Language (DML) - DML includes a query language
based on both relational calculus. It includes commands to insert tuples into, delete tuples
from and modify tuples in the database.
Embedded DML - This is designed for use within general purpose programming
languages such as PL/1. Cobol, Pascal, Fortran and C.

3. Data Control Language (DCL) - These are commands used to control access to the
database in response to DML commands. It acts as an interface between the DML and
the OS. It provides security and control to the data.

4. Query Languages - A query language is a formalized method of constructing queries in


database system. It provides the ways in which the user interrogates the database for data
without using conventional programs. For relation database, structures query languages
(SQL) has emerged as the standard language. Almost all the DBMS(s) use SQL running
on machines ranging from microcomputers to large main frames.

i. View Definition - The SQL DDL includes commands for specifying access rights to
relations and view.
ii. Integrity - The SQL DDL includes commands for specifying integrity constraints that the
data stored in the database must satisfy. Updates that violate integrity constraints as
disallowed.
iii. Transaction Control - SQL includes commands for specifying the beginning and ending
of transactions. Several implementations also allow explicit locking of data for
concurrency control.
Chapter two
DBMS Database Models
A Database model defines the logical design and structure of a database and defines how data
will be stored, accessed and updated in a database management system. While the Relational
Model is the most widely used database model, there are other models too:

 Hierarchical Model
 Network Model
 Entity-relationship Model
 Relational Model

Hierarchical Model

This database model organises data into a tree-like-structure, with a single root, to which all the
other data is linked. The heirarchy starts from the Root data, and expands like a tree, adding
child nodes to the parent nodes.

In this model, a child node will only have a single parent node.

This model efficiently describes many real-world relationships like index of a book, recipes etc.

In hierarchical model, data is organised into tree-like structure with one one-to-many relationship
between two different types of data, for example, one department can have many courses, many
professors and of-course many students.
Network Model

This is an extension of the Hierarchical model. In this model data is organised more like a graph,
and are allowed to have more than one parent node.

In this database model data is more related as more relationships are established in this database
model. Also, as the data is more related, hence accessing the data is also easier and fast. This
database model was used to map many-to-many data relationships.

This was the most widely used database model, before Relational Model was introduced.

Relational Model

In this model, data is organised in two-dimensional tables and the relationship is maintained by
storing a common field.

This model was introduced by E.F Codd in 1970, and since then it has been the most widely used
database model, infact, we can say the only database model used around the world.

The basic structure of data in the relational model is tables. All the information related to a
particular type is stored in rows of that table.

Hence, tables are also known as relations in relational model.

In the coming tutorials we will learn how to design tables, normalize them to reduce data
redundancy and how to use Structured Query language to access data from tables.
Basic Concepts of ER Model in DBMS
As we described in the tutorial Database models, Entity-relationship model is a model used for
design and representation of relationships between data.

The main data objects are termed as Entities, with their details defined as attributes, some of
these attributes are important and are used to identity the entity, and different entities are related
using relationships.

In short, to understand about the ER Model, we must understand about:

 Entity and Entity Set


 What are Attributes? And Types of Attributes.
 Keys
 Relationships

Let's take an example to explain everything. For a School Management Software, we will have
to store Student information, Teacher information, Classes, Subjects taught in each class etc.

ER Model: Entity and Entity Set


Considering the above example, Student is an entity, Teacher is an entity, similarly, Class,
Subject etc are also entities.

An Entity is generally a real-world object which has characteristics and holds relationships in a
DBMS.

If a Student is an Entity, then the complete dataset of all the students will be the Entity Set

ER Model: Attributes

If a Student is an Entity, then student's roll no., student's name, student's age, student's gender
etc will be its attributes.

An attribute can be of many types, here are different types of attributes defined in ER database
model:

1. Simple attribute: The attributes with values that are atomic and cannot be broken down further
are simple attributes. For example, student's age.
2. Composite attribute: A composite attribute is made up of more than one simple attribute. For
example, student's address will contain, house no., street name, pincode etc.
3. Derived attribute: These are the attributes which are not present in the whole database
management system, but are derived using other attributes. For example, average age of
students in a class.
4. Single-valued attribute: As the name suggests, they have a single value.
5. Multi-valued attribute: And, they can have multiple values.

ER Model: Keys/database keys

Keys are very important part of Relational database model. They are used to establish and
identify relationships between tables and also to uniquely identify any record or row of data
inside a table.

A Key can be a single attribute or a group of attributes, where the combination may act as a key.

Why we need a Key?


In real world applications, number of tables required for storing the data is huge, and the
different tables are related to each other as well.

Also, tables store a lot of data in them. Tables generally extends to thousands of records stored in
them, unsorted and unorganised.
Now to fetch any particular record from such dataset, you will have to apply some conditions,
but what if there is duplicate data present and every time you try to fetch some data by applying
certain condition, you get the wrong data. How many trials before you get the right data?

To avoid all this, Keys are defined to easily identify any row of data in a table.

Let's try to understand about all the keys using a simple example.

student_id name phone age


1 Akon 9876723452 17
2 Akon 9991165674 19
3 Bkon 7898756543 18
4 Ckon 8987867898 19
5 Dkon 9990080080 17

Let's take a simple Student table, with fields student_id, name, phone and age.

Super Key

Super Key is defined as a set of attributes within a table that can uniquely identify each record
within a table. Super Key is a superset of Candidate key.

In the table defined above super key would include student_id, (student_id, name), phone
etc.

Confused? The first one is pretty simple as student_id is unique for every row of data, hence it
can be used to identity each row uniquely.

Next comes, (student_id, name), now name of two students can be same, but their
student_id can't be same hence this combination can also be a key.

Similarly, phone number for every student will be unique, hence again, phone can also be a key.

So they all are super keys.

Candidate Key

Candidate keys are defined as the minimal set of fields which can uniquely identify each record
in a table. It is an attribute or a set of attributes that can act as a Primary Key for a table to
uniquely identify each record in that table. There can be more than one candidate key.
In our example, student_id and phone both are candidate keys for table Student.

 A candiate key can never be NULL or empty. And its value should be unique.
 There can be more than one candidate keys for a table.
 A candidate key can be a combination of more than one columns (attributes).

Primary Key

Primary key is a candidate key that is most appropriate to become the main key for any table. It
is a key that can uniquely identify each record in a table.

For the table Student we can make the student_id column as the primary key.

Composite Key

Key that consists of two or more attributes that uniquely identify any record in a table is called
Composite key. But the attributes which together form the Composite key are not a key
independentely or individually.
In the above picture we have a Score table which stores the marks scored by a student in a
particular subject.

In this table student_id and subject_id together will form the primary key, hence it is a
composite key.

Secondary or Alternative key

The candidate key which are not selected as primary key are known as secondary keys or
alternative keys.

Non-key Attributes

Non-key attributes are the attributes or fields of a table, other than candidate key
attributes/fields in a table.

Non-prime Attributes

Non-prime Attributes are attributes other than Primary Key attribute(s)..


ER Model: Relationships

When an Entity is related to another Entity, they are said to have a relationship. For example, A
Class Entity is related to Student entity, becasue students study in classes, hence this is a
relationship.

Depending upon the number of entities involved, a degree is assigned to relationships.

For example, if 2 entities are involved, it is said to be Binary relationship, if 3 entities are
involved, it is said to be Ternary relationship, and so on.

ER Diagrams
ER Diagram is a visual representation of data that describes how data is related to each other. In
ER Model, we disintegrate data into entities, attributes and setup relationships between entities,
all this can be represented visually using the ER diagram.

For example, in the below diagram, anyone can see and understand what the diagram wants to
convey: Developer develops a website, whereas a Visitor visits a website.

Components of ER Diagram

Entitiy, Attributes, Relationships etc form the components of ER Diagram and there are defined
symbols and shapes to represent each one of them.

Let's see how we can represent these in our ER Diagram.

Entity

Simple rectangular box represents an Entity.


Relationships between Entities - Weak and Strong

Rhombus is used to setup relationships between two or more entities.

Attributes for any Entity

Ellipse is used to represent attributes of any entity. It is connected to the entity.

Weak Entity

A weak Entity is represented using double rectangular boxes. It is generally connected to another
entity.

Key Attribute for any Entity

To represent a Key attribute, the attribute name inside the Ellipse is underlined.
Derived Attribute for any Entity

Derived attributes are those which are derived based on other attributes, for example, age can be
derived from date of birth.

To represent a derived attribute, another dotted ellipse is created inside the main ellipse.

Multivalued Attribute for any Entity

Double Ellipse, one inside another, represents the attribute which can have multiple values.

Composite Attribute for any Entity

A composite attribute is the attribute, which also has attributes.

ER Diagram: Entity

An Entity can be any object, place, person or class. In ER Diagram, an entity is represented
using rectangles. Consider an example of an Organisation- Employee, Manager, Department,
Product and many more can be taken as entities in an Organisation.
The yellow rhombus in between represents a relationship.

ER Diagram: Weak Entity

Weak entity is an entity that depends on another entity. Weak entity doesn't have anay key
attribute of its own. Double rectangle is used to represent a weak entity.

ER Diagram: Attribute

An Attribute describes a property or characterstic of an entity. For example, Name, Age,


Address etc can be attributes of a Student. An attribute is represented using eclipse.
ER Diagram: Key Attribute

Key attribute represents the main characterstic of an Entity. It is used to represent a Primary key.
Ellipse with the text underlined, represents Key Attribute.

ER Diagram: Composite Attribute

An attribute can also have their own attributes. These attributes are known as Composite
attributes.
ER Diagram: Relationship

A Relationship describes relation between entities. Relationship is represented using diamonds


or rhombus.

There are three types of relationship that exist between Entities.

1. Binary Relationship
2. Recursive Relationship
3. Ternary Relationship

ER Diagram: Binary Relationship

Binary Relationship means relation between two Entities. This is further divided into three types.

One to One Relationship

This type of relationship is rarely seen in real world.


The above example describes that one student can enroll only for one course and a course will
also have only one Student. This is not what you will usually see in real-world relationships.

One to Many Relationship

The below example showcases this relationship, which means that 1 student can opt for many
courses, but a course can only have 1 student. Sounds weird! This is how it is.

Many to One Relationship

It reflects business rule that many entities can be associated with just one entity. For example,
Student enrolls for only one Course but a Course can have many Students.
Many to Many Relationship

The above diagram represents that one student can enroll for more than one courses. And a
course can have more than 1 student enrolled in it.

ER Diagram: Recursive Relationship

When an Entity is related with itself it is known as Recursive Relationship.


ER Diagram: Ternary Relationship

Relationship of degree three is called Ternary relationship.

A Ternary relationship involves three entities. In such relationships we always consider two
entites together and then look upon the third.

For example, in the diagram above, we have three related entities, Company, Product and
Sector. To understand the relationship better or to define rules around the model, we should
relate two entities and then derive the third one.

A Company produces many Products/ each product is produced by exactly one company.
A Company operates in only one Sector / each sector has many companies operating in it.

Considering the above two rules or relationships, we see that although the complete relationship
involves three entities, but we are looking at two entities at a time.

Normalization of Database
Database Normalization is a technique of organizing the data in the database. Normalization is a
systematic approach of decomposing tables to eliminate data redundancy(repetition) and
undesirable characteristics like Insertion, Update and Deletion Anamolies. It is a multi-step
process that puts data into tabular form, removing duplicated data from the relation tables.

Normalization is used for mainly two purposes,

 Eliminating reduntant(useless) data.


 Ensuring data dependencies make sense i.e data is logically stored.

Problems Without Normalization

If a table is not properly normalized and have data redundancy then it will not only eat up extra
memory space but will also make it difficult to handle and update the database, without facing
data loss. Insertion, Updation and Deletion Anamolies are very frequent if database is not
normalized. To understand these anomalies let us take an example of a Student table.

rollno name branch hod office_tel


401 Akon CSE Mr. X 53337
402 Bkon CSE Mr. X 53337
403 Ckon CSE Mr. X 53337
404 Dkon CSE Mr. X 53337

In the table above, we have data of 4 Computer Sci. students. As we can see, data for the fields
branch, hod(Head of Department) and office_tel is repeated for the students who are in the
same branch in the college, this is Data Redundancy.

Insertion Anomaly

Suppose for a new admission, until and unless a student opts for a branch, data of the student
cannot be inserted, or else we will have to set the branch information as NULL.
Also, if we have to insert data of 100 students of same branch, then the branch information will
be repeated for all those 100 students.

These scenarios are nothing but Insertion anomalies.

Updation Anomaly

What if Mr. X leaves the college? or is no longer the HOD of computer science department? In
that case all the student records will have to be updated, and if by mistake we miss any record, it
will lead to data inconsistency. This is Updation anomaly.

Deletion Anomaly

In our Student table, two different informations are kept together, Student information and
Branch information. Hence, at the end of the academic year, if student records are deleted, we
will also lose the branch information. This is Deletion anomaly.

Normalization Rule

Normalization rules are divided into the following normal forms:

1. First Normal Form


2. Second Normal Form
3. Third Normal Form
4. BCNF
5. Fourth Normal Form

First Normal Form (1NF)

For a table to be in the First Normal Form, it should follow the following 4 rules:

1. It should only have single(atomic) valued attributes/columns.


2. Values stored in a column should be of the same domain
3. All the columns in a table should have unique names.
4. And the order in which data is stored, does not matter.

The 1st Normal form expects you to design your table in such a way that it can easily be
extended and it is easier for you to retrieve data from it whenever required.
If tables in a database are not even in the 1st Normal Form, it is considered as bad database
design.

Rules for First Normal Form


The first normal form expects you to follow a few simple rules while designing your database,
and they are:

Rule 1: Single Valued Attributes

Each column of your table should be single valued which means they should not contain multiple
values. We will explain this with help of an example later, let's see the other rules for now.

Rule 2: Attribute Domain should not change

This is more of a "Common Sense" rule. In each column the values stored must be of the same
kind or type.

For example: If you have a column dob to save date of births of a set of people, then you cannot
or you must not save 'names' of some of them in that column along with 'date of birth' of others
in that column. It should hold only 'date of birth' for all the records/rows.

Rule 3: Unique name for Attributes/Columns

This rule expects that each column in a table should have a unique name. This is to avoid
confusion at the time of retrieving data or performing any other operation on the stored data.

If one or more columns have same name, then the DBMS system will be left confused.

Rule 4: Order doesn't matters

This rule says that the order in which you store the data in your table doesn't matter.

Time for an Example


Although all the rules are self explanatory still let's take an example where we will create a table
to store student data which will have student's roll no., their name and the name of subjects they
have opted for.

Here is our table, with some sample data added to it.

roll_no name subject


101 Akon OS, CN
103 Ckon Java
102 Bkon C, C++

Our table already satisfies 3 rules out of the 4 rules, as all our column names are unique, we have
stored data in the order we wanted to and we have not inter-mixed different type of data in
columns.

But out of the 3 different students in our table, 2 have opted for more than 1 subject. And we
have stored the subject names in a single column. But as per the 1st Normal form each column
must contain atomic value.

How to solve this Problem?

It's very simple, because all we have to do is break the values into atomic values.

Here is our updated table and it now satisfies the First Normal Form.

roll_no name subject


101 Akon OS
101 Akon CN
103 Ckon Java
102 Bkon C
102 Bkon C++

By doing so, although a few values are getting repeated but values for the subject column are
now atomic for each record/row.

Using the First Normal Form, data redundancy increases, as there will be many columns with
same data in multiple rows but each row as a whole will be unique.

Second Normal Form (2NF)


For a table to be in the Second Normal Form,

1. It should be in the First Normal form.


2. And, it should not have Partial Dependency.

To understand what is Partial Dependency and how to normalize a table to 2nd normal for, jump
to the Second Normal Form tutorial.

For a table to be in the Second Normal Form, it must satisfy two conditions:

1. The table should be in the First Normal Form.


2. There should be no Partial Dependency.

What is Partial Dependency? Do not worry about it. First let's understand what is Dependency
in a table?

What is Dependency?

Let's take an example of a Student table with columns student_id, name, reg_no(registration
number), branch and address(student's home address).

student_id name reg_no branch address

In this table, student_id is the primary key and will be unique for every row, hence we can use
student_id to fetch any row of data from this table

Even for a case, where student names are same, if we know the student_id we can easily fetch
the correct record.

student_id name reg_no branch address


10 Akon 07-WY CSE Kerala
11 Akon 08-WY IT Gujarat

Hence we can say a Primary Key for a table is the column or a group of columns(composite
key) which can uniquely identify each record in the table.

I can ask from branch name of student with student_id 10, and I can get it. Similarly, if I ask
for name of student with student_id 10 or 11, I will get it. So all I need is student_id and
every other column depends on it, or can be fetched using it.

This is Dependency and we also call it Functional Dependency.


What is Partial Dependency?

Now that we know what dependency is, we are in a better state to understand what partial
dependency is.

For a simple table like Student, a single column like student_id can uniquely identfy all the
records in a table.

But this is not true all the time. So now let's extend our example to see if more than 1 column
together can act as a primary key.

Let's create another table for Subject, which will have subject_id and subject_name fields
and subject_id will be the primary key.

subject_id subject_name
1 Java
2 C++
3 Php

Now we have a Student table with student information and another table Subject for storing
subject information.

Let's create another table Score, to store the marks obtained by students in the respective
subjects. We will also be saving name of the teacher who teaches that subject along with marks.

score_id student_id subject_id marks teacher


1 10 1 70 Java Teacher
2 10 2 75 C++ Teacher
3 11 1 80 Java Teacher

In the score table we are saving the student_id to know which student's marks are these and
subject_id to know for which subject the marks are for.

Together, student_id + subject_id forms a Candidate Key(learn about Database Keys) for
this table, which can be the Primary key.

Confused, How this combination can be a primary key?

See, if I ask you to get me marks of student with student_id 10, can you get it from this table?
No, because you don't know for which subject. And if I give you subject_id, you would not
know for which student. Hence we need student_id + subject_id to uniquely identify any
row.

But where is Partial Dependency?


Now if you look at the Score table, we have a column names teacher which is only dependent
on the subject, for Java it's Java Teacher and for C++ it's C++ Teacher & so on.

Now as we just discussed that the primary key for this table is a composition of two columns
which is student_id & subject_id but the teacher's name only depends on subject, hence the
subject_id, and has nothing to do with student_id.

This is Partial Dependency, where an attribute in a table depends on only a part of the primary
key and not on the whole key.

How to remove Partial Dependency?

There can be many different solutions for this, but out objective is to remove teacher's name
from Score table.

The simplest solution is to remove columns teacher from Score table and add it to the Subject
table. Hence, the Subject table will become:

subject_id subject_name teacher


1 Java Java Teacher
2 C++ C++ Teacher
3 Php Php Teacher

And our Score table is now in the second normal form, with no partial dependency.

score_id student_id subject_id marks


1 10 1 70
2 10 2 75
3 11 1 80

Quick Recap

1. For a table to be in the Second Normal form, it should be in the First Normal form and it should
not have Partial Dependency.
2. Partial Dependency exists, when for a composite primary key, any attribute in the table depends
only on a part of the primary key and not on the complete primary key.
3. To remove Partial dependency, we can divide the table, remove the attribute which is causing
partial dependency, and move it to some other table where it fits in well.

So let's use the same example, where we have 3 tables, Student, Subject and Score.

Student Table
student_id name reg_no branch address
10 Akon 07-WY CSE Kerala
11 Akon 08-WY IT Gujarat
12 Bkon 09-WY IT Rajasthan

Subject Table

subject_id subject_name teacher


1 Java Java Teacher
2 C++ C++ Teacher
3 Php Php Teacher

Score Table

score_id student_id subject_id marks


1 10 1 70
2 10 2 75
3 11 1 80

In the Score table, we need to store some more information, which is the exam name and total
marks, so let's add 2 more columns to the Score table.

score_id student_id subject_id marks exam_name total_marks

Third Normal Form (3NF)

A table is said to be in the Third Normal Form when,

1. It is in the Second Normal form.


2. And, it doesn't have Transitive Dependency.

Here is the Third Normal Form tutorial. But we suggest you to first study about the second
normal form and then head over to the third normal form.

Requirements for Third Normal Form


For a table to be in the third normal form,
1. It should be in the Second Normal form.
2. And it should not have Transitive Dependency.

What is Transitive Dependency?

With exam_name and total_marks added to our Score table, it saves more data now. Primary
key for our Score table is a composite key, which means it's made up of two attributes or
columns → student_id + subject_id.

Our new column exam_name depends on both student and subject. For example, a mechanical
engineering student will have Workshop exam but a computer science student won't. And for
some subjects you have Prctical exams and for some you don't. So we can say that exam_name is
dependent on both student_id and subject_id.

And what about our second new column total_marks? Does it depend on our Score table's
primary key?

Well, the column total_marks depends on exam_name as with exam type the total score
changes. For example, practicals are of less marks while theory exams are of more marks.

But, exam_name is just another column in the score table. It is not a primary key or even a part of
the primary key, and total_marks depends on it.

This is Transitive Dependency. When a non-prime attribute depends on other non-prime


attributes rather than depending upon the prime attributes or primary key.

How to remove Transitive Dependency?

Again the solution is very simple. Take out the columns exam_name and total_marks from
Score table and put them in an Exam table and use the exam_id wherever required.

Score Table: In 3rd Normal Form

score_id student_id subject_id marks exam_id

The new Exam table

exam_id exam_name total_marks


1 Workshop 200
2 Mains 70
3 Practicals 30
Advantage of removing Transitive Dependency

The advantage of removing transitive dependency is,

 Amount of data duplication is reduced.


 Data integrity achieved.

Introduction to SQL
Structure Query Language(SQL) is a database query language used for storing and managing
data in Relational DBMS. SQL was the first commercial language introduced for E.F Codd's
Relational model of database. Today almost all RDBMS(MySql, Oracle, Infomix, Sybase, MS
Access) use SQL as the standard database query language. SQL is used to perform all types of
data operations in RDBMS.

SQL Command
SQL defines following ways to manipulate data stored in an RDBMS.

DDL: Data Definition Language

This includes changes to the structure of the table like creation of table, altering table, deleting a
table etc.

All DDL commands are auto-committed. That means it saves all the changes permanently in the
database.

Command Description
create to create new table or database
alter for alteration
truncate delete data from table
drop to drop a table
rename to rename a table

DML: Data Manipulation Language


DML commands are used for manipulating the data stored in the table and not the table itself.

DML commands are not auto-committed. It means changes are not permanent to database, they
can be rolled back.

Command Description
insert to insert a new row
update to update existing row
delete to delete a row
merge merging two rows or two tables

TCL: Transaction Control Language

These commands are to keep a check on other commands and their affect on the database. These
commands can annul changes made by other commands by rolling the data back to its original
state. It can also make any temporary change permanent.

Command Description
commit to permanently save
rollback to undo change
savepoint to save temporarily

DCL: Data Control Language

Data control language are the commands to grant and take back authority from any database
user.

Command Description
grant grant permission of right
revoke take back permission.

DQL: Data Query Language

Data query language is used to fetch data from tables based on conditions that we can easily
apply.

Command Description
select retrieve records from one or more table

SQL: create command


create is a DDL SQL command used to create a table or a database in relational database
management system.

Creating a Database
To create a database in RDBMS, create command is used. Following is the syntax,

CREATE DATABASE <DB_NAME>;

Example for creating Database


CREATE DATABASE Test;

The above command will create a database named Test, which will be an empty schema without
any table.

To create tables in this newly created database, we can again use the create command.

Creating a Table
create command can also be used to create tables. Now when we create a table, we have to
specify the details of the columns of the tables too. We can specify the names and datatypes of
various columns in the create command itself.

Following is the syntax,

CREATE TABLE <TABLE_NAME>


(
column_name1 datatype1,
column_name2 datatype2,
column_name3 datatype3,
column_name4 datatype4
);

create table command will tell the database system to create a new table with the given table
name and column information.

Example for creating Table


CREATE TABLE Student(
student_id INT,
name VARCHAR(100),
age INT);

The above command will create a new table with name Student in the current database with 3
columns, namely student_id, name and age. Where the column student_id will only store
integer, name will hold upto 100 characters and age will again store only integer value.

If you are currently not logged into your database in which you want to create the table then you
can also add the database name along with table name, using a dot operator .

For example, if we have a database with name Test and we want to create a table Student in it,
then we can do so using the following query:

CREATE TABLE Test.Student(


student_id INT,
name VARCHAR(100),
age INT);

Most commonly used datatypes for Table columns

Here we have listed some of the most commonly used datatypes used for columns in tables.

Datatype Use
INT used for columns which will store integer values.
FLOAT used for columns which will store float values.
DOUBLE used for columns which will store float values.
used for columns which will be used to store characters and integers, basically a
VARCHAR
string.
CHAR used for columns which will store char values(single character).
DATE used for columns which will store date values.
used for columns which will store text which is generally long in length. For
TEXT example, if you create a table for storing profile information of a social networking
website, then for about me section you can have a column of type TEXT.

SQL: ALTER command


alter command is used for altering the table structure, such as,

 to add a column to existing table


 to rename any existing column
 to change datatype of any column or to modify its size.
 to drop a column from the table.
ALTER Command: Add a new Column
Using ALTER command we can add a column to any existing table. Following is the syntax,

ALTER TABLE table_name ADD(


column_name datatype);

Here is an Example for this,

ALTER TABLE student ADD(


address VARCHAR(200)
);

The above command will add a new column address to the table student, which will hold data
of type varchar which is nothing but string, of length 200.

ALTER Command: Add multiple new Columns


Using ALTER command we can even add multiple new columns to any existing table. Following
is the syntax,

ALTER TABLE table_name ADD(


column_name1 datatype1,
column-name2 datatype2,
column-name3 datatype3);

Here is an Example for this,

ALTER TABLE student ADD(


father_name VARCHAR(60),
mother_name VARCHAR(60),
dob DATE);

The above command will add three new columns to the student table

ALTER Command: Add Column with default value


ALTER command can add a new column to an existing table with a default value too. The default
value is used when no value is inserted in the column. Following is the syntax,

ALTER TABLE table_name ADD(


column-name1 datatype1 DEFAULT some_value
);

Here is an Example for this,

ALTER TABLE student ADD(


dob DATE DEFAULT '01-Jan-99'
);

The above command will add a new column with a preset default value to the table student.

ALTER Command: Modify an existing Column


ALTER command can also be used to modify data type of any existing column. Following is the
syntax,

ALTER TABLE table_name modify(


column_name datatype
);

Here is an Example for this,

ALTER TABLE student MODIFY(


address varchar(300));

Remember we added a new column address in the beginning? The above command will modify
the address column of the student table, to now hold upto 300 characters.

ALTER Command: Rename a Column


Using ALTER command you can rename an existing column. Following is the syntax,

ALTER TABLE table_name RENAME


old_column_name TO new_column_name;

Here is an example for this,

ALTER TABLE student RENAME


address TO location;

The above command will rename address column to location.


ALTER Command: Drop a Column
ALTER command can also be used to drop or remove columns. Following is the syntax,

ALTER TABLE table_name DROP(


column_name);

Here is an example for this,

ALTER TABLE student DROP(


address);

The above command will drop the address column from the table student.

Truncate, Drop or Rename a Table


In this tutorial we will learn about the various DDL commands which are used to re-define the
tables.

TRUNCATE command
TRUNCATE command removes all the records from a table. But this command will not destroy the
table's structure. When we use TRUNCATE command on a table its (auto-increment) primary key is
also initialized. Following is its syntax,

TRUNCATE TABLE table_name

Here is an example explaining it,

TRUNCATE TABLE student;

The above query will delete all the records from the table student.

In DML commands, we will study about the DELETE command which is also more or less same
as the TRUNCATE command. We will also learn about the difference between the two in that
tutorial.

DROP command
DROP command completely removes a table from the database. This command will also destroy
the table structure and the data stored in it. Following is its syntax,

DROP TABLE table_name


Here is an example explaining it,

DROP TABLE student;

The above query will delete the Student table completely. It can also be used on Databases, to
delete the complete database. For example, to drop a database,

DROP DATABASE Test;

The above query will drop the database with name Test from the system.

RENAME query
RENAME command is used to set a new name for any existing table. Following is the syntax,

RENAME TABLE old_table_name to new_table_name

Here is an example explaining it.

RENAME TABLE student to students_info;

The above query will rename the table student to students_info.

Using INSERT SQL command


Data Manipulation Language (DML) statements are used for managing data in database. DML
commands are not auto-committed. It means changes made by DML command are not
permanent to database, it can be rolled back.

Talking about the Insert command, whenever we post a Tweet on Twitter, the text is stored in
some table, and as we post a new tweet, a new record gets inserted in that table.

INSERT command
Insert command is used to insert data into a table. Following is its general syntax,

INSERT INTO table_name VALUES(data1, data2, ...)

Lets see an example,

Consider a table student with the following fields.


s_id name age
INSERT INTO student VALUES(101, 'Adam', 15);

The above command will insert a new record into student table.

s_id name age


101 Adam 15

Insert value into only specific columns

We can use the INSERT command to insert values for only some specific columns of a row. We
can specify the column names along with the values to be inserted like this,

INSERT INTO student(id, name) values(102, 'Alex');

The above SQL query will only insert id and name values in the newly inserted record.

Insert NULL value to a column

Both the statements below will insert NULL value into age column of the student table.

INSERT INTO student(id, name) values(102, 'Alex');

Or,

INSERT INTO Student VALUES(102,'Alex', null);

The above command will insert only two column values and the other column is set to null.

S_id S_Name age


101 Adam 15
102 Alex

Insert Default value to a column


INSERT INTO Student VALUES(103,'Chris', default)
S_id S_Name age
101 Adam 15
102 Alex
103 chris 14

Suppose the column age in our tabel has a default value of 14.
Also, if you run the below query, it will insert default value into the age column, whatever the
default value may be.

INSERT INTO Student VALUES(103,'Chris')

Using UPDATE SQL command


Let's take an example of a real-world problem. These days, Facebook provides an option for
Editing your status update, how do you think it works? Yes, using the Update SQL command.

Let's learn about the syntax and usage of the UPDATE command.

UPDATE command
UPDATE command is used to update any record of data in a table. Following is its general syntax,

UPDATE table_name SET column_name = new_value WHERE some_condition;


WHERE is used to add a condition to any SQL query, we will soon study about it in detail.

Lets take a sample table student,

student_id name age


101 Adam 15
102 Alex
103 chris 14
UPDATE student SET age=18 WHERE student_id=102;
S_id S_Name age
101 Adam 15
102 Alex 18
103 chris 14

In the above statement, if we do not use the WHERE clause, then our update query will update age
for all the columns of the table to 18.

Updating Multiple Columns

We can also update values of multiple columns using a single UPDATE statement.

UPDATE student SET name='Abhi', age=17 where s_id=103;


The above command will update two columns of the record which has s_id 103.

s_id name age


101 Adam 15
102 Alex 18
103 Abhi 17

UPDATE Command: Incrementing Integer Value

When we have to update any integer value in a table, then we can fetch and update the value in
the table in a single statement.

For example, if we have to update the age column of student table every year for every student,
then we can simply run the following UPDATE statement to perform the following operation:

UPDATE student SET age = age+1;

As you can see, we have used age = age + 1 to increment the value of age by 1.

NOTE: This style only works for integer values.

Using DELETE SQL command


When you ask any question in Studytonight's Forum it gets saved into a table. And using the
Delete option, you can even delete a question asked by you. How do you think that works? Yes,
using the Delete DML command.

Let's study about the syntax and the usage of the Delete command.

DELETE command
DELETE command is used to delete data from a table.

Following is its general syntax,

DELETE FROM table_name;

Let's take a sample table student:

s_id name age


101 Adam 15
102 Alex 18
103 Abhi 17

Delete all Records from a Table


DELETE FROM student;

The above command will delete all the records from the table student.

Delete a particular Record from a Table

In our student table if we want to delete a single record, we can use the WHERE clause to provide
a condition in our DELETE statement.

DELETE FROM student WHERE s_id=103;

The above command will delete the record where s_id is 103 from the table student.

S_id S_Name age


101 Adam 15
102 Alex 18

Isn't DELETE same as TRUNCATE

TRUNCATE command is different from DELETE command. The delete command will delete all the
rows from a table whereas truncate command not only deletes all the records stored in the table,
but it also re-initializes the table(like a newly created table).

For eg: If you have a table with 10 rows and an auto_increment primary key, and if you use
DELETE command to delete all the rows, it will delete all the rows, but will not re-initialize the
primary key, hence if you will insert any row after using the DELETE command, the
auto_increment primary key will start from 11. But in case of TRUNCATE command, primary key
is re-initialized, and it will again start from 1.

Commit, Rollback and Savepoint SQL


commands
Transaction Control Language(TCL) commands are used to manage transactions in the database.
These are used to manage the changes made to the data in a table by DML statements. It also
allows statements to be grouped together into logical transactions.

COMMIT command
COMMIT command is used to permanently save any transaction into the database.

When we use any DML command like INSERT, UPDATE or DELETE, the changes made by these
commands are not permanent, until the current session is closed, the changes made by these
commands can be rolled back.

To avoid that, we use the COMMIT command to mark the changes as permanent.

Following is commit command's syntax,

COMMIT;

ROLLBACK command
This command restores the database to last commited state. It is also used with SAVEPOINT
command to jump to a savepoint in an ongoing transaction.

If we have used the UPDATE command to make some changes into the database, and realise that
those changes were not required, then we can use the ROLLBACK command to rollback those
changes, if they were not commited using the COMMIT command.

Following is rollback command's syntax,

ROLLBACK TO savepoint_name;

SAVEPOINT command
SAVEPOINT command is used to temporarily save a transaction so that you can rollback to that
point whenever required.

Following is savepoint command's syntax,

SAVEPOINT savepoint_name;
In short, using this command we can name the different states of our data in any table and then
rollback to that state using the ROLLBACK command whenever required.

Using Savepoint and Rollback

Following is the table class,

id name
1 Abhi
2 Adam
4 Alex

Lets use some SQL queries on the above table and see the results.

INSERT INTO class VALUES(5, 'Rahul');

COMMIT;

UPDATE class SET name = 'Abhijit' WHERE id = '5';

SAVEPOINT A;

INSERT INTO class VALUES(6, 'Chris');

SAVEPOINT B;

INSERT INTO class VALUES(7, 'Bravo');

SAVEPOINT C;

SELECT * FROM class;


NOTE: SELECT statement is used to show the data stored in the table.

The resultant table will look like,

id name
1 Abhi
2 Adam
4 Alex
5 Abhijit
6 Chris
7 Bravo

Now let's use the ROLLBACK command to roll back the state of data to the savepoint B.

ROLLBACK TO B;
SELECT * FROM class;

Now our class table will look like,

id name
1 Abhi
2 Adam
4 Alex
5 Abhijit
6 Chris

Now let's again use the ROLLBACK command to roll back the state of data to the savepoint A

ROLLBACK TO A;

SELECT * FROM class;

Now the table will look like,

id name
1 Abhi
2 Adam
4 Alex
5 Abhijit

So now you know how the commands COMMIT, ROLLBACK and SAVEPOINT works.

Using GRANT and REVOKE


Data Control Language(DCL) is used to control privileges in Database. To perform any
operation in the database, such as for creating tables, sequences or views, a user needs privileges.
Privileges are of two types,

 System: This includes permissions for creating session, table, etc and all types of other
system privileges.
 Object: This includes permissions for any command or query to perform any operation
on the database tables.

In DCL we have two commands,

 GRANT: Used to provide any user access privileges or other priviliges for the database.
 REVOKE: Used to take back permissions from any user.
Allow a User to create session

When we create a user in SQL, it is not even allowed to login and create a session until and
unless proper permissions/priviliges are granted to the user.

Following command can be used to grant the session creating priviliges.

GRANT CREATE SESSION TO username;

Allow a User to create table

To allow a user to create tables in the database, we can use the below command,

GRANT CREATE TABLE TO username;

Provide user with space on tablespace to store table

Allowing a user to create table is not enough to start storing data in that table. We also must
provide the user with priviliges to use the available tablespace for their table and data.

ALTER USER username QUOTA UNLIMITED ON SYSTEM;

The above command will alter the user details and will provide it access to unlimited tablespace
on system.

NOTE: Generally unlimited quota is provided to Admin users.

Grant all privilege to a User

sysdba is a set of priviliges which has all the permissions in it. So if we want to provide all the
privileges to any user, we can simply grant them the sysdba permission.

GRANT sysdba TO username

Grant permission to create any table

Sometimes user is restricted from creating come tables with names which are reserved for system
tables. But we can grant privileges to a user to create any table using the below command,

GRANT CREATE ANY TABLE TO username


Grant permission to drop any table

As the title suggests, if you want to allow user to drop any table from the database, then grant
this privilege to the user,

GRANT DROP ANY TABLE TO username

To take back Permissions

And, if you want to take back the privileges from any user, use the REVOKE command.

REVOKE CREATE TABLE FROM username

Using the WHERE SQL clause


WHERE clause is used to specify/apply any condition while retrieving, updating or deleting data
from a table. This clause is used mostly with SELECT, UPDATE and DELETEquery.

When we specify a condition using the WHERE clause then the query executes only for those
records for which the condition specified by the WHERE clause is true.

Syntax for WHERE clause

Here is how you can use the WHERE clause with a DELETE statement, or any other statement,

DELETE FROM table_name WHERE [condition];

The WHERE clause is used at the end of any SQL query, to specify a condition for execution.

Time for an Example

Consider a table student,

s_id name age address


101 Adam 15 Chennai
102 Alex 18 Delhi
103 Abhi 17 Banglore
104 Ankit 22 Mumbai
Now we will use the SELECT statement to display data of the table, based on a condition, which
we will add to our SELECT query using WHERE clause.

Let's write a simple SQL query to display the record for student with s_id as 101.

SELECT s_id,
name,
age,
address
FROM student WHERE s_id = 101;

Following will be the result of the above query.

s_id name age address


101 Adam 15 Noida

Applying condition on Text Fields

In the above example we have applied a condition to an integer value field, but what if we want
to apply the condition on name field. In that case we must enclose the value in single quote ' '.
Some databases even accept double quotes, but single quotes is accepted by all.

SELECT s_id,
name,
age,
address
FROM student WHERE name = 'Adam';

Following will be the result of the above query.

s_id name age address


101 Adam 15 Noida

Operators for WHERE clause condition


Following is a list of operators that can be used while specifying the WHERE clause condition.

Operator Description
= Equal to
!= Not Equal to
< Less than
> Greater than
<= Less than or Equal to
>= Greate than or Equal to
BETWEEN Between a specified range of values
LIKE This is used to search for a pattern in value.
IN In a given set of values

SQL LIKE clause


LIKE clause is used in the condition in SQL query with the WHERE clause. LIKE clause compares
data with an expression using wildcard operators to match pattern given in the condition.

Wildcard operators

There are two wildcard operators that are used in LIKE clause.

 Percent sign %: represents zero, one or more than one character.


 Underscore sign _: represents only a single character.

Example of LIKE clause

Consider the following Student table.

s_id s_Name age


101 Adam 15
102 Alex 18
103 Abhi 17
SELECT * FROM Student WHERE s_name LIKE 'A%';

The above query will return all records where s_name starts with character 'A'.

s_id s_Name age


101 Adam 15
102 Alex 18
103 Abhi 17

Using _ and %
SELECT * FROM Student WHERE s_name LIKE '_d%';

The above query will return all records from Student table where s_name contain 'd' as second
character.
s_id s_Name age
101 Adam 15

Using % only
SELECT * FROM Student WHERE s_name LIKE '%x';

The above query will return all records from Student table where s_name contain 'x' as last
character.

s_id s_Name age


102 Alex 18

ORDER BY Clause
Order by clause is used with SELECT statement for arranging retrieved data in sorted order. The
Order by clause by default sorts the retrieved data in ascending order. To sort the data in
descending order DESC keyword is used with Order by clause.

Syntax of Order By

SELECT column-list|* FROM table-name ORDER BY ASC | DESC;

Using default Order by

Consider the following Emp table,

eid name age salary


401 Anu 22 9000
402 Shane 29 8000
403 Rohan 34 6000
404 Scott 44 10000
405 Tiger 35 8000
SELECT * FROM Emp ORDER BY salary;

The above query will return the resultant data in ascending order of the salary.

eid name age salary


403 Rohan 34 6000
402 Shane 29 8000
405 Tiger 35 8000
401 Anu 22 9000
404 Scott 44 10000

Using Order by DESC

Consider the Emp table described above,

SELECT * FROM Emp ORDER BY salary DESC;

The above query will return the resultant data in descending order of the salary.

eid name age salary


404 Scott 44 10000
401 Anu 22 9000
405 Tiger 35 8000
402 Shane 29 8000
403 Rohan 34 6000

Group By Clause
Group by clause is used to group the results of a SELECT query based on one or more columns. It
is also used with SQL functions to group the result from one or more tables.

Syntax for using Group by in a statement.

SELECT column_name, function(column_name)


FROM table_name
WHERE condition
GROUP BY column_name

Example of Group by in a Statement

Consider the following Emp table.

eid name age salary


401 Anu 22 9000
402 Shane 29 8000
403 Rohan 34 6000
404 Scott 44 9000
405 Tiger 35 8000

Here we want to find name and age of employees grouped by their salaries or in other words,
we will be grouping employees based on their salaries, hence, as a result, we will get a data set,
with unique salaries listed, along side the first employee's name and age to have that salary. Hope
you are getting the point here!

group by is used to group different row of data together based on any one column.

SQL query for the above requirement will be,

SELECT name, age


FROM Emp GROUP BY salary

Result will be,

name age
Rohan 34
Shane 29
Anu 22

Example of Group by in a Statement with WHERE clause

Consider the following Emp table

eid name age salary


401 Anu 22 9000
402 Shane 29 8000
403 Rohan 34 6000
404 Scott 44 9000
405 Tiger 35 8000

SQL query will be,

SELECT name, salary


FROM Emp
WHERE age > 25
GROUP BY salary

Result will be.

name salary
Rohan 6000
Shane 8000
Scott 9000

You must remember that Group By clause will always come at the end of the SQL query, just
like the Order by clause.

HAVING Clause
Having clause is used with SQL Queries to give more precise condition for a statement. It is
used to mention condition in Group by based SQL queries, just like WHERE clause is used with
SELECT query.

Syntax for HAVING clause is,

SELECT column_name, function(column_name)


FROM table_name
WHERE column_name condition
GROUP BY column_name
HAVING function(column_name) condition

Example of SQL Statement using HAVING

Consider the following Sale table.

oid order_name previous_balance customer


11 ord1 2000 Alex
12 ord2 1000 Adam
13 ord3 2000 Abhi
14 ord4 1000 Adam
15 ord5 2000 Alex

Suppose we want to find the customer whose previous_balance sum is more than 3000.

We will use the below SQL query,

SELECT *
FROM sale GROUP BY customer
HAVING sum(previous_balance) > 3000

Result will be,

oid order_name previous_balance customer


11 ord1 2000 Alex
The main objective of the above SQL query was to find out the name of the customer who has
had a previous_balance more than 3000, based on all the previous sales made to the customer,
hence we get the first row in the table for customer Alex.

DISTINCT keyword
The distinct keyword is used with SELECT statement to retrieve unique values from the table.
Distinct removes all the duplicate records while retrieving records from any table in the
database.

Syntax for DISTINCT Keyword


SELECT DISTINCT column-name FROM table-name;

Example using DISTINCT Keyword

Consider the following Emp table. As you can see in the table below, there is employee name,
along with employee salary and age.

In the table below, multiple employees have the same salary, so we will be using DISTINCT
keyword to list down distinct salary amount, that is currently being paid to the employees.

eid name age salary


401 Anu 22 5000
402 Shane 29 8000
403 Rohan 34 10000
404 Scott 44 10000
405 Tiger 35 8000
SELECT DISTINCT salary FROM Emp;

The above query will return only the unique salary from Emp table.

salary
5000
8000
10000

AND & OR operator


The AND and OR operators are used with the WHERE clause to make more precise conditions for
fetching data from database by combining more than one condition together.

AND operator
AND operator is used to set multiple conditions with the WHERE clause, alongside, SELECT, UPDATE
or DELETE SQL queries.

Example of AND operator

Consider the following Emp table

eid name age salary


401 Anu 22 5000
402 Shane 29 8000
403 Rohan 34 12000
404 Scott 44 10000
405 Tiger 35 9000
SELECT * FROM Emp WHERE salary < 10000 AND age > 25

The above query will return records where salary is less than 10000 and age greater than 25.
Hope you get the concept here. We have used the AND operator to specify two conditions with
WHERE clause.

eid name age salary


402 Shane 29 8000
405 Tiger 35 9000

OR operator
OR operator is also used to combine multiple conditions with WHERE clause. The only difference
between AND and OR is their behaviour.

When we use AND to combine two or more than two conditions, records satisfying all the
specified conditions will be there in the result.

But in case of OR operator, atleast one condition from the conditions specified must be satisfied
by any record to be in the resultset.
Example of OR operator

Consider the following Emp table

eid name age salary


401 Anu 22 5000
402 Shane 29 8000
403 Rohan 34 12000
404 Scott 44 10000
405 Tiger 35 9000
SELECT * FROM Emp WHERE salary > 10000 OR age > 25

The above query will return records where either salary is greater than 10000 or age is greater
than 25.

402 Shane 29 8000


403 Rohan 34 12000
404 Scott 44 10000
405 Tiger 35 9000

SQL JOIN
SQL Join is used to fetch data from two or more tables, which is joined to appear as single set of
data. It is used for combining column from two or more tables by using values common to both
tables.

JOIN Keyword is used in SQL queries for joining two or more tables. Minimum required
condition for joining table, is (n-1) where n, is number of tables. A table can also join to itself,
which is known as, Self Join.

Types of JOIN
Following are the types of JOIN that we can use in SQL:

 Inner
 Outer
 Left
 Right
Cross JOIN or Cartesian Product
This type of JOIN returns the cartesian product of rows from the tables in Join. It will return a
table which consists of records which combines each row from the first table with each row of
the second table.

Cross JOIN Syntax is,

SELECT column-name-list
FROM
table-name1 CROSS JOIN table-name2;

Example of Cross JOIN

Following is the class table,

ID NAME
1 abhi
2 adam
4 alex

and the class_info table,

ID Address
1 DELHI
2 MUMBAI
3 CHENNAI

Cross JOIN query will be,

SELECT * FROM
class CROSS JOIN class_info;

The resultset table will look like,

ID NAME ID Address
1 abhi 1 DELHI
2 adam 1 DELHI
4 alex 1 DELHI
1 abhi 2 MUMBAI
2 adam 2 MUMBAI
4 alex 2 MUMBAI
1 abhi 3 CHENNAI
2 adam 3 CHENNAI
4 alex 3 CHENNAI

As you can see, this join returns the cross product of all the records present in both the tables.

INNER Join or EQUI Join


This is a simple JOIN in which the result is based on matched data as per the equality condition
specified in the SQL query.

Inner Join Syntax is,

SELECT column-name-list FROM


table-name1 INNER JOIN table-name2
WHERE table-name1.column-name = table-name2.column-name;

Example of INNER JOIN

Consider a class table,

ID NAME
1 abhi
2 adam
3 alex
4 anu

and the class_info table,

ID Address
1 DELHI
2 MUMBAI
3 CHENNAI

Inner JOIN query will be,

SELECT * from class INNER JOIN class_info where class.id = class_info.id;

The resultset table will look like,

ID NAME ID Address
1 abhi 1 DELHI
2 adam 2 MUMBAI
3 alex 3 CHENNAI

Natural JOIN

Natural Join is a type of Inner join which is based on column having same name and same
datatype present in both the tables to be joined.

The syntax for Natural Join is,

SELECT * FROM
table-name1 NATURAL JOIN table-name2;

Example of Natural JOIN

Here is the class table,

ID NAME
1 abhi
2 adam
3 alex
4 anu

and the class_info table,

ID Address
1 DELHI
2 MUMBAI
3 CHENNAI

Natural join query will be,

SELECT * from class NATURAL JOIN class_info;

The resultset table will look like,

ID NAME Address
1 abhi DELHI
2 adam MUMBAI
3 alex CHENNAI
In the above example, both the tables being joined have ID column(same name and same
datatype), hence the records for which value of ID matches in both the tables will be the result of
Natural Join of these two tables.

OUTER JOIN
Outer Join is based on both matched and unmatched data. Outer Joins subdivide further into,

1. Left Outer Join


2. Right Outer Join
3. Full Outer Join

LEFT Outer Join

The left outer join returns a resultset table with the matched data from the two tables and then
the remaining rows of the left table and null from the right table's columns.

Syntax for Left Outer Join is,

SELECT column-name-list FROM


table-name1 LEFT OUTER JOIN table-name2
ON table-name1.column-name = table-name2.column-name;

To specify a condition, we use the ON keyword with Outer Join.

Left outer Join Syntax for Oracle is,

SELECT column-name-list FROM


table-name1, table-name2 on table-name1.column-name = table-name2.column-
name(+);

Example of Left Outer Join

Here is the class table,

ID NAME
1 abhi
2 adam
3 alex
4 anu
5 ashish
and the class_info table,

ID Address
1 DELHI
2 MUMBAI
3 CHENNAI
7 NOIDA
8 PANIPAT

Left Outer Join query will be,

SELECT * FROM class LEFT OUTER JOIN class_info ON (class.id = class_info.id);

The resultset table will look like,

ID NAME ID Address
1 abhi 1 DELHI
2 adam 2 MUMBAI
3 alex 3 CHENNAI
4 anu null null
5 ashish null null

RIGHT Outer Join

The right outer join returns a resultset table with the matched data from the two tables being
joined, then the remaining rows of the right table and null for the remaining left table's columns.

Syntax for Right Outer Join is,

SELECT column-name-list FROM


table-name1 RIGHT OUTER JOIN table-name2
ON table-name1.column-name = table-name2.column-name;

Right outer Join Syntax for Oracle is,

SELECT column-name-list FROM


table-name1, table-name2
ON table-name1.column-name(+) = table-name2.column-name;

Example of Right Outer Join

Once again the class table,


ID NAME
1 abhi
2 adam
3 alex
4 anu
5 ashish

and the class_info table,

ID Address
1 DELHI
2 MUMBAI
3 CHENNAI
7 NOIDA
8 PANIPAT

Right Outer Join query will be,

SELECT * FROM class RIGHT OUTER JOIN class_info ON (class.id = class_info.id);

The resultant table will look like,

ID NAME ID Address
1 abhi 1 DELHI
2 adam 2 MUMBAI
3 alex 3 CHENNAI
null null 7 NOIDA
null null 8 PANIPAT

Full Outer Join

The full outer join returns a resultset table with the matched data of two table then remaining
rows of both left table and then the right table.

Syntax of Full Outer Join is,

SELECT column-name-list FROM


table-name1 FULL OUTER JOIN table-name2
ON table-name1.column-name = table-name2.column-name;

Example of Full outer join is,

The class table,


ID NAME
1 abhi
2 adam
3 alex
4 anu
5 ashish

and the class_info table,

ID Address
1 DELHI
2 MUMBAI
3 CHENNAI
7 NOIDA
8 PANIPAT

Full Outer Join query will be like,

SELECT * FROM class FULL OUTER JOIN class_info ON (class.id = class_info.id);

The resultset table will look like,

ID NAME ID Address
1 abhi 1 DELHI
2 adam 2 MUMBAI
3 alex 3 CHENNAI
4 anu null null
5 ashish null null
null null 7 NOIDA
null null 8 PANIPAT

What is Relational Algebra?


Every database management system must define a query language to allow users to access the
data stored in the database. Relational Algebra is a procedural query language used to query the
database tables to access data in different ways.
In relational algebra, input is a relation(table from which data has to be accessed) and output is
also a relation(a temporary table holding the data asked for by the user).

Relational Algebra works on the whole table at once, so we do not have to use loops etc to iterate
over all the rows(tuples) of data one by one. All we have to do is specify the table name from
which we need the data, and in a single line of command, relational algebra will traverse the
entire given table to fetch data for you.

The primary operations that we can perform using relational algebra are:

1. Select
2. Project
3. Union
4. Set Different
5. Cartesian product
6. Rename

Select Operation (σ)


This is used to fetch rows(tuples) from table(relation) which satisfies a given condition.

Syntax: σp(r)

Where, σ represents the Select Predicate, r is the name of relation(table name in which you want
to look for data), and p is the prepositional logic, where we specify the conditions that must be
satisfied by the data. In prepositional logic, one can use unary and binary operators like =, <, >
etc, to specify the conditions.

Let's take an example of the Student table we specified above in the Introduction of relational
algebra, and fetch data for students with age more than 17.

σage > 17 (Student)

This will fetch the tuples(rows) from table Student, for which age will be greater than 17.

You can also use, and, or etc operators, to specify two conditions, for example,

σage > 17 and gender = 'Male' (Student)

This will return tuples(rows) from table Student with information of male students, of age more
than 17.(Consider the Student table has an attribute Gender too.)

Project Operation (∏)


Project operation is used to project only a certain set of attributes of a relation. In simple words,
If you want to see only the names all of the students in the Student table, then you can use
Project Operation.

It will only project or show the columns or attributes asked for, and will also remove duplicate
data from the columns.

Syntax: ∏A1, A2...(r)

where A1, A2 etc are attribute names(column names).

For example,

∏Name, Age (Student)

Above statement will show us only the Name and Age columns for all the rows of data in
Student table.

Union Operation (∪)


This operation is used to fetch data from two relations(tables) or temporary relation(result of
another operation).
For this operation to work, the relations(tables) specified should have same number of
attributes(columns) and same attribute domain. Also the duplicate tuples are autamatically
eliminated from the result.

Syntax: A ∪ B

where A and B are relations.

For example, if we have two tables RegularClass and ExtraClass, both have a column student
to save name of student, then,

∏Student(RegularClass) ∪ ∏Student(ExtraClass)

Above operation will give us name of Students who are attending both regular classes and extra
classes, eliminating repetition.

Set Difference (-)


This operation is used to find data present in one relation and not present in the second relation.
This operation is also applicable on two relations, just like Union operation.

Syntax: A - B

where A and B are relations.

For example, if we want to find name of students who attend the regular class but not the extra
class, then, we can use the below operation:

∏Student(RegularClass) - ∏Student(ExtraClass)

Cartesian Product (X)


This is used to combine data from two different relations(tables) into one and fetch data from the
combined relation.

Syntax: A X B

For example, if we want to find the information for Regular Class and Extra Class which are
conducted during morning, then, we can use the following operation:

σtime = 'morning' (RegularClass X ExtraClass)


For the above query to work, both RegularClass and ExtraClass should have the attribute time.

Rename Operation (ρ)


This operation is used to rename the output relation for any query operation which returns result
like Select, Project etc. Or to simply rename a relation(table)

Syntax: ρ(RelationNew, RelationOld)

Apart from these common operations Relational Algebra is also used for Join operations like,

 Natural Join
 Outer Join
 Theta join etc.

What is Relational Calculus?


Contrary to Relational Algebra which is a procedural query language to fetch data and which
also explains how it is done, Relational Calculus in non-procedural query language and has no
description about how the query will work or the data will b fetched. It only focusses on what to
do, and not on how to do it.

Relational Calculus exists in two forms:

1. Tuple Relational Calculus (TRC)


2. Domain Relational Calculus (DRC)

Tuple Relational Calculus (TRC)

In tuple relational calculus, we work on filtering tuples based on the given condition.

Syntax: { T | Condition }

In this form of relational calculus, we define a tuple variable, specify the table(relation) name in
which the tuple is to be searched for, along with a condition.

We can also specify column name using a . dot operator, with the tuple variable to only get a
certain attribute(column) in result.

A lot of informtion, right! Give it some time to sink in.


A tuple variable is nothing but a name, can be anything, generally we use a single alphabet for
this, so let's say T is a tuple variable.

To specify the name of the relation(table) in which we want to look for data, we do the
following:

Relation(T), where T is our tuple variable.

For example if our table is Student, we would put it as Student(T)

Then comes the condition part, to specify a condition applicable for a particluar
attribute(column), we can use the . dot variable with the tuple variable to specify it, like in table
Student, if we want to get data for students with age greater than 17, then, we can write it as,

T.age > 17, where T is our tuple variable.

Putting it all together, if we want to use Tuple Relational Calculus to fetch names of students,
from table Student, with age greater than 17, then, for T being our tuple variable,

T.name | Student(T) AND T.age > 17

Domain Relational Calculus (DRC)

In domain relational calculus, filtering is done based on the domain of the attributes and not
based on the tuple values.

Syntax: { c1, c2, c3, ..., cn | F(c1, c2, c3, ... ,cn)}

where, c1, c2... etc represents domain of attributes(columns) and F defines the formula including
the condition for fetching the data.

For example,

{< name, age > | ∈ Student ∧ age > 17}

Again, the above query will return the names and ages of the students in the table Student who
are older than 17.

https://www.studytonight.com/dbms/sql-function.php

You might also like