Database
Database
What is data?
• Data is a collection of characters, numbers, and other symbols that
represents values of some situations or variables
• Examples:
• Name, age, gender, contact details, etc., of a person
• Transactions data generated through banking, ticketing, shopping, etc. whether
online or offline
• Images, graphics, animations, audio, video
• Documents and web pages
• Online posts, comments and messages
• Signals generated by sensors
• Satellite data including meteorological data, communication data, earth observation
data, etc.
• Let us take an example of a school that maintains data about its students,
along with their attendance record and guardian details.
• limitations of manual record keeping
• Entry of student details (Roll number and name) in the new attendance register
when the student is promoted to the next class.
• Writing student details on each month’s attendance page where inconsistency
may happen due to incorrectly written names, skipped student records, etc.
• Loss of data in case attendance register is lost or damaged.
• Erroneous calculation while consolidating attendance record manually.
• Finding information from a huge volume of papers or deleting/modifying an
entry is a difficult task in pen and paper based approach
• To overcome the hassles faced in manual record keeping, it is desirable to
store attendance record and student details on separate data files on a
computerised system, so that office staff and teachers can:
• Simply copy the student details to the new attendance file from the old attendance file
when students are promoted to next class
• Find any data about student or guardian
• Add more details to existing data whenever a new student joins the school
• Modify stored data like details of student or guardian whenever required
• Remove/delete data whenever a student leaves the school
File System
• A file can be understood as a container to store data in a computer
• Files can be stored on the storage device of a computer system
• Contents of a file can be texts, computer program code, comma separated
values (CSV), etc. Likewise, pictures, audios/videos, web pages are also files
• Files stored on a computer can be accessed directly and searched for
desired data
• But to access data of a file through software, for example, to display
monthly attendance report on school website, one has to write computer
programs to access data from files
• we need to store data about students and attendance in two
separate files
• Table 8.1 shows the contents of STUDENT file which has six
columns, as detailed below:
• RollNumber – Roll number of the student
• SName – Name of the student
• SDateofBirth – Date of birth of the student
• GName – Name of the guardian
• GPhone – Phone number of the student guardian
• GAddress – Address of the guardian of the student
• Table 8.2 shows another file called ATTENDANCE which has four
columns, as detailed below:
• AttendanceDate – Date for which attendance was marked
• RollNumber – Roll number of the student
• SName – Name of the student
• AttendanceStatus – Marked as P (present) or A (absent)
Limitations of a File System
• Difficulty in Access
• Files themselves do not provide any mechanism to retrieve data
• Data maintained in a file system are accessed through application programs
• Data Redundancy
• Redundancy means same data are duplicated in different places (files)
• In our example, student names are maintained in both the files. Besides, in
Table 8.1, students with roll numbers 3 and 5 have same guardian name and
therefore same guardian name is maintained twice.
• Redundancy leads to excess storage use and may cause data inconsistency also
• Data Inconsistency
• Data inconsistency occurs when same data maintained in different places do not
match
• If a student wants to get changed the spelling of her name, it needs to be changed
in SName column in both the files.
• Data Isolation
• Both the files presented at Table 8.1 (STUDENT) and at Table 8.2 (ATTENDANCE)
are related to students
• But there is no link or mapping between them
• The school will have to write separate programs to access these two files
• This is because data mapping is not supported in file system
• Data Dependence
• Data are stored in a specific format or structure in a file
• If the structure or format itself is changed, all the existing application programs
accessing that file also need to be changed
• Otherwise, the programs may not work correctly
• Controlled Data Sharing
• There can be different category of users like teacher, office staff and parents Ideally,
not every user should be able to access all the data.
• As an example, guardians and office staff can only see the student attendance data
but should not be able to modify/delete it
• It means these users should be given limited access (read only) to the
ATTENDANCE file
• Only the teacher should be able to update the attendance data
• It is very difficult to enforce this kind of access control in a file system while
accessing files through application programs
Database management system
• A database management system (DBMS) or database system in short,
is a software that can be used to create and manage databases
• DBMS lets users to create a database, store, manage, update/modify
and retrieve data from that database by users or application programs
• Some examples of open source and commercial DBMS include MySQL,
Oracle, PostgreSQL, SQL Server, Microsoft Access, MongoDB
• The DBMS serves as an interface between the database and end users
or application programs
• Retrieving data from a database through special type of commands is
called querying the database
File System to DBMS
• Let us now design a database to store data of those two files
• We know that tables in a database are linked or related through one or
more common columns or fields
• In our example, the STUDENT (Table 8.1) file and ATTENDANCE (Table
8.2) file have RollNumber and SName as common field names
• In order to convert these two files into a database, we need to
incorporate the following changes:
Key Concepts in DBMS
• Database Schema
• Database Schema is the design of a database
• It is the skeleton of the database that represents the structure (table names
and their fields/columns), the type of data each column can hold, constraints
on the data to be stored (if any), and the relationships among the tables
• Data Constraint
• Sometimes we put certain restrictions or limitations on the type of data that
can be inserted in one or more columns of a table
• This is done by specifying one or more constraints on that column(s) while
creating the tables
• For example, one can define the constraint that the column mobile number
can only have non-negative integer values of exactly 10 digits
• Query
• A query is a request to a database for obtaining information in a desired way. Query
can be made to get data from one table or from a combination of tables
• For example, “find names of all those students present on Attendance Date 2000-01-
02” is a query to the database
• Data Manipulation
• Modification of database consists of three operations viz. Insertion, Deletion or
Update
• Database Instance
• When we define database structure or schema, state of database is
empty i.e. no data entry is there
• After loading data, the state or snapshot of the database at any given time is
the database instance
• Meta-data or Data Dictionary
• The database schema along with various constraints on the data is stored by
DBMS in a database catalog or dictionary, called meta-data
• A meta-data is data about the data.
• Database Engine
• Database engine is the underlying component or set of programs used by a
DBMS to create database and handle various queries for data retrieval and
manipulation.
Relational data model
• A data model describes the structure of the database, including how
data are defined and represented, relationships among data, and the
constraints
• The most commonly used data model is Relational Data Model
• Other types of data models include object-oriented data model, entity-
relationship data model, document model and hierarchical data model
• In relational database model data is organized into table (i.e. rows
and columns)
• These tables are also known as relations
• A row in a table represent relationship among a set of values
• Row of a relation is called tuple or record
• A column represent the field/attributes related to relation under
which information will be stored
• Column of a relation is called attribute or field
• Number of columns (attributes) in a relation is called degree
• Number of records in a relation is called cardinality
• It is important to note here that relations in a database are not
independent tables, but are associated with each other
• For example, relation ATTENDANCE has attribute RollNumber which
links it with corresponding student record in relation STUDENT
• Similarly, attribute GUID is placed with STUDENT table for
extracting guardian details of a particular student
• If linking attributes are not there in appropriate relations, it will not
be possible to keep the database in correct state and retrieve valid
information from the database
Three Important Properties of a Relation
| 101 | Aaliya |
| 102 | Kritika |
| 103 | Shabbir |
| 104 | Gurpreet |
| 105 | Joseph |
| 106 | Sanya |
| 107 | Vergese |
| 108 | Nachaobi |
| 109 | Daribha |
| 110 | Tanya |
+-------+----------+
10 rows in set (0.00 sec)
(B) Renaming of columns
• In case we want to rename any column while displaying the output, it can be done by using the alias 'AS'
• The following query selects Employee name as Name in the output for all the employees:
mysql> SELECT EName as Name FROM EMPLOYEE;
+----------+
| Name |
+----------+
| Aaliya |
| Kritika |
| Shabbir |
| Gurpreet |
| Joseph |
| Sanya |
| Vergese |
| Nachaobi |
| Daribha |
| Tanya |
+----------+
10 rows in set (0.00 sec)
• Example: Select names of all employees along with their annual income
(calculated as Salary*12). While displaying the query result, rename the
column EName as Name
mysql> SELECT EName AS Name, Salary*12 FROM EMPLOYEE;
• Observe that in the output, Salary*12 is displayed as the column name
for the Annual Income column. In the output table, we can use alias to
rename that column as Annual Income as shown below:
mysql> SELECT Ename AS Name, Salary*12 AS ‘Annual Income’
• Note: Annual Income will not be added as a new column in the database
table. It is just for displaying the output of the query
• If an aliased column name has space as in the case of Annual Income, it
should be enclosed in quotes as 'Annual Income’
(C) DISTINCT clause
• The SELECT statement when combined with DISTINCT clause, returns
records without repetition (distinct records)
• For example, while retrieving a department number from employee
relation, there can be duplicate values as many employees are
assigned to the same department
• To select unique department number for all the employees, we use
DISTINCT as shown below
mysql> SELECT DISTINCT DeptId FROM EMPLOYEE;
(D) WHERE clause
• The WHERE clause is used to retrieve data that meet some specified conditions
• Following query gives distinct salaries of the employees working in the
department number D01:
mysql> SELECT DISTINCT Salary
-> FROM EMPLOYEE
-> WHERE Deptid='D01';
• Relational operators (=,<, <=, >, >=, !=) can be used to specify such conditions
• The logical operators AND, OR, and NOT are used to combine multiple
conditions
• Examples:
• Display all the details of those employees of D03 department who
earn more than 5000
mysql> SELECT * FROM EMP
-> WHERE Salary > 5000 AND DId = 'D03';
• The following query selects records of all the employees except
Aaliya.