Introduction to Database Systems
Introduction to Database Systems
⮚ View of Data
A database system is a collection of interrelated data and a set of programs that allow users to
access and modify these data. A major purpose of a database system is to provide users with
an abstract view of the data. That is, the system hides certain details of how the data are
stored and maintained.
1. Data Abstraction
As many database-system users are not computer trained, developers hide the complexity
from users through several levels of abstraction, to simplify users interactions with the
system:
• Physical level. The lowest level of data abstraction which describes how the data is actually
stored.
• Logical level. The next-higher level of abstraction that describes what data are stored in the
database, and what relationships exist among those data. The logical level thus describes the
entire database in terms of a small number of relatively simple structures. The logical level of
abstraction is used by the Database administrators, who must decide what information is to be
kept in the database.
• View level. This is the highest level of data abstraction which describes only the part of the
entire database. Many users of the database system do not need all the information; instead,
they need to access only a part of the database. The system may provide many views for the
same database. The views also provides a security mechanisms to prevent some users from
accessing parts of the database.
For example, clerks in the university registrar office can see only that part of the database
that has information about students; they cannot access information about salaries of
instructors.
2. Instances and Schemas
Databases change over time as information is inserted and deleted.
▪ Instance - The collection of information stored in the database at a particular moment
is called an instance of the database.
▪ Schema - The overall design of the database is called the database schema.
Schemas are changed infrequently, if at all. A database schema corresponds to the
variable declarations (along with associated type definitions) in a program. Each
variable has a particular value at a given instant. The values of the variables in a
program at a point in time correspond to an instance of a database schema. Database
systems have several schemas, partitioned according to the levels of abstraction.
▪ Physical schema - The physical schema describes the database design at the physical
level.
▪ Logical schema- The logical schema describes the database design at the logical level.
▪ View Schema - A database may also have several schemas at the view level,
sometimes called subschemas, that describe different views of the database.
▪ Data Independence –Data Independence is mainly defined as a property of DBMS
that helps you to change the database schema at one level of a system without requiring
to change the schema at the next level. It helps to keep the data separated from all
program that makes use of it.
We have namely two levels of data independence arising from these levels of
abstraction:
1. Physical level data independence
2. Logical level data independence
Physical Level Data Independence -It refers to the characteristic of being able to modify
the physical schema without any alterations to the conceptual or logical schema, done for
optimization purposes, e.g., the Conceptual structure of the database would not be affected
by any change in storage size of the database system server.
Logical Level Data Independence - It refers characteristic of being able to modify the
logical schema without affecting the external schema or application program. The user
view of the data would not be affected by any changes to the conceptual view of the data.
These changes may include insertion or deletion of attributes, altering table structures
entities or relationships to the logical schema, etc.
3. Data Models –Data model is a collection of conceptual tools for describing data, data
relationships, data semantics, and consistency constraints. A data model provides a way to
describe the design of a database at the physical, logical, and view levels.
• Relational Model. The relational model uses a collection of tables to represent both data
and the relationships among those data. Each table has multiple columns, and each column
has a unique name. Tables are also known as relations. The relational model is an example of
a record-based model.
Record-based models are so named because the database is structured in fixed-format records
of several types. Each table contains records of a particular type. Each record type defines a
fixed number of fields, or attributes. The columns of the table correspond to the attributes of
the record type.
• Entity-Relationship Model - The entity-relationship (E-R) data model uses a collection of
basic objects, called entities, and relationships among these objects. An entity is a “thing” or
“object” in the real world that is distinguishable from other objects.
• Object-Based Data Model - Object-oriented programming (especially in Java, C++, or C#)
has become the dominant software-development methodology. This led to the development of
an object-oriented data model that can be seen as extending the E-R model with notions of
encapsulation, methods (functions), and object identity. The object-relational data model
combines features of the object-oriented data model and relational data model.
• Semi structured Data Model- The semi structured data model permits the specification of
data where individual data items of the same type may have different sets of attributes. This is
in contrast to the data models mentioned earlier, where every data item of a particular type
must have the same set of attributes. The Extensible Markup Language (XML) is widely
used to represent semi structured data.
⮚ Relational Databases
A relational database is based on the relational model. It uses a collection of tables to
represent both data and the relationships among those data. It also includes a DML and DDL.
1. Tables -Each table has multiple columns and each column has a unique name.
Figure below represents a sample relational database comprising two tables: one shows
details of university instructors and the other shows details of the various university
departments. The first table, the instructor table, shows, for example, that an instructor
named Einstein with ID 22222 is a member of the Physics department and has an
annual salary of $95,000. The second table, department, shows, for example, that the
Physics department is located in the Watson building and has a budget of $90,000, so
by this it is clear that the two tables are related to each other.
● The relational model is an example of a record-based model.
● Record-based models are so named because the database is structured in fixed-format
records of several types.
● Each table contains records of a particular type. Each record type defines a fixed
number of fields, or attributes. The columns of the table correspond to the attributes of
the record type.
● The relational model hides low-level implementation details from database developers
and users.
● It is possible to create schemas in the relational model that have problems such as
unnecessarily duplicated information. For example, suppose we store the department
budget as an attribute of the instructor record. Then, whenever the value of a particular
budget (say that one for the Physics department) changes, that change must to be
reflected in the records of all instructors associated with the Physics department.
⮚ Database Architecture
Database System Structure: The architecture of a database system is greatly influenced
by the underlying computer system on which the database system runs. Database
architecture can be seen as a single tier or multi-tier (centralized, or client-server). But
logically, database architecture is of two types like: 2-tier architecture and 3-tier
architecture.
● Most users of a database system today are not present at the site of the database system,
but connect to it through a network.
● In 1 tier architecture, the database is directly available to the user. It means the user
can directly sit on the DBMS and use it.
● Any changes done here will directly be done on the database itself. It is used for
development of the local application, where programmers can directly communicate
with the database for the quick response.
● In 2-Tier architecture or the basic client-server architecture, applications on the client
end can directly communicate with the database at the server side. For this interaction,
API's like: ODBC, JDBC are used.
● The basic client/server architecture is used to deal with a large number of PCs, web
servers, database servers and other components that are connected with networks.
● The client/server architecture consists of many PCs and a workstation which are
connected via the network.
● The user interfaces and application programs are run on the client-side.
● The server side is responsible to provide the functionalities like: query processing and
transaction management.
● To communicate with the DBMS, client-side application establishes a connection with
the server side.
● In contrast, in a three-tier architecture, the client machine acts as merely a front end and
does not contain any direct database calls. Instead, the client end communicates with an
application server, usually through a forms interface.
● The application server in turn communicates with a database system to access data. The
business logic of the application, which says what actions to carry out under what
conditions, is embedded in the application server, instead of being distributed across
multiple clients.
● Three-tier applications are more appropriate for large applications, and for applications
that run on the World Wide Web.
⮚ Transaction Management
1. Atomicity:
● Several operations on the database form a single logical unit of work. Consider an
example of funds transfer, in which one department account(say A) is debited and
another department account (say B) is credited. It is essential that either both the credit
and debit occur, or that neither occur. This all-or-none requirement is called atomicity.
In the absence of failures, all transactions complete successfully, and atomicity is
achieved easily.
2 Consistency:
● In addition, it is essential that the execution of the funds transfer preserve the
consistency of the database. That is, the value of the sum of the balances of A and B
must be preserved. This correctness requirement is called consistency.
3. Durability:
● Finally, after the successful execution of a funds transfer, the new values of the
balances of accounts A and B must persist, despite the possibility of system failure.
This persistence requirement is called durability.
4. Recovery Manager: Ensuring the atomicity and durability properties is the
responsibility of the database system itself specifically, of the recovery manager.
5. Failure recovery:
● Because of various types of failure, a transaction may not always complete its
execution successfully. If we are to ensure the atomicity property, a failed transaction
must have no effect on the state of the database.
● The database must be restored to the state in which it was before the transaction in
question started executing. The database system must therefore perform failure
recovery, that is, detect system failures and restore the database to the state that existed
prior to the occurrence of the failure.
6. Concurrency-control manager:
● When several transactions update the database concurrently, the consistency of data
may no longer be preserved, even though each individual transaction is correct. It is the
responsibility of the concurrency-control manager to control the interaction among the
concurrent transactions, to ensure the consistency of the database. The transaction
manager consists of the concurrency-control manager and the recovery manager.
2. DATA MODELS
❖ TYPES OF RELATIONSHIPS
● One is to one:
A single record in one table is related to a single record in another table.
E.g., One department can have only one manager, Each person has one passport, and each
passport is assigned to one person.
● One is to Many:
A single record in one table can be related to multiple records in another table.
E.g., One department may have many employees, but each employee belongs to only one
department.
● Many is to one:
Multiple records in one table are related to a single record in another table.
E.g., Many students may be assigned to one professor, but each student has only one
professor.
● Many is to Many:
Multiple records in one table are related to multiple records in another table.
E.g., Students can enroll in multiple courses, and each course can have multiple students
enrolled.
4. Constraints:
● Constraints are conditions applied on the data.
● It provides the data integrity.
Example: A student can take a maximum of 2 books from the library is applied as a
constraint on the student database.
⮚ Business Rules
● Definition: Business rules are statements of a discrete operational business policy or
practice within specific organizations that constrains the business.
● It is intended to control or influence the behaviour of the business.
● Database designer needs to take help from concepts such as entity, attributes and
relationships to build a data model, but the above things are not sufficient to describe a
system completely.
● Business rules may define actors and prescribe how they should behave by setting
constraints and help to manage business change in the system.
o Characteristics of Business Rules:
1. Atomicity: Rule should define any one aspect of the system environment.
E.g.: - College should have students in it.
2. Business format: Rule should be expressed in business terms understandable to
business people.
E.g.: ER diagram, object diagram etc
3. Business ownership: Each rule is governed by a businessperson who is responsible for
verifying it, enforcing it, and monitoring need for change.
E.g.: End user or customer is responsible for requirements submitted by him.
4. Classification: Each rule can be classified by its data and constraints.
5. Business Formalism: Each rule can be implemented in the related information system.
Business rules should be consistent and non-redundant.
EXAMPLES:
A student may take admission to college
One subject is taught by only one professor
A class consists of minimum 60 and maximum 80 students
o Types of Business Rules:
1. DEFINITIONS:
Define some business terms. Definitions are incorporated in systems data dictionary.
E.g., A professor is someone who teaches to students.
2. FACTS:
Connect business terms in ways that make business sense. Facts are implemented as
relationships between various data entities.
E.g., A professor may have student
3. CONSTRAINTS:
Shows how business rules and how business terms are connected with each other.
Constraints usually state how many of one data entity can be related to another data entity.
E.g., Each professor may teach up to four subjects.
4. DERIVATIONS:
Enable new knowledge or actions. Derivations are often implemented as formulas and
triggers.
E. g. A student pending fees is his fees paid minus total fees.
⮚ The Evaluation of Data Models
– Managing data was the key and was essential. Therefore, data model originated to
solve the file system issues. Here are the Data Models in DBMS –
1. Hierarchical Model
● In Hierarchical Model, a hierarchical relation is formed by collection of relations and
forms a tree-like structure.
● The relationship can be defined in the form of parent child type.
● One of the first and most popular Hierarchical Model is Information Management
System (IMS), developed by IBM.
Example
The hierarchy shows an Employee can be an Intern, on Contract or Full- Time. Sub-levels
show that Full-Time Employee can be hired as a Writer, Senior Writer or Editor:
Advantages
Disadvantages
● Implementation is complex.
● This model has to deal with anomalies like Insert, Update and Delete.
● Maintenance is difficult since changes done in the database may want you to do
changes in the entire database structure.
2. Network Model
● The Hierarchical Model creates hierarchical tree with parent/ child relationship, whereas
the Network Model has graph and links.
● The relationship can be defined in the form of links and it handles many-to-many
relations. This itself states that a record can have more than one parent.
Example
Advantages
Disadvantages
● Pointers bring complexity since the records are based on pointers and graphs.
● Changes in the database isn’t easy that makes it hard to achieve structural
independence.
3. Relational Model
● A relational model groups data into one or more tables. These tables are related to each
other using common records.
● The data is represented in the form of rows and columns i.e. tables:
Example
Let us see an example of two relations <Employee> and <Department> linked to each other,
with DepartmentID, which is Foreign Key of <Employee> table and Primary key
of <Department> table
Advantages
● The Relational Model does not have any issues that we saw in the previous two models
i.e. update, insert and delete anomalies have nothing to do in this model.
● Changes in the database do not require you to affect the complete database.
● Implementation of a Relational Model is easy.
● To maintain a Relational Model is not a tiresome task.
Disadvantages
● Database inefficiencies hide and arise when the model has large volumes of data.
● The overheads of using relational data model come with the cost of using powerful
hardware and devices.
4. E-R model
● An E-R model is the logical representation of data as objects and relationship
among them.
● This objects are known as Entities and relationship is an association between these
entities.
Advantages
Disadvantages
● ER model can represent limited relationships as compared to other models and It is not
possible to indicate primary keys and foreign keys when they’re expected.
● ER models can be difficult to modify once they are created. Any changes made to the
model may require extensive rework, which can be time-consuming and expensive.
● ER models do not provide support for business rules, which can make it difficult to
ensure data integrity and enforce constraints.
● The Object-Oriented Model in DBMS or OODM is the data model where data is stored
in the form of objects.
● The data and data relationship are stored together in a single entity known as an object
in the Object Oriented Model.
● The Object-Oriented Database Management System is built on top of Object Oriented
Model.
● We can use the Object Oriented Model in DBMS to store real-world entities. Here, we
can store pictures, audio, video, and other types of data, which was previously
impossible to store with the relational approach.
● This model works with object oriented programming languages like python, Java,
VB.net and Perl etc.
Advantages
Disadvantages