0% found this document useful (0 votes)
73 views22 pages

Introduction to Database Systems

Uploaded by

sahilcarry42121
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
73 views22 pages

Introduction to Database Systems

Uploaded by

sahilcarry42121
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

MODULE 1

1: INTRODUCTION TO DATABASES AND TRANSACTIONS

⮚ What is a database system.


● A database-management system (DBMS) is a collection of interrelated data and a set of
programs to access those data.
● The collection of data, usually referred to as the database, contains information relevant to
an enterprise.
● The primary goal of a DBMS is to provide a way to store and retrieve database
information that is both convenient and efficient.
● Database systems are designed to manage large bodies of information.
● Management of data involves both defining structures for storage of information and
providing mechanisms for the manipulation of information.
● In addition, the database system must ensure the safety of the Information stored, despite
system crashes or attempts at unauthorized access. If data are to be shared among several
users, the system must avoid possible anomalous results.

⮚ Purpose of Database System


Keeping organizational information in a file-processing system has a number of major
disadvantages, which lead to use of DBMS:
• Data redundancy and Inconsistency- Since different programmers create the files and
application programs over a long period, the various files are likely to have different
structures and the programs may be written in several programming languages. Moreover, the
same information may be duplicated in several places (files). For example, if a student has a
double major (say, music and mathematics) the address and telephone number of that
student may appear in a file that consists of student records of students in the Music
department and in a file that consists of student records of students in the Mathematics
department. This redundancy leads to higher storage and access cost. In addition, it may lead
to data inconsistency; that is, the various copies of the same data may no longer agree. For
example, a changed student address may be reflected in the Music department records but not
elsewhere in the system.
• Difficulty in accessing data- Suppose that one of the university clerks needs to find out the
names of all students who live within a particular postal-code area. The clerk asks the
data-processing department to generate such a list. Because the designers of the original
system did not anticipate this request, there is no application program on hand to meet it.
There is, however, an application program to generate the list of all students. The university
clerk has now two choices: either obtain the list of all students and extract the needed
information manually or ask a programmer to write the necessary application program. Both
alternatives are obviously unsatisfactory. Suppose that such a program is written, and that,
several days later, the same clerk needs to trim that list to include only those students who
have taken at least 60 credit hours. As expected, a program to generate such a list does not
exist. Again, the clerk has the preceding two options, neither of which is satisfactory. The
point here is that conventional file-processing environments do not
allow needed data to be retrieved in a convenient and efficient manner. More responsive
data-retrieval systems are required for general use.
• Data isolation. Because data are scattered in various files, and files may be in different
formats, writing new application programs to retrieve the appropriate data is difficult.
• Integrity problems. The data values stored in the database must satisfy certain types of
consistency constraints. Suppose the university maintains an account for each department,
and records the balance amount in each account. Suppose also that the university requires that
the account balance of a department may never fall below zero. Developers enforce these
constraints in the system by adding appropriate code in the various application programs.
However, when new constraints are added, it is difficult to change the programs to enforce
them. The problem is compounded when constraints involve several data items from different
files.
• Atomicity problems. A computer system, like any other device, is subject to failure. In
many applications, it is crucial that, if a failure occurs, the data be restored to the consistent
state that existed prior to the failure. Example: Consider a program to transfer Rs 500 from
the account balance of department A to the account balance of department B. If a system
failure occurs during the execution of the program, it is possible that the 500 Rs was removed
from the balance of department A but was not credited to the balance of department B,
resulting in an inconsistent database state. Clearly, it is essential to database consistency that
either both the credit and debit occur, or that neither occur. That is, the funds transfer must be
atomic—it must happen in its entirety or not at all. It is difficult to ensure atomicity in a
conventional file-processing system.
• Concurrent-access anomalies. For the sake of overall performance of the system and faster
response, many systems allow multiple users to update the data simultaneously. Indeed, today,
the largest Internet retailers may have millions of accesses per day to their data by shoppers.
In such an environment, interaction of concurrent updates is possible and may result in
inconsistent data.
• Security problems. Not every user of the database system should be able to access all the
data. For example, in a university, payroll personnel need to see only that part of the database
that has financial information. They do not need access to information about academic
records. But, since application programs are added to the file-processing system in an ad hoc
manner, enforcing such security constraints is difficult. These difficulties, among others,
prompted the development of database systems.

⮚ View of Data
A database system is a collection of interrelated data and a set of programs that allow users to
access and modify these data. A major purpose of a database system is to provide users with
an abstract view of the data. That is, the system hides certain details of how the data are
stored and maintained.
1. Data Abstraction
As many database-system users are not computer trained, developers hide the complexity
from users through several levels of abstraction, to simplify users interactions with the
system:
• Physical level. The lowest level of data abstraction which describes how the data is actually
stored.

• Logical level. The next-higher level of abstraction that describes what data are stored in the
database, and what relationships exist among those data. The logical level thus describes the
entire database in terms of a small number of relatively simple structures. The logical level of
abstraction is used by the Database administrators, who must decide what information is to be
kept in the database.

• View level. This is the highest level of data abstraction which describes only the part of the
entire database. Many users of the database system do not need all the information; instead,
they need to access only a part of the database. The system may provide many views for the
same database. The views also provides a security mechanisms to prevent some users from
accessing parts of the database.
For example, clerks in the university registrar office can see only that part of the database
that has information about students; they cannot access information about salaries of
instructors.
2. Instances and Schemas
Databases change over time as information is inserted and deleted.
▪ Instance - The collection of information stored in the database at a particular moment
is called an instance of the database.
▪ Schema - The overall design of the database is called the database schema.
Schemas are changed infrequently, if at all. A database schema corresponds to the
variable declarations (along with associated type definitions) in a program. Each
variable has a particular value at a given instant. The values of the variables in a
program at a point in time correspond to an instance of a database schema. Database
systems have several schemas, partitioned according to the levels of abstraction.
▪ Physical schema - The physical schema describes the database design at the physical
level.
▪ Logical schema- The logical schema describes the database design at the logical level.
▪ View Schema - A database may also have several schemas at the view level,
sometimes called subschemas, that describe different views of the database.
▪ Data Independence –Data Independence is mainly defined as a property of DBMS
that helps you to change the database schema at one level of a system without requiring
to change the schema at the next level. It helps to keep the data separated from all
program that makes use of it.
We have namely two levels of data independence arising from these levels of
abstraction:
1. Physical level data independence
2. Logical level data independence
Physical Level Data Independence -It refers to the characteristic of being able to modify
the physical schema without any alterations to the conceptual or logical schema, done for
optimization purposes, e.g., the Conceptual structure of the database would not be affected
by any change in storage size of the database system server.

Logical Level Data Independence - It refers characteristic of being able to modify the
logical schema without affecting the external schema or application program. The user
view of the data would not be affected by any changes to the conceptual view of the data.
These changes may include insertion or deletion of attributes, altering table structures
entities or relationships to the logical schema, etc.
3. Data Models –Data model is a collection of conceptual tools for describing data, data
relationships, data semantics, and consistency constraints. A data model provides a way to
describe the design of a database at the physical, logical, and view levels.

The data models can be classified into four different categories:

• Relational Model. The relational model uses a collection of tables to represent both data
and the relationships among those data. Each table has multiple columns, and each column
has a unique name. Tables are also known as relations. The relational model is an example of
a record-based model.
Record-based models are so named because the database is structured in fixed-format records
of several types. Each table contains records of a particular type. Each record type defines a
fixed number of fields, or attributes. The columns of the table correspond to the attributes of
the record type.
• Entity-Relationship Model - The entity-relationship (E-R) data model uses a collection of
basic objects, called entities, and relationships among these objects. An entity is a “thing” or
“object” in the real world that is distinguishable from other objects.
• Object-Based Data Model - Object-oriented programming (especially in Java, C++, or C#)
has become the dominant software-development methodology. This led to the development of
an object-oriented data model that can be seen as extending the E-R model with notions of
encapsulation, methods (functions), and object identity. The object-relational data model
combines features of the object-oriented data model and relational data model.
• Semi structured Data Model- The semi structured data model permits the specification of
data where individual data items of the same type may have different sets of attributes. This is
in contrast to the data models mentioned earlier, where every data item of a particular type
must have the same set of attributes. The Extensible Markup Language (XML) is widely
used to represent semi structured data.

⮚ Relational Databases
A relational database is based on the relational model. It uses a collection of tables to
represent both data and the relationships among those data. It also includes a DML and DDL.
1. Tables -Each table has multiple columns and each column has a unique name.
Figure below represents a sample relational database comprising two tables: one shows
details of university instructors and the other shows details of the various university
departments. The first table, the instructor table, shows, for example, that an instructor
named Einstein with ID 22222 is a member of the Physics department and has an
annual salary of $95,000. The second table, department, shows, for example, that the
Physics department is located in the Watson building and has a budget of $90,000, so
by this it is clear that the two tables are related to each other.
● The relational model is an example of a record-based model.
● Record-based models are so named because the database is structured in fixed-format
records of several types.
● Each table contains records of a particular type. Each record type defines a fixed
number of fields, or attributes. The columns of the table correspond to the attributes of
the record type.
● The relational model hides low-level implementation details from database developers
and users.
● It is possible to create schemas in the relational model that have problems such as
unnecessarily duplicated information. For example, suppose we store the department
budget as an attribute of the instructor record. Then, whenever the value of a particular
budget (say that one for the Physics department) changes, that change must to be
reflected in the records of all instructors associated with the Physics department.

✔ Data Manipulation Language


● The SQL query language is non procedural. A query takes as input several tables
(possibly only one) and always returns a single table. Here is an example of an SQL
query that finds the names of all instructors in the History department:
select instructor.name
from instructor
where instructor.dept_name = ‘History’
● The query specifies that those rows from the table instructor where the dept name is
History must be retrieved, and the name attribute of these rows must be displayed.
✔ Data Defination Language
● SQL provides a rich DDL that allows one to define tables, integrity constraints,
assertions, etc.
● For instance, the following SQL DDL statement defines the department table:
create table department
(depLname char(20),
building char (15),
budget numeric (12,2));
● Execution of the above DDL statement creates the department table with three
columns :dept_name, building, and budget, each of which has a specific data type
associated with it.
● In addition, the DDL statement updates the data dictionary, which contains
metadata. The schema of a table is an example of metadata.

✔ Database Access from Application Programs


● To access the database?
● DML statements need to be executed from the host language. There are two ways to
do this:
● By providing an application program interface (set of procedures) that can be used
to send DML and DDL statements to the database and retrieve the results.
● The Open Database Connectivity (ODBC) standard for use with the C language is a
commonly used application program interface standard.
● The Java Database Connectivity (JDBC) standard provides corresponding features
to the Java language.

⮚ Database Architecture
Database System Structure: The architecture of a database system is greatly influenced
by the underlying computer system on which the database system runs. Database
architecture can be seen as a single tier or multi-tier (centralized, or client-server). But
logically, database architecture is of two types like: 2-tier architecture and 3-tier
architecture.
● Most users of a database system today are not present at the site of the database system,
but connect to it through a network.
● In 1 tier architecture, the database is directly available to the user. It means the user
can directly sit on the DBMS and use it.
● Any changes done here will directly be done on the database itself. It is used for
development of the local application, where programmers can directly communicate
with the database for the quick response.
● In 2-Tier architecture or the basic client-server architecture, applications on the client
end can directly communicate with the database at the server side. For this interaction,
API's like: ODBC, JDBC are used.
● The basic client/server architecture is used to deal with a large number of PCs, web
servers, database servers and other components that are connected with networks.
● The client/server architecture consists of many PCs and a workstation which are
connected via the network.
● The user interfaces and application programs are run on the client-side.
● The server side is responsible to provide the functionalities like: query processing and
transaction management.
● To communicate with the DBMS, client-side application establishes a connection with
the server side.
● In contrast, in a three-tier architecture, the client machine acts as merely a front end and
does not contain any direct database calls. Instead, the client end communicates with an
application server, usually through a forms interface.
● The application server in turn communicates with a database system to access data. The
business logic of the application, which says what actions to carry out under what
conditions, is embedded in the application server, instead of being distributed across
multiple clients.
● Three-tier applications are more appropriate for large applications, and for applications
that run on the World Wide Web.
⮚ Transaction Management
1. Atomicity:
● Several operations on the database form a single logical unit of work. Consider an
example of funds transfer, in which one department account(say A) is debited and
another department account (say B) is credited. It is essential that either both the credit
and debit occur, or that neither occur. This all-or-none requirement is called atomicity.
In the absence of failures, all transactions complete successfully, and atomicity is
achieved easily.
2 Consistency:
● In addition, it is essential that the execution of the funds transfer preserve the
consistency of the database. That is, the value of the sum of the balances of A and B
must be preserved. This correctness requirement is called consistency.
3. Durability:
● Finally, after the successful execution of a funds transfer, the new values of the
balances of accounts A and B must persist, despite the possibility of system failure.
This persistence requirement is called durability.
4. Recovery Manager: Ensuring the atomicity and durability properties is the
responsibility of the database system itself specifically, of the recovery manager.
5. Failure recovery:
● Because of various types of failure, a transaction may not always complete its
execution successfully. If we are to ensure the atomicity property, a failed transaction
must have no effect on the state of the database.
● The database must be restored to the state in which it was before the transaction in
question started executing. The database system must therefore perform failure
recovery, that is, detect system failures and restore the database to the state that existed
prior to the occurrence of the failure.
6. Concurrency-control manager:
● When several transactions update the database concurrently, the consistency of data
may no longer be preserved, even though each individual transaction is correct. It is the
responsibility of the concurrency-control manager to control the interaction among the
concurrent transactions, to ensure the consistency of the database. The transaction
manager consists of the concurrency-control manager and the recovery manager.
2. DATA MODELS

⮚ The importance of Data Models


● A data model is a structure of the data that contains all the required details of the data
like the name of the data, size of the data, relationship with other data and constraints that
are applied on the data.
● In other words, data model is an overview of a software system which describes how data
can be represented and accessed from software system after its complete implementation.
Data models define data elements and and relationships among various data elements for a
specified system
● Data model gives an idea of how the final system or software will look after when the
development is completed.
● This concept is exactly like real world modelling in which before constructing any project
(Buildings, Bridges, Towers) engineers create a model for it and gives the idea of how a
project will look like after
Construction.
Importance of Data Models:
● A data model is a set of concepts that can be used to describe the structure of data in a
database.
● Data models are used to support the development of information systems by providing
the definition and format of data to be involved in future system.
● Data model is acting like a guideline for development also gives an idea about possible
alternatives to achieve targeted solution.
● A data model can sometimes be referred to as data structure especially in the context of
programming languages.
Advantages of Data Models:
● Data model prevents the system from future risk and failure by defining structure of
data in advance.
● As we got an idea of final system at the beginning of development itself so we can
reduce the cost of project by proper planning and cost estimation as actual system is not
yet developed.
● Data repetition and data type compatibility can be checked and removed with help of
data model.
● We can improve Graphical User Interface (GUI) of system by making its model and get
it approved by its future user so it will be simple for them to operate system and make
entire system effective.

⮚ Basic Building blocks


● The basic building block for any data model is Entities, Attributes , relationships and
constraints.
1. Entity:
● Entities are real time objects that exist. It can be a person, place, object, event, concept.
Entities are represented by a rectangle box containing the entity name in it.
● An entity is having its own independent existence in real world.
E.g.: A Student, Faculty, Subject having independent existence.
● An entity may be an object with a physical existence, or it may have logical existence.
E.g.: Entities like Department, Section, adult(age>18) may have physical existence or it
may have only logical existence.
2. Attributes:
● Attribute is the set of characteristics representing an entity. It is represented by an ellipse
symbol with attribute name on it.
● A particular entity will have some value for each of its attributes . for e.g –An Employee
has attributes name, age, phone etc.
3. Relationships:
● It describes the association between two or more entities.
● It is represented using diamond symbol containing relationship name with it.
● The data model generally uses three kinds of relationships : one to many, many to many,
one to one.
Example: The relationship between two entities Student and Class has many to many
relationship.
● The degree of the relationship is the number of participating entity types in a particular
relation.

❖ TYPES OF RELATIONSHIPS
● One is to one:
A single record in one table is related to a single record in another table.
E.g., One department can have only one manager, Each person has one passport, and each
passport is assigned to one person.
● One is to Many:
A single record in one table can be related to multiple records in another table.
E.g., One department may have many employees, but each employee belongs to only one
department.
● Many is to one:
Multiple records in one table are related to a single record in another table.
E.g., Many students may be assigned to one professor, but each student has only one
professor.
● Many is to Many:
Multiple records in one table are related to multiple records in another table.
E.g., Students can enroll in multiple courses, and each course can have multiple students
enrolled.
4. Constraints:
● Constraints are conditions applied on the data.
● It provides the data integrity.
Example: A student can take a maximum of 2 books from the library is applied as a
constraint on the student database.

⮚ Business Rules
● Definition: Business rules are statements of a discrete operational business policy or
practice within specific organizations that constrains the business.
● It is intended to control or influence the behaviour of the business.
● Database designer needs to take help from concepts such as entity, attributes and
relationships to build a data model, but the above things are not sufficient to describe a
system completely.
● Business rules may define actors and prescribe how they should behave by setting
constraints and help to manage business change in the system.
o Characteristics of Business Rules:
1. Atomicity: Rule should define any one aspect of the system environment.
E.g.: - College should have students in it.
2. Business format: Rule should be expressed in business terms understandable to
business people.
E.g.: ER diagram, object diagram etc
3. Business ownership: Each rule is governed by a businessperson who is responsible for
verifying it, enforcing it, and monitoring need for change.
E.g.: End user or customer is responsible for requirements submitted by him.
4. Classification: Each rule can be classified by its data and constraints.
5. Business Formalism: Each rule can be implemented in the related information system.
Business rules should be consistent and non-redundant.
EXAMPLES:
A student may take admission to college
One subject is taught by only one professor
A class consists of minimum 60 and maximum 80 students
o Types of Business Rules:
1. DEFINITIONS:
Define some business terms. Definitions are incorporated in systems data dictionary.
E.g., A professor is someone who teaches to students.
2. FACTS:
Connect business terms in ways that make business sense. Facts are implemented as
relationships between various data entities.
E.g., A professor may have student
3. CONSTRAINTS:
Shows how business rules and how business terms are connected with each other.
Constraints usually state how many of one data entity can be related to another data entity.
E.g., Each professor may teach up to four subjects.
4. DERIVATIONS:
Enable new knowledge or actions. Derivations are often implemented as formulas and
triggers.
E. g. A student pending fees is his fees paid minus total fees.
⮚ The Evaluation of Data Models
– Managing data was the key and was essential. Therefore, data model originated to
solve the file system issues. Here are the Data Models in DBMS –

1. Hierarchical Model
● In Hierarchical Model, a hierarchical relation is formed by collection of relations and
forms a tree-like structure.
● The relationship can be defined in the form of parent child type.
● One of the first and most popular Hierarchical Model is Information Management
System (IMS), developed by IBM.
Example
The hierarchy shows an Employee can be an Intern, on Contract or Full- Time. Sub-levels
show that Full-Time Employee can be hired as a Writer, Senior Writer or Editor:

Advantages

● The design of the hierarchical model is simple.


● Provides Data Integrity since it is based on parent/ child relationship
● Data sharing is feasible since the data is stored in a single database.
● Even for large volumes of data, this model works perfectly.

Disadvantages

● Implementation is complex.
● This model has to deal with anomalies like Insert, Update and Delete.
● Maintenance is difficult since changes done in the database may want you to do
changes in the entire database structure.

2. Network Model
● The Hierarchical Model creates hierarchical tree with parent/ child relationship, whereas
the Network Model has graph and links.
● The relationship can be defined in the form of links and it handles many-to-many
relations. This itself states that a record can have more than one parent.
Example

Advantages

● Easy to design the Network Model


● The model can handle one-one, one-to-many, many-to-many relationships.
● It isolates the program from other details.
● Based on standards and conventions.

Disadvantages

● Pointers bring complexity since the records are based on pointers and graphs.
● Changes in the database isn’t easy that makes it hard to achieve structural
independence.

3. Relational Model
● A relational model groups data into one or more tables. These tables are related to each
other using common records.
● The data is represented in the form of rows and columns i.e. tables:
Example
Let us see an example of two relations <Employee> and <Department> linked to each other,
with DepartmentID, which is Foreign Key of <Employee> table and Primary key
of <Department> table

Advantages

● The Relational Model does not have any issues that we saw in the previous two models
i.e. update, insert and delete anomalies have nothing to do in this model.
● Changes in the database do not require you to affect the complete database.
● Implementation of a Relational Model is easy.
● To maintain a Relational Model is not a tiresome task.

Disadvantages

● Database inefficiencies hide and arise when the model has large volumes of data.
● The overheads of using relational data model come with the cost of using powerful
hardware and devices.
4. E-R model
● An E-R model is the logical representation of data as objects and relationship
among them.
● This objects are known as Entities and relationship is an association between these
entities.

Advantages

● It is simple to draw an ER diagram when we know entities and relationships. It is an


effective communication tool.
● The ER Model can be easily integrated with relational model.
● The design of ER is very logical and hence they are easy to design and
understand.They show database capabilities like how tables, keys and columns are used
to find a solution to the given question.

Disadvantages

● ER model can represent limited relationships as compared to other models and It is not
possible to indicate primary keys and foreign keys when they’re expected.
● ER models can be difficult to modify once they are created. Any changes made to the
model may require extensive rework, which can be time-consuming and expensive.
● ER models do not provide support for business rules, which can make it difficult to
ensure data integrity and enforce constraints.

5.Object Oriented Data Model

● The Object-Oriented Model in DBMS or OODM is the data model where data is stored
in the form of objects.
● The data and data relationship are stored together in a single entity known as an object
in the Object Oriented Model.
● The Object-Oriented Database Management System is built on top of Object Oriented
Model.
● We can use the Object Oriented Model in DBMS to store real-world entities. Here, we
can store pictures, audio, video, and other types of data, which was previously
impossible to store with the relational approach.
● This model works with object oriented programming languages like python, Java,
VB.net and Perl etc.

● Here Transport, Bus, Ship, and Plane are objects.


● Bus has Road Transport as the attribute.
● Ship has Water Transport as the attribute.
● Plane has Air Transport as the attribute.
● The Transport object` is the base object and the Bus, Ship, and Plane objects derive
from it.

Advantages

● Database integrity can be achieved.


● Structural and database independence is created.
● We can store pictures, audio, video, and other types of data, which was previously
impossible to store earlier.

Disadvantages

● It has complex navigational data access.


● There is a steep learning curve here.
● The transactions might be slow here.

⮚ Degree of Data Abstraction


● For the system to be usable, it must retrieve data efficiently. The need for efficiency has
led designers to use complex data structures to represent data in the database, hide the
complexity from users through several levels of abstraction, to simplify users
interactions with the system:
● Physical level: The lowest level of data abstraction which describes how the data is
actually stored.
● Logical level/ Conceptual level: The next-higher level of abstraction that describes
what data are stored in the database, and what relationships exist among those data. The
logical level thus describes the entire database in terms of a small number of relatively
simple structures. The logical level of abstraction is used by the Database
administrators, who must decide what information is to be kept in the database.
● View level: The highest level of abstraction describes only part of the entire database.
Even though the logical level uses simpler structures, complexity remains because of
the variety of information stored in a large database. Many users of the database system
do not need all this information; instead, they need to access only a part of the database.
The view level of abstraction exists to simplify their interaction with the system. The
system may provide many views for the same database.
Type customer = record
customer-id: string;
customer-name: string;
customer-street: string;
customer-city: string;
end;
● This code defines a new record type called customer with four fields. Each field has a
name and a type associated with it.
● A banking enterprise may have several such record types, including account, with
fields account number and balance and employee, with fields employee name and
salary.
● At the physical level, a customer, account, or employee record can be described as a
block of consecutive storage locations (for example, words or bytes). The language
compiler hides this level of detail from programmers.
● The database system hides many of the lowest level storage details from database
programmers.
● Database administrators, on the other hand, may be aware of certain details of the
physical organization of the data.
● At the logical level, each such record is described by a type definition, and the
interrelationship of these record types is defined as well.
● Programmers using a programming language work at this level of abstraction.
● Similarly, database administrators usually work at this level of abstraction
● Finally, at the view level, computer users see a set of application programs that hide
details of the data types. Similarly, at the view level, several views of the database are
defined, and database users see these views.
● The views also provide a security mechanism to prevent users from accessing certain
parts of the database.

You might also like