0% found this document useful (0 votes)
4 views

DBMS Module 1

This document provides an overview of database management systems (DBMS), highlighting their importance in efficiently storing and retrieving data compared to traditional file systems. It discusses the advantages and disadvantages of using a DBMS, the roles of database users and administrators, and various data models and architectures. Key concepts such as data abstraction, schemas, and the three-schema architecture are also covered to illustrate how databases are structured and managed.

Uploaded by

vspranav88
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

DBMS Module 1

This document provides an overview of database management systems (DBMS), highlighting their importance in efficiently storing and retrieving data compared to traditional file systems. It discusses the advantages and disadvantages of using a DBMS, the roles of database users and administrators, and various data models and architectures. Key concepts such as data abstraction, schemas, and the three-schema architecture are also covered to illustrate how databases are structured and managed.

Uploaded by

vspranav88
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 37

Database

System concepts
and applications
Module I - DBMS
Knowledge is not free…
You have to pay
attention!
● Introduction to databases
● File Systems vs. DBMS
● Advantages and Disadvantages of using DBMS Approach
● Database administrators and user
● Data Abstraction
● Schemas
● Instances
● Types of Data Models
● Three Schema Architecture and Data Independence
● Database Languages and Interfaces
Introduction to databases…
A database-management system (DBMS) is a collection of interrelated data
and a set of programs to access those data.
The collection of data, usually referred to as the database, contains information
relevant to an enterprise.
The primary goal of a DBMS is to provide a way to store and retrieve database
information that is both convenient and efficient.
A database is a collection of data, typically describing the activities of one or
more related organizations. For example, a university database might contain
information about the following: Entities such as students, faculty, courses, and
classrooms. Relationships between entities, such as students’ enrollment in
courses, faculty teaching courses, and the use of rooms for courses.
A database management system, or DBMS, is software designed to assist in
maintaining and utilizing large collections of data
Database-System Applications
• Enterprise Information
◦ Sales: For customer, product, and purchase information.
◦ Accounting: For payments, receipts, account balances, assets and other
accounting information.
◦ Human resources: For information about employees, salaries, payroll
taxes, and benefits, and for generation of paychecks.
◦ Manufacturing: For management of the supply chain and for tracking
production of items in factories, inventories of items in warehouses and stores,
and orders for items.
◦ Online retailers: For sales data noted above plus online order tracking,
generation of recommendation lists, and maintenance of online product
evaluations.

• Telecommunication: For keeping records of calls made, generating monthly


bills, maintaining balances on prepaid calling cards, and storing information
about the communication networks.
Database-System Applications
• Banking and Finance
◦ Banking: For customer information, accounts, loans, and banking
transactions.
◦ Credit card transactions: For purchases on credit cards and generation of
monthly statements.
◦ Finance: For storing information about holdings, sales, and purchases of
financial instruments such as stocks and bonds; also for storing real-time market
data to enable online trading by customers and automated trading by the firm.

• Universities: For student information, course registrations, and grades (in


addition to standard enterprise information such as human resources and
accounting).

• Airlines: For reservations and schedule information. Airlines were among the
first to use databases in a geographically distributed manner.
File System VS DBMS
A company has a large collection (say, 500 GB) of data on employees, departments,
products, sales, and so on. This data is accessed concurrently by several employees.
Questions about the data must be answered quickly, changes made to the data by
different users must be applied consistently, and access to certain parts of the data
(e.g., salaries) must be restricted. We can try to deal with this data management
problem by storing the data in a collection of operating system files. This approach
has many drawbacks, including the following:

● We probably do not have 500 GB of main memory to hold all the data. We
must therefore store data in a storage device such as a disk or tape and bring
relevant parts into main memory for processing as needed.
● Even if we have 500 GB of main memory, on computer systems with 32-bit
addressing, we cannot refer directly to more than about 4 GB of data! We have
to program some method of identifying all data items.
File System VS DBMS
● We have to write special programs to answer each question that users may want to ask
about the data. These programs are likely to be complex because of the large volume of
data to be searched.
● We must protect the data from inconsistent changes made by different users accessing
the data concurrently. If programs that access the data are written with such concurrent
access in mind, this adds greatly to their complexity. We must ensure that data is restored
to a consistent state if the system crashes while changes are being made.
● Operating systems provide only a password mechanism for security. This is not
sufficiently flexible to enforce security policies in which different users have permission to
access different subsets of the data.

The typical file-processing system is supported by a conventional operating system. The


system stores permanent records in various files, and it needs different application programs
to extract records from, and add records to, the appropriate files. Before database
management systems (DBMSs) were introduced, organizations usually stored information in
such systems.
File System VS DBMS
Keeping organizational information in a file-processing system has a number of
major disadvantages:
● Data redundancy and inconsistency
● Difficulty in accessing data
● Data isolation
● Integrity problems - consistency constraints
● Atomicity problems
● Concurrent-access anomalies
● Security problems

A DBMS is a piece of software that is designed to make the preceding tasks easier.
By storing data in a DBMS, rather than as a collection of operating system files, we
can use the DBMS’s features to manage the data in a robust and efficient manner. As
the volume of data and the number of users grow—hundreds of gigabytes of data
and thousands of users are common in current corporate databases—DBMS support
becomes indispensable.
ADVANTAGES OF A DBMS
● Data independence: Application programs should be as independent as possible from
details of data representation and storage. The DBMS can provide an abstract view of
the data to insulate application code from such details.
● Data integrity and security: If data is always accessed through the DBMS, the DBMS
can enforce integrity constraints on the data. For example, before inserting salary
information for an employee, the DBMS can check that the department budget is not
exceeded. Also, the DBMS can enforce access controls that govern what data is visible
to different classes of users.
● Data administration: When several users share the data, centralizing the
administration of data can offer significant improvements. Experienced professionals
who understand the nature of the data being managed, and how different groups of
users use it, can be responsible for organizing the data representation to minimize
redundancy and for fine-tuning the storage of the data to make retrieval efficient.
ADVANTAGES OF A DBMS
● Concurrent access and crash recovery: A DBMS schedules concurrent accesses to
the data in such a manner that users can think of the data as being accessed by only
one user at a time. Further, the DBMS protects users from the effects of system
failures.
● Efficient data access: A DBMS utilizes a variety of sophisticated techniques to store
and retrieve data efficiently. This feature is especially important if the data is stored
on external storage devices.
● Reduced application development time: Clearly, the DBMS supports many
important functions that are common to many applications accessing data stored in
the DBMS. This, in conjunction with the high-level interface to the data, facilitates
quick development of applications. Such applications are also likely to be more
robust than applications developed from scratch because many important tasks are
handled by the DBMS instead of being implemented by the application.
DISADVANTAGES OF A DBMS
● A DBMS is a complex piece of software, optimized for certain kinds of workloads (e.g.,
answering complex queries or handling many concurrent requests), and its performance
may not be adequate for certain specialized applications.

○ Examples include applications with tight real-time constraints or applications with just
a few well-defined critical operations for which efficient custom code must be written.

● An application may need to manipulate the data in ways not supported by the query
language. In such a situation, the abstract view of the data presented by the DBMS does
not match the application’s needs, and actually gets in the way.

○ As an example, relational databases do not support flexible analysis of text data. If


specialized performance or data manipulation requirements are central to an
application, the application may choose not to use a DBMS, especially if the added
benefits of a DBMS are not required.

● In most situations calling for large-scale data management, however, DBMSs have
become an indispensable tool.
Database Users and Administrators
A primary goal of a database system is to retrieve information from and store new
information into the database. People who work with a database can be categorized
as database users or database administrators.
Database Users and User Interfaces
There are four different types of database-system users, differentiated by the way
they expect to interact with the system. Different types of user interfaces have been
designed for the different types of users.
● Naive users
● Application programmers
● Sophisticated users
● Specialized users
Database Users and Administrators
● Naıve users are unsophisticated users who interact with the system by invoking
one of the application programs that have been written previously. Naıve users
may also simply read reports generated from the database. Example, consider a
student, who during class registration period, wishes to register for a class by
using a Web interface. Such a user connects to a Web application program that
runs at a Web server. The application first verifies the identity of the user, and
allows her to access a form where she enters the desired information.
● Application programmers are computer professionals who write application
programs. Application programmers can choose from many tools to develop user
interfaces. Rapid application development (RAD) tools are tools that enable an
application programmer to construct forms and reports with minimal
programming effort.
Database Users and Administrators

● Sophisticated users interact with the system without writing programs. Instead,
they form their requests either using a database query language or by using
tools such as data analysis software. Analysts who submit queries to explore
data in the database fall in this category.
● Specialized users are sophisticated users who write specialized database
applications that do not fit into the traditional data-processing framework.
Among these applications are computer-aided design systems, knowledgebase
and expert systems, systems that store data with complex data types (for
example, graphics data and audio data), and environment-modeling systems.
Database Users and Administrators
Database Administrator

One of the main reasons for using DBMSs is to have central control of both the data
and the programs that access those data. A person who has such central control over
the system is called a database administrator (DBA). The functions of a DBA include:
● Schema definition. The DBA creates the original database schema by executing
a set of data definition statements in the DDL.
● Storage structure and access-method definition.
● Granting of authorization for data access. By granting different types of
authorization, the database administrator can regulate which parts of the
database various users can access. The authorization information is kept in a
special system structure that the database system consults whenever someone
attempts to access the data in the system.
Database Users and Administrators
Database Administrator

● Schema and physical-organization modification. The DBA carries out changes to


the schema and physical organization to reflect the changing needs of the
organization, or to alter the physical organization to improve performance.
● Routine maintenance. Examples of the database administrator’s routine
maintenance activities are:
○ Periodically backing up the database, either onto tapes or onto remote
servers, to prevent loss of data in case of disasters such as flooding.
○ Ensuring that enough free disk space is available for normal operations, and
upgrading disk space as required.
○ Monitoring jobs running on the database and ensuring that performance is
not degraded by very expensive tasks submitted by some users.
DATA ABSTRACTION
A database system is a collection of interrelated data and a set of
programs that allow users to access and modify these data. A major
purpose of a database system is to provide users with an abstract
view of the data. That is, the system hides certain details of how the
data are stored and maintained.
Since many database-system users are not computer trained,
developers hide the complexity from users through several levels of
abstraction, to simplify users’ interactions with the system:
● Physical level
● Logical level
● View level
DATA ABSTRACTION
● Physical level. The lowest level of abstraction
describes how the data are actually stored. The
physical level describes complex low-level data
structures in detail.
● Logical level. The next-higher level of abstraction
describes what data are stored in the database,
and what relationships exist among those data.
● View level. The highest level of abstraction
describes only part of the entire database. Many
users of the database system need to access only a
part of the database. The view level of abstraction
exists to simplify their interaction with the system.
The system may provide many views for the same
database.
INSTANCES & SCHEMAS
● Databases change over time as information is inserted and deleted. The
collection of information stored in the database at a particular moment
is called an instance of the database.
● The overall design of the database is called the database schema.
● Database systems have several schemas, partitioned according to the
levels of abstraction.
○ The physical schema describes the database design at the physical
level.
○ The logical schema describes the database design at the logical level.
○ A database may also have several schemas at the view level,
sometimes called subschemas, that describe different views of the
databse database.
DATA MODELS
Underlying the structure of a database is the data model: a collection
of conceptual tools for describing data, data relationships, data
semantics, and consistency constraints. A data model provides a way
to describe the design of a database at the physical, logical, and view
levels.
The data models can be classified into four different categories:
1. Relational Model.
2. Entity-Relationship Model.
3. Object-Based Data Model.
4. Semistructured Data Model.
DATA MODELS
1. Relational Model: uses a collection of tables to represent both
data and the relationships among those data. Each table has
multiple columns, and each column has a unique name. Tables
are also known as relations. The relational model is an
example of a record-based model. Record-based models are
so named because the database is structured in fixed-format
records of several types. Each table contains records of a
particular type. Each record type defines a fixed number of
fields, or attributes. The columns of the table correspond to
the attributes of the record type.
DATA MODELS
2. Entity-Relationship Model. The (E-R) data model uses a collection
of basic objects, called entities, and relationships among these
objects. An entity is a “thing” or “object” in the real world that is
distinguishable from other objects. The entity-relationship model
is widely used in database design.
3. Object-Based Data Model: has become the dominant
software-development methodology. This led to the development
of an object-oriented data model that can be seen as extending
the E-R model with notions of encapsulation, methods, and object
identity. The object-relational data model combines features of
the object-oriented data model and relational data model.
DATA MODELS
4. Semistructured Data Model: permits the specification of data
where individual data items of the same type may have
different sets of attributes. This is in contrast to the data
models mentioned earlier, where every data item of a
particular type must have the same set of attributes. The
Extensible Markup Language (XML) is widely used to
represent semistructured data.
Three Schema Architecture
Three Schema Architecture
1. The physical schema specifies additional storage details. Essentially, the
physical schema summarizes how the relations described in the conceptual
schema are actually stored on secondary storage devices such as disks and
tapes. We must decide what file organizations to use to store the relations, and
create auxiliary data structures called indexes to speed up data retrieval
operations. A sample physical schema for the university database follows:
a. Store all relations as unsorted files of records. (A file in a DBMS is either a
collection of records or a collection of pages, rather than a string of
characters as in an operating system.)
b. Create indexes on the first column of the Students, Faculty, and Courses
relations, the sal column of Faculty, and the capacity column of Rooms.
Decisions about the physical schema are based on an understanding of how the
data is typically accessed. The process of arriving at a good physical schema is
called physical database design.
Three Schema Architecture
2. The conceptual schema (logical schema) describes the stored data in terms of the data
model of the DBMS. In a relational DBMS, the conceptual schema describes all relations
that are stored in the database.
In our sample university database, these relations contain information about entities,
such as students and faculty, and about relationships, such as students’ enrollment in
courses. All student entities can be described using records in a Students relation, as we
saw earlier. In fact, each collection of entities and each collection of relationships can be
described as a relation, leading to the following conceptual schema:
● Students(sid: string, name: string, login: string, age: integer, gpa: real)
● Faculty(fid: string, fname: string, sal: real)
● Courses(cid: string, cname: string, credits: integer)
● Rooms(rno: integer, address: string, capacity: integer)
● Enrolled(sid: string, cid: string, grade: string)
● Teaches(fid: string, cid: string)
● Meets In(cid: string, rno: integer, time: string)
Three Schema Architecture
3. External schemas, which usually are also in terms of the data model of the DBMS,
allow data access to be customized (and authorized) at the level of individual users
or groups of users. Any given database has exactly one conceptual schema and one
physical schema because it has just one set of stored relations, but it may have
several external schemas, each tailored to a particular group of users.
Each external schema consists of a collection of one or more views and relations
from the conceptual schema.
A view is conceptually a relation, but the records in a view are not stored in the
DBMS. Rather, they are computed using a definition for the view, in terms of
relations stored in the DBMS. The external schema design is guided by end user
requirements. For example, we might want to allow students to find out the names
of faculty members teaching courses, as well as course enrollments. This can be
done by defining the following view:
Courseinfo(cid: string, fname: string, enrollment: integer)
Three Schema Architecture
A user can treat a view just like a relation and ask questions about the records in the
view. Even though the records in the view are not stored explicitly, they are
computed as needed. We did not include Courseinfo in the conceptual schema
because we can compute Courseinfo from the relations in the conceptual schema,
and to store it in addition would be redundant. Such redundancy, in addition to the
wasted space, could lead to inconsistencies.
For example, a tuple may be inserted into the Enrolled relation, indicating that a
particular student has enrolled in some course, without incrementing the value in
the enrollment field of the corresponding record of Courseinfo
Data Independence
Data independence is achieved through use of the three levels of data abstraction.
Relations in the external schema (view relations) are in principle generated on demand from the
relations corresponding to the conceptual schema.3 If the underlying data is reorganized, that is, the
conceptual schema is changed, the definition of a view relation can be modified so that the same
relation is computed as before.
For example, suppose that the Faculty relation in our university database is replaced by the following
two relations: Faculty public(fid: string, fname: string, office: integer)
Faculty private(fid: string, sal: real)
Intuitively, some confidential information about faculty has been placed in a separate relation and
information about offices has been added. The Courseinfo view relation can be redefined in terms of
Faculty public and Faculty private, which together contain all the information in Faculty, so that a
user who queries Courseinfo will get the same answers as before.
Thus users can be shielded from changes in the logical structure of the data, or changes in
the choice of relations to be stored. This property is called logical data independence.
In turn, the conceptual schema insulates users from changes in the physical storage of the
data. This property is referred to as physical data independence.
The conceptual schema hides details such as how the data is actually laid out on disk, the
file structure, and the choice of indexes. As long as the conceptual schema remains the
same, we can change these storage details without altering applications.
Database Languages
A query is a statement requesting the retrieval of information. The portion of
a DML that involves information retrieval is called a query language.
A database system provides
● a data-definition language to specify the database schema
● a data-manipulation language to express database queries and updates.

A data-manipulation language (DML) is a language that enables users to


access or manipulate data as organized by the appropriate data model.
The types of access are:
• Retrieval of information stored in the database [SELECT]
• Insertion of new information into the database [INSERT INTO]
• Deletion of information from the database [DELETE FROM]
• Modification of information stored in the database[UPDATE… SET]
Database Languages
A data-definition language (DDL) is also used to specify additional properties of the data. We specify
the storage structure and access methods used by the database system by a set of statements in a
special type of DDL called a data storage and definition language. These statements define the
implementation details of the database schemas, which are usually hidden from the users. The data
values stored in the database must satisfy certain consistency constraints.
● Domain Constraints. A domain of possible values must be associated with every
attribute (for example, integer types, character types, date/time types). Declaring an
attribute to be of a particular domain acts as a constraint on the values that it can
take. Domain constraints are the most elementary form of integrity constraint.
● Referential Integrity. There are cases where we wish to ensure that a value that
appears in one relation for a given set of attributes also appears in a certain set of
attributes in another relation (referential integrity). For example, the department
listed for each course must be one that actually exists. More precisely, the dept name
value in a course record must appear in the dept name attribute of some record of the
department relation. Database modifications can cause violations of referential
integrity.
Database Languages
● Assertions. An assertion is any condition that the database must always satisfy.
Domain constraints and referential-integrity constraints are special forms of assertions.
However, there are many constraints that we cannot express by using only these
special forms. For example, “Every department must have at least five courses offered
every semester” must be expressed as an assertion. When an assertion is created, the
system tests it for validity. If the assertion is valid, then any future modification to the
database is allowed only if it does not cause that assertion to be violated.
● Authorization. We may want to differentiate among the users as far as the type of
access they are permitted on various data values in the database. These
differentiations are expressed in terms of authorization, the most common being: read
authorization, which allows reading, but not modification, of data; insert authorization,
which allows insertion of new data, but not modification of existing data; update
authorization, which allows modification, but not deletion, of data; and delete
authorization, which allows deletion of data. We may assign the user all, none, or a
combination of these types of authorization.
Database Languages
The DDL, just like any other programming language, gets as input some instructions
(statements) and generates some output.
The output of the DDL is placed in the data dictionary, which contains metadata— that is,
data about data.
● The data dictionary is considered to be a special type of table that can only be accessed
and updated by the database system itself (not a regular user). The database system
consults the data dictionary before reading or modifying actual data.
Module 1 Revision Questions
2 mark questions
1. List the advantages and applications of DBMS.
2. Define instances and schemas of database.
3. What are the duties of a database administrator?
4. What is a relational model?
5. Explain the advantages and disadvantages of DBMS approach.
6. Differentiate between database schema and database instance.
7. Define procedural DML and declarative DML.
Module 1 Revision Questions
5 mark questions
1. What is meant by data independence? Explain three schema architecture.
2. List different data models in detail.
3. What are the disadvantages of file processing system?
4. What are different data models? Write about database users?
5. What is data independence? Differentiate between physical data independence and
logical data independence.
6. What are three levels of abstraction?
7. Write a comparison on file system and DBMS.
Thank You
Happy Learning!

You might also like