Unit 1
Unit 1
The primary goal of a DBMS is to provide a way to store and retrieve database
information that is both convenient and efficient. Database systems are designed
to manage large bodies of information.
In addition, the database system must ensure the safety of the information
stored, despite system crashes or attempts at unauthorized access.
If data are to be shared among several users, the system must avoid possible
anomalous (deviating from what is expected) results. Because information is so
important in most organizations, computer scientists have developed a large body
of concepts and techniques for managing data.
• Enterprise Information :
◦ Accounting: For payments, receipts, account balances, assets and other accounting
information.
◦ Human resources: For information about employees, salaries, payroll taxes, and
benefits, and for generation of paychecks.
◦ Manufacturing: For management of the supply chain and for tracking production of
items in factories, inventories of items in warehouses and stores, and orders for items.
◦ Online retailers: For sales data noted above plus online order tracking, generation of
recommendation lists, and maintenance of online product evaluations.
◦ Credit card transactions: For purchases on credit cards and generation of monthly
statements.
◦ Finance: For storing information about holdings, sales, and purchases of financial
instruments such as stocks and bonds; also for storing real-time market data to enable
online trading by customers and automated trading by the firm.
• Universities: For student information, course registrations, and grades (in addition
to standard enterprise information such as human resources and accounting).
• Airlines: For reservations and schedule information. Airlines were among the first
to use databases in a geographically distributed manner.
As an example,
• Assign grades to students, compute grade point averages (GPA), and generate
transcripts
major disadvantages:
• Data redundancy and inconsistency. Since different programmers create the files
and application programs over a long period, the various files are likely to have
different structures and the programs may be written in several programming
languages.
• Difficulty in accessing data. Suppose that one of the university clerks needs to find
out the names of all students who live within a particular postal-code area. The clerk
asks the data-processing department to generate such a list. Because the designers of
the original system did not anticipate this request, there is no application program on
hand to meet it.
• Data isolation. Because data are scattered in various files, and files may be in
different formats
• Integrity problems. The data values stored in the database must satisfy certain
types of consistency constraints.
• Atomicity problems. A computer system, like any other device, is subject to failure.
In many applications, it is crucial that, if a failure occurs, the data be restored to the
consistent state that survive prior to the failure.
• Concurrent-access anomalies. For the sake of overall performance of the system
and faster response, many systems allow multiple users to update the data
simultaneously.
• Security problems. Not every user of the database system should be able to access
all the data. For example, in a university, payroll personnel need to see only that part
of the database that has financial information.
View of Data
A database system is a collection of interrelated data and a set of programs that allow
users to access and modify these data.
A major purpose of a database system is to provide users with an abstract view of the
data. That is, the system hides certain details of how the data are stored and
maintained.
Data Abstraction
The need for efficiency has let designers to use complex data structures to represent
data in the database.
Since many database-system users are not computer trained, developers hide the
complexity from users through several levels of abstraction, to simplify users’
interactions with the system:
• Physical level. The lowest level of abstraction describes how the data are actually
stored. The physical level describes complex low-level data structures in detail.
Example:
When we access data we may get a single data or a table of data. Moreover, by the
term "relational database" we visualize a table of rows and columns. But at a physical
level, these tables are stored in hard drives which are located at a very secure data
center.
Above is the picture of a Google data center that can be visited by only 1% of
Googlers! All these racks contain hard disk drives storing all your secured data!
• Logical level. The next-higher level of abstraction describes what data are stored in
the database, and what relationships exist among those data. The logical level thus
describes the entire database in terms of a small number of relatively simple
structures. Although implementation of the simple structures at the logical level may
involve complex physical-level structures, the user of the logical level does not need
to be aware of this complexity. This is referred to as physical data independence.
Database administrators, who must decide what information to keep in the database,
use the logical level of abstraction.
Example:
We have data of a few products like product id, product name, and manufacturing
date, and we have another set of data of customers containing customer id, customer
name, and customer address. Now, we need to frame this data in proper tables of
products and customers. After that, we can even frame a join to show which product
has been ordered by which customer.
• View level. The highest level of abstraction describes only part of the entire
database. Even though the logical level uses simpler structures, complexity remains
because of the variety of information stored in a large database.
Many users of the database system do not need all this information; instead, they need
to access only a part of the database.
The view level of abstraction exists to simplify their interaction with the system.
Example:
Concerning the example in the logical level section, let us say a customer wants to
view the order history, he gets to see only the orders he had made in the past. Now, let
us say a shop owner needs to see the products that are on the order list. He gets to see
a table containing all the info about the products and the customers to whom they
need to be delivered.
The system may provide many views for the same database. Figure 1.1 shows the
relationship among the three levels of abstraction.
languages support the notion of a structured type. For example, we may describe a
record as follows:1
ID : char (5);
end;
This code defines a new record type called instructor with four fields. Each field has a
name and a type associated with it. A university organization may have several such
record types, including
• course, with fields course id, title, dept name, and credits
• student, with fields ID, name, dept name, and tot cred
The overall design of the database is called the database schema. Schemas are
changed infrequently, if at all.
Data Models
Underlying the structure of a database is the data model: a collection of conceptual
tools for describing data, data relationships, data semantics, and consistency
constraints.
A data model provides a way to describe the design of a database at the physical,
logical, and view levels.
• Relational Model. The relational model uses a collection of tables to represent both
data and the relationships among those data. Each table has multiple columns, and
each column has a unique name. Tables are also known as relations. The relational
model is an example of a record-based model. The relational data model is the most
widely used data model, and a vast majority of current database systems are based on
the relational model.
Database Languages
• Procedural DMLs require a user to specify what data are needed and how to get
those data.
2. Data-Definition Language
We specify the storage structure and access methods used by the database system by
a set of statements in a special type of DDL called a data storage and definition
language. These statements define the implementation details of the database
schemas, which are usually hidden from the users (data abstraction).
The data values stored in the database must satisfy certain consistency constraints.
• Assertions. An assertion is any condition that the database must always satisfy.
Domain constraints and referential-integrity constraints are special forms of
assertions. However, there are many constraints that we cannot express by using only
these special forms. For example, “Every department must have at least five courses
offered every semester” must be expressed as an assertion.
• Authorization. We may want to differentiate among the users as far as the type of
access they are permitted on various data values in the database. These
differentiations are expressed in terms of authorization, the most common being:
read authorization, which allows reading, but not modification, of data; insert
authorization, which allows insertion of new data, but not modification of existing
data; update authorization, which allows modification, but not deletion, of data; and
delete authorization, which allows deletion of data. We may assign the user all,
none, or a combination of these types of authorization.
Relational Databases
A relational database is based on the relational model and uses a collection of tables
to represent both data and the relationships among those data. It also includes a DML
and DDL.
Tables
Each table has multiple columns and each column has a unique name. Figure 1.2
presents a sample relational database comprising two tables: one shows details of
university instructors and the other shows details of the various university
departments.
1.5.2 Data-Manipulation Language
The SQL query language is nonprocedural. A query takes as input several tables
(possibly only one) and always returns a single table. Here is an example of an SQL
query that finds the names of all instructors in the History department:
select instructor.name
from instructor
Data-Definition Language
SQL provides a rich DDL that allows one to define tables, integrity constraints,
assertions, etc.
For instance, the following SQL DDL statement defines the department table:
(dept name char (20), building char (15), budget numeric (12,2));
numeric (5,2)…> 5 means max five digit including decimal….2 means max two digit
inn decimal
Example :
100.12 -->ok
10.012 -->Error
SQL does not support actions such as input from users, output to displays, or
communication over the network. Such computations and actions must be written in a
host language, such as C, C++, or Java, with embedded SQL queries that access the
data in the database. Application programs are programs that are used to interact
with the database in this fashion.
Example:
Database Design
Next, the designer chooses a data model, and by applying the concepts of the chosen
data model, translates these requirements into a conceptual schema of the database.
The process of moving from an abstract data model to the implementation of the
database proceeds in two final design phases. In the logical-design phase, the
designer maps the high-level conceptual schema onto the implementation data model.
in the subsequent physical-design phase, in which include the form of file
organization and the internal storage structures.
Example:
The initial specification of user requirements may be based on interviews with the
database users, and on the designer’s own analysis of the organization.
The description that arises from this design phase serves as the basis for specifying
the conceptual structure of the database.
• Each department has a list of courses it offers. Each course has associated with it a
course id, title, dept name, and credits, and may also have have associated
prerequisites.
• Instructors are identified by their unique ID. Each instructor has name, associated
department (dept name), and salary.
• Students are identified by their unique ID. Each student has a name, an associated
major department (dept name), and tot cred (total credit hours the student earned thus
far).
• The university maintains a list of classrooms, specifying the name of the building,
room number, and room capacity.
• The university maintains a list of all classes (sections) taught. Each section is
identified by a course id, sec id, year, and semester, and has associated with it a
semester, year, building, room number, and time slot id (the time slot when the class
meets).
• The department has a list of teaching assignments specifying, for each instructor,
the sections the instructor is teaching.
• The university has a list of all student course registrations, specifying, for each
student, the courses and the associated sections that the student has taken (registered
for).
This simplified model to help you understand conceptual ideas of data base design.
draw these diagrams. One of the most popular is to use the Unified Modeling
Language (UML).
The E-R diagram indicates that there are two entity sets, instructor and department,
with attributes as outlined earlier. The diagram also shows a relationship member
between instructor and department.
Normalization
To understand the need for normalization, let us look at what can go wrong in a bad
database design. Among the undesirable properties that a bad design may have are:
• Repetition of information
Database Architecture
Below picture depicts the various components of a database system and the
connections among them.
The architecture of a database system is greatly influenced by the underlying
computer system on which the database system runs.
Most users of a database system today are connected to it through a network. We can,
therefore, differentiate between client machines, on which remote database users
work, and server machines, on which the database system runs.
1-Tier Architecture
o In this architecture, the database is directly available to the user. It
means the user can directly sit on the DBMS and use it.
o Any changes done here will directly be done on the database itself. It
doesn't provide a handy tool for end users.
o The 1-Tier architecture is used to develop the local application, where
programmers can directly communicate with the database for quick
response.
2-Tier Architecture
3-Tier Architecture
o The 3-Tier architecture contains another layer between the client and
server. In this architecture, the client can't directly communicate with
the server.
o The application on the client-end interacts with an application server
which further communicates with the database system.
o End user has no idea about the existence of the database beyond the
application server. The database also has no idea about any other user
beyond the application.
o The 3-Tier architecture is used in case of large web application, and for
applications that run on the WorldWideWeb.
Database Users and Administrators
A primary goal of a database system is to retrieve information from and store new information
into the database. People who work with a database can be categorized as database users or
database administrators.
Database Users:
Users are differentiated by the way they expect to interact with the system:
Application programmers:
o Application programmers are computer professionals who write
application programs. Application programmers can choose from
many tools to develop user interfaces.
o Example: Rapid application development (RAD) tools are tools that
enable an application programmer to construct forms and reports
without writing a program.
Sophisticated users:
o Sophisticated users interact with the system without writing
programs. Instead, they form their requests in a database query
language.
o Example: They submit each such query to a query processor, whose
function is to break down DML statements into instructions that the
storage manager understands.
Specialized users :
o Specialized users are sophisticated users who write specialized
database applications that do not fit into the traditional data-
processing framework.
o Among these applications are computer-aided design systems,
knowledge base and expert systems, systems that store data with
complex data types (for example, graphics data and audio data), and
environment-modeling systems.
Naïve users :
o Naive users are unsophisticated users who interact with the system
by invoking one of the application programs that have been written
previously.
o For example, a bank teller who needs to transfer $50 from account A
to account B invokes a program called transfer. This program asks
the teller for the amount of money to be transferred, the account
from which the money is to be transferred, and the account to which
the money is to be transferred.
Query Processor:
The query processor will accept query from user and solves it by accessing the
database.
Parts of Query processor:
1. DDL interpreter
a. This will interprets DDL statements and fetch the definitions in the
data dictionary.
2. DML compiler
a. This will translates DML statements in a query language into low
level instructions that the query evaluation engine understands.
b. A query can usually be translated into any of a number of alternative
evaluation plans for same query result DML compiler will select best
plan for query optimization.
3. Query evaluation engine
This engine will execute low-level instructions generated by the DML
compiler on DBMS.
One of the main reasons for using DBMSs is to have central control of the data and the programs
that access those data. A person with such central control over the system is called a database
administrator (DBA).