1.4.
4 The SQL Data-Manipulation
Language
The SQL query language is nonprocedural. A query takes as
input several tables (possibly only one) and always returns a
single table. Here is an example of an SQL query that finds
the names of all instructors in the History department:
select instructor.name
from instructor
where instructor.dept_name = 'History';
The query specifies that those rows from the table instructor
where the dept_name is History must be retrieved, and the
name attribute of these rows must be displayed. The result
of executing this query is a table with a single column
labeled name and a set of rows, each of which contains the
name of an instructor whose dept_name is History. If the
query is run on the table in
Figure 1.1, the result consists of
two rows, one with the name El Said and the other with the
name Califieri.
Queries may involve information from more than one
table. For instance, the following query finds the instructor
ID and department name of all instructors associated with a
department with a budget of more than $95,000.
select instructor.ID, department.dept_name
from instructor, department
where instructor.dept_name =
department.dept_name and department.budget
> 95000;
If the preceding query were run on the tables in
Figure 1.1,
the system would find that there are two departments with
a budget of greater than $95,000—Computer Science and
Finance; there are five instructors in these departments.
Thus, the result consists of a table with two columns (ID,
dept_name) and five rows: (12121, Finance), (45565,
Computer Science), (10101, Computer Science), (83821,
Computer Science), and (76543, Finance).
1.4.5 Database Access from Application
Programs
Non-procedural query languages such as SQL are not as
powerful as a universal Turing machine; that is, there are
some computations that are possible using a general
purpose programming language but are not possible using
SQL. SQL also does not support actions such as input from
users, output to displays, or communication over the
network. Such computations and actions must be written in
a host language, such as C/C++, Java, or Python, with
embedded SQL queries that access the data in the
database. Application programs are programs that are
used to interact with the database in this fashion. Examples
in a university system are programs that allow students to
register for courses, generate class rosters, calculate
student GPA, generate payroll checks, and perform other
tasks.
To access the database, DML statements need to
be sent from the host to the database where they
will be executed. This is most commonly done by using an
application-program interface (set of procedures) that can
be used to send DML and DDL statements to the database
and retrieve the results. The Open Database Connectivity
Page 17
(ODBC) standard defines application program interfaces for
use with C and several other languages. The Java Database
Connectivity (JDBC) standard defines a corresponding
interface for the Java language.
1.5 Database Design
Database systems are designed to manage large bodies of
information. These large bodies of information do not exist
in isolation. They are part of the operation of some
enterprise whose end product may be information from the
database or may be some device or service for which the
database plays only a supporting role.
Database design mainly involves the design of the
database schema. The design of a complete database
application environment that meets the needs of the
enterprise being modeled requires attention to a broader set
of issues. In this text, we focus on the writing of database
queries and the design of database schemas, but discuss
application design later, in
Chapter 9.
A high-level data model provides the database designer
with a conceptual framework in which to specify the data
requirements of the database users and how the database
will be structured to fulfill these requirements. The initial
phase of database design, then, is to characterize fully the
data needs of the prospective database users. The database
designer needs to interact extensively with domain experts
and users to carry out this task. The outcome of this phase
is a specification of user requirements.
Next, the designer chooses a data model, and by applying
the concepts of the chosen data model, translates these
requirements into a conceptual schema of the database.
The schema developed at this conceptual-design phase
provides a detailed overview of the enterprise. The designer
reviews the schema to confirm that all data requirements
are indeed satisfied and are not in conflict with one another.
The designer can also examine the design to remove any
redundant features. The focus at this point is on describing
the data and their relationships, rather than on specifying
physical storage details.
In terms of the relational model, the conceptual-design
process involves decisions on what attributes we want to
capture in the database and how to group these attributes
to form the various tables. The "what" part is basically a
business decision, and we shall not discuss it further in this
text. The "how" part is mainly a computer-science problem.
There are principally two ways to tackle the problem. The
f
irst one is to use the entity-relationship model (
Chapter 6);
the other is to employ a set of algorithms (collectively
known as normalization that takes as input the set of all
attributes and generates a set of tables (
Chapter 7).
A fully developed conceptual schema indicates the
functional requirements of the enterprise. In a
specification of functional requirements, users
describe the kinds of operations (or transactions)
that will be performed on the data. Example
operations include modifying or updating data, searching for
and retrieving specific data, and deleting data. At this stage
of conceptual design, the designer can review the schema
to ensure it meets functional requirements.
Page 18
The process of moving from an abstract data model to the
implementation of the database proceeds in two final design
phases. In the logical-design phase, the designer maps
the high-level conceptual schema onto the implementation
data model of the database system that will be used. The
designer uses the resulting system-specific database
schema in the subsequent physical-design phase, in
which the physical features of the database are specified.
These features include the form of file organization and the
Chapter
internal storage structures; they are discussed in
13.
1.6 Database Engine
A database system is partitioned into modules that deal
with each of the responsibilities of the overall system. The
functional components of a database system can be broadly
divided into the storage manager, the query processor
components, and the transaction management component.
The storage manager is important because databases
typically require a large amount of storage space. Corporate
databases commonly range in size from hundreds of
gigabytes to terabytes of data. A gigabyte is approximately
1 billion bytes, or 1000 megabytes (more precisely, 1024
megabytes), while a terabyte is approximately 1 trillion
bytes or 1 million megabytes (more precisely, 1024
gigabytes). The largest enterprises have databases that
reach into the multi-petabyte range (a petabyte is 1024
terabytes). Since the main memory of computers cannot
store this much information, and since the contents of main
memory are lost in a system crash, the information is stored
on disks. Data are moved between disk storage and main
memory as needed. Since the movement of data to and
from disk is slow relative to the speed of the central
processing unit, it is imperative that the database system
structure the data so as to minimize the need to move data
between disk and main memory. Increasingly, solid-state
disks (SSDs) are being used for database storage. SSDs are
faster than traditional disks but also more costly.
The query processor is important because it helps the
database system to simplify and facilitate access to data.
The query processor allows database users to obtain good
performance while being able to work at the view level and
not be burdened with understanding the physical-level
details of the implementation of the system. It is the job of
the database system to translate updates and queries
written in a nonprocedural language, at the logical level,
into an efficient sequence of operations at the physical
level.
The transaction manager is important because it allows
application developers to treat a sequence of database
accesses as if they were a single unit that either happens in
its entirety or not at all. This permits application developers
to think at a higher level of abstraction about the
application without needing to be concerned with
the lower-level details of managing the effects of concurrent
access to the data and of system failures.
Page 19
While database engines were traditionally centralized
computer systems, today parallel processing is key for
handling very large amounts of data efficiently. Modern
database engines pay a lot of attention to parallel data
storage and parallel query processing