0% found this document useful (0 votes)
4 views

Chapter1_Sem4

The document provides an overview of databases, focusing on relational database management systems, data types, and the hierarchy of data. It discusses various data processing methods, including batch, real-time, online, and distributed processing, as well as the differences between centralized, decentralized, and distributed systems. Additionally, it covers the concept of databases and database management systems, emphasizing their role in data storage, retrieval, and management.

Uploaded by

vaishhsingh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Chapter1_Sem4

The document provides an overview of databases, focusing on relational database management systems, data types, and the hierarchy of data. It discusses various data processing methods, including batch, real-time, online, and distributed processing, as well as the differences between centralized, decentralized, and distributed systems. Additionally, it covers the concept of databases and database management systems, emphasizing their role in data storage, retrieval, and management.

Uploaded by

vaishhsingh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 84

Database for Business

2
Chapter 1)

 Relational Database Management System – Concept (Data, Data Types,


Character, Field, Record, File, Database, Information, RDBMS), ACID properties,
Important Terms (Entity, Attribute, Primary Key, Foreign Key, Candidate Key,
Entity Integrity, Referential Integrity, Table, Relation, Views, SQL, Data
Dictionary, Schema, Metadata), Advantages of SQL, Types of SQL commands
(DDL, DCL, DML, TCL).

 Database for Business Performance Improvement - OLAP & OLTP, Data Lake,
Data Warehousing (Concept, Features, Architecture & Analytical Techniques -
Roll up, Drill Down, Slicing, Pivot), Data Mining and Forecasting, Data Mart,
Data Backup (Concept & Types).

SSR 04/12/2025
3
Data

 In general, data is any set of characters that is gathered and translated for
some purpose, usually analysis. If data is not put into context, it doesn't do
anything to a human or computer.
 There are multiple types of data. Some of the more common types of data
include the following:
 Single character
 Boolean (true or false)
 Text (string)
 Number (integer or floating-point)
 Picture
 Sound
SSR Video 04/12/2025
4

 In computing, data is information that has been translated into a form that is efficient for
movement or processing. Relative to today's computers and transmission media, data is
information converted into binary digital form. It is acceptable for data to be used as a
singular subject or a plural subject. Raw data is a term used to describe data in its most basic
digital format.
 The concept of data in the context of computing has its roots in the work of Claude Shannon,
an American mathematician known as the father of information theory. He ushered in binary
digital concepts based on applying two-value Boolean logic to electronic circuits. Binary digit
formats underlie the CPUs, semiconductor memories and disk drives, as well as many of the
peripheral devices common in computing today. Early computer input for both control and
data took the form of punch cards, followed by magnetic tape and the hard disk.

SSR 04/12/2025
5

 Computer data is information processed or stored by a computer. This


information may be in the form of text documents, images, audio clips,
software programs, or other types of data. Computer data may be
processed by the computer's CPU and is stored in files and folders on
the computer's hard disk.

SSR 04/12/2025
6
How data is stored

 Computers represent data, including video, images, sounds and text, as


binary values using patterns of just two numbers: 1 and 0. A bit is the
smallest unit of data, and represents just a single value. A byte is eight
binary digits long. Storage and memory is measured in megabytes and
gigabytes.

SSR 04/12/2025
7
Hierarchy of Data

 Data are the principal resources of an organization. Data stored in


computer systems form a hierarchy extending from a single bit to a
database, the major record-keeping entity of a firm. Each higher rung of
this hierarchy is organized from the components below it.
 Data are logically organized into:
 1. Bits (characters)
 2. Fields
 3. Records
 4. Files
 5. Databases

SSR 04/12/2025
8

SSR 04/12/2025
9

 Bit (Character) - a bit is the smallest unit of data representation (value of a bit may be a 0 or 1).
Eight bits make a byte which can represent a character or a special symbol in a character code.
 Field - a field consists of a grouping of characters. A data field represents an attribute (a
characteristic or quality) of some entity (object, person, place, or event).
 Record - a record represents a collection of attributes that describe a real-world entity. A record
consists of fields, with each field describing an attribute of the entity.
 File - a group of related records. Files are frequently classified by the application for which they are
primarily used (employee file). A primary key in a file is the field (or fields) whose value identifies
a record among others in a data file.
 Database - is an integrated collection of logically related records or files. A database consolidates
records previously stored in separate files into a common pool of data records that provides data
for many applications. The data is managed by systems software called database management
systems (DBMS). The data stored in a database is independent of the application programs using it
and
SSR of the types of secondary storage devices on which it is stored. 04/12/2025
10

SSR 04/12/2025
11
Information

 Information is the summarization of data. Technically, data are raw facts and
figures that are processed into information, such as summaries and totals. But
since information can also be the raw data for the next job or person, the two terms
cannot be precisely defined, and both are used interchangeably. It may be helpful
to view information the way it is structured and used, namely: data, text,
spreadsheets, pictures, voice and video. Data are discretely defined fields. Text is a
collection of words. Spreadsheets are data in matrix (row and column) form.
Pictures are lists of vectors or frames of bits. Voice is a continuous stream of sound
waves. Video is a sequence of image frames.

SSR 04/12/2025
12

SSR 04/12/2025
13
Centralized, Decentralized and
Distributed Systems
 1. CENTRALIZED SYSTEMS:
 Centralized systems are systems that use client/server architecture
where one or more client nodes are directly connected to a central
server. This is the most commonly used type of system in many
organisations where client sends a request to a company server and
receives the response.

SSR 04/12/2025
14

 Example –
Wikipedia. Consider a massive server to which we send our requests
and the server responds with the article that we requested. Suppose we
enter the search term ‘junk food’ in the Wikipedia search bar. This
search term is sent as a request to the Wikipedia servers (mostly
located in Virginia, U.S.A) which then responds back with the articles
based on relevance. In this situation, we are the client node, wikipedia
servers are central server.

SSR 04/12/2025
15

 Advantages of Centralized System – Disadvantages of Centralized System –


 Easy to physically secure. It is easy to secure and service • Highly dependent on the network
the server and client nodes by virtue of their location connectivity – System can fail if the nodes
 lose connectivity as there is only one central
Smooth and elegant personal experience – A client has a
dedicated system which he uses(for example, a personal node.
computer) and the company has a similar system which • No graceful degradation of system – abrupt
can be modified to suit custom needs failure of the entire system
 Dedicated resources (memory, CPU cores, etc)
• Less possibility of data backup. If the server
node fails and there is no backup, you lose
 More cost efficient for small systems upto a certain limit
the data straight away
– As the central systems take less funds to set up, they
• Difficult server maintenance – There is only
have an edge when small systems have to be built
one server node and due to availability
 Quick updates are possible – Only one machine to reasons, it is inefficient and unprofessional to
update.
take the server down for maintenance. So,
 Easy detachment of a node from the system. Just updates have to be done on-the-fly(hot
remove
SSR the connection of the client node from the server 04/12/2025
updates) which is difficult and the system
16
DECENTRALIZED SYSTEMS:

 These are another type of systems which have been gaining a lot of
popularity, primarily because of the massive hype of Bitcoin. Now many
organisations are trying to find the application of such systems.
 In decentralized systems, every node makes its own decision. The final
behavior of the system is the aggregate of the decisions of the individual
nodes. Note that there is no single entity that receives and responds to
the request.

SSR 04/12/2025
17

 Example –
Bitcoin. Lets take bitcoin for example because its the most popular use
case of decentralized systems. No single entity/organisation owns the
bitcoin network. The network is a sum of all the nodes who talk to each
other for maintaining the amount of bitcoin every account holder has.

SSR 04/12/2025
18

 Advantages of Decentralized System – Disadvantages of Decentralized System –


• Difficult to achieve global big tasks – No chain of
 Minimal problem of performance bottlenecks
command to command others to perform certain
occurring – The entire load gets balanced on all
tasks
the nodes; leading to minimal to no bottleneck
situations
• No regulatory oversight
• Difficult to know which node failed – Each node
 High availability – Some nodes(computers, must be pinged for availability checking and
mobiles, servers) are always available/online for partitioning of work has to be done to actually
work, leading to high availability
find out which node failed by checking the
 More autonomy and control over resources – As expected output with what the node generated
each node controls its own behavior, it has • Difficult to know which node responded – When a
better autonomy leading to more control over request is served by a decentralised system, the
resources request is actually served by one of the nodes in
the system but it is actually difficult to find out
which node indeed served the request.
SSR 04/12/2025
19
DISTRIBUTED SYSTEMS:

 In decentralized systems, every node makes its own decision. The final
behaviour of the system is the aggregate of the decisions of the
individual nodes. Note that there is no single entity that receives and
responds to the request.

SSR 04/12/2025
20

 Example –
Google search system. Each request is worked upon by hundreds of
computers which crawl the web and return the relevant results. To the
user, the Google appears to be one system, but it actually is multiple
computers working together to accomplish one single task (return the
results to the search query).

SSR 04/12/2025
21

 Advantages of Distributed System –


 Low latency than centralized system – Distributed systems have low
latency because of high geographical spread, hence leading to less
time to get a response
 Disadvantages of Distributed System –
 Difficult to achieve consensus
 Conventional way of logging events by absolute time they occur is not
possible here

SSR 04/12/2025
22

SSR 04/12/2025
23
Types of Data Processing

 Batch Processing:
This is one of the widely used type of data processing which is also
known as Serial/Sequential, Tacked/Queued offline processing. The
fundamental of this type of processing is that different jobs of
different users are processed in the order received. Once the
stacking of jobs is complete they are provided/sent for processing
while maintaining the same order. This processing of a large
volume of data helps in reducing the processing cost thus making
it data processing economical. Batch Processing is a method where
the information to be organized is sorted into groups to allow for
efficient and sequential processing.

SSR 04/12/2025
24

 Real time processing:


As the name suggests this method is used for carrying out real-time processing.
This is required where the results are displayed immediately or in lowest time
possible. The data fed to the software is used almost instantaneously for
processing purpose. The nature of processing of this type of data processing
requires use of internet connection and data is stored/used online. No lag is
expected/acceptable in this type and receiving and processing of transaction is
carried out simultaneously. This method is costly than batch processing as the
hardware and software capabilities are better. Example includes banking system,
tickets booking for flights, trains, movie tickets, rental agencies etc. This technique
can respond almost immediately to various signals to acquire and process
information. These involve high maintenance and upfront cost attributed to very
advanced technology and computing power. Time saved is maximum in this case
as the output is seen in real time. For example in banking transactions.
SSR 04/12/2025
25

 Online Processing:
This processing method is a part of automatic processing method. This method at times known as
direct or random access processing. Under this method the job received by the system is
processed at same time of receiving. This can be considered and often mixed with real-time
processing. This system features random and rapid input of transaction and user defined/
demanded direct access to databases/content when needed. This is a method that utilizes
Internet connections and equipment directly attached to a computer. This allows the data to be
stored in one place and being used at an altogether different place. Cloud computing can be
considered as an example which uses this type of processing. It is used mainly for information
recording and research.

 Distributed Processing:
This method is commonly utilized by remote workstations connected to one big central
workstation or server. ATMs are good examples of this data processing method. All the end
machines run on a fixed software located at a particular place and make use of exactly same
information and sets of instruction.

SSR 04/12/2025
26
Flat Files

 A flat file, also known as a text database, is a type of database that


stores data in a plain text format. Flat file databases were developed
and implemented in the early 1970s by IBM.
 Flat files typically text files that have all word processing and structure
markup removed. A flat file features a table with a single record per
line. The different columns in a record use a tab or comma to delimit
the fields. The flat file database does not have multiple tables, unlike in
a relational database. The information contained in flat files does not
have associated paths or folders.
 All the records are stored in one place, and the database can be set up
with a number of standard office applications. The database is easy to
understand, and it is easy to sort the records. Records can also be
viewed or extracted with simple criteria.

SSR 04/12/2025
27
Database

 Database, also called electronic database, is any collection of data, or information,


specially organized for rapid search and retrieval by a computer. Databases are structured
to facilitate the storage, retrieval, modification, and deletion of data in conjunction with
various data-processing operations.
 A database is stored as a file or a set of files. The information in these files may be broken
down into records, each of which consists of one or more fields.

Database Management System


 Database Management System (DBMS) is a software for storing and retrieving users' data
while considering appropriate security measures. It consists of a group of programs which
manipulate the database. The DBMS accepts the request for data from an application and
instructs the operating system to provide the specific data. In large systems, a DBMS helps
users and other third-party software to store and retrieve data.
 DBMS allows users to create their own databases as per their requirement. The term “DBMS”
SSR includes the user of the database and other application programs. It provides an interface
04/12/2025
28
Advantages of DBMS

 DBMS offers a variety of techniques to store & retrieve data,


 DBMS serves as an efficient handler to balance the needs of multiple applications
using the same data,
 Uniform administration procedures for data,
 Application programmers never exposed to details of data representation and
storage,
 A DBMS uses various powerful functions to store and retrieve data efficiently,
 Offers Data Integrity and Security,
 The DBMS implies integrity constraints to get a high level of protection against
prohibited access to data,
 A DBMS schedules concurrent access to the data in such a manner that only one user
can access the same data at a time,
SSR 04/12/2025
 Reduced Application Development Time.
29
Disadvantage of DBMS

 Cost of Hardware and Software of a DBMS is quite high which increases the
budget of your organization,
 Most database management systems are often complex systems, so the
training for users to use the DBMS is required,
 In some organizations, all data is integrated into a single database which can
be damaged because of electric failure or database is corrupted on the storage
media,
 Use of the same program at a time by many users sometimes lead to the loss
of some data,
 DBMS can't perform sophisticated calculations.

SSR 04/12/2025
30
Why RDBMS ?

 A relational database organizes data into tables which can be linked—


or related—based on data common to each. This capability enables you
to retrieve an entirely new table from data in one or more tables with a
single query. It also allows you and your business to better understand
the relationships among all available data and gain new insights for
making better decisions or identifying new opportunities.

 The primary benefit of the relational database approach is the ability to


create meaningful information by joining the tables. Joining tables
allows you to understand the relationships between the data, or how
the tables connect. SQL includes the ability to count, add, group, and
also combine queries. SQL can perform basic math and subtotal
functions and logical transformations. Analysts can order the results by
SSR date, name, or any column. 04/12/2025
31
Features of RDBMS

 First of all, its number one feature is the ability to store data in tables. The fact that the very
storage of data is in a structured form can significantly reduce iteration time.
 Data persists in the form of rows and columns and allows for a facility primary key to define
unique identification of rows.
 It creates indexes for quicker data retrieval.
 Allows for various types of data integrity like (i) Entity Integrity; wherein no duplicate rows in
a table exist, (ii)Domain Integrity; that enforces valid entries for a given column by filtering
the type, the format, or the wide use of values, (iii)Referential Integrity; which disables the
deletion of rows that are in use by other records and (iv)User Defined Integrity ; providing
some specific business rules that do not fall into the above three.
 Also allows for the virtual table creation which provides a safe means to store and secure
sensitive content.
 Common column implementation and also multi user accessibility is included in the RDBMS
features.
SSR 04/12/2025
32
Advantages of RDBMS

 Data is stored only once and hence multiple record changes are not required. Also deletion
and modification of data becomes simpler and storage efficiency is very high.
 Complex queries can be carried out using the Structure Query Language. Terms like ‘Insert’,
‘Update’, ‘Delete’, ‘Create’ and ‘Drop’ are keywords in SQL that help in accessing a particular
data of choice.
 Better security is offered by the creation of tables. Certain tables can be protected by this
system. Users can set access barriers to limit access to the available content. It is very useful
in companies where a manager can decide which data is provided to the employees and
customers. Thus a customized level of data protection can be enabled.
 Provision for future requirements as new data can easily be added and appended to the
existing tables and can be made consistent with the previously available content. This is a
feature that no flat file database has.

SSR 04/12/2025
34

Atomicity:
By this, we mean that either the entire transaction takes place at once or doesn’t happen at all.
There is no midway i.e. transactions do not occur partially. Each transaction is considered as one
unit and either runs to completion or is not executed at all. It involves the following two
operations.
Abort: If a transaction aborts, changes made to database are not
visible.
Commit: If a transaction commits, changes made are visible.

Consistency:
This means that integrity constraints must be maintained so that the database is consistent before
and after the transaction. It refers to the correctness of a database.
The total amount before and after the transaction must be maintained.
Total before T occurs = 500 + 200 = 700.
Total
SSR
after T occurs = 400 + 300 = 700. 04/12/2025
Therefore, database is consistent.
35

Isolation:
This property ensures that multiple transactions can occur concurrently without leading
to the inconsistency of database state. Transactions occur independently without
interference. Changes occurring in a particular transaction will not be visible to any other
transaction until that particular change in that transaction is written to memory or has
been committed. This property ensures that the execution of transactions concurrently
will result in a state that is equivalent to a state achieved these were executed serially in
some order.

Durability:
This property ensures that once the transaction has completed execution, the updates
and modifications to the database are stored in and written to disk and they persist even
if a system failure occurs. These updates now become permanent and are stored in non-
volatile memory. The effects of the transaction, thus, are never lost.
SSR 04/12/2025
36

SSR 04/12/2025
37

SSR 04/12/2025
38

 Difference between database and flat files are given below:


 Database provide more flexibility whereas flat file provide less flexibility.
 Database system provide data consistency whereas flat file can not provide
data consistency.
 Database is more secure over flat files.
 Database support DML and DDL whereas flat files can not support these.
 Less data redundancy in database whereas more data redundancy in flat files.

SSR 04/12/2025
39
Concepts of RDBMS:

 Entity:-
 When an object becomes uniquely identifiable we can call it an
entity.
 An entity can be of two types:
 Tangible Entity: Tangible Entities are those entities which exist in
the real world physically. Example: Person, car, etc.
 Intangible Entity: Intangible Entities are those entities which
exist only logically and have no physical
existence. Example: Bank Account, etc.

 Entity Type: The entity type is a collection of the entity having


similar attributes.
SSR 04/12/2025
 Attribute / Columns:
In relational databases, attributes are the describing characteristics or properties that define
40
all items
pertaining to a certain category applied to all cells of a column. The rows represent data sets applied to a
single entity to uniquely identify each item.

 Simple attributes : Class, Age.


 Composite attributes : Name, Address, DOB.
 Single valued attributes: Roll No.
 Multi valued attributes : Mobile Number.
 Derived attributes : DOB -> Age.
 Key attributes : ID.

 Tuples / Rows:
A single entry in a table is called a Tuple or Record or Row. A tuple in a table represents a set of related
data.

 Table:
In Relational
SSR database model, a table is a collection of data elements organized in terms of rows and columns.
04/12/2025
A table is also considered as a convenient representation of relations.
41
RDBMS Keys

 SUPER KEY:
Super Key is a set of attributes whose set of values can uniquely identify an entity instance in
the entity set. It contains one or more than one attributes. It is the broadest definition of
unique identifiers of an entity in an entity set.
The combination of “SSN” and “Name” is a super key of the following entity set customer.

 CANDIDATE KEY:
Candidate key is a set of one or more attributes whose set of values can uniquely identify an
entity instance in the entity set. Any attribute in the candidate key cannot be omitted without
destroying the uniqueness property of the Candidate key. It is minimal Super Key. In building a
database in a database software, the software will only allow to use one candidate key to be
the unique identifier of an entity for an entity set.
SSR 04/12/2025
42

 Example: • (SSN, Name) is NOT a candidate key, because taking out “name” still leaves “SSN” which
can uniquely identify an entity. “SSN” is a candidate key of customer.
 Example: Both “SSN” and “License #” are candidate keys of Driver entity set. Customer-name
Customer-street customer SSN Customer-city

 Overall, Super Key is the broadest unique identifier; Candidate Key is a subset of Super Key; and
Primary Key is a subset of Candidate Key. In practice, we would first look for Super Keys. Then we
look for Candidate Keys based on experience and common sense. If there is only one Candidate Key,
it naturally will be designated as the Primary Key. If we find more than one Candidate Key, then we
can designate any one of them as Primary Key.

SSR 04/12/2025
43

 PRIMARY KEY:
The Primary Key is an attribute or a set of attributes that uniquely identify a specific instance
of an entity. Every entity in the data model must have a primary key whose values uniquely
identify instances of the entity.
To qualify as a primary key for an entity, an attribute must have the following properties: It
must have a non-null value for each instance of the entity The value must be unique for each
instance of an entity The values must not change or become null during the life of each entity
instance PRIMARY KEY Properties Of Primary Keys....

Primary and Foreign keys are the most basic components on which relational theory is based.
Each entity must have a attribute or attributes, the primary key, whose values uniquely
identify each instance of the entity. Every child entity must have an attribute, the foreign key,
that completes the association with the parent entity.
SSR 04/12/2025
44

 FOREIGN KEY:
A Foreign key is an attribute that completes a relationship by identifying the parent entity. Foreign
keys provide a method for maintaining integrity in the data (called referential integrity) and for
navigating between different instances of an entity. Every relationship in the model must be
supported by a foreign key.
Every dependent and category (subtype) entity in the model must have a foreign key for each
relationship in which it participates. Foreign keys are formed in dependent and subtype entities by
migrating the entire primary key from the parent or generic entity. If the primary key is
composite, it may not be split.

 COMPOSITE KEY:
When a primary key is created from a combination of 2 or more columns, the primary key is called
a composite
SSR key. Each column may not be unique by itself within the database table but04/12/2025
when
combined with the other column(s) in the composite key, the combination is unique.
45

SSR 04/12/2025
 46
Referential integrity refers to the accuracy and consistency of data within a relationship.
In relationships, data is linked between two or more tables. This is achieved by having the foreign
key (in the associated table) reference a primary key value (in the primary – or parent –
table). Because of this, we need to ensure that data on both sides of the relationship remain
intact.

So, referential integrity requires that, whenever a foreign key value is used it must
reference a valid, existing primary key in the parent table.

For example, if we delete row number 15 in a primary table, we need to be sure that there’s
no foreign key in any related table with the value of 15. We should only be able to delete a
primary key if there are no associated rows. Otherwise, we would end up with an orphaned
record.

 Referential integrity will prevent users from:


• Adding rows to a related table if there is no associated row in the primary table.
• Changing values in a primary table that result in orphaned records in a related table.
SSR 04/12/2025
• Deleting rows from a primary table if there are matching related rows.
47
Integrity Constraints

 1. Domain constraints:
Domain constraints can be defined as the definition of a valid set of values for an
attribute.
The data type of domain includes string, character, integer, time, date, currency, etc.
The value of the attribute must be available in the corresponding domain.

 2. Entity integrity constraints:


The entity integrity constraint states that primary key value can't be null.
This is because the primary key value is used to identify individual rows in relation and
if the primary key has a null value, then we can't identify those rows.
A table can contain a null value other than the primary key field.
SSR 04/12/2025
48

 3. Referential Integrity Constraints:


A referential integrity constraint is specified between two tables.
In the Referential integrity constraints, if a foreign key in Table 1 refers to the Primary Key
of Table 2, then every value of the Foreign Key in Table 1 must be null or be available in
Table 2.

 4. Key constraints:
Keys are the entity set that is used to identify an entity within its entity set uniquely.
An entity set can have multiple keys, but out of which one key will be the primary key. A
primary key can contain a unique and null value in the relational table.

SSR 04/12/2025
49

SSR 04/12/2025
50
Relations in DBMS:

 Relation is sometimes used to refer to a table in a relational database but is more


commonly used to describe the relationships that can be created between those
tables in a relational database.

In relational databases, a relationship exists between two tables when one of them has a
foreign key that references the primary key of the other table. This single fact allows
relational databases to split and store data in different tables, yet still link the disparate
data items together. It is one of the features that makes relational databases such
powerful and efficient stores of information. Relation may also be known as relationship.

• One-to-One Relationship: 1 Student has 1 ID.


• One-to-Many or Many-to-One Relationship: 1 Person has n Accounts.

SSR
Many-to-Many Relationship: x Customers can buy y Products. 04/12/2025
51
Schema

A database schema is the skeleton structure that represents the view of the entire database. It
defines how the data is organized and how the relations among them are associated. It formulates all
the constraints that are to be applied on the data.
A database schema defines its entities and the relationship among them. It contains a descriptive
detail of the database.

 The design of a database at physical level is called physical schema, how the data stored in
blocks of storage is described at this level.
 Design of database at logical level is called logical schema, programmers and database
administrators work at this level, at this level data can be described as certain types of data
records gets stored in data structures, however the internal details such as implementation of
data structure is hidden at this level (available at physical level).
 Design of database at view level is called view schema. This generally describes end user
interaction with database systems.
SSR 04/12/2025
52
Views

 Views in DMBS are kind of virtual tables. A view also has rows and columns as they
are in a real table in the database. We can create a view by selecting fields from one
or more tables present in the database. A View can either have all the rows of a table
or specific rows based on certain condition.
Views can join and simplify multiple tables into a single virtual table. Views can
act as aggregated tables, where the database engine aggregates data (sum,
average, etc.) and presents the calculated results as part of the data. Views can
hide the complexity of data.
The difference between a view and a table is that views are definitions built on
top of other tables (or views), and do not hold data themselves. If data is
changing in the underlying table, the same change is reflected in the view.

SSR 04/12/2025
53

SSR 04/12/2025
54
Metadata

 Metadata in DBMS is characterized as data about data. It implies it is a context and


description of the data. It assists in understanding, finding, and organizing data.
Technical metadata gives data on the specialized properties of an advanced document or
the specific software and hardware conditions needed to process or render digital
information. Metadata can be stored in a variety of places. Where the metadata relates
to databases, the data is often stored in tables and fields within the database.
Some examples of basic metadata are author, date created, date modified, and file size.
Metadata is also used for unstructured data such as images, video, web pages,
spreadsheets, etc.

SSR 04/12/2025
55

SSR 04/12/2025
56
Data Dictionary

 A Data Dictionary is a collection of names, definitions, and attributes about data


elements that are being used or captured in a database, information system, or part of a
research project. It describes the meanings and purposes of data elements within the
context of a project, and provides guidance on interpretation, accepted meanings and
representation. A Data Dictionary also provides metadata about data elements. The
metadata included in a Data Dictionary can assist in defining the scope and
characteristics of data elements, as well the rules for their usage and application.

 A data dictionary is a centralized repository of metadata. Metadata is data about data.


Some examples of what might be contained in an organization's data dictionary include:
integer, real, character, and image of all fields in the organization's databases.

SSR 04/12/2025
57

SSR 04/12/2025
SQL 58
 SQL stands for Structured Query Language. It lets you access and manipulate databases.
 SQL (Structured Query Language) is a standardized programming language that's used to
manage relational databases and perform various operations on the data in them.

1. It’s a universal language.


2. Easy , simple , English language to use and learn.
3. Manage 1000 and more rows and columns together.
4. Available across all platforms

The uses of SQL include modifying database table and index structures; adding,
updating and deleting rows of data; and retrieving subsets of information from
within a database for transaction processing and analytics applications.
Queries and other SQL operations take the form of commands written as
statements -- commonly used SQL statements include select, add, insert,
update, delete, create, alter and truncate.

SSR 04/12/2025
59

SSR 04/12/2025
 Data Definition Language (DDL): 60
DDL changes the structure of the table like creating a table, deleting a table, altering a
table, etc.

 All the command of DDL are auto-committed that means it permanently save all the
changes in the database.

Here are some commands that come under DDL:

a. CREATE It is used to create a new table in the database.


b. ALTER: It is used to alter the structure of the database. This change could be either to
modify the characteristics of an existing attribute or probably to add a new attribute.
c. DROP: It is used to delete both the structure and record stored in the table.
d. TRUNCATE: It is used to delete all the rows from the table and free the space
containing the table.
SSR 04/12/2025
 Data Manipulation Language (DML); 61
DML commands are used to modify the database. It is responsible for all form of changes in
the database.

 The command of DML is not auto-committed that means it can't permanently save all the
changes in the database. They can be rollback.

Here are some commands that come under DML:

a. SELECT: This is the same as the projection operation of relational algebra. It is used to
select the attribute based on the condition described by WHERE clause.
b. INSERT: The INSERT statement is a SQL query. It is used to insert data into the row of a
table.
c. UPDATE: This command is used to update or modify the value of a column in the table.
d. DELETE: It is used to remove one or more row from a table.

SSR 04/12/2025
 Transaction Control Language (TCL);
62
TCL commands can only use with DML commands like INSERT, DELETE and UPDATE only.

 These operations are automatically committed in the database that's why they
cannot be used while creating tables or dropping them.

Here are some commands that come under TCL:

a. Commit: Commit command is used to save all the transactions to the database.
b. Rollback: Rollback command is used to undo transactions that have not already
been saved to the database.
c. SAVEPOINT: It is used to roll the transaction back to a certain point without rolling
back the entire transaction.

SSR 04/12/2025
63

 Data Control Language (DCL):


DCL commands are used to grant and take back authority from any
database user.

Here are some commands that come under DCL:

a. Grant: It is used to give user access privileges to a database.


b. Revoke: It is used to take back permissions from the user.

SSR 04/12/2025
64
OLAP:

OLAP stands for On-Line Analytical Processing, a category of software tools which provide
analysis of data for business decisions. OLAP systems allow users to analyse database
information from multiple database systems at one time.
It is used for analysis of database information from multiple database systems at one time such as
sales analysis and forecasting, market research, budgeting and etc. Data Warehouse is the
example of OLAP system. Any Data-warehouse system is an OLAP system.
Example of OLAP
A company might compare their mobile phone sales in September with sales in October, then
compare those results with another location which may be stored in a separate database.
Amazon analyses purchases by its customers to come up with a personalized homepage with
products which likely interest to their customer.

SSR 04/12/2025
65

 Benefits of using OLAP services Drawbacks of OLAP service

I. OLAP creates a single platform


for all type of business I. Implementation and maintenance are
dependent on IT professional because
analytical needs which the traditional OLAP tools require a
includes planning, budgeting, complicated modelling procedure.
forecasting, and analysis.
II. OLAP tools need cooperation between
II. The main benefit of OLAP is the people of various departments to be
consistency of information and effective which might always be not
calculations. possible.

III. Easily apply security


restrictions on users and
objects to comply with
regulations and protect
sensitive data.

SSR 04/12/2025
OLTP: 66
 Online transaction processing shortly known as OLTP supports transaction-oriented
applications in a 3-tier architecture. OLTP administers day to day transaction of an
organization.

 OLTP stands for On-Line Transactional processing. It is used for maintaining the online
transaction and record integrity in multiple access environments. OLTP is a system that
manages very large number of short online transactions for example, ATM.

 OLTP applications typically possess the following characteristics:


 Transactions that involve small amounts of data,
 Indexed access to data,
 A large number of users,
 Frequent queries and updates,
 Fast response times.
SSR 04/12/2025
67

 Examples of OLTP system

An example of OLTP system is ATM centre. Assume that a couple has a joint account with
a bank. One day both simultaneously reach different ATM centres at precisely the same
time and want to withdraw total amount present in their bank account.

• Online banking
• Purchasing a book online
• Booking an airline ticket
• Sending a text message
• Order entry
• Telemarketers entering telephone survey results
• Call center staff viewing and updating customers’ details
SSR 04/12/2025
68

SSR 04/12/2025
69
Data Architecture:

SSR 04/12/2025
70
What is Data Warehousing?

 Data Warehousing (DW) is a process for collecting and


managing data from diverse sources to provide
meaningful insights into the business. A Data Warehouse
is typically used to connect and analyse heterogeneous
sources of business data. The data warehouse is the
centrepiece of the BI system built for data analysis and
reporting.
 It is a mixture of technologies and components which
helps to use data strategically. Instead of transaction
processing, it is the automated collection of a vast amount
of information by a company that is configured for
demand and review. It’s a process of transforming data
into information and making it available for users to make
a difference in a timely way.
SSR 04/12/2025
71
Characteristics of data warehousing

 1. Subject oriented
A data warehouse is subject-oriented, as it provides information on a topic rather than the
ongoing operations of organizations. Such issues may be inventory, promotion, storage, etc.
Never does a data warehouse concentrate on the current processes. Instead, it emphasized
modelling and analysing decision-making data. It also provides a simple and succinct
description of the particular subject by excluding details that would not be useful in helping
the decision process.
 2. Integrated
Integration in Data Warehouse means establishing a standard unit of measurement from the
different databases for all the similar data. The data must also get stored in a simple and
universally acceptable manner within the Data Warehouse. Through combining data from
various sources such as a mainframe, relational databases, flat files, etc., a data warehouse is
created. It must also keep the naming conventions, format, and coding consistent. Such an
application assists in robust data analysis. Consistency must be maintained in naming
conventions,
SSR measurements of characteristics, specification of encoding, etc. 04/12/2025
72

 3. Time-variant
Compared to operating systems, the time horizon for the data warehouse is quite
extensive. The data collected in a data warehouse is acknowledged over a given period
and provides historical information. It contains a temporal element, either explicitly or
implicitly.
One such location in the record key system where Data Warehouse data shows time
variation is. Each primary key contained with the DW should have an element of time
either implicitly or explicitly. Just like the day, the month of the week, etc.
 4. Non-volatile
Also, the data warehouse is non-volatile, meaning that prior data will not be erased when
new data are entered into it. Data is read-only, only updated regularly. It also assists in
analyzing historical data and in understanding what and when it happened. The
transaction process, recovery, and competitiveness control mechanisms are not required.
InSSRthe Data Warehouse environment, activities such as deleting, updating, and inserting
04/12/2025
that are performed in an operational application environment are omitted.
73
Analytical Techniques:

• Roll up: It is just opposite of the drill-


down operation. It performs
aggregation on the OLAP cube. It can
be done by: Climbing up in the concept
hierarchy
• Reducing the dimensions
In the cube given in the overview section,
the roll-up operation is performed by
climbing up in the concept hierarchy
of Location dimension (City -> Country).

SSR 04/12/2025
74

• Drill down: In drill-down operation, the less


detailed data is converted into highly detailed
data. It can be done by: Moving down in the
concept hierarchy
• Adding a new dimension
In the cube given in overview section, the drill down
operation is performed by moving down in the
concept hierarchy of Time dimension (Quarter ->
Month).

SSR 04/12/2025
75

Slice: It selects a single


dimension from the OLAP
cube which results in a new
sub-cube creation. In the
cube given in the overview
section, Slice is performed
on the dimension Time =
“Q1”.

SSR 04/12/2025
76

• Dice: It selects a sub-cube from


the OLAP cube by selecting two or
more dimensions. In the cube
given in the overview section, a
sub-cube is selected by selecting
following dimensions with criteria :
= “Delhi” or “Kolkata”
• Time = “Q1” or “Q2”
• Item = “Car” or “Bus”

SSR 04/12/2025
77

 Pivot: It is also known


as rotation operation as it
rotates the current view to
get a new view of the
representation. In the sub-
cube obtained after the
slice operation, performing
pivot operation gives a new
view of it.

SSR 04/12/2025
78
Data Marts

 Data mart is also a part of storage component. It stores the information


of a particular function of an organization which is handled by single
authority. There can be as many number of data marts in an
organization depending upon the functions. We can also say that data
mart contains subset of the data stored in Data warehouse.

SSR 04/12/2025
79
Data Lake

A data lake is a centralized repository designed to store, process, and secure


large amounts of structured, semi structured, and unstructured data. It can
store data in its native format and process any variety of it, ignoring size limits.
Some data lake use cases
Media and entertainment
A company offering streaming music, radio, and podcasts can increase revenue
by improving their recommendation system, so users consume their service
more, allowing the company to sell more ads.
Telecommunications
A multinational telecommunications company can save money by building
churn-propensity models that reduce customer churn.

SSR 04/12/2025
80
What is a Data Mining?

Data mining is the process of uncovering patterns and finding anomalies and relationships in
large datasets that can be used to make predictions about future trends. The main purpose of
data mining is extracting valuable information from available data.
Data mining is considered an interdisciplinary field that joins the techniques of computer
science and statistics. Note that the term “data mining” is a misnomer. It is primarily
concerned with discovering patterns and anomalies within datasets, but it is not related to the
extraction of the data itself.

Applications:
 Data mining offers many applications in business. For example, the establishment of proper
data (mining) processes can help a company to decrease its costs, increase revenues, or
derive insights from the behaviour and practices of its customers. Certainly, it plays a vital
role in the business decision-making process nowadays.
SSR 04/12/2025
81

 Data mining is also actively utilized in finance. For instance, relevant techniques allow users
to determine and assess the factors that influence the price fluctuations of financial
securities.
 The field is rapidly evolving. New data emerges at enormously fast speeds while
technological advancements allow for more efficient ways to solve existing problems. In
addition, developments in the areas of artificial intelligence and machine learning provide
new paths to precision and efficiency in the field.

 Examples Of Data Mining In Real Life:


 Mobile Service Providers.
 Retail Sector.
 Artificial Intelligence.
 Ecommerce.
SSR 04/12/2025
 Science And Engineering.
Data Backup: 82
Data loss can occur from a variety of causes, including computer viruses, hardware
failure, file corruption, fire, flood, or theft, etc. Data loss may involve critical
financial, customer, and company data, so a solid data backup plan is critical for
every organization.

Advantages of Data Backup:


 Freeing memory spaces which gives you room for more files to be stored.
 Your computer will improve performance by gaining speed.
 You can recover your files anytime and whenever you need it, which won't stop
your organization's operations and your job.

Disadvantages of Data Backup:


 Backup can take long hours and even days to complete depending on the size of
files you are transferring. Also, back up like this required high storage capacity
hardware.
 Cloud backup may jeopardize the integrity and confidentiality of your data. Also,
since cloud uses internet, without it, you cannot perform backup.
 Data can be duplicated and loss if you are careless in performing backups.
SSR 04/12/2025
83

SSR 04/12/2025
SSR 04/12/2025 84

You might also like