0% found this document useful (0 votes)
2 views71 pages

Unit v Dbms Updated

The document discusses implementation techniques and non-relational models in database management systems, focusing on data storage types, including primary, secondary, and tertiary memory, as well as RAID configurations. It explains the characteristics and functions of various memory types such as RAM and ROM, detailing their volatile and non-volatile properties. Additionally, the document introduces NoSQL databases and MongoDB, highlighting their advantages and data models.

Uploaded by

naveneetha27
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views71 pages

Unit v Dbms Updated

The document discusses implementation techniques and non-relational models in database management systems, focusing on data storage types, including primary, secondary, and tertiary memory, as well as RAID configurations. It explains the characteristics and functions of various memory types such as RAM and ROM, detailing their volatile and non-volatile properties. Additionally, the document introduces NoSQL databases and MongoDB, highlighting their advantages and data models.

Uploaded by

naveneetha27
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 71

UNIT V IMPLEMENTATION TECHNIQUES AND NON-RELATIONAL MODEL

Data on External Storage – RAID- File Organizations – Indexing and Hashing -Trees – B+ tree
andB- Tree index files. Hashing: Static – Dynamic. Query Processing and Query Optimization -
Introduction to NoSQL & MongoDB: Advantages, Architecture, Data Models MongoDB
Datatypes and CRUD Operations.

Data on external storage:


Data in a DBMS is stored on storage devices such as disks and tapes

 The disk space manager is responsible for keeping track of available disk space.

 The file manager, which provides the abstraction of a file of records to higher levels of

 DBMS code, issues requests to the disk space manager to obtain and relinquish space on disk.

Storage Types in DBMS

The records in databases are stored in file formats. Physically, the data is stored in
electromagnetic format on a device. The electromagnetic devices used in database systems
for data storage are classified as follows :

 Primary Memory: The primary memory of a server is the type of data storage that is
directly accessible by the central processing unit, meaning that it doesn’t require any
other devices to read from it. The primary memory must, in general, function flawlessly
with equal contributions from the electric power supply, the hardware backup system, the
supporting devices, the coolant that moderates the system temperature, etc. The size of
these devices is considerably smaller and they are volatile. According to performance and
speed, the primary memory devices are the fastest devices, and this feature is in direct
correlation with their capacity. These primary memory devices are usually more expensive
due to their increased speed and performance.
 Secondary Memory: Data storage devices known as secondary storage, as the name
suggests, are devices that can be accessed for storing data that will be needed at a later
point in time for various purposes or database actions. Therefore, these types of storage
systems are sometimes called backup units as well. Devices that are plugged or connected
externally fall under this memory category, unlike primary memory, which is part of the
CPU. The size of this group of devices is noticeably larger than the primary devices and
smaller than the tertiary devices. It is also regarded as a temporary storage system since it
can hold data when needed and delete it when the user is done with it. Compared to
primary storage devices as well as tertiary devices, these secondary storage devices are
slower with respect to actions and pace.

 It usually has a higher capacity than primary storage systems, but it changes with the
technological world, which is expanding every day. Secondary storage systems nowadays
consist of magnetic disks and optical discs such as DVD or CD, which were used in earlier
times. The constructive development of technology has brought about modern devices
that have made it easier for users to handle multiple devices. In addition to portable hard
disks and reusable flash drives, peripheral devices are equipped with USB ports so that
they can be used as secondary storage devices through plug-and-play.

 RAID is a network of redundant storage devices that will fill the shortcomings of one
device by connecting to another device in the chain. This machine uses processes such as
array data arrangement for structuring, mirroring, error correction codes, isolating disks
into multiple disks, etc. to ensure that data flows smoothly through. RAID levels range
from RAID 0, RAID 1, RAID 2, and so on. Based on the redundancy observed in the stored
data, these levels are planned and determined.

 Tertiary Memory: For data storage, Tertiary Memory refers to devices that can hold a
large amount of data without being constantly connected to the server or the peripherals.
A device of this type is connected either to a server or to a device where the database is
stored from the outside. Due to the fact that tertiary storage provides more space than
other types of device memory but is most slowly performing, the cost of tertiary storage is
lower than primary and secondary. As a means to make a backup of data, this type of
storage is commonly used for making copies from servers and databases. The ability to
use secondary devices and to delete the contents of the tertiary devices is similar.

Memory Hierarchy:

A computer system has a hierarchy of memory. Direct access to a CPU’s main memory and
inbuilt registers is available. Accessing the main memory takes less time than running a CPU.
Cache memory is introduced to minimize this difference in speed. Data that is most
frequently accessed by the CPU resides in cache memory, which provides the fastest access
time to data. Fastest-accessing memory is the most expensive. Although large storage devices
are slower and less expensive than CPU registers and cache memory, they can store a greater
amount of data.
Magnetic Disks: Present-day computer systems use hard disk drives as secondary storage
devices. Magnetic disks store information using the concept of magnetism. Metal disks are
coated with magnetizable material to create hard disks. Spindles hold these disks vertically.
As the read/write head moves between the disks, it de-magnetizes or magnetizes the spots
under it. There are two magnetized spots: 0 (zero) and 1 (one). Formatted hard disks store
data efficiently by storing them in a defined order. The hard disk plate is divided into many
concentric circles, called tracks. Each track contains a number of sectors. Data on a hard disk
is typically stored in sectors of 512 bytes.

Primary Memory :

Read Only Memory (ROM)

In a computer system, memory is a very essential part of the computer system and used to
store information for instant use or permanently. Based on computer memory working
features, memory is divided into two types.
 Volatile Memory (RAM)
 Non-volatile Memory (ROM)
Before understanding ROM, we will first understand what exactly volatile and non-volatile
memory is. Non-volatile memory is a type of computer memory that is used to retain stored
information during power is removed. It is less expensive than volatile memory. It has a large
storage capacity. ROM (read-only memory), flash memory are examples of non-volatile
memory. Whereas volatile memory is a temporary memory. In this memory, the data is stored
till the system is capable of, but once the power of the system is turned off the data within
the volatile memory is deleted automatically. RAM is an example of volatile memory.

What is Read-Only Memory (ROM)?

ROM stands for Read-Only Memory. It is a non-volatile memory that is used to stores
important information which is used to operate the system. As its name refers to read-only
memory, we can only read the programs and data stored on it. It is also a primary memory
unit of the computer system. It contains some electronic fuses that can be programmed for a
piece of specific information. The information stored in the ROM in binary format. It is also
known as permanent memory.
Features of ROM (Read-Only Memory):

 ROM is a non-volatile memory.


 Information stored in ROM is permanent.
 Information and programs stored on it, we can only read.
 Information and programs are stored on ROM in binary format.
 It is used in the start-up process of the computer.
Types of Read-Only Memory (ROM):
1. MROM (Masked read-only memory)
2. PROM (Programmable read-only memory)
3. EPROM (Erasable programmable read-only memory)
4. EEPROM (Electrically erasable programmable read-only memory)
Now we will discuss the types of ROM one by one:
1. MROM (Masked read-only memory): We know that ROM is as old as semiconductor
technology. MROM was the very first ROM that consists of a grid of word lines and bit lines
joined together transistor switches. This type of ROM data is physically encoded in the circuit
and only be programmed during fabrication. It was not so expensive.
2. PROM (Programmable read-only memory): PROM is a form of digital memory. In this type
of ROM, each bit is locked by a fuse or anti-fuse. The data stored in it are permanently stored
and can not be changed or erasable. It is used in low-level programs such as firmware or
microcode.
3. EPROM (Erasable programmable read-only memory): EPROM also called EROM, is a type
of PROM but it can be reprogrammed. The data stored in EPROM can be erased and
reprogrammed again by ultraviolet light. Reprogrammed of it is limited. Before the era of
EEPROM and flash memory, EPROM was used in microcontrollers.
4. EEPROM (Electrically erasable programmable read-only memory): As its name refers, it
can be programmed and erased electrically. The data and program of this ROM can be erased
and programmed about ten thousand times. The duration of erasing and programming of the
EEPROM is near about 4ms to 10ms. It is used in microcontrollers and remote keyless
systems.
Advantages of ROM:
 It is cheaper than RAM and it is non-volatile memory.
 It is more reliable as compared to RAM.
 Its circuit is simple as compared to RAM.
 It doesn’t need refreshing time because it is static.
 It is easy to test.
Disadvantages of ROM:
 It is a read-only memory, so it cannot be modified.
 It is slower as compared to RAM.

Difference between PROM and EPROM.

The data stored in PROM is permanently stored The EPROM can be reprogrammed and
and cannot be changed and erased. reusable multiple times.
PROM is not expensive compared to EPROM. EPROM is more expensive than PROM.

A bipolar transistor is used in PROM. A MOS transistor is used in EPROM.

PROM is more flexible than EPROM. EPROM is less flexible than PROM.

PROM is used in low-level programs such as


firmware or microcode. EPROM is used in microcontrollers.

Random Access Memory (RAM)


A computer is an electronic device, but it is very similar to a Human Brain. The human brain
has memory, and it is the most essential part played by the brain, memory helps in
remembering things, people remember their past due to the memory present in the brain,
similarly, computers have memory too.

Memory

In order to save data and instructions, memory is required. Memory is divided into cells, and
they are stored in the storage space present in the computer. Every cell has its unique
location/address. Memory is very essential for a computer as this is the way it becomes
somewhat more similar to a human brain.
In human brains, there are different ways of keeping a memory, like short-term memory,
long-term memory, implicit memory, etc. Likewise, in computers, there are different types of
memories or different ways of saving memories. They are Cache memory, Primary memory/
Main memory, Secondary memory.
Types of Memory

There are three types of memories. Cache memory is helpful in speeding up the CPU as it is a
high-speed memory, It consumes less time but is very expensive. The next type is the Main
memory or Primary memory which is used to store or hold the current data, it consists of
RAM and ROM, RAM is a volatile memory while ROM is non-volatile in nature. The third type
is Secondary memory, which is non-volatile in nature, it is used to store the data permanently
in a computer.
RAM (Random Access Memory)

It is one of the parts of the Main memory, also famously known as Read Write Memory.
Random Access memory is present on the motherboard and the computer’s data is
temporarily stored in RAM. As the name says, RAM can help in both Read and write. RAM is a
volatile memory, which means, it is present as long as the Computer is in ON state, as soon as
the computer turns OFF, the memory is erased.
In order to better understand RAM, imagine the blackboard of the classroom, the students
can both read and write and also erase the data written after the class is over, some new data
can be entered now.
Features of RAM:
 RAM is volatile in nature, which means, the data is lost when the device is switched off.
 RAM is known as the Primary memory of the computer.
 RAM is known to be expensive since the memory can be accessed directly.
 RAM is the fastest memory, therefore, it is an internal memory for the computer.
 The speed of computer depends on RAM, say if the computer has less RAM, it will take
more time to load and the computer slows down.

Types of RAM

RAM is further divided into two types, SRAM – Static Random Access Memory and DRAM-
Dynamic Random Access Memory. Let’s learn about both of these types in more detail.
SRAM (Static Random Access memory)
SRAM is used for Cache memory, it can hold the data as long as the power availability is
there. It is refreshed simultaneously to store the present information. It is made with CMOS
technology. It contains 4 to 6 transistors and it also uses clocks. It does not require a periodic
refresh cycle due to the presence of transistors. Although SRAM is faster, it requires more
power and is more expensive in nature. Since, SRAM requires more power, more heat is lost
here as well, another drawback of SRAM is that it can not store more bits per chip, for
instance, for the same amount of memory stored in DRAM, SRAM would require one more
chip.
Function of SRAM: The function of SRAM is that it provides a direct interface with the Central
Processing Unit at higher speeds.
Characteristics of SRAM:

1. SRAM is used as the Cache memory inside the computer.


2. SRAM is known to be the fastest among all the memories.
3. SRAM is costlier
4. SRAM has a lower density (number of memory cells per unit area)
5. The power consumption of SRAM is less but when it is operated at higher frequencies, the
power consumption of SRAM is compatible with DRAM.
DRAM (Dynamic Random Access memory)
DRAM is used for the Main memory, it has a different construction than SRAM, it used one
transistor and one capacitor (also known as a conductor), it is needed to get recharge in
milliseconds due to the presence of capacitor. Dynamic RAM was the first sold memory
integrated circuit. DRAM is the second most compact technology in production (First is Flash
Memory). DRAM has one transistor and one capacitor in 1 memory bit.
Although DRAM is slower, it can store more bits per chip, for instance, for the same amount
of memory stored in SRAM, DRAM requires one less chip. DRAM requires less power and
hence, less heat is produced.
Function of DRAM: The function of DRAM is that it is used for program code by a computer
processor in order to function. It is used in our PCs (Personal Computers).
Characteristics of DRAM:
1. DRAM is used as the Main Memory inside the computer.
2. DRAM is known to be a fast memory but not as fast as SRAM.
3. DRAM is cheaper as compared to SRAM
4. DRAM has higher density (number of memory cells per unit area)
5. The power consumption by DRAM is more.

Secondary Memory

In a computer, memory refers to the physical devices that are used to store programs or data
on a temporary or permanent basis. It is a group of registers. Memory are of two types (i)
primary memory, (ii) secondary memory. Primary memory is made up of semiconductors, It is
also divided into two types, Read-Only Memory (ROM) and Random Access Memory (RAM).
Secondary memory is a physical device for the permanent storage of programs and data(Hard
disk, Compact disc, Flash drive, etc.).

Primary Memory

Primary memory is made up of semiconductors and it is the main memory of the computer
system. It is generally used to store data or information on which the computer is currently
working, so we can say that it is used to store data temporarily. Data or information is lost
when the systems are off. It is also divided into two types:
(i). Read-Only Memory (ROM)
ii). Random Access Memory (RAM).
1. Random Access Memory: Primary memory is also called internal memory. This is the main
area in a computer where data, instructions, and information are stored. Any storage location
in this memory can be directly accessed by the Central Processing Unit. As the CPU can
randomly access any storage location in this memory, it is also called Random Access Memory
or RAM. The CPU can access data from RAM as long as the computer is switched on. As soon
as the power to the computer is switched off, the stored data and instructions disappear from
RAM. Such type of memory is known as volatile memory. RAM is also called read/write
memory.
2. Read-Only Memory: Read-Only Memory (ROM) is a type of primary memory from which
information can only be read. So it is also known as Read-Only Memory. ROM can be directly
accessed by the Central Processing Unit. But, the data and instructions stored in ROM are
retained even when the computer is switched off OR we can say it holds the data after being
switched off. Such type of memory is known as non-volatile memory.

Secondary Memory

We have read so far, that primary memory is volatile and has limited capacity. So, it is
important to have another form of memory that has a larger storage capacity and from which
data and programs are not lost when the computer is turned off. Such a type of memory is
called secondary memory. In secondary memory, programs and data are stored. It is also
called auxiliary memory. It is different from primary memory as it is not directly accessible
through the CPU and is non-volatile. Secondary or external storage devices have a much
larger storage capacity and the cost of secondary memory is less as compared to primary
memory.

Use of Secondary memory

Secondary memory is used for different purposes but the main purposes of using secondary
memory are:
 Permanent storage: As we know that primary memory stores data only when the power
supply is on, it loses data when the power is off. So we need a secondary memory to
stores data permanently even if the power supply is off.
 Large Storage: Secondary memory provides large storage space so that we can store large
data like videos, images, audios, files, etc permanently.
 Portable: Some secondary devices are removable. So, we can easily store or transfer data
from one computer or device to another.

Types of Secondary memory

Secondary memory is of two types:


1. Fixed storage
In secondary memory, a fixed storage is an internal media device that is used to store data in
a computer system. Fixed storage is generally known as fixed disk drives or hard drives.
Generally, the data of the computer system is stored in a built-in fixed storage device. Fixed
storage does not mean that you can not remove them from the computer system, you can
remove the fixed storage device for repairing, for the upgrade, or for maintenance, etc. with
the help of an expert or engineer.
Types of fixed storage:
Following are the types of fixed storage:
 Internal flash memory (rare)
 SSD (solid-state disk)
 Hard disk drives (HDD)
2. Removable storage
In secondary memory, removable storage is an external media device that is used to store
data in a computer system. Removable storage is generally known as disks drives or external
drives. It is a storage device that can be inserted or removed from the computer according to
our requirements. We can easily remove them from the computer system while the computer
system is running. Removable storage devices are portable so we can easily transfer data
from one computer to another. Also, removable storage devices provide the fast data transfer
rates associated with storage area networks (SANs).
Types of Removable Storage:
 Optical discs (like CDs, DVDs, Blu-ray discs, etc.)
 Memory cards
 Floppy disks
 Magnetic tapes
 Disk packs
 Paper storage (like punched tapes, punched cards, etc.)

Secondary memory devices

Following are the commonly used secondary memory devices are:


1. Floppy Disk: A floppy disk consists of a magnetic disc in a square plastic case. It is used to
store data and to transfer data from one device to another device. Floppy disks are available
in two sizes (a) Size: 3.5 inches, the Storage capacity of 1.44 MB (b) Size: 5.25 inches, the
Storage capacity of 1.2 MB. To use a floppy disk, our computer needs to have a floppy disk
drive. This storage device becomes obsolete now and has been replaced by CDs, DVDs, and
flash drives.
2. Compact Disc: A Compact Disc (CD) is a commonly used secondary storage device. It
contains tracks and sectors on its surface. Its shape is circular and is made up of
polycarbonate plastic. The storage capacity of CD is up to 700 MB of data. A CD may also be
called a CD-ROM (Compact Disc Read-Only Memory), in this computers can read the data
present in a CD-ROM, but cannot write new data onto it. For a CD-ROM, we require a CD-
ROM. CD is of two types:
 CD-R (compact disc recordable): Once the data has been written onto it cannot be erased,
it can only be read.
 CD-RW (compact disc rewritable): It is a special type of CD in which data can be erased
and rewritten as many times as we want. It is also called an erasable CD.
3. Digital Versatile Disc: A Digital Versatile Disc also known as DVD it is looks just like a CD,
but the storage capacity is greater compared to CD, it stores up to 4.7 GB of data. DVD-ROM
drive is needed to use DVD on a computer. The video files, like movies or video recordings,
etc., are generally stored on DVD and you can run DVD using the DVD player. DVD is of three
types:
 DVD-ROM(Digital Versatile Disc Readonly): In DVD-ROM the manufacturer writes the
data in it and the user can only read that data, cannot write new data in it. For example
movie DVD, movie DVD is already written by the manufacturer we can only watch the
movie but we cannot write new data into it.
 DVD-R(Digital Versatile Disc Recordable): In DVD-R you can write the data but only one
time. Once the data has been written onto it cannot be erased, it can only be read.
 DVD-RW(Digital Versatile Disc Rewritable and Erasable): It is a special type of DVD in
which data can be erased and rewritten as many times as we want. It is also called an
erasable DVD.

Blu-ray Disc: A Blu-ray disc looks just like a CD or a DVD but it can store data or information
up to 25 GB data. If you want to use a Blu-ray disc, you need a Blu-ray reader. The name Blu-
ray is derived from the technology that is used to read the disc ‘Blu’ from the blue-violet laser
and ‘ray’ from an optical ray.

5. Hard Disk: A hard disk is a part of a unit called a hard disk drive. It is used to storing a large
amount of data. Hard disks or hard disk drives come in different storage capacities.(like 256
GB, 500 GB, 1 TB, and 2 TB, etc.). It is created using the collection of discs known as platters.
The platters are placed one below the other. They are coated with magnetic material. Each
platter consists of a number of invisible circles and each circle having the same centre called
tracks. Hard disk is of two types (i) Internal hard disk (ii) External hard disk.

6. Flash Drive: A flash drive or pen drive comes in various storage capacities, such as 1 GB, 2
GB, 4 GB, 8 GB, 16 GB, 32 GB, 64 GB, up to 1 TB. A flash drive is used to transfer and store
data. To use a flash drive, we need to plug it into a USB port on a computer. As a flash drive is
easy to use and compact in size, Nowadays it is very popular.

7. Solid-state disk: It is also known as SDD. It is a non-volatile storage device that is used to
store and access data. It is faster, does noiseless operations(because it does not contain any
moving parts like the hard disk), consumes less power, etc. It is a great replacement for
standard hard drives in computers and laptops if the price is low and it is also suitable for
tablets, notebooks, etc because they do not require large storage.

8. SD Card: It is known as a Secure Digital Card. It is generally used in portable devices like
mobile phones, cameras, etc., to store data. It is available in different sizes like 1 GB, 2 GB, 4
GB, 8 GB, 16 GB, 32 GB, 64 GB, etc. To view the data stored in the SD card you can remove
them from the device and insert them into a computer with help of a card reader. The data
stores in the SD card is stored in memory chips(present in the SD Card) and it does not
contain any moving parts like the hard disk.

RAID (Redundant Arrays of Independent Disks)


RAID is a technique that makes use of a combination of multiple disks instead of using a single
disk for increased performance, data redundancy, or both. The term was coined by David
Patterson, Garth A. Gibson, and Randy Katz at the University of California, Berkeley in 1987.
Why Data Redundancy?
Data redundancy, although taking up extra space, adds to disk reliability. This means, in case of
disk failure, if the same data is also backed up onto another disk, we can retrieve the data and
go on with the operation. On the other hand, if the data is spread across just multiple disks
without the RAID technique, the loss of a single disk can affect the entire data.
Key Evaluation Points for a RAID System
 Reliability: How many disk faults can the system tolerate?
 Availability: What fraction of the total session time is a system in uptime mode, i.e. how
available is the system for actual use?
 Performance: How good is the response time? How high is the throughput (rate of
processing work)? Note that performance contains a lot of parameters and not just the
two.
 Capacity: Given a set of N disks each with B blocks, how much useful capacity is available to
the user?
RAID is very transparent to the underlying system. This means, to the host system, it appears as
a single big disk presenting itself as a linear array of blocks. This allows older technologies to be
replaced by RAID without making too many changes to the existing code.
Different RAID Levels
1. RAID-0 (Stripping)
2. RAID-1 (Mirroring)
3. RAID-2 (Bit-Level Stripping with Dedicated Parity)
4. RAID-3 (Byte-Level Stripping with Dedicated Parity)
5. RAID-4 (Block-Level Stripping with Dedicated Parity)
6. RAID-5 (Block-Level Stripping with Distributed Parity)
7. RAID-6 (Block-Level Stripping with two Parity Bits)
Raid Controller

1. RAID-0 (Stripping)
 Blocks are “stripped” across disks.

RAID-0

 In the figure, blocks “0,1,2,3” form a stripe.


 Instead of placing just one block into a disk at a time, we can work with two (or more)
blocks placed into a disk before moving on to the next one.

Raid-0

Evaluation
 Reliability: 0
There is no duplication of data. Hence, a block once lost cannot be recovered.
 Capacity: N*B
The entire space is being used to store data. Since there is no duplication, N disks each
having B blocks are fully utilized.
Advantages
1. It is easy to implement.
2. It utilizes the storage capacity in a better way.
Disadvantages
1. A single drive loss can result in the complete failure of the system.
2. Not a good choice for a critical system.
2. RAID-1 (Mirroring)
 More than one copy of each block is stored in a separate disk. Thus, every block has two (or
more) copies, lying on different disks.
Raid-1

 The above figure shows a RAID-1 system with mirroring level 2.


 RAID 0 was unable to tolerate any disk failure. But RAID 1 is capable of reliability.
Evaluation
Assume a RAID system with mirroring level 2.
 Reliability: 1 to N/2
1 disk failure can be handled for certain because blocks of that disk would have duplicates
on some other disk. If we are lucky enough and disks 0 and 2 fail, then again this can be
handled as the blocks of these disks have duplicates on disks 1 and 3. So, in the best case,
N/2 disk failures can be handled.
 Capacity: N*B/2
Only half the space is being used to store data. The other half is just a mirror of the already
stored data.
Advantages
1. It covers complete redundancy.
2. It can increase data security and speed.
Disadvantages
1. It is highly expensive.
2. Storage capacity is less.
3. RAID-2 (Bit-Level Stripping with Dedicated Parity)
 In Raid-2, the error of the data is checked at every bit level. Here, we use Hamming Code
Parity Method to find the error in the data.
 It uses one designated drive to store parity.
 The structure of Raid-2 is very complex as we use two disks in this technique. One word is
used to store bits of each word and another word is used to store error code correction.
 It is not commonly used.
Advantages
1. In case of Error Correction, it uses hamming code.
2. It Uses one designated drive to store parity.
Disadvantages
1. It has a complex structure and high cost due to extra drive.
2. It requires an extra drive for error detection.
4. RAID-3 (Byte-Level Stripping with Dedicated Parity)
 It consists of byte-level striping with dedicated parity striping.
 At this level, we store parity information in a disc section and write to a dedicated parity
drive.
 Whenever failure of the drive occurs, it helps in accessing the parity drive, through which
we can reconstruct the data.

Raid-3

 Here Disk 3 contains the Parity bits for Disk 0, Disk 1, and Disk 2. If data loss occurs, we can
construct it with Disk 3.
Advantages
1. Data can be transferred in bulk.
2. Data can be accessed in parallel.
Disadvantages
1. It requires an additional drive for parity.
2. In the case of small-size files, it performs slowly.
5. RAID-4 (Block-Level Stripping with Dedicated Parity)
 Instead of duplicating data, this adopts a parity-based approach.

Raid-4

 In the figure, we can observe one column (disk) dedicated to parity.


 Parity is calculated using a simple XOR function. If the data bits are 0,0,0,1 the parity bit is
XOR(0,0,0,1) = 1. If the data bits are 0,1,1,0 the parity bit is XOR(0,1,1,0) = 0. A simple
approach is that an even number of ones results in parity 0, and an odd number of ones
results in parity 1.

Raid-4

 Assume that in the above figure, C3 is lost due to some disk failure. Then, we can
recompute the data bit stored in C3 by looking at the values of all the other columns and
the parity bit. This allows us to recover lost data.
Evaluation
 Reliability: 1
RAID-4 allows recovery of at most 1 disk failure (because of the way parity works). If more
than one disk fails, there is no way to recover the data.
 Capacity: (N-1)*B
One disk in the system is reserved for storing the parity. Hence, (N-1) disks are made
available for data storage, each disk having B blocks.
Advantages
1. It helps in reconstructing the data if at most one data is lost.
Disadvantages
1. It can’t help in reconstructing when more than one data is lost.
6. RAID-5 (Block-Level Stripping with Distributed Parity)
 This is a slight modification of the RAID-4 system where the only difference is that the parity
rotates among the drives.

Raid-5

 In the figure, we can notice how the parity bit “rotates”.


 This was introduced to make the random write performance better.
Evaluation
 Reliability: 1
RAID-5 allows recovery of at most 1 disk failure (because of the way parity works). If more
than one disk fails, there is no way to recover the data. This is identical to RAID-4.
 Capacity: (N-1)*B
Overall, space equivalent to one disk is utilized in storing the parity. Hence, (N-1) disks are
made available for data storage, each disk having B blocks.
Advantages
1. Data can be reconstructed using parity bits.
2. It makes the performance better.
Disadvantages
1. Its technology is complex and extra space is required.
2. If both discs get damaged, data will be lost forever.
7. RAID-6 (Block-Level Stripping with two Parity Bits)
 Raid-6 helps when there is more than one disk failure. A pair of independent parities are
generated and stored on multiple disks at this level. Ideally, you need four disk drives for
this level.
 There are also hybrid RAIDs, which make use of more than one RAID level nested one after
the other, to fulfill specific requirements.

Raid-6

Advantages
1. Very high data Accessibility.
2. Fast read data transactions.
Disadvantages
1. Due to double parity, it has slow write data transactions.
2. Extra space is required.
Advantages of RAID
1. Increased data reliability: RAID provides redundancy, which means that if one disk fails, the
data can be recovered from the remaining disks in the array. This makes RAID a reliable
storage solution for critical data.
2. Improved performance: RAID can improve performance by spreading data across multiple
disks. This allows multiple read/write operations to co-occur, which can speed up data
access.
3. Scalability: RAID can be scaled by adding more disks to the array. This means that storage
capacity can be increased without having to replace the entire storage system.
4. Cost-effective: Some RAID configurations, such as RAID 0, can be implemented with low-
cost hardware. This makes RAID a cost-effective solution for small businesses or home
users.
Disadvantages of RAID
1. Cost: Some RAID configurations, such as RAID 5 or RAID 6, can be expensive to implement.
This is because they require additional hardware or software to provide redundancy.
2. Performance limitations: Some RAID configurations, such as RAID 1 or RAID 5, can have
performance limitations. For example, RAID 1 can only read data as fast as a single drive,
while RAID 5 can have slower write speeds due to the parity calculations required.
3. Complexity: RAID can be complex to set up and maintain. This is especially true for more
advanced configurations, such as RAID 5 or RAID 6.
4. Increased risk of data loss: While RAID provides redundancy, it is not a substitute for proper
backups. If multiple drives fail simultaneously, data loss can still occur.

File Organization

o The File is a collection of records. Using the primary key, we can access the records. The
type and frequency of access can be determined by the type of file organization which
was used for a given set of records.
o File organization is a logical relationship among various records. This method defines
how file records are mapped onto disk blocks.
o File organization is used to describe the way in which the records are stored in terms of
blocks, and the blocks are placed on the storage medium.
o The first approach to map the database to the file is to use the several files and store
only one fixed length record in any given file. An alternative approach is to structure our
files so that we can contain multiple lengths for records.
o Files of fixed length records are easier to implement than the files of variable length
records.

Objective of file organization


o It contains an optimal selection of records, i.e., records can be selected as fast as
possible.
o To perform insert, delete or update transaction on the records should be quick and
easy.
o The duplicate records cannot be induced as a result of insert, update or delete.
o For the minimal cost of storage, records should be stored efficiently.
Types of file organization:

File organization contains various methods. These particular methods have pros and cons on
the basis of access or selection. In the file organization, the programmer decides the best-suited
file organization method according to his requirement.

Types of file organization are as follows:

Sequential File Organization

This method is the easiest method for file organization. In this method, files are stored
sequentially. This method can be implemented in two ways:

1. Pile File Method:


o It is a quite simple method. In this method, we store the record in a sequence, i.e., one
after another. Here, the record will be inserted in the order in which they are inserted
into tables.
o In case of updating or deleting of any record, the record will be searched in the memory
blocks. When it is found, then it will be marked for deleting, and the new record is
inserted.
Insertion of the new record:

Suppose we have four records R1, R3 and so on upto R9 and R8 in a sequence. Hence, records
are nothing but a row in the table. Suppose we want to insert a new record R2 in the sequence,
then it will be placed at the end of the file. Here, records are nothing but a row in any table.

2. Sorted File Method:


o In this method, the new record is always inserted at the file's end, and then it will sort
the sequence in ascending or descending order. Sorting of records is based on any
primary key or any other key.
o In the case of modification of any record, it will update the record and then sort the file,
and lastly, the updated record is placed in the right place.

Insertion of the new record:

Suppose there is a preexisting sorted sequence of four records R1, R3 and so on upto R6 and
R7. Suppose a new record R2 has to be inserted in the sequence, then it will be inserted at the
end of the file, and then it will sort the sequence.
Pros of sequential file organization
o It contains a fast and efficient method for the huge amount of data.
o In this method, files can be easily stored in cheaper storage mechanism like magnetic
tapes.
o It is simple in design. It requires no much effort to store the data.
o This method is used when most of the records have to be accessed like grade calculation
of a student, generating the salary slip, etc.
o This method is used for report generation or statistical calculations.

Cons of sequential file organization


o It will waste time as we cannot jump on a particular record that is required but we have
to move sequentially which takes our time.
o Sorted file method takes more time and space for sorting the records.

Heap file organization


o It is the simplest and most basic type of organization. It works with data blocks. In heap
file organization, the records are inserted at the file's end. When the records are
inserted, it doesn't require the sorting and ordering of records.
o When the data block is full, the new record is stored in some other block. This new data
block need not to be the very next data block, but it can select any data block in the
memory to store new records. The heap file is also known as an unordered file.
o In the file, every record has a unique id, and every page in a file is of the same size. It is
the DBMS responsibility to store and manage the new records.

Insertion of a new record

Suppose we have five records R1, R3, R6, R4 and R5 in a heap and suppose we want to insert a
new record R2 in a heap. If the data block 3 is full then it will be inserted in any of the database
selected by the DBMS, let's say data block 1.
If we want to search, update or delete the data in heap file organization, then we need to
traverse the data from staring of the file till we get the requested record.

If the database is very large then searching, updating or deleting of record will be time-
consuming because there is no sorting or ordering of records. In the heap file organization, we
need to check all the data until we get the requested record.

Pros of Heap file organization


o It is a very good method of file organization for bulk insertion. If there is a large number
of data which needs to load into the database at a time, then this method is best suited.
o In case of a small database, fetching and retrieving of records is faster than the
sequential record.

Cons of Heap file organization


o This method is inefficient for the large database because it takes time to search or
modify the record.
o

o This method is inefficient for large databases.


Hash File Organization

Hash File Organization uses the computation of hash function on some fields of the records.
The hash function's output determines the location of disk block where the records are to be
placed.

When a record has to be received using the hash key columns, then the address is generated,
and the whole record is retrieved using that address. In the same way, when a new record has
to be inserted, then the address is generated using the hash key and record is directly inserted.
The same process is applied in the case of delete and update.

In this method, there is no effort for searching and sorting the entire file. In this method, each
record will be stored randomly in the memory.
Indexed sequential access method (ISAM)

ISAM method is an advanced sequential file organization. In this method, records are stored in
the file using the primary key. An index value is generated for each primary key and mapped
with the record. This index contains the address of the record in the file.
If any record has to be retrieved based on its index value, then the address of the data block is
fetched and the record is retrieved from the memory.

Pros of ISAM:
o In this method, each record has the address of its data block, searching a record in a
huge database is quick and easy.
o This method supports range retrieval and partial retrieval of records. Since the index is
based on the primary key values, we can retrieve the data for the given range of value.
In the same way, the partial value can also be easily searched, i.e., the student name
starting with 'JA' can be easily searched.

Cons of ISAM
o This method requires extra space in the disk to store the index value.
o When the new records are inserted, then these files have to be reconstructed to
maintain the sequence.
o When the record is deleted, then the space used by it needs to be released. Otherwise,
the performance of the database will slow down.
Redundant Array of Independent Disks
 RAID or Redundant Array of Independent Disks, is a technology to
connect multiple secondary storage devices and use them as a single
storage media.
 RAID consists of an array of disks in which multiple disks are
connected together to achieve different goals. RAID levels define the
use of disk arrays.
Redundant Array of Independent Disks
 RAID or Redundant Array of Independent Disks, is a technology to
connect multiple secondary storage devices and use them as a single
storage media.
 RAID consists of an array of disks in which multiple disks are
connected together to achieve different goals. RAID levels define the
use of disk arrays.
Indexing in DBMS

o Indexing is used to optimize the performance of a database by minimizing the number


of disk accesses required when a query is processed.
o The index is a type of data structure. It is used to locate and access the data in a
database table quickly.

Index structure:

Indexes can be created using some database columns.

o The first column of the database is the search key that contains a copy of the primary
key or candidate key of the table. The values of the primary key are stored in sorted
order so that the corresponding data can be accessed easily.
o The second column of the database is the data reference. It contains a set of pointers
holding the address of the disk block where the value of the particular key can be found.
Indexing Methods

Ordered indices

The indices are usually sorted to make searching faster. The indices which are sorted are known
as ordered indices.

Example: Suppose we have an employee table with thousands of record and each of which is
10 bytes long. If their IDs start with 1, 2, 3....and so on and we have to search student with ID-
543.

o In the case of a database with no index, we have to search the disk block from starting
till it reaches 543. The DBMS will read the record after reading 543*10=5430 bytes.
o In the case of an index, we will search using indexes and the DBMS will read the record
after reading 542*2= 1084 bytes which are very less compared to the previous case.

Primary Index
o If the index is created on the basis of the primary key of the table, then it is known as
primary indexing. These primary keys are unique to each record and contain 1:1 relation
between the records.
o As primary keys are stored in sorted order, the performance of the searching operation
is quite efficient.
o The primary index can be classified into two types: Dense index and Sparse index.

Dense index
o The dense index contains an index record for every search key value in the data file. It
makes searching faster.
o In this, the number of records in the index table is same as the number of records in the
main table.
o It needs more space to store index record itself. The index records have the search key
and a pointer to the actual record on the disk.

Sparse index
o In the data file, index record appears only for a few items. Each item points to a block.
o In this, instead of pointing to each record in the main table, the index points to the
records in the main table in a gap.

Clustering Index
o A clustered index can be defined as an ordered data file. Sometimes the index is created
on non-primary key columns which may not be unique for each record.
o In this case, to identify the record faster, we will group two or more columns to get the
unique value and create index out of them. This method is called a clustering index.
o The records which have similar characteristics are grouped, and indexes are created for
these group.

Example: suppose a company contains several employees in each department. Suppose we use
a clustering index, where all employees which belong to the same Dept_ID are considered
within a single cluster, and index pointers point to the cluster as a whole. Here Dept_Id is a non-
unique key.

The previous schema is little confusing because one disk block is shared by records which
belong to the different cluster. If we use separate disk block for separate clusters, then it is
called better technique.
Secondary Index

In the sparse indexing, as the size of the table grows, the size of mapping also grows. These
mappings are usually kept in the primary memory so that address fetch should be faster. Then
the secondary memory searches the actual data based on the address got from mapping. If the
mapping size grows then fetching the address itself becomes slower. In this case, the sparse
index will not be efficient. To overcome this problem, secondary indexing is introduced.

In secondary indexing, to reduce the size of mapping, another level of indexing is introduced. In
this method, the huge range for the columns is selected initially so that the mapping size of the
first level becomes small. Then each range is further divided into smaller ranges. The mapping
of the first level is stored in the primary memory, so that address fetch is faster. The mapping of
the second level and actual data are stored in the secondary memory (hard disk).
For example:

o If you want to find the record of roll 111 in the diagram, then it will search the highest
entry which is smaller than or equal to 111 in the first level index. It will get 100 at this
level.
o Then in the second index level, again it does max (111) <= 111 and gets 110. Now using
the address 110, it goes to the data block and starts searching each record till it gets
111.
o This is how a search is performed in this method. Inserting, updating or deleting is also
done in the same manner.

B+ Tree

o The B+ tree is a balanced binary search tree. It follows a multi-level index format.
o In the B+ tree, leaf nodes denote actual data pointers. B+ tree ensures that all leaf
nodes remain at the same height.
o In the B+ tree, the leaf nodes are linked using a link list. Therefore, a B+ tree can support
random access as well as sequential access.

Structure of B+ Tree
o In the B+ tree, every leaf node is at equal distance from the root node. The B+ tree is of
the order n where n is fixed for every B+ tree.
o It contains an internal node and leaf node.

Internal node
o An internal node of the B+ tree can contain at least n/2 record pointers except the root
node.
o At most, an internal node of the tree contains n pointers.

Leaf node
o The leaf node of the B+ tree can contain at least n/2 record pointers and n/2 key values.
o At most, a leaf node contains n record pointer and n key values.
o Every leaf node of the B+ tree contains one block pointer P to point to next leaf node.

Searching a record in B+ Tree

Suppose we have to search 55 in the below B+ tree structure. First, we will fetch for the
intermediary node which will direct to the leaf node that can contain a record for 55.

So, in the intermediary node, we will find a branch between 50 and 75 nodes. Then at the end,
we will be redirected to the third leaf node. Here DBMS will perform a sequential search to find
55.
B+ Tree Insertion

Suppose we want to insert a record 60 in the below structure. It will go to the 3rd leaf node
after 55. It is a balanced tree, and a leaf node of this tree is already full, so we cannot insert 60
there.

In this case, we have to split the leaf node, so that it can be inserted into tree without affecting
the fill factor, balance and order.

The 3rd leaf node has the values (50, 55, 60, 65, 70) and its current root node is 50. We will split
the leaf node of the tree in the middle so that its balance is not altered. So we can group (50,
55) and (60, 65, 70) into 2 leaf nodes.

If these two has to be leaf nodes, the intermediate node cannot branch from 50. It should have
60 added to it, and then we can have pointers to a new leaf node.
This is how we can insert an entry when there is overflow. In a normal scenario, it is very easy
to find the node where it fits and then place it in that leaf node.

B+ Tree Deletion

Suppose we want to delete 60 from the above example. In this case, we have to remove 60
from the intermediate node as well as from the 4th leaf node too. If we remove it from the
intermediate node, then the tree will not satisfy the rule of the B+ tree. So we need to modify it
to have a balanced tree.

After deleting node 60 from above B+ tree and re-arranging the nodes, it will show as follows:

Hashing
In a huge database structure, it is very inefficient to search all the index values and reach the
desired data. Hashing technique is used to calculate the direct location of a data record on the
disk without using index structure.

In this technique, data is stored at the data blocks whose address is generated by using the
hashing function. The memory location where these records are stored is known as data bucket
or data blocks.

In this, a hash function can choose any of the column value to generate the address. Most of
the time, the hash function uses the primary key to generate the address of the data block. A
hash function is a simple mathematical function to any complex mathematical function. We can
even consider the primary key itself as the address of the data block. That means each row
whose address will be the same as a primary key stored in the data block.

The above diagram shows data block addresses same as primary key value. This hash function
can also be a simple mathematical function like exponential, mod, cos, sin, etc. Suppose we
have mod (5) hash function to determine the address of the data block. In this case, it applies
mod (5) hash function on the primary keys and generates 3, 3, 1, 4 and 2 respectively, and
records are stored in those data block addresses.
Types of Hashing:

Static Hashing

In static hashing, the resultant data bucket address will always be the same. That means if we
generate an address for EMP_ID =103 using the hash function mod (5) then it will always result
in same bucket address 3. Here, there will be no change in the bucket address.

Hence in this static hashing, the number of data buckets in memory remains constant
throughout. In this example, we will have five data buckets in the memory used to store the
data.
Operations of Static Hashing
o Searching a record

When a record needs to be searched, then the same hash function retrieves the address of the
bucket where the data is stored.

o Insert a Record

When a new record is inserted into the table, then we will generate an address for a new
record based on the hash key and record is stored in that location.

o Delete a Record

To delete a record, we will first fetch the record which is supposed to be deleted. Then we will
delete the records for that address in memory.

o Update a Record

To update a record, we will first search it using a hash function, and then the data record is
updated.
If we want to insert some new record into the file but the address of a data bucket generated
by the hash function is not empty, or data already exists in that address. This situation in the
static hashing is known as bucket overflow. This is a critical situation in this method.

To overcome this situation, there are various methods. Some commonly used methods are as
follows:

1. Open Hashing

When a hash function generates an address at which data is already stored, then the next
bucket will be allocated to it. This mechanism is called as Linear Probing.

For example: suppose R3 is a new address which needs to be inserted, the hash function
generates address as 112 for R3. But the generated address is already full. So the system
searches next available data bucket, 113 and assigns R3 to it.

2. Close Hashing

When buckets are full, then a new data bucket is allocated for the same hash result and is
linked after the previous one. This mechanism is known as Overflow chaining.

For example: Suppose R3 is a new address which needs to be inserted into the table, the hash
function generates address as 110 for it. But this bucket is full to store the new data. In this
case, a new bucket is inserted at the end of 110 buckets and is linked to it.
Dynamic Hashing

o The dynamic hashing method is used to overcome the problems of static hashing like
bucket overflow.
o In this method, data buckets grow or shrink as the records increases or decreases. This
method is also known as Extendable hashing method.
o This method makes hashing dynamic, i.e., it allows insertion or deletion without
resulting in poor performance.

How to search a key


o First, calculate the hash address of the key.
o Check how many bits are used in the directory, and these bits are called as i.
o Take the least significant i bits of the hash address. This gives an index of the directory.
o Now using the index, go to the directory and find bucket address where the record
might be.

How to insert a new record


o Firstly, you have to follow the same procedure for retrieval, ending up in some bucket.
o If there is still space in that bucket, then place the record in it.
o If the bucket is full, then we will split the bucket and redistribute the records.
For example:

Consider the following grouping of keys into buckets, depending on the prefix of their hash
address:

The last two bits of 2 and 4 are 00. So it will go into bucket B0. The last two bits of 5 and 6 are
01, so it will go into bucket B1. The last two bits of 1 and 3 are 10, so it will go into bucket B2.
The last two bits of 7 are 11, so it will go into B3.

Insert key 9 with hash address 10001 into the above structure:
o Since key 9 has hash address 10001, it must go into the first bucket. But bucket B1 is full,
so it will get split.
o The splitting will separate 5, 9 from 6 since last three bits of 5, 9 are 001, so it will go
into bucket B1, and the last three bits of 6 are 101, so it will go into bucket B5.
o Keys 2 and 4 are still in B0. The record in B0 pointed by the 000 and 100 entry because
last two bits of both the entry are 00.
o Keys 1 and 3 are still in B2. The record in B2 pointed by the 010 and 110 entry because
last two bits of both the entry are 10.
o Key 7 are still in B3. The record in B3 pointed by the 111 and 011 entry because last two
bits of both the entry are 11.

Advantages of dynamic hashing


o In this method, the performance does not decrease as the data grows in the system. It
simply increases the size of memory to accommodate the data.
o In this method, memory is well utilized as it grows and shrinks with the data. There will
not be any unused memory lying.
o This method is good for the dynamic database where data grows and shrinks frequently.

Disadvantages of dynamic hashing


o In this method, if the data size increases then the bucket size is also increased. These
addresses of data will be maintained in the bucket address table. This is because the
data address will keep changing as buckets grow and shrink. If there is a huge increase in
data, maintaining the bucket address table becomes tedious.
o In this case, the bucket overflow situation will also occur. But it might take little time to
reach this situation than static hashing.

Query Processing in DBMS


Query Processing is the activity performed in extracting data from the database. In query
processing, it takes various steps for fetching the data from the database. The steps involved
are:

1. Parsing and translation


2. Optimization
3. Evaluation

The query processing works in the following way:

Parsing and Translation

As query processing includes certain activities for data retrieval. Initially, the given user queries
get translated in high-level database languages such as SQL. It gets translated into expressions
that can be further used at the physical level of the file system. After this, the actual evaluation
of the queries and a variety of query -optimizing transformations and takes place. Thus before
processing a query, a computer system needs to translate the query into a human-readable and
understandable language. Consequently, SQL or Structured Query Language is the best suitable
choice for humans. But, it is not perfectly suitable for the internal representation of the query
to the system. Relational algebra is well suited for the internal representation of a query. The
translation process in query processing is similar to the parser of a query. When a user executes
any query, for generating the internal form of the query, the parser in the system checks the
syntax of the query, verifies the name of the relation in the database, the tuple, and finally the
required attribute value. The parser creates a tree of the query, known as 'parse-tree.' Further,
translate it into the form of relational algebra. With this, it evenly replaces all the use of the
views when used in the query.

Thus, we can understand the working of a query processing in the below-described diagram:
Suppose a user executes a query. As we have learned that there are various methods of
extracting the data from the database. In SQL, a user wants to fetch the records of the
employees whose salary is greater than or equal to 10000. For doing this, the following query is
undertaken:

select emp_name from Employee where salary>10000;

Thus, to make the system understand the user query, it needs to be translated in the form of
relational algebra. We can bring this query in the relational algebra form as:

o σsalary>10000 (πsalary (Employee))


o πsalary (σsalary>10000 (Employee))

After translating the given query, we can execute each relational algebra operation by using
different algorithms. So, in this way, a query processing begins its working.

Evaluation

For this, with addition to the relational algebra translation, it is required to annotate the
translated relational algebra expression with the instructions used for specifying and evaluating
each operation. Thus, after translating the user query, the system executes a query evaluation
plan.

Query Evaluation Plan


o In order to fully evaluate a query, the system needs to construct a query evaluation
plan.
o The annotations in the evaluation plan may refer to the algorithms to be used for the
particular index or the specific operations.
o Such relational algebra with annotations is referred to as Evaluation Primitives. The
evaluation primitives carry the instructions needed for the evaluation of the operation.
o Thus, a query evaluation plan defines a sequence of primitive operations used for
evaluating a query. The query evaluation plan is also referred to as the query execution
plan.
o A query execution engine is responsible for generating the output of the given query. It
takes the query execution plan, executes it, and finally makes the output for the user
query.

Optimization
o The cost of the query evaluation can vary for different types of queries. Although the
system is responsible for constructing the evaluation plan, the user does need not to
write their query efficiently.
o Usually, a database system generates an efficient query evaluation plan, which
minimizes its cost. This type of task performed by the database system and is known as
Query Optimization.
o For optimizing a query, the query optimizer should have an estimated cost analysis of
each operation. It is because the overall operation cost depends on the memory
allocations to several operations, execution costs, and so on.

Finally, after selecting an evaluation plan, the system evaluates the query and produces the
output of the query.
NoSQL
NoSQL Database is a non-relational Data Management System, that does not require a fixed
schema. It avoids joins, and is easy to scale. The major purpose of using a NoSQL database is for
distributed data stores with humongous data storage needs. NoSQL is used for Big data and
real-time web apps. For example, companies like Twitter, Facebook and Google collect
terabytes of user data every single day.
NoSQL database stands for “Not Only SQL” or “Not SQL.” Though a better term would be
“NoREL”, NoSQL caught on. Carl Strozz introduced the NoSQL concept in 1998.
Traditional RDBMS uses SQL syntax to store and retrieve data for further insights. Instead, a
NoSQL database system encompasses a wide range of database technologies that can store
structured, semi-structured, unstructured and polymorphic data. Let’s understand about NoSQL
with a diagram in this NoSQL database tutorial:

Why NoSQL?
The concept of NoSQL databases became popular with Internet giants like Google, Facebook,
Amazon, etc. who deal with huge volumes of data. The system response time becomes slow
when you use RDBMS for massive volumes of data.
To resolve this problem, we could “scale up” our systems by upgrading our existing hardware.
This process is expensive.
The alternative for this issue is to distribute database load on multiple hosts whenever the load
increases. This method is known as “scaling out.”
NoSQL database is non-relational, so it scales out better than relational databases as they are
designed with web applications in mind.
Brief History of NoSQL Databases
 1998- Carlo Strozzi use the term NoSQL for his lightweight, open-source relational
database
 2000- Graph database Neo4j is launched
 2004- Google BigTable is launched
 2005- CouchDB is launched
 2007- The research paper on Amazon Dynamo is released
 2008- Facebooks open sources the Cassandra project
 2009- The term NoSQL was reintroduced
Features of NoSQL
Non-relational
 NoSQL databases never follow the relational model
 Never provide tables with flat fixed-column records
 Work with self-contained aggregates or BLOBs
 Doesn’t require object-relational mapping and data normalization
 No complex features like query languages, query planners,referential integrity joins,
ACID
Schema-free
 NoSQL databases are either schema-free or have relaxed schemas
 Do not require any sort of definition of the schema of the data
 Offers heterogeneous structures of data in the same domain
Simple API
 Offers easy to use interfaces for storage and querying data provided
 APIs allow low-level data manipulation & selection methods
 Text-based protocols mostly used with HTTP REST with JSON
 Mostly used no standard based NoSQL query language
 Web-enabled databases running as internet-facing services
Distributed
 Multiple NoSQL databases can be executed in a distributed fashion
 Offers auto-scaling and fail-over capabilities
 Often ACID concept can be sacrificed for scalability and throughput
 Mostly no synchronous replication between distributed nodes Asynchronous Multi-
Master Replication, peer-to-peer, HDFS Replication
 Only providing eventual consistency
 Shared Nothing Architecture. This enables less coordination and higher distribution.

Types of NoSQL Databases


NoSQL Databases are mainly categorized into four types: Key-value pair, Column-oriented,
Graph-based and Document-oriented. Every category has its unique attributes and limitations.
None of the above-specified database is better to solve all the problems. Users should select
the database based on their product needs.
Types of NoSQL Databases:
 Key-value Pair Based
 Column-oriented Graph
 Graphs based
 Document-oriented
Key Value Pair Based
Data is stored in key/value pairs. It is designed in such a way to handle lots of data and heavy
load.
Key-value pair storage databases store data as a hash table where each key is unique, and the
value can be a JSON, BLOB(Binary Large Objects), string, etc.
For example, a key-value pair may contain a key like “Website” associated with a value like
“Guru99”.

It is one of the most basic NoSQL database example. This kind of NoSQL database is used as a
collection, dictionaries, associative arrays, etc. Key value stores help the developer to store
schema-less data. They work best for shopping cart contents.
Redis, Dynamo, Riak are some NoSQL examples of key-value store DataBases. They are all
based on Amazon’s Dynamo paper.
Column-based
Column-oriented databases work on columns and are based on BigTable paper by Google.
Every column is treated separately. Values of single column databases are stored contiguously.
Column based NoSQL database
They deliver high performance on aggregation queries like SUM, COUNT, AVG, MIN etc. as the
data is readily available in a column.
Column-based NoSQL databases are widely used to manage data warehouses, business
intelligence, CRM, Library card catalogs,
HBase, Cassandra, HBase, Hypertable are NoSQL query examples of column based database.
Document-Oriented
Document-Oriented NoSQL DB stores and retrieves data as a key value pair but the value part is
stored as a document. The document is stored in JSON or XML formats. The value is understood
by the DB and can be queried.

Relational Vs. Document


In this diagram on your left you can see we have rows and columns, and in the right, we have a
document database which has a similar structure to JSON. Now for the relational database, you
have to know what columns you have and so on. However, for a document database, you have
data store like JSON object. You do not require to define which make it flexible.
The document type is mostly used for CMS systems, blogging platforms, real-time analytics & e-
commerce applications. It should not use for complex transactions which require multiple
operations or queries against varying aggregate structures.
Amazon SimpleDB, CouchDB, MongoDB, Riak, Lotus Notes, MongoDB, are popular Document
originated DBMS systems.
Graph-Based
A graph type database stores entities as well the relations amongst those entities. The entity is
stored as a node with the relationship as edges. An edge gives a relationship between nodes.
Every node and edge has a unique identifier.

Compared to a relational database where tables are loosely connected, a Graph database is a
multi-relational in nature. Traversing relationship is fast as they are already captured into the
DB, and there is no need to calculate them.
Graph base database mostly used for social networks, logistics, spatial data.
Neo4J, Infinite Graph, OrientDB, FlockDB are some popular graph-based databases.
Query Mechanism tools for NoSQL
The most common data retrieval mechanism is the REST-based retrieval of a value based on its
key/ID with GET resource
Document store Database offers more difficult queries as they understand the value in a key-
value pair. For example, CouchDB allows defining views with MapReduce
What is the CAP Theorem?
CAP theorem is also called brewer’s theorem. It states that is impossible for a distributed data
store to offer more than two out of three guarantees
1. Consistency
2. Availability
3. Partition Tolerance
Consistency:
The data should remain consistent even after the execution of an operation. This means once
data is written, any future read request should contain that data. For example, after updating
the order status, all the clients should be able to see the same data.
Availability:
The database should always be available and responsive. It should not have any downtime.
Partition Tolerance:
Partition Tolerance means that the system should continue to function even if the
communication among the servers is not stable. For example, the servers can be partitioned
into multiple groups which may not communicate with each other. Here, if part of the database
is unavailable, other parts are always unaffected.
Eventual Consistency
The term “eventual consistency” means to have copies of data on multiple machines to get high
availability and scalability. Thus, changes made to any data item on one machine has to be
propagated to other replicas.
Data replication may not be instantaneous as some copies will be updated immediately while
others in due course of time. These copies may be mutually, but in due course of time, they
become consistent. Hence, the name eventual consistency.
BASE: Basically Available, Soft state, Eventual consistency
 Basically, available means DB is available all the time as per CAP theorem
 Soft state means even without an input; the system state may change
 Eventual consistency means that the system will become consistent over time

Advantages of NoSQL
 Can be used as Primary or Analytic Data Source
 Big Data Capability
 No Single Point of Failure
 Easy Replication
 No Need for Separate Caching Layer
 It provides fast performance and horizontal scalability.
 Can handle structured, semi-structured, and unstructured data with equal effect
 Object-oriented programming which is easy to use and flexible
 NoSQL databases don’t need a dedicated high-performance server
 Support Key Developer Languages and Platforms
 Simple to implement than using RDBMS
 It can serve as the primary data source for online applications.
 Handles big data which manages data velocity, variety, volume, and complexity
 Excels at distributed database and multi-data center operations
 Eliminates the need for a specific caching layer to store data
 Offers a flexible schema design which can easily be altered without downtime or service
disruption
Disadvantages of NoSQL
 No standardization rules
 Limited query capabilities
 RDBMS databases and tools are comparatively mature
 It does not offer any traditional database capabilities, like consistency when multiple
transactions are performed simultaneously.
 When the volume of data increases it is difficult to maintain unique values as keys
become difficult
 Doesn’t work as well with relational data
 The learning curve is stiff for new developers
 Open source options so not so popular for enterprises.
Summary
 NoSQL is a non-relational DMS, that does not require a fixed schema, avoids joins, and is
easy to scale
 The concept of NoSQL databases became popular with Internet giants like Google,
Facebook, Amazon, etc. who deal with huge volumes of data
 In the year 1998- Carlo Strozzi use the term NoSQL for his lightweight, open-source
relational database
 NoSQL databases never follow the relational model it is either schema-free or has
relaxed schemas
 Four types of NoSQL Database are 1). Key-value Pair Based 2). Column-oriented Graph
3). Graphs based 4). Document-oriented
 NOSQL can handle structured, semi-structured, and unstructured data with equal effect
 CAP theorem consists of three words Consistency, Availability, and Partition Tolerance
 BASE stands for Basically Available, Soft state, Eventual consistency
 The term “eventual consistency” means to have copies of data on multiple machines to
get high availability and scalability
 NOSQL offer limited query capabilitie

MongoDB
MongoDB is a document-oriented NoSQL database used for high volume data storage. Instead
of using tables and rows as in the traditional relational databases, MongoDB makes use of
collections and documents. Documents consist of key-value pairs which are the basic unit of
data in MongoDB. Collections contain sets of documents and function which is the equivalent of
relational database tables. MongoDB is a database which came into light around the mid-2000s.
MongoDB Features
1. Each database contains collections which in turn contains documents. Each document
can be different with a varying number of fields. The size and content of each document
can be different from each other.
2. The document structure is more in line with how developers construct their classes and
objects in their respective programming languages. Developers will often say that their
classes are not rows and columns but have a clear structure with key-value pairs.
3. The rows (or documents as called in MongoDB) doesn’t need to have a schema defined
beforehand. Instead, the fields can be created on the fly.
4. The data model available within MongoDB allows you to represent hierarchical
relationships, to store arrays, and other more complex structures more easily.
5. Scalability – The MongoDB environments are very scalable. Companies across the world
have defined clusters with some of them running 100+ nodes with around millions of
documents within the database
MongoDB Example
The below example shows how a document can be modeled in MongoDB.
1. The _id field is added by MongoDB to uniquely identify the document in the collection.
2. What you can note is that the Order Data (OrderID, Product, and Quantity ) which in
RDBMS will normally be stored in a separate table, while in MongoDB it is actually
stored as an embedded document in the collection itself. This is one of the key
differences in how data is modeled in MongoDB.

Key Components of MongoDB Architecture


Below are a few of the common terms used in MongoDB
1. _id – This is a field required in every MongoDB document. The _id field represents a
unique value in the MongoDB document. The _id field is like the document’s primary
key. If you create a new document without an _id field, MongoDB will automatically
create the field. So for example, if we see the example of the above customer table,
Mongo DB will add a 24 digit unique identifier to each document in the collection.
_Id CustomerID CustomerName OrderID
563479cc8a8a4246bd27d784 11 Guru99 111
563479cc7a8a4246bd47d784 22 Trevor Smith 222
563479cc9a8a4246bd57d784 33 Nicole 333
2. Collection – This is a grouping of MongoDB documents. A collection is the equivalent of
a table which is created in any other RDMS such as Oracle or MS SQL. A collection exists
within a single database. As seen from the introduction collections don’t enforce any
sort of structure.
3. Cursor – This is a pointer to the result set of a query. Clients can iterate through a cursor
to retrieve results.
4. Database – This is a container for collections like in RDMS wherein it is a container for
tables. Each database gets its own set of files on the file system. A MongoDB server can
store multiple databases.
5. Document – A record in a MongoDB collection is basically called a document. The
document, in turn, will consist of field name and values.
6. Field – A name-value pair in a document. A document has zero or more fields. Fields are
analogous to columns in relational databases.The following diagram shows an example
of Fields with Key value pairs. So in the example below CustomerID and 11 is one of the
key value pair’s defined in the document.

7. JSON – This is known as JavaScript Object Notation. This is a human-readable, plain text
format for expressing structured data. JSON is currently supported in many
programming languages.
Just a quick note on the key difference between the _id field and a normal collection field. The
_id field is used to uniquely identify the documents in a collection and is automatically added by
MongoDB when the collection is created.
Why Use MongoDB?
Below are the few of the reasons as to why one should start using MongoDB
1. Document-oriented – Since MongoDB is a NoSQL type database, instead of having data
in a relational type format, it stores the data in documents. This makes MongoDB very
flexible and adaptable to real business world situation and requirements.
2. Ad hoc queries – MongoDB supports searching by field, range queries, and regular
expression searches. Queries can be made to return specific fields within documents.
3. Indexing – Indexes can be created to improve the performance of searches within
MongoDB. Any field in a MongoDB document can be indexed.
4. Replication – MongoDB can provide high availability with replica sets. A replica set
consists of two or more mongo DB instances. Each replica set member may act in the
role of the primary or secondary replica at any time. The primary replica is the main
server which interacts with the client and performs all the read/write operations. The
Secondary replicas maintain a copy of the data of the primary using built-in replication.
When a primary replica fails, the replica set automatically switches over to the
secondary and then it becomes the primary server.
5. Load balancing – MongoDB uses the concept of sharding to scale horizontally by splitting
data across multiple MongoDB instances. MongoDB can run over multiple servers,
balancing the load and/or duplicating data to keep the system up and running in case of
hardware failure.
Data Modelling in MongoDB
As we have seen from the Introduction section, the data in MongoDB has a flexible schema.
Unlike in SQL databases, where you must have a table’s schema declared before inserting data,
MongoDB’s collections do not enforce document structure. This sort of flexibility is what makes
MongoDB so powerful.
When modeling data in Mongo, keep the following things in mind
1. What are the needs of the application – Look at the business needs of the application
and see what data and the type of data needed for the application. Based on this,
ensure that the structure of the document is decided accordingly.
2. What are data retrieval patterns – If you foresee a heavy query usage then consider the
use of indexes in your data model to improve the efficiency of queries.
3. Are frequent inserts, updates and removals happening in the database? Reconsider the
use of indexes or incorporate sharding if required in your data modeling design to
improve the efficiency of your overall MongoDB environment.
Difference between MongoDB & RDBMS
Below are some of the key term differences between MongoDB and RDBMS
RDBMS MongoDB Difference
Table Collection In RDBMS, the table contains the columns and rows which
are used to store the data whereas, in MongoDB, this same
structure is known as a collection. The collection contains
documents which in turn contains Fields, which in turn are
key-value pairs.
In RDBMS, the row represents a single, implicitly structured
Row Document data item in a table. In MongoDB, the data is stored in
documents.
Column Field In RDBMS, the column denotes a set of data values. These in
MongoDB are known as Fields.
Joins Embedded documents In RDBMS, data is sometimes spread across various tables
and in order to show a complete view of all data, a join is
sometimes formed across tables to get the data. In
MongoDB, the data is normally stored in a single collection,
but separated by using Embedded documents. So there is no
concept of joins in MongoDB.
Apart from the terms differences, a few other differences are shown below
1. Relational databases are known for enforcing data integrity. This is not an explicit
requirement in MongoDB.
2. RDBMS requires that data be normalized first so that it can prevent orphan records and
duplicates Normalizing data then has the requirement of more tables, which will then
result in more table joins, thus requiring more keys and indexes.As databases start to
grow, performance can start becoming an issue. Again this is not an explicit requirement
in MongoDB. MongoDB is flexible and does not need the data to be normalized first.
What is MongoDB?
MongoDB is an open source NoSQL database management program. NoSQL (Not only SQL) is
used as an alternative to traditional relational databases. NoSQL databases are quite useful for
working with large sets of distributed data. MongoDB is a tool that can manage document-
oriented information, store or retrieve information.
MongoDB is used for high-volume data storage, helping organizations store large amounts of
data while still performing rapidly. Organizations also use MongoDB for its ad-hoc queries,
indexing, load balancing, aggregation, server-side JavaScript execution and other features.
Structured Query Language (SQL) is a standardized programming language that is used to
manage relational databases. SQL normalizes data as schemas and tables, and every table has a
fixed structure.
Instead of using tables and rows as in relational databases, as a NoSQL database, the MongoDB
architecture is made up of collections and documents. Documents are made up of key-value
pairs -- MongoDB's basic unit of data. Collections, the equivalent of SQL tables, contain
document sets. MongoDB offers support for many programming languages, such as C, C++, C#,
Go, Java, Python, Ruby and Swift.
How does MongoDB work?
MongoDB environments provide users with a server to create databases with MongoDB.
MongoDB stores data as records that are made up of collections and documents.
Documents contain the data the user wants to store in the MongoDB database. Documents are
composed of field and value pairs. They are the basic unit of data in MongoDB. The documents
are similar to JavaScript Object Notation (JSON) but use a variant called Binary JSON (BSON).
The benefit of using BSON is that it accommodates more data types. The fields in these
documents are like the columns in a relational database. Values contained can be a variety of
data types, including other documents, arrays and arrays of documents, according to the
MongoDB user manual. Documents will also incorporate a primary key as a unique identifier. A
document's structure is changed by adding or deleting new or existing fields.
Sets of documents are called collections, which function as the equivalent of relational
database tables. Collections can contain any type of data, but the restriction is the data in a
collection cannot be spread across different databases. Users of MongoDB can create multiple
databases with multiple collections.
The mongo shell is a standard component of the open-source distributions of MongoDB. Once
MongoDB is installed, users connect the mongo shell to their running MongoDB instances. The
mongo shell acts as an interactive JavaScript interface to MongoDB, which allows users to query
or update data and conduct administrative operations.
A binary representation of JSON-like documents is provided by the BSON document storage and
data interchange format. Automatic sharding is another key feature that enables data in a
MongoDB collection to be distributed across multiple systems for horizontal scalability, as
data volumes and throughput requirements increase.
The NoSQL DBMS uses a single master architecture for data consistency, with secondary
databases that maintain copies of the primary database. Operations are automatically
replicated to those secondary databases for automatic failover.

Why is MongoDB used?


An organization might want to use MongoDB for the following:
 Storage. MongoDB can store large structured and unstructured data volumes and is
scalable vertically and horizontally. Indexes are used to improve search performance.
Searches are also done by field, range and expression queries.
 Data integration. This integrates data for applications, including for hybrid and multi-cloud
applications.
 Complex data structures descriptions. Document databases enable the embedding of
documents to describe nested structures (a structure within a structure) and can tolerate
variations in data.
 Load balancing. MongoDB can be used to run over multiple servers.
Features of MongoDB
Features of MongoDB include the following:
 Replication. A replica set is two or more MongoDB instances used to provide high
availability. Replica sets are made of primary and secondary servers. The primary MongoDB
server performs all the read and write operations, while the secondary replica keeps a copy
of the data. If a primary replica fails, the secondary replica is then used.
 Scalability. MongoDB supports vertical and horizontal scaling. Vertical scaling works by
adding more power to an existing machine, while horizontal scaling works by adding more
machines to a user's resources.
 Load balancing. MongoDB handles load balancing without the need for a separate,
dedicated load balancer, through either vertical or horizontal scaling.
 Schema-less. MongoDB is a schema-less database, which means the database can manage
data without the need for a blueprint.
 Document. Data in MongoDB is stored in documents with key-value pairs instead of rows
and columns, which makes the data more flexible when compared to SQL databases.
Advantages of MongoDB
MongoDB offers several potential benefits:
 Schema-less. Like other NoSQL databases, MongoDB doesn't require predefined schemas. It
stores any type of data. This gives users the flexibility to create any number of fields in a
document, making it easier to scale MongoDB databases compared to relational databases.
 Document-oriented. One of the advantages of using documents is that these objects map
to native data types in several programming languages., Having embedded documents also
reduces the need for database joins, which can lower costs.
 Scalability. A core function of MongoDB is its horizontal scalability, which makes it a useful
database for companies running big data applications. In addition, sharding lets the
database distribute data across a cluster of machines. MongoDB also supports the creation
of zones of data based on a shard key.
 Third-party support. MongoDB supports several storage engines and provides pluggable
storage engine APIs that let third parties develop their own storage engines for MongoDB.
 Aggregation. The DBMS also has built-in aggregation capabilities, which lets users
run MapReduce code directly on the database rather than running MapReduce on Hadoop.
MongoDB also includes its own file system called GridFS, akin to the Hadoop Distributed File
System. The use of the file system is primarily for storing files larger than BSON's size limit
of 16 MB per document. These similarities let MongoDB be used instead of Hadoop, though
the database software does integrate with Hadoop, Spark and other data processing
frameworks.
Disadvantages of MongoDB
Though there are some valuable benefits to MongoDB, there are some downsides to it as well.
 Continuity. With its automatic failover strategy, a user sets up just one master node in a
MongoDB cluster. If the master fails, another node will automatically convert to the new
master. This switch promises continuity, but it isn't instantaneous -- it can take up to a
minute. By comparison, the Cassandra NoSQL database supports multiple master nodes. If
one master goes down, another is standing by, creating a highly available database
infrastructure.
 Write limits. MongoDB's single master node also limits how fast data can be written to the
database. Data writes must be recorded on the master, and writing new information to the
database is limited by the capacity of that master node.
 Data consistency. MongoDB doesn't provide full referential integrity through the use of
foreign-key constraints, which could affect data consistency.
 Security. In addition, user authentication isn't enabled by default in MongoDB databases.
However, malicious hackers have targeted large numbers of unsecured MongoDB systems
in attacks, which led to the addition of a default setting that blocks networked connections
to databases if they haven't been configured by a database administrator.
MongoDB vs. RDBMS: What are the differences?
A relational database management system (RDBMS) is a collection of programs and capabilities
that let IT teams and others create, update, administer and otherwise interact with a relational
database. RDBMSes store data in the form of tables and rows. Although it is not necessary,
RDBMS most commonly uses SQL.
One of the main differences between MongoDB and RDBMS is that RDBMS is a relational
database while MongoDB is nonrelational. Likewise, while most RDBMS systems use SQL to
manage stored data, MongoDB uses BSON for data storage -- a type of NoSQL database.
While RDBMS uses tables and rows, MongoDB uses documents and collections. In RDBMS a
table -- the equivalent to a MongoDB collection -- stores data as columns and rows. Likewise, a
row in RDBMS is the equivalent of a MongoDB document but stores data as structured data
items in a table. A column denotes sets of data values, which is the equivalent to a field in
MongoDB.
MongoDB is also better suited for hierarchical storage.

MongoDB platforms
MongoDB is available in community and commercial versions through vendor MongoDB Inc.
MongoDB Community Edition is the open source release, while MongoDB Enterprise Server
brings added security features, an in-memory storage engine, administration and
authentication features, and monitoring capabilities through Ops Manager.
A graphical user interface (GUI) named MongoDB Compass gives users a way to work with
document structure, conduct queries, index data and more. The MongoDB Connector for BI lets
users connect the NoSQL database to their business intelligence tools to visualize data and
create reports using SQL queries.
Following in the footsteps of other NoSQL database providers, MongoDB Inc. launched a cloud
database as a service named MongoDB Atlas in 2016. Atlas runs on AWS, Microsoft Azure and
Google Cloud Platform. Later, MongoDB released a platform named Stitch for application
development on MongoDB Atlas, with plans to extend it to on-premises databases.

NoSQL databases often include document, graph, key-value or wide-column store-based


databases.
The company also added support for multi-document atomicity, consistency, isolation, and
durability (ACID) transactions as part of MongoDB 4.0 in 2018. Complying with the ACID
properties across multiple documents expands the types of transactional workloads that
MongoDB can handle with guaranteed accuracy and reliability.
MongoDB history
MongoDB was created by Dwight Merriman and Eliot Horowitz, who encountered development
and scalability issues with traditional relational database approaches while building web
applications at DoubleClick, an online advertising company that is now owned by Google Inc.
The name of the database was derived from the word humongous to represent the idea of
supporting large amounts of data.
Merriman and Horowitz helped form 10Gen Inc. in 2007 to commercialize MongoDB and
related software. The company was renamed MongoDB Inc. in 2013 and went public in October
2017 under the ticker symbol MDB.
The DBMS was released as open source software in 2009 and has been kept updated since.

Architecture
When designing a modern application, chances are that you will need a database to store data.
There are many ways to architect software solutions that use a database, depending on how
your application will use this data. In this article, we will cover the different types of database
architecture and describe in greater detail a three-tier application architecture, which is
extensively used in modern web applications.
What is database architecture?
Database architecture describes how a database management system (DBMS) will be
integrated with your application. When designing a database architecture, you must make
decisions that will change how your applications are created.
First, decide on the type of database you would like to use. The database could be centralized
or decentralized. Centralized databases are typically used for regular web applications and will
be the focus of this article. Decentralized databases, such as blockchain databases, might
require a different architecture.
Once you’ve decided the type of database you want to use, you can determine the type of
architecture you want to use. Typically, these are categorized into single-tier or multi-tier
applications.
What are the types of database architecture?
When we talk about database architectures, we refer to the number of tiers an application has.
1-tier architecture
In 1-tier architecture, the database and any application interfacing with the database are kept
on a single server or device. Because there are no network delays involved, this is generally a
fast way to access data.
On a single-tier application, the application and database reside on the same device.
An example of a 1-tier architecture would be a mobile application that uses Realm, the open-
source mobile database by MongoDB, as a local database. In that case, both the application and
the database are running on the user’s mobile device.
2-tier architecture
2-tier architectures consist of multiple clients connecting directly to the database. This
architecture is also known as client-server architecture.

In a 2-tier architecture, clients are connecting directly to a database.


This architecture used to be more common when a desktop application would connect to a
single database hosted on an on-premise database server—for example, an in-house customer
relationship management (CRM) that connects to an Access database.
3-tier architecture
Most modern web applications use a 3-tier architecture. In this architecture, the clients connect
to a back end, which in turn connects to the database. Using this approach has many benefits:
 Security: Keeping the database connection open to a single back end reduces the risks of
being hacked.
 Scalability: Because each layer operates independently, it is easier to scale parts of the
application.
 Faster deployment: Having multiple tiers makes it easier to have a separation of
concerns and to follow cloud-native best practices, including better continuous delivery
processes.

In a 3-tier architecture, the information between the database and the clients is relayed by a
back-end server.
An example of this type of architecture would be a React application that connects to a Node.js
back end. The Node.js back end processes the requests and fetches the necessary information
from a database such as MongoDB Atlas, using the native driver. This architecture is described
in greater detail in the next section.
What are the three levels of database architecture in MongoDB Atlas?
The most common DBMS architecture used in modern application development is the 3-tier
model. Since it’s so popular, let’s look at what this architecture looks like with MongoDB Atlas.
A three tier application is composed of three layers, the data, the application, and the
presentation.
As you can see in this diagram, the 3-tier architecture comprises the data, application, and
presentation levels.
Data (database) layer
As the name suggests, the data layer is where the data resides. In the scenario above, the data
is stored in a MongoDB Atlas database hosted on any public cloud—or across multiple clouds, if
needed. The only responsibility of this layer is to keep the data accessible for the application
layer and run the queries efficiently.
Application (middle) layer
The application tier is in charge of communicating with the database. To ensure secure access
to the data, requests are initiated from this tier. In a modern web application, this would be
your API. A back-end application built with Node.js (or any other programming language with
a native driver) makes requests to the database and relays the information back to the clients.
Presentation (user) layer
The final layer is the presentation layer. This is usually the UI of the application with which the
users will interact. In the case of a MERN or MEAN stack application, this would be the
JavaScript front end built with React or Angular.
Summary
In this article, you’ve learned about the different types of database architecture. A 3-tier
architecture is your go-to solution for most modern web applications. However, there are other
topologies that you might want to explore. For example, the type of database you use could be
a dedicated or a serverless instance, depending on your predicted usage model. You could also
supplement your database with data lakes or even online archiving to make the best use of
your hardware resources. If you are ready to concretize your database architecture, why not
try MongoDB Atlas, the database-as-a-service solution from MongoDB? Using the realm-web
SDK, you can even host all three tiers of your web application on MongoDB Atlas.
Data Model Design
MongoDB provides two types of data models: — Embedded data model and Normalized data
model. Based on the requirement, you can use either of the models while preparing your
document.
Embedded Data Model
In this model, you can have (embed) all the related data in a single document, it is also known
as de-normalized data model.
For example, assume we are getting the details of employees in three different documents
namely, Personal_details, Contact and, Address, you can embed all the three documents in a
single one as shown below −
{
_id: ,
Emp_ID: "10025AE336"
Personal_details:{
First_Name: "Radhika",
Last_Name: "Sharma",
Date_Of_Birth: "1995-09-26"
},
Contact: {
e-mail: "[email protected]",
phone: "9848022338"
},
Address: {
city: "Hyderabad",
Area: "Madapur",
State: "Telangana"
}
}
Normalized Data Model
In this model, you can refer the sub documents in the original document, using references. For
example, you can re-write the above document in the normalized model as:
Employee:
{
_id: <ObjectId101>,
Emp_ID: "10025AE336"
}
Personal_details:
{
_id: <ObjectId102>,
empDocID: " ObjectId101",
First_Name: "Radhika",
Last_Name: "Sharma",
Date_Of_Birth: "1995-09-26"
}
Contact:
{
_id: <ObjectId103>,
empDocID: " ObjectId101",
e-mail: "[email protected]",
phone: "9848022338"
}

Address:
{
_id: <ObjectId104>,
empDocID: " ObjectId101",
city: "Hyderabad",
Area: "Madapur",
State: "Telangana"
}
Considerations while designing Schema in MongoDB
 Design your schema according to user requirements.
 Combine objects into one document if you will use them together. Otherwise separate
them (but make sure there should not be need of joins).
 Duplicate the data (but limited) because disk space is cheap as compare to compute
time.
 Do joins while write, not on read.
 Optimize your schema for most frequent use cases.
 Do complex aggregation in the schema.
Example
Suppose a client needs a database design for his blog/website and see the differences between
RDBMS and MongoDB schema design. Website has the following requirements.
 Every post has the unique title, description and url.
 Every post can have one or more tags.
 Every post has the name of its publisher and total number of likes.
 Every post has comments given by users along with their name, message, data-time and
likes.
 On each post, there can be zero or more comments.
In RDBMS schema, design for above requirements will have minimum three tables.
While in MongoDB schema, design will have one collection post and the following structure −
{
_id: POST_ID
title: TITLE_OF_POST,
description: POST_DESCRIPTION,
by: POST_BY,
url: URL_OF_POST,
tags: [TAG1, TAG2, TAG3],
likes: TOTAL_LIKES,
comments: [
{
user:'COMMENT_BY',
message: TEXT,
dateCreated: DATE_TIME,
like: LIKES
},
{
user:'COMMENT_BY',
message: TEXT,
dateCreated: DATE_TIME,
like: LIKES
}
]
}
So while showing the data, in RDBMS you need to join three tables and in MongoDB, data will
be shown from one collection only.

MongoDB supports many datatypes. Some of them are −

 String − This is the most commonly used datatype to store the data. String in MongoDB
must be UTF-8 valid.
 Integer − This type is used to store a numerical value. Integer can be 32 bit or 64 bit
depending upon your server.
 Boolean − This type is used to store a boolean (true/ false) value.
 Double − This type is used to store floating point values.
 Min/ Max keys − This type is used to compare a value against the lowest and highest
BSON elements.
 Arrays − This type is used to store arrays or list or multiple values into one key.
 Timestamp − ctimestamp. This can be handy for recording when a document has been
modified or added.
 Object − This datatype is used for embedded documents.
 Null − This type is used to store a Null value.
 Symbol − This datatype is used identically to a string; however, it's generally reserved
for languages that use a specific symbol type.
 Date − This datatype is used to store the current date or time in UNIX time format. You
can specify your own date time by creating object of Date and passing day, month, year
into it.
 Object ID − This datatype is used to store the document’s ID.
 Binary data − This datatype is used to store binary data.
 Code − This datatype is used to store JavaScript code into the document.
 Regular expression − This datatype is used to store regular expression.

CRUD Operations
MongoDB CRUD operations are the fundamental operations used to manage data within a
MongoDB database. CRUD stands for Create, Read, Update, and Delete, and these operations
are essential for working with documents in a MongoDB collection. Let's dive deeper into each
of these CRUD operations:

Create Operations:

Create or Insert Operations: These operations add new documents to a collection. If the
collection does not exist, insert operations will create it. MongoDB provides the following
methods for creating or inserting documents:

db.collection.insertOne(): This method is used to insert a single document into the collection.

db.collection.insertMany(): It allows you to insert multiple documents into the collection.

db.createCollection(): This is used to create an empty collection. If the create operation is


successful, a new document is created, and the function returns an object where
"acknowledged" is "true," and "insertID" is the newly created "ObjectId."

Read Operations:

Read Operations: These operations retrieve documents from a collection, effectively querying
the collection for documents. MongoDB provides the following methods for reading
documents:

db.collection.find(): It is used to retrieve documents from the collection based on specific


criteria.

db.collection.findOne(): This method retrieves the first document that matches the query
criteria.

Update Operations:

Update Operations: These operations modify existing documents within a collection. MongoDB
provides the following methods for updating documents:

db.collection.updateOne(): This method is used to update a single document that matches the
provided filter criteria.

db.collection.updateMany(): It allows you to update multiple documents that match the filter
criteria.

db.collection.replaceOne(): This method is used to replace a single document in the specified


collection. It replaces the entire document, and fields in the old document not contained in the
new document will be lost.

Here's an example of how the update operations work:

// Updating a single document

db.RecordsDB.updateOne({name: "Kevin"}, {$set: {name: "Maki"}});

// Updating multiple documents


db.RecordsDB.updateMany({species: "Dog"}, {$set: {age: "5"}});

Delete Operations:

These operations remove documents from a collection. MongoDB provides the following
methods for deleting documents:

db.collection.deleteOne(): It removes a single document that matches the provided filter


criteria.

db.collection.deleteMany(): This method is used to delete multiple documents that match the
filter criteria.

// Deleting a single document

db.RecordsDB.deleteOne({name: "Maki"});

// Deleting multiple documents

db.RecordsDB.deleteMany({species: "Dog"});

In summary, MongoDB CRUD operations are essential for managing data within a MongoDB
database. They allow you to create, read, update, and delete documents, providing the basic
functionality needed to interact with the database and manipulate data as needed for your
applications.

You might also like