0% found this document useful (0 votes)
3 views39 pages

db 5

The document provides an overview of NoSQL and NewSQL databases, comparing their features, advantages, and use cases. It specifically highlights MongoDB and Cassandra as prominent NoSQL databases, detailing their architectures, functionalities, and benefits such as scalability and flexibility. Additionally, it discusses when to choose or avoid NoSQL databases based on application requirements and data handling needs.

Uploaded by

aksharadeepa2006
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views39 pages

db 5

The document provides an overview of NoSQL and NewSQL databases, comparing their features, advantages, and use cases. It specifically highlights MongoDB and Cassandra as prominent NoSQL databases, detailing their architectures, functionalities, and benefits such as scalability and flexibility. Additionally, it discusses when to choose or avoid NoSQL databases based on application requirements and data handling needs.

Uploaded by

aksharadeepa2006
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 39

Lecture Notes Template

22XX405 - DATABASE MANAGEMENT SYSTEM

UNIT 5 & LP 1 - NOSQL VS NEWSQL, NOSQL DATABASES:


MONGODB AND CASSANDRA

1. NOSQL VS NEWSQL:
FEATURES NO SQL NEW SQL
It is both schema-fix and
Schema It is schema-free.
schema-free.
Base Properties/Theorem It follows the CAP theorem. It takes care of ACID properties.
Security It is less secure. It is moderately secure.
Databases Distributed database. Distributed database.
It does not support old SQL but It supports SQL with improved
Query Language
supports UQL. functions and features.
It is both vertically and
Scalability It is only vertically scalable.
horizontally scalable.
Relational database but not
Types of database Non-relational database.
purely.
Online transaction processing
Online processing Online analytical processing.
with full functionality.
Complex queries can be directed Highly efficient for complex
Query Handling
better than SQL. queries.
Example MongoDB Cockroach DB.

2. NOSQL DATABASES:
NoSQL refers to nonrelational databases that store data differently from relational tables.
They can be queried using various languages, making them "not only SQL" databases.
Developers prefer them for agile development, as they easily adapt to changing requirements.
NoSQL databases store data in ways closer to how applications use it, minimizing
transformations. They leverage the cloud for zero downtime.

2.1 SQL VERSUS NOSQL:


SQL databases are relational, while NoSQL databases are non-relational. The relational
database management system (RDBMS) is the basis for structured query language (SQL), which
lets users access and manipulate data in highly structured tables. This is a foundational model for
database systems such as MS SQL Server, IBM DB2, Oracle, and MySQL. But with NoSQL
databases, the data access syntax can be different from database to database.

2.2 RELATIONAL DATABASE VERSUS NOSQL DATABASE:


Understanding NoSQL databases involves recognizing the differences from traditional
RDBMS (Relational Database Management Systems). RDBMS stores data in tables with defined
schemas, requiring upfront knowledge of columns and data types. Relationships between tables
are established through keys.
In contrast, NoSQL databases allow storing data without a predefined schema, enabling
quick iteration based on business requirements. While relational databases were widely used, the
increasing variety, velocity, and volume of data have led to the adoption of NoSQL databases,
known for scalability and adaptability to high traffic.

When to choose a NoSQL database?


With businesses and organizations needing to innovate rapidly, being able to stay agile
and continue operating at any scale is the name of the game. NoSQL databases offer flexible
schemas and also support a variety of data models that are ideal for building applications that
require large data volumes and low latency or response times—for example, online gaming and
ecommerce web applications.

When not to choose a NoSQL database?


NoSQL databases primarily use denormalized data, catering to applications with fewer
tables and embedded records instead of references for data relationships. Traditional back-office
applications with highly normalized data, like those in finance and enterprise resource planning,
may not be suitable for NoSQL databases.
NoSQL excels with simple queries against a single table but faces challenges with
complex queries, where relational databases are more effective. Some companies opt for a hybrid
approach, employing both relational and nonrelational models to enhance flexibility and
maintain consistency in handling diverse data types without compromising performance.

What does NoSQL offer that other databases don’t?


NoSQL databases differentiate themselves by employing unstructured storage,
emphasizing fast queries, scalability, and adaptability to frequent application changes. Developed
in the last two decades, NoSQL databases simplify programming for developers.
Horizontal scaling, achieved through a process called "sharding," sets NoSQL apart from
other databases that typically use vertical scaling. Sharding allows for the addition of more
machines to handle data across multiple servers, providing an efficient way to manage large and
growing datasets.
In contrast, vertical scaling involves enhancing power and memory on existing machines,
which may become unsustainable with increasing storage needs.

2.3 BENEFITS OF A NOSQL DATABASE:


Flexibility
With SQL databases, data is stored in a much more rigid, predefined structure. But with
NoSQL, data can be stored in a more free-form fashion without those rigid schemas. This design
enables innovation and rapid application development. Developers can focus on creating systems
to better serve their customers without worrying about schemas. NoSQL databases can easily
handle any data format, such as structured, semi-structured, and unstructured data in a single data
store.

Scalability
Instead of scaling up by adding more servers, NoSQL databases can scale out by using
commodity hardware. This has the ability to support increased traffic in order to meet demand
with zero downtime. By scaling out, NoSQL databases can become larger and more powerful,
which is why they have become the preferred option for evolving data sets.

High performance
The scale-out architecture of a NoSQL database is particularly beneficial for handling
increased data volume and traffic. This architecture, illustrated in the graphic, ensures fast and
predictable single-digit millisecond response times. NoSQL databases excel in ingesting and
delivering data quickly and reliably, making them suitable for applications collecting terabytes of
data daily while maintaining a highly interactive user experience. The graphic depicts an
incoming rate of 300 reads per second (blue line) with a 95th latency in the 3-4ms range, and an
incoming rate of 150 writes per second (green line) with a 95th latency in the 4-5ms range.
Availability
NoSQL databases automatically replicate data across multiple servers, data centers, or
cloud resources. In turn, this minimizes latency for users, no matter where they’re located. This
feature also works to reduce the burden of database management, which frees up time to focus
on other priorities.

Highly Functional
NoSQL databases are designed for distributed data stores that have extremely large data
storage needs. This is what makes NoSQL the ideal choice for big data, real-time web apps,
customer 360, online shopping, online gaming, Internet of things, social networks, and online
advertising applications.

2.4 TYPES OF NOSQL DATABASES:


There are four main types of NoSQL databases:
Key value
This is the most flexible type of NoSQL database because the application has complete
control over what is stored in the value field without any restrictions.

Document
Also referred to as document store or document-oriented databases, these databases are
used for storing, retrieving, and managing semi-structured data. There is no need to specify
which fields a document will contain.
Graph
This database organizes data as nodes and relationships, which show the connections
between nodes. This supports a richer and fuller representation of data. Graph databases are
applied in social networks, reservation systems, and fraud detection.
Wide column
These databases store and manage data in the form of tables, rows, and columns. They
are broadly deployed in applications that require a column format to capture schema-free data.

3. TYPES OF NOSQL DATABASES:


● Document-based databases
● Key-value stores
● Column-oriented databases
● Graph-based databases

Types of NoSQL Database

4. MONGODB FEATURES AND IMPLEMENTATION:


MongoDB is an open-source document-oriented database designed for efficient storage
and retrieval of large-scale data. It falls under the NoSQL category as it doesn't use tables for
data organization. Developed and managed by MongoDB.Inc under the SSPL (Server Side
Public License), MongoDB offers official driver support for popular programming languages,
enabling application development in C, C++, C#, .Net, Go, Java, Node.js, Perl, PHP, Python,
Motor, Ruby, Scala, Swift, and Mongoid. Companies like Facebook, Nokia, eBay, Adobe, and
Google use MongoDB to store substantial amounts of data. It was initially released in February
2009.

4.1 WORKING:
Now, we will see what actually happens behind the scenes. As we know that MongoDB
is a database server and the data is stored in these databases. Or in other words, the MongoDB
environment gives you a server that you can start and then create multiple databases on it using
MongoDB. Because of its NoSQL database, the data is stored in the collections and documents.
Hence the database, collection, and documents are related to each other as shown below:

● The MongoDB database contains collections just like the MYSQL database contains
tables. You are allowed to create multiple databases and multiple collections.
● Now inside of the collection we have documents. These documents contain the data we
want to store in the MongoDB database and a single collection can contain multiple
documents and you are schema-less means it is not necessary that one document is
similar to another.
● The documents are created using the fields. Fields are key-value pairs in the documents,
it is just like columns in the relational database. The value of the fields can be of any
BSON data types like double, string, boolean, etc.
● The data stored in MongoDB is in the format of BSON documents. Here, BSON stands
for Binary representation of JSON documents. Or in other words, in the backend, the
MongoDB server converts the JSON data into a binary form that is known as BSON and
this BSON is stored and queried more efficiently.
● In MongoDB documents, you are allowed to store nested data. This nesting of data
allows you to create complex relations between data and store them in the same
document which makes the working and fetching of data extremely efficient as compared
to SQL. In SQL, you need to write complex joins to get the data from table 1 and table 2.
The maximum size of the BSON document is 16MB. In the MongoDB server, you are
allowed to run multiple databases.

4.2 DIFFERENCE BETWEEN MONGODB AND RDBMS:

MongoDB RDBMS

It is a non-relational and document-oriented


It is a relational database.
database.

It is suitable for hierarchical data storage. It is not suitable for hierarchical data storage.

It has a dynamic schema. It has a predefined schema.

It centers around the CAP theorem


It centers around ACID properties (Atomicity,
(Consistency, Availability, and Partition
Consistency, Isolation, and Durability).
tolerance).

In terms of performance, it is much faster than In terms of performance, it is slower than


RDBMS. MongoDB.

4.3 FEATURES OF MONGODB:


Schema-less Database: It is the great feature provided by MongoDB. A Schema-less database
means one collection can hold different types of documents in it. Or in other words, in the
MongoDB database, a single collection can hold multiple documents and these documents may
consist of the different numbers of fields, content, and size. It is not necessary that the one
document is similar to another document like in relational databases. Due to this cool feature,
MongoDB provides great flexibility to databases.

Document Oriented: In MongoDB, all the data is stored in the documents instead of tables like
in RDBMS. In these documents, the data is stored in fields(key-value pair) instead of rows and
columns which make the data much more flexible in comparison to RDBMS. And each
document contains its unique object id.

Indexing: In MongoDB database, every field in the documents is indexed with primary and
secondary indices; this makes it easier and takes less time to get or search data from the pool of
the data. If the data is not indexed, then the database searches each document with the specified
query which takes lots of time and is not so efficient.
Scalability: MongoDB provides horizontal scalability with the help of sharding. Sharding means
to distribute data on multiple servers, here a large amount of data is partitioned into data chunks
using the shard key, and these data chunks are evenly distributed across shards that reside across
many physical servers. It will also add new machines to a running database.

Replication: MongoDB provides high availability and redundancy with the help of replication, it
creates multiple copies of the data and sends these copies to a different server so that if one
server fails, then the data is retrieved from another server.

Aggregation: It allows to perform operations on the grouped data and get a single result or
computed result. It is similar to the SQL GROUPBY clause. It provides three different
aggregations i.e, aggregation pipeline, map-reduce function, and single-purpose aggregation
methods

High Performance: The performance of MongoDB is very high and data persistence as
compared to another database due to its features like scalability, indexing, replication, etc.

4.4 ADVANTAGES OF MONGODB :


● It is a schemaless NoSQL database. You need not to design the schema of the database
when you are working with MongoDB.
● It does not support join operation.
● It provides great flexibility to the fields in the documents.
● It contains heterogeneous data.
● It provides high performance, availability, and scalability.
● It supports Geospatial efficiently.
● It is a document oriented database and the data is stored in BSON documents.
● It also supports multiple document ACID transitions(string from MongoDB 4.0).
● It does not require any SQL injection.
● It is easily integrated with Big Data Hadoop

4.5 DISADVANTAGES OF MONGODB :


● It uses high memory for data storage.
● You are not allowed to store more than 16MB data in the documents.
● The nesting of data in BSON is also limited; you are not allowed to nest data more than
100 levels.
5. CASSANDRA FEATURES AND IMPLEMENTATION:
5.1 FEATURES:
Apache Cassandra is an open source, user-available, distributed, NoSQL DBMS which is
designed to handle large amounts of data across many servers. It provides zero point of failure.
Cassandra offers massive support for clusters spanning multiple data centers. There are some
massive features of Cassandra. Here are some of the features described below:
Distributed: Each node in the cluster has the same role. There’s no question of failure & the
data set is distributed across the cluster but one issue is there that the master isn’t present in each
node to support requests for service.

Supports replication & Multi data center replication: Replication factor comes with the best
configurations in cassandra. Cassandra is designed to have a distributed system, for the
deployment of a large number of nodes across multiple data centers and other key features too.

Scalability: It is designed to r/w throughput, Increase gradually as new machines are added
without interrupting other applications.

Fault-tolerance: Data is automatically stored & replicated for fault-tolerance. If a node Fails,
then it is replaced within no time.

MapReduce Support: It supports Hadoop integration with MapReduce support.Apache Hive &
Apache Pig is also supported.

Query Language: Cassandra has introduced the CQL (Cassandra Query Language). Its a simple
interface for accessing Cassandra.

6.2 WORKING:
Cassandra is an open-source NoSQL distributed database managed by the Apache
non-profit organization, emphasizing high availability and reliability through a distributed
architecture. The article highlights Cassandra's features and identifies six key applications in
enterprise use cases. Given the modern abundance of data, effective storage systems like
databases are essential for processing and referencing information. Database Management
Systems (DBMS), crucial for managing databases, interact with databases and other software for
dataset analysis.
Cassandra, as an open-source NoSQL distributed database, excels in managing large data
volumes across commodity servers, providing high availability without a single point of failure.
It is designed for decentralized, scalable storage and operates in multiple cloud data centers.
Understanding Cassandra involves exploring its architecture components, partitioning system,
and replicability..

1. Architecture of Cassandra
Cassandra's primary architecture consists of a peer-to-peer system with a cluster of equal
nodes, resembling DynamoDB and Google Bigtable. Each node stores specific data, and related
nodes form a data center, while a cluster comprises multiple data centers. Cassandra's
architecture allows easy expansion by adding more nodes, doubling data capacity, and
dynamically scaling both ways.
This scalability contrasts with the complexity of increasing data capacity in traditional
SQL databases. Additionally, Cassandra's architecture enhances data security and safeguards
against data loss.

2. The partitioning system


In Cassandra, data storage and retrieval rely on a partitioning system. A partitioner
determines the primary location for storing a data set, working with nodal tokens in a direct
format. Each node owns a set of tokens based on a partition key, and the partition key determines
the data storage location.
Upon entering the cluster, data undergoes a hash function with the partition key. The
coordinator node, the one a client connects to for a request, is responsible for sending the data to
the node with the matching token under that partition.

3. Cassandra’s replicability
Cassandra ensures data reliability through replication across nodes, with secondary nodes
known as replica nodes. The number of replica nodes is determined by the replication factor
(RF), where a factor of 3, for example, means three nodes store the same data.
This redundancy contributes to Cassandra's reliability, as even if one node fails
temporarily or permanently, other nodes retain the same data, minimizing the risk of data loss.
When a temporarily disrupted node recovers, it receives updates on missed data actions and
catches up to resume normal functioning.
Lecture Notes Template

22XX405 - DATABASE MANAGEMENT


SYSTEMS
UNIT 5 & LP 2 –NewSQL databases: Redis
and NuoDB -Selection of NoSQL or NewSQL over RDBMS

5.1 NEWSQL DATABASES


NewSQL is a modern relational database system that bridges the gap between
SQL and NoSQL. NewSQL databases aim to scale and stay consistent.

NoSQL databases scale while standard SQL databases are consistent. NewSQL
attempts to produce both features and find a middle ground. As a result, the database type
solves the problems in big data fields.

5.1.1 What is NewSQL?


NewSQL is a unique database system that combines ACID compliance with
horizontal scaling. The database system strives to keep the best of both worlds.
OLTP-based transactions and the high performance of NoSQL combine in a single
solution.
Enterprises expect high-quality data integrity on large data volumes. When either
becomes a problem, an enterprise chooses to:
● Improve hardware, or
● Create custom software for distributed databases

5.2 NEWSQL DATABASE FEATURES


The main features of NewSQL databases are:

● In-memory storage and data processing supply fast query results.


● Partitioning scales the database into units. Queries execute on many shards and
combine into a single result.
● ACID properties preserve the features of RDBMS.
● Secondary indexing results in faster query processing and information retrieval.
● High availability due to the database replication mechanism.
● A built-in crash recovery mechanism delivers fault tolerance and minimizes
downtime.

5.3 BEST NEWSQL DATABASES


Below is a run-through of the best NewSQL databases currently on the market.
The list is not extensive, so research further if you plan to use one of the databases.

5.3.1 VoltDB
VoltDB works well with high-speed transactional applications. The database
performs in-memory processing on a distributed architecture. The software is available as
both open source and proprietary.
Key features:
○ Real-time decision-making.
○ Support for Kafka import and export.
○ Disaster recovery through database replication.
○ Hadoop and OLAP export integration.
5.3.2 CockroachDB
CockroachDB is a scalable and robust database. The database offers strong data
consistency and works well with low-latency resources.
Key features:
● Robust disaster recovery system.
● Historical data view, record, and storage options.
● Built-in cleaning processes for disks and storage devices.
● CockroachDB works in unfavorable conditions.
5.3.3 NuoDB
NuoDB is a geo-distributed database with flexible scaling for various geographic
locations. The database maps data across various points while staying ACID compliant.
Key features:
● High-quality data transformations.
● Always available with online schema evolutions and rolling upgrades.
● Tailored features for data storage and control.
● Full ACID transaction support.
5.3.4 ClustrixDB
ClustrixDB is a self-managing NewSQL database. The software automates
scaling operations and supports high availability.
Key features:
● Efficient data categorization.
● SQL code migration options.
● Built-in health metrics in a browser interface.
● DevOps assistance and query caching.
5.3.4 Altibase
Altibase is an in-memory database with a hybrid architecture. The database
reduces hardware and software costs by combining in-memory data processing with an
on-disk DBMS with a single license. Altibase comes in both community and proprietary
versions.
Key Features
● Memory-optimized engine for increased speeds.
● Custom persistence and performance balance levels.
● Flexible deployment options.
● Real-time access to vital data.

5.4 REDIS DATABASE FEATURES


5.4.1 What is Redis?

Redis is a NoSQL database which follows the principle of key-value store. The
key-value store provides the ability to store some data called a value, inside a key.
You can receive this data later only if you know the exact key used to store it.
Redis is a flexible, open-source (BSD licensed), in-memory data structure store,
used as database, cache, and message broker. Redis is a NoSQL database so it
facilitates users to store huge amounts of data without the limit of a Relational
database.
Redis supports various types of data structures like strings, hashes, lists, sets,
sorted sets, bitmaps, hyperloglogs and geospatial indexes with radius queries.

5.4.2 Redis Architecture


There are two main processes in Redis architecture:
○ Redis Client
○ Redis Server
These client and server can be on the same computer or two different computers.
Redis server is used to store data in memory . It controls all types of management
and forms the main part of the architecture. You can create a Redis client or Redis
console client when you install Redis application or you can use

5.4.3 Features of Redis

Speed: Redis stores the whole dataset in primary memory that's why it is
extremely fast. It loads up to 110,000 SETs/second and 81,000 GETs/second can
be retrieved in an entry level Linux box. Redis supports Pipelining of commands
and facilitates you to use multiple values in a single command to speed up
communication with the client libraries.
Persistence: While all the data lives in memory, changes are asynchronously
saved on disk using flexible policies based on elapsed time and/or number of
updates since last save. Redis supports an append-only file persistence mode.
Check more on Persistence, or read the AppendOnlyFileHowto for more
information.
Data Structures: Redis supports various types of data structures such as strings,
hashes, sets, lists, sorted sets with range queries, bitmaps, hyperloglogs and
geospatial indexes with radius queries.
Atomic Operations: Redis operations working on the different Data Types are
atomic, so it is safe to set or increase a key, add and remove elements from a set,
increase a counter etc.
Supported Languages: Redis supports a lot of languages such as ActionScript,
C, C++, C#, Clojure, Common Lisp, D, Dart, Erlang, Go, Haskell, Haxe, Io, Java,
JavaScript (Node.js), Julia, Lua, Objective-C, Perl, PHP, Pure Data, Python, R,
Racket, Ruby, Rust, Scala, Smalltalk and Tcl.
Master/Slave Replication: Redis follows a very simple and fast Master/Slave
replication. It takes only one line in the configuration file to set it up, and 21
seconds for a Slave to complete the initial sync of a 10 MM key set on an
Amazon EC2 instance.
Sharding: Redis supports sharding. It is very easy to distribute the dataset across
multiple Redis instances, like other key-value store
Portable: Redis is written in ANSI C and works in most POSIX systems like
Linux, BSD, Mac OS X, Solaris, and so on. Redis is reported to compile and work
under WIN32 if compiled with Cygwin, but there is no official support for
Windows currently.
5.4.4 Redis Implementation
Implementing Redis involves setting up and configuring a Redis server,
connecting to it, and using the Redis commands to perform various operations.
Redis is an in-memory data structure store that can be used as a database, cache,
and message broker.
Here is a basic guide to implementing Redis:
1. Install Redis:
You can download and install Redis from the official Redis website.
Follow the installation instructions for your specific operating system.
2. Start the Redis Server:
Once installed, you can start the Redis server by running the following command:
redis-server
This will start the Redis server on the default port (6379).
3. Connect to Redis from a Client:
You can interact with Redis using various programming languages and libraries.
Some popular libraries include redis-py for Python, ioredis for Node.js, and jedis
for Java.
Below is an example using Python's redis-py:
import redis
# Connect to the Redis server
r = redis.StrictRedis(host='localhost', port=6379, decode_responses=True)
# Set a key-value pair
r.set('example_key', 'example_value')
# Retrieve the value
value = r.get('example_key')
print(value)
4. Basic Redis Operations:
Redis supports various data types and operations. Here are some basic examples:
# Strings
r.set('name', 'John')
print(r.get('name'))
# Lists
r.rpush('mylist', 'item1')
r.rpush('mylist', 'item2')
print(r.lrange('mylist', 0, -1))
# Sets
r.sadd('myset', 'item1')
r.sadd('myset', 'item2')
print(r.smembers('myset'))
# Hashes
r.hset('myhash', 'field1', 'value1')
r.hset('myhash', 'field2', 'value2')
print(r.hgetall('myhash'))
5. Closing the Connection:
It's a good practice to close the connection to Redis when you're done:
r.close()

5.4.3 Popular Redis Use Cases

5.4.3.1 Caching
● Redis is a great choice for implementing a highly available in-memory cache to
decrease data access latency, increase throughput, and ease the load off your
relational or NoSQL database and application.
● Redis can serve frequently requested items at sub-millisecond response times, and
enables you to easily scale for higher loads without growing the costlier backend.
● Database query results caching, persistent session caching, web page caching, and
caching of frequently used objects such as images, files, and metadata are all
popular examples of caching with Redis.
5.4.3.2 Chat, messaging, and queues
● Redis supports Pub/Sub with pattern matching and a variety of data structures such as
lists, sorted sets, and hashes.
● This allows Redis to support high performance chat rooms, real-time comment
streams, social media feeds and server intercommunication.
● The Redis List data structure makes it easy to implement a lightweight queue.
● Lists offer atomic operations as well as blocking capabilities, making them suitable
for a variety of applications that require a reliable message broker or a circular list.

5.4.3.3 Gaming leaderboards


● Redis is a popular choice among game developers looking to build real-time
leaderboards.
● Simply use the Redis Sorted Set data structure, which provides uniqueness of
elements while maintaining the list sorted by users' scores.
● Creating a real-time ranked list is as easy as updating a user's score each time it
changes. You can also use Sorted Sets to handle time series data by using timestamps
as the score.
5.4.3.4 Session store

● Redis as an in-memory data store with high availability and persistence is a popular
choice among application developers to store and manage session data for internet-scale
applications.
● Redis provides the sub-millisecond latency, scale, and resiliency required to manage
session data such as user profiles, credentials, session state, and user-specific
personalization.

5.4.3.5 Rich media streaming

● Redis offers a fast, in-memory data store to power live streaming use cases.
● Redis can be used to store metadata about users' profiles and viewing histories,
authentication information/tokens for millions of users, and manifest files to enable CDNs
to stream videos to millions of mobile and desktop users at a time.

5.5 DIFFERENCE BETWEEN NOSQL AND NEWSQL

S.No NoSQL NewSQL

NoSQL is a schema-free NewSQL is schema-fixed as well


1.
database. as a schema-free database.

2. It is horizontally scalable. It is horizontally scalable.

It automatically possesses It possesses built-in high


3.
high-availability. availability.

It supports cloud, on-disk, and It fully supports cloud, on-disk,


4.
cache storage. and cache storage.

5. It promotes CAP properties. It promotes ACID properties.

6. Online Transactional Processing is Online Transactional Processing is


not supported. fully supported.

There are moderate security


7. There are low-security concerns.
concerns.

Use Cases: Big Data, Social Use Cases: E-Commerce, Telecom


8.
Network Applications, and IOT. industry, and Gaming.

Examples : DynamoDB, Examples : VoltDB, CockroachDB,


9.
MongoDB, RaveenDB etc. NuoDB etc.

5.6 SELECTION OF NOSQL , NEWSQL OVER TRADITIONAL


RDBMS
22CS405-DATABASE MANAGEMENT SYSTEMS
UNIT 5 & LP3- CAP Theorem and BASE Properties,HeidiSQL
Features and Usecase
CAP Theorem and Applicability in DBMS
Let’s take a detailed look at the three database management system characteristics to which the
CAP theorem refers.

Consistency

Consistency means that all clients see the same data at the same time, no matter which node they
connect to. For this to happen, whenever data is written to one node, it must be instantly
forwarded or replicated to all the other nodes in the system before the write is deemed
‘successful.’

Availability

Availability means that any client making a request for data gets a response, even if one or more
nodes are down. Another way to state this—all working nodes in the database management
system return a valid response for any request, without exception.

Partition tolerance

A partition is a communications break within a database management system—a lost or


temporarily delayed connection between two nodes. Partition tolerance means that the cluster
must continue to work despite any number of communication breakdowns between nodes in the
system.
● CA(Consistency and Availability)- The system prioritizes availability over consistency and can
respond with possibly stale data.

Example databases: Cassandra, CouchDB, Riak, Voldemort.

● AP(Availability and Partition Tolerance)-The system prioritizes availability over consistency


and can respond with possibly stale data. The system can be distributed across multiple nodes
and is designed to operate reliably even in the face of network partitions.

Example databases: Amazon DynamoDB, Google Cloud Spanner.

● CP(Consistency and Partition Tolerance)- The system prioritizes consistency over


availability and responds with the latest updated data.The system can be distributed across
multiple nodes and is designed to operate reliably even in the face of network partitions.
Example databases: Apache HBase, MongoDB, Redis.

It’s important to note that these database systems may have different configurations and
settings that can change their behavior with respect to consistency, availability, and partition
tolerance. Therefore, the exact behavior of a database system may depend on its
configuration and usage.

BASE PROPERTIES
The BASE properties of a database management system are a set of principles that guide the
design and operation of modern databases.
The acronym BASE stands for Basically Available, Soft State, and Eventual Consistency.
Basically Available
This property refers to the fact that the database system should always be available to respond to
user requests, even if it cannot guarantee immediate access to all data. The database may
experience brief periods of unavailability, but it should be designed to minimize downtime and
provide quick recovery from failures.
Soft State
This property refers to the fact that the state of the database can change over time, even without
any explicit user intervention. This can happen due to the effects of background processes,
updates to data, and other factors. The database should be designed to handle this change
gracefully, and ensure that it does not lead to data corruption or loss.
Eventual Consistency
This property refers to the eventual consistency of data in the database, despite changes over
time. In other words, the database should eventually converge to a consistent state, even if it
takes some time for all updates to propagate and be reflected in the data. This is in contrast to the
immediate consistency required by traditional ACID-compliant databases.
Uses of BASE Databases
BASE databases are used in modern, highly-available, and scalable systems that handle large
amounts of data. Examples of such systems include online shopping websites, social media
platforms, and cloud-based services.
Difference between Base Properties and ACID Properties

ACID BASE

ACID (Atomicity, Consistency, Isolation, The BASE properties are a more relaxed
Durability) is a set of properties that version of ACID that trade off some
guarantee the integrity and consistency of consistency guarantees for greater
data in a traditional database. scalability and availability.

The primary difference between the two is while BASE only requires eventual
that ACID requires immediate consistency, consistency.

ACID is better suited to traditional The BASE is more suitable for use in
transactional databases. large-scale, highly-available systems,

HeidiSQL
HeidiSQL is free software that enables you to browse and edit data, create and edit tables, views,
procedures, triggers, and scheduled events from computers running one of the database systems
MariaDB, MySQL, Microsoft SQL, PostgreSQL, and SQLite.
HeidiSQL is a popular open-source database management tool for MySQL, MariaDB, Microsoft
SQL Server, and PostgreSQL. It provides a graphical interface for users to interact with their
databases. Here are some key properties and features of HeidiSQL:

​ 1. Cross-Platform Compatibility:
● HeidiSQL is a cross-platform tool, compatible with Windows, Linux, and macOS.
​ 2. Multi-Database Support:
● It supports various database systems, including MySQL, MariaDB, Microsoft
SQL Server, and PostgreSQL.
​ 3. Intuitive User Interface:
● HeidiSQL offers a user-friendly and intuitive interface, making it easy for both
beginners and experienced database administrators to manage and manipulate
databases.
​ 4. Query and Script Execution:
● Users can execute SQL queries and scripts directly within the application, with
syntax highlighting and code completion features.
​ 5. Database Connection Management:
● HeidiSQL allows users to manage multiple database connections simultaneously,
providing easy navigation between different databases and servers.
​ 6. Table and Data Management:
● Users can create, modify, and delete database tables and manage table data using
the graphical interface.
​ 7. Import and Export Data:
● The tool supports importing and exporting data in various formats, making it
convenient for transferring data between different databases or applications.
​ 8. SSH Tunneling:
● HeidiSQL supports secure connections through SSH tunneling, enhancing the
security of database connections.
​ 9. Transaction Management:
● Users can work with database transactions, ensuring data consistency and
integrity.
​ 10. Stored Procedure and Function Support:
● HeidiSQL allows users to create, modify, and execute stored procedures and
functions within the application.
​ 11. Visual Query Builder:
● It provides a visual query builder, helping users to construct complex SQL queries
without writing the code manually.
​ 12. Database Design Tools:
● HeidiSQL includes tools for designing and managing database structures, such as
creating and altering tables, indexes, and relationships.
​ 13. Customization Options:
● Users can customize the appearance and behavior of HeidiSQL according to their
preferences, including themes and color schemes.
​ 14. Open Source and Free:
● HeidiSQL is an open-source software, and users can use it for free without any
licensing fees.
​ Heidisql use cases:

​ 1.Database Connection and Management:
● Connect to different database servers using various protocols (TCP/IP, SSH
tunneling).
● Manage multiple database connections simultaneously.
● Browse databases, tables, and views.
​ 2.Querying and Editing Data:
● Execute SQL queries and view results.
● Edit and modify table data directly within the application.
● Import and export data in various formats.
​ 3.Database Structure Modification:
● Create and modify database tables, indexes, and views.
● Manage triggers, stored procedures, and functions.
​ 4.User and Privilege Management:
● Manage user accounts and their privileges.
● Grant and revoke permissions for database objects.
​ 5.Data Backup and Restore:
● Perform database backups and restores.
● Schedule automatic backups.
​ 6.SSH Tunneling:
● Connect to databases securely using SSH tunneling.
​ 7.Version Control Integration:
● HeidiSQL supports version control systems like Git and integrates with them.
​ 8.Export and Import:
● Export database structure and data to SQL files.
● Import SQL files to create or update database structures.
​ 9.Stored Procedure and Function Editing:
● Create, edit, and execute stored procedures and functions.
​ 10.Customization and Themes:
● Customize the look and feel of the interface.
● Choose from different themes to personalize your experience.
22CD405-DATABASE MANAGEMENT SYSTEMS
Unit 5&LP4-IN-MEMORY DATABASES AND
CACHING,DATABASE SECURITY AND ENCRYPTION
DATABASE PERFORMANCE TUNING
1.WHAT IS AN IN-MEMORY DATABASE?
An in-memory database is a data storage software that holds all of its data in the memory of
the host. The main difference between a traditional database and an in-memory database
relies upon where the data is stored. Even when compared with solid-state drives (SSD),
random access memory (RAM) is orders of magnitude faster than disk access. Because an
in-memory database uses the latter for storage, access to the data is much faster than with a
traditional database using disk operations.

In-memory databases provide quick access to their content; on the downside, they are at high
risk of losing data in case of a server failure, since the data is not persisted anywhere. If a
server failure or shutdown should occur, everything currently in the memory of that computer
would be lost due to the volatile nature of RAM. It is also worth noting that the cost of
memory is much higher than the cost of hard disks. This is why there is typically much more
hard disk space than memory on modern computers. This factor makes in-memory databases
much more expensive. They are also more at risk of running out of space to store data.

Your decision to use an in-memory database would depend on your use case. In-memory
databases are great for high-volume data access where a data loss would be acceptable. Think
of a large e-commerce website. The information about the products is crucial and should be
kept on a persisted storage, but the information in the shopping cart could potentially be kept
in an in-memory database for quicker access.

1.1 How does an in-memory database work?


An in-memory database works in a very similar way as any other database, but the data is
kept in RAM rather than on a traditional disk. Replacing the disk access with memory
operations highly reduces the latency required to access data.

Using RAM as a storage medium comes with a price. If a server failure occurs, all data will
be lost. As a way to prevent this, replica sets can be created in modern databases such
as MongoDB with a mix of in-memory engines and traditional on-disk storage. This replica
set ensures that some of the members of the cluster are persisting data.
A replica set with both in-memory and traditional storage.

In this scenario, possible with MongoDB Enterprise Advanced, the primary node of the
replica set uses an in-memory storage engine. It has two other nodes, one of which uses an
in-memory storage, the other one using the WiredTiger engine. The secondary node using the
disk storage is configured as a hidden member.

In case of a failure, the secondary in-memory server would become the primary and still
provide quick access to the data. Once the failing server comes back up, it would sync with
the server using the WiredTiger engine, and no data would be lost.

Many in-memory database offerings nowadays offer in-memory performance with


persistence. They typically use a configuration similar to this one.

Similar setups can be done with sharded clusters when using MongoDB Enterprise
Advanced.

1.2 What are the advantages and disadvantages of in-memory databases?


The most obvious advantage of using an in-memory database is the speed to retrieve data
from the database. Without the need of performing disk operations, the latency is reduced
greatly and is more consistent. Because there are no more reasons to limit the number of
reading operations on a disk, different algorithms can be used to search data, increasing the
overall performance of an in-memory database.

It might seem like a great idea to use exclusively in-memory databases to benefit from the
speed gains, but there are some drawbacks. First, the cost of RAM is about 80 times higher
than the price of traditional disk space. This would significantly increase the operating costs
of your infrastructure, whether the servers are hosted in the cloud or on-premises.

Secondly, the lack of data persistence in case of a failure can be an issue in some cases. While
there are ways to mitigate the risks associated with these data losses, those risks might not be
acceptable for your business case.
Finally, because servers typically have much less RAM than disk space, it is not uncommon
to run out of space. In this case, write operations would fail, losing any new data stored.

Using an on-disk database with an NVMe SSD can prove to be a solid alternative to
in-memory databases. These disks offer a data bandwidth similar to RAM, although the
latency is slightly higher.

1.3 Why use an in-memory database?


The main use case for in-memory databases is when real-time data is needed. With its very
low latency, RAM can provide near-instantaneous access to the needed data. Because of the
potential data losses, in-memory databases without a persistence mechanism should not be
used for mission-critical applications.

Any application where the need for speed is more important than the need for durability
could benefit from an in-memory database over a traditional database.

In many cases, the in-memory database can be used only by a small portion of the total
application, while the more critical data is stored in an on-disk database such as MongoDB
Atlas.

1.4 In-memory database examples


In-memory databases can find their place in many different scenarios. Some of the typical use
cases could include:

● IoT data: IoT sensors can provide large amounts of data. An in-memory database
could be used for storing and computing data to later be stored in a traditional
database.
● E-commerce: Some parts of e-commerce applications, such as the shopping cart, can
be stored in an in-memory database for faster retrieval on each page view, while the
product catalog could be stored in a traditional database.
● Gaming: Leaderboards require quick updates and fast reads when millions of players
are accessing a game at the same time. In-memory databases can help to sort the
results more quickly than traditional databases.
● Session management: In stateful web applications, a session is created to keep track of
a user identity and recent actions. Storing this information in an in-memory database
avoids a round trip to the central database with each web request.

2. What is an In-Memory Cache?


An in-memory cache is a data storage layer that sits between applications and databases to
deliver responses with high speeds by storing data from earlier requests or copied directly
from databases. An in-memory cache removes the performance delays when an application
built on a disk-based database must retrieve data from a disk before processing.
Reading data from memory is faster than from the disk. In-memory caching avoids latency and
improves online application performance. Background jobs that took hours or minutes can now
execute in minutes or seconds.

Since cache data is kept separate from the application, the in-memory cache requires data
movement from the cache system to the application and back to the cache for processing. This
often occurs across networks.

3. TOOLS FOR SUPPORTING IN MEMORY DATABASES


AND CACHE

In-memory databases and caches play a crucial role in improving the performance of applications
by storing and retrieving data in the system's main memory rather than relying on slower
disk-based storage. Several tools and technologies support the implementation and management
of in-memory databases and caches. Here are some commonly used tools:

Redis:

Description: Redis is an open-source, in-memory data structure store. It supports various data
structures such as strings, hashes, lists, sets, and more.

Use Cases: Caching, real-time analytics, messaging, leaderboards, and other scenarios where
low-latency data access is critical.

Key Features: Persistence options, built-in replication, support for transactions, and a variety
of data types.
Memcached:

Description: Memcached is a distributed, in-memory caching system that is simple yet


powerful. It stores key-value pairs in memory and is often used to speed up dynamic web
applications.

Use Cases: Caching frequently accessed data, session storage, and database query result
caching.

Key Features: Distributed architecture, simple API, and support for multiple programming
languages.

Hazelcast:

Description: Hazelcast is an open-source, in-memory data grid and computing platform. It


provides distributed data structures and supports caching and computation across a cluster.

Use Cases: Distributed caching, distributed computing, and real-time analytics.

Key Features: Distributed data structures (maps, queues, etc.), distributed computing
capabilities, and support for multiple programming languages.

Apache Ignite:

Description: Apache Ignite is an open-source, distributed in-memory computing platform. It


provides an in-memory data grid, caching, and computation capabilities.

Use Cases: Real-time analytics, distributed caching, and high-performance computing.

Key Features: Distributed in-memory storage, SQL queries, distributed computing, and
integration with various data sources.

Microsoft Azure Redis Cache:

Description: Azure Redis Cache is a fully managed, in-memory caching service provided by
Microsoft Azure. It is built on the open-source Redis.

Use Cases: Caching, session storage, and improving the performance of Azure applications.

Key Features: Fully managed service, support for Redis commands, and integration with
Azure services.

Oracle TimesTen:

Description: Oracle TimesTen is an in-memory relational database management system


(RDBMS). It is designed for high-performance transaction processing.

Use Cases: Real-time data processing, low-latency applications, and caching.

Key Features: Full ACID compliance, in-memory storage, and seamless integration with
Oracle Database.

SAP HANA:
Description: SAP HANA is an in-memory, column-oriented, relational database management
system. It is designed for both analytical and transactional processing.

Use Cases: Real-time analytics, data warehousing, and high-performance transactional


applications.

Key Features: In-memory storage, advanced analytics, and integration with SAP applications.

When choosing a tool for in-memory databases or caches, consider factors such as the
specific use case, scalability requirements, ease of integration, and the programming
languages and frameworks supported. Each tool has its strengths and may be better suited for
particular scenarios.

4. WHAT IS DATABASE SECURITY?


Database security refers to the range of tools, controls, and measures designed to establish
and preserve database confidentiality, integrity, and availability. This article will focus
primarily on confidentiality since it’s the element that’s compromised in most data breaches.

Database security must address and protect the following:


● The data in the database

● The database management system (DBMS)

● Any associated applications

● The physical database server and/or the virtual database server and the underlying
hardware

● The computing and/or network infrastructure used to access the database


Database security is a complex and challenging endeavor that involves all aspects of
information security technologies and practices. It’s also naturally at odds with database
usability. The more accessible and usable the database, the more vulnerable it is to security
threats; the more invulnerable the database is to threats, the more difficult it is to access and
use. (This paradox is sometimes referred to as Anderson’s Rule. (link resides outside IBM)
Why is it important?

By definition, a data breach is a failure to maintain the confidentiality of data in a database.


How much harm a data breach inflicts on your enterprise depends on a number of
consequences or factors:

● Compromised intellectual property: Your intellectual property—trade secrets,


inventions, proprietary practices—may be critical to your ability to maintain a
competitive advantage in your market. If that intellectual property is stolen or
exposed, your competitive advantage may be difficult or impossible to maintain or
recover.
● Damage to brand reputation: Customers or partners may be unwilling to buy your
products or services (or do business with your company) if they don’t feel they can
trust you to protect your data or theirs.

● Business continuity (or lack thereof): Some business cannot continue to operate
until a breach is resolved.

● Fines or penalties for non-compliance: The financial impact for failing to comply
with global regulations such as the Sarbannes-Oxley Act (SAO) or Payment Card
Industry Data Security Standard (PCI DSS), industry-specific data privacy regulations
such as HIPAA, or regional data privacy regulations, such as Europe’s General Data
Protection Regulation (GDPR) can be devastating, with fines in the worst cases
exceeding several million dollars per violation.

● Costs of repairing breaches and notifying customers: In addition to the cost of


communicating a breach to customer, a breached organization must pay for forensic
and investigative activities, crisis management, triage, repair of the affected systems,
and more.

4.1 Common Threats and Challenges


Many software misconfigurations, vulnerabilities, or patterns of carelessness or misuse can
result in breaches. The following are among the most common types or causes of database
security attacks and their causes.
Insider threats
An insider threat is a security threat from any one of three sources with privileged access to
the database:
● A malicious insider who intends to do harm

● A negligent insider who makes errors that make the database vulnerable to attack

● An infiltrator—an outsider who somehow obtains credentials via a scheme such as


phishing or by gaining access to the credential database itself
Insider threats are among the most common causes of database security breaches and are
often the result of allowing too many employees to hold privileged user access credentials.
Human error

Accidents, weak passwords, password sharing, and other unwise or uninformed user
behaviors continue to be the cause of nearly half (49%) of all reported data breaches.

Exploitation of database software vulnerabilities

Hackers make their living by finding and targeting vulnerabilities in all kinds of software,
including database management software. All major commercial database software vendors
and open source database management platforms issue regular security patches to address
these vulnerabilities, but failure to apply these patches in a timely fashion can increase your
exposure.

SQL/NoSQL injection attacks

A database-specific threat, these involve the insertion of arbitrary SQL or non-SQL attack
strings into database queries served by web applications or HTTP headers. Organizations that
don’t follow secure web application coding practices and perform regular vulnerability
testing are open to these attacks.

Buffer overflow exploitations

Buffer overflow occurs when a process attempts to write more data to a fixed-length block of
memory than it is allowed to hold. Attackers may use the excess data, stored in adjacent
memory addresses, as a foundation from which to launch attacks.

Malware

Malware is software written specifically to exploit vulnerabilities or otherwise cause damage


to the database. Malware may arrive via any endpoint device connecting to the database’s
network.

Attacks on backups
Organizations that fail to protect backup data with the same stringent controls used to protect
the database itself can be vulnerable to attacks on backups.
These threats are exacerbated by the following:
● Growing data volumes: Data capture, storage, and processing continues to grow
exponentially across nearly all organizations. Any data security tools or practices need
to be highly scalable to meet near and distant future needs.

● Infrastructure sprawl: Network environments are becoming increasingly complex,


particularly as businesses move workloads to multicloud or hybrid
cloud architectures, making the choice, deployment, and management of security
solutions ever more challenging.

● Increasingly stringent regulatory requirements: The worldwide regulatory


compliance landscape continues to grow in complexity, making adhering to all
mandates more difficult.

● Cybersecurity skills shortage: Experts predict there may be as many as 8 million


unfilled cybersecurity positions by 2022.
Denial of service (DoS/DDoS) attacks

In a denial of service (DoS) attack, the attacker deluges the target server—in this case the
database server—with so many requests that the server can no longer fulfill legitimate
requests from actual users, and, in many cases, the server becomes unstable or crashes.

In a distributed denial of service attack (DDoS), the deluge comes from multiple servers,
making it more difficult to stop the attack. See our video “What is a DDoS Attack”(3:51) for
more information:

Best practices

Because databases are nearly always network-accessible, any security threat to any
component within or portion of the network infrastructure is also a threat to the database, and
any attack impacting a user’s device or workstation can threaten the database. Thus, database
security must extend far beyond the confines of the database alone.

When evaluating database security in your environment to decide on your team’s top
priorities, consider each of the following areas:

● Physical security: Whether your database server is on-premise or in a cloud data


center, it must be located within a secure, climate-controlled environment. (If your
database server is in a cloud data center, your cloud provider will take care of this for
you.)

● Administrative and network access controls: The practical minimum number of


users should have access to the database, and their permissions should be restricted to
the minimum levels necessary for them to do their jobs. Likewise, network access
should be limited to the minimum level of permissions necessary.

● End user account/device security: Always be aware of who is accessing the


database and when and how the data is being used. Data monitoring solutions can
alert you if data activities are unusual or appear risky. All user devices connecting to
the network housing the database should be physically secure (in the hands of the
right user only) and subject to security controls at all times.

● Encryption: ALL data—including data in the database, and credential data—should


be protected with best-in-class encryption while at rest and in transit. All encryption
keys should be handled in accordance with best-practice guidelines.

● Database software security: Always use the latest version of your database
management software, and apply all patches as soon as they are issued.
● Application/web server security: Any application or web server that interacts with
the database can be a channel for attack and should be subject to ongoing security
testing and best practice management.

● Backup security: All backups, copies, or images of the database must be subject to
the same (or equally stringent) security controls as the database itself.

● Auditing: Record all logins to the database server and operating system, and log all
operations performed on sensitive data as well. Database security standard audits
should be performed regularly.

Controls and policies

In addition to implementing layered security controls across your entire network


environment, database security requires you to establish the correct controls and policies for
access to the database itself. These include:

● Administrative controls to govern installation, change, and configuration


management for the database.

● Preventative controls to govern access, encryption, tokenization, and masking.

● Detective controls to monitor database activity monitoring and data loss prevention
tools. These solutions make it possible to identify and alert on anomalous or
suspicious activities.

Database security policies should be integrated with and support your overall business goals,
such as protection of critical intellectual property and your cybersecurity policies and cloud
security policies. Ensure you have designated responsibility for maintaining and auditing
security controls within your organization and that your policies complement those of your
cloud provider in shared responsibility agreements. Security controls, security awareness
training and education programs, and penetration testing and vulnerability assessment
strategies should all be established in support of your formal security policies.
Data protection tools and platforms

Today, a wide array of vendors offer data protection tools and platforms. A full-scale solution
should include all of the following capabilities:

● Discovery: Look for a tool that can scan for and classify vulnerabilities across all
your databases—whether they’re hosted in the cloud or on-premise—and offer
recommendations for remediating any vulnerabilities identified. Discovery
capabilities are often required to conform to regulatory compliance mandates.

● Data activity monitoring: The solution should be able to monitor and audit all data
activities across all databases, regardless of whether your deployment is on-premise,
in the cloud, or in a container. It should alert you to suspicious activities in real-time
so that you can respond to threats more quickly. You’ll also want a solution that can
enforce rules, policies, and separation of duties and that offers visibility into the status
of your data through a comprehensive and unified user interface. Make sure that any
solution you choose can generate the reports you’ll need to meet compliance
requirements.

● Encryption and tokenization capabilities: In case of a breach, encryption offers a


final line of defense against compromise. Any tool you choose should include flexible
encryption capabilities that can safeguard data in on-premise, cloud, hybrid, or
multicloud environments. Look for a tool with file, volume, and application
encryption capabilities that conform to your industry’s compliance requirements,
which may demand tokenization (data masking) or advanced security key
management capabilities.

● Data security optimization and risk analysis: A tool that can generate contextual
insights by combining data security information with advanced analytics will enable
you to accomplish optimization, risk analysis, and reporting with ease. Choose a
solution that can retain and synthesize large quantities of historical and recent data
about the status and security of your databases, and look for one that offers data
exploration, auditing, and reporting capabilities through a comprehensive but
user-friendly self-service dashboard.

5.WHAT IS DATABASE PERFORMANCE TUNING?


Database Performance Tuning refers to the process of optimizing database systems for
improved performance and efficiency. It involves making adjustments in the database
configuration, query design, indexing, and other system parameters to achieve maximum
performance. By fine-tuning databases, businesses can ensure a faster response time for data
retrieval and analytics, contributing to better decision making and enhanced user experiences.

5.1 Functionality and Features


Database Performance Tuning involves various techniques and methodologies aimed at
improving database performance, such as:

● Query optimization: Rewriting SQL queries for better execution plans


● Index management: Creating, modifying, and deleting indexes to optimize data
access
● Resource allocation: Assigning memory, CPU, and disk resources for optimal
performance
● Database design: Designing database schemas that allow efficient data
processing
● Data partitioning: Dividing large tables into smaller, more manageable pieces for
improved query performance
● Caching: Storing frequently accessed data in memory for faster retrieval

5.2 Benefits and Use Cases


Database Performance Tuning offers several advantages to businesses, including:

● Increased query efficiency: Faster data retrieval allows users to run complex
queries without experiencing delays
● Reduced resource consumption: Optimized databases require fewer computing
resources, leading to cost savings
● Improved scalability: Tuned databases can handle more concurrent users and
larger data volumes
● Better user experience: Prompt data access enhances user satisfaction and
productivity
● Higher return on investment: Efficient databases make the best use of hardware
and software investments

5.3 Challenges and Limitations


Despite the benefits, Database Performance Tuning does have limitations:

● Time-consuming: Rigorous tuning efforts can require significant time and


expertise
● Diminishing returns: Ongoing tuning might not yield substantial performance
improvements
● Complexity: Adjusting multiple configurations can create complications in
maintaining the database system

5.4 Database Performance Tuning Techniques:

1. Make minor adjustments to queries

There are plenty of ways a query can be tuned. However, one of the most vital is to
take your time with the process. You can start with basic arrangements and then work
your way up. You can adjust indexes or place a simple skip list around the query.

You can also run a database command and look for high-selectivity queries. Then,
make changes to these queries by placing indexes and changing your query plan.
Making these simple changes can massively affect performance. Check out the
practical ways to maximize your website for lead generation.

2. Statistics should be up to date

● Statistics can be used effectively to generate the right execution plans. This is for
performance tuning.
● There are plenty of great performance tuning tools. However, if you’re using outdated
statistics, these plans couldn’t be optimized to meet present issues.

3. Don’t use the leading wildcards

● Wildcard parameters can force a full table scan even if indexed fields are inside given
tables.
● If database engines scan every row in a table to look for a specific entry, then the
delivery speed can significantly decrease. Parallel queries may also suffer from this
scanning of the whole data in the memory.
● This may result in optimum CPU utilization and not letting other queries run in the
memory.

4. Use Constraints

● Constraints are one of the most effective ways to speed up queries since it allows SQL
optimizer to develop a better execution plan. However, the enhanced performance
also comes at the cost of the data since it needs more memory.
● Depending on your objectives, the enhanced query speed will be well worth it, but
being aware of the price is vital.

5. Increase memory

● In data from Statista, 32% of B2B businesses expect to boost their spending on
database management.
● One way that DBAs might handle SQL performance tuning issues is to enhance the
memory allocation of the current databases. SQL databases have plenty of memory,
improve efficiency, and performance often follows.

6. Overhaul CPU

● If you’re regularly experiencing database and operation strain, you might want to
invest in a more robust CPU.
● Businesses that often use obsolete hardware and other systems experience database
performance issues that can affect their ROI.

7. Improve indexes

● Aside from queries, another essential element of a database is the index. If done right,
indexing enhances database performance and optimizes the query execution duration.
● In the same way, it helps build a data structure that lets you optimize your data and
make it easy to find information. Because it’s a lot easier to find data, indexing boosts
the efficiency of data retrieval and speeds up the whole process. This saves you and
the system a lot of time and effort.

8. Defragment data
● One of the best approaches to improving database performance is data
defragmentation. Over time, so much data is constantly being written and deleted in
the database.
● As a result, data can become defragmented. It slows down the data retrieval process
since it affects the query’s ability to find the information it’s searching for.
● When you defragment data, you allow that relevant data to be grouped together, and
then you erase index page issues. As a result, I/O-related operations will run so much
quicker.

9. Review the actual execution plan, not just the estimated plan

● The estimated execution plan can be helpful if you’re writing queries since it
previews how the plan will run. However, it could be blind to parameter data types
which would be wrong.
● If you want to get the best results in performance tuning, it helps that you review the
actual execution plan first. This is because the existing plan will use accurate
statistics.

10. Adjust queries, make one small change at a time

● Many too many changes at once can be detrimental in the long run. A more efficient
approach is to make changes with the most expensive operations and then work from
there.

11. Review access

● Once you know that your database hardware works well, the next thing that you need
to do is to review your database access. It includes which applications are accessing
your database.
● Suppose one of the services and applications suffers from poor database performance.
In that case, you mustn’t immediately jump to conclusions about the service or
application responsible for it.
● It’s also possible that a single client might be experiencing poor performance.
However, there’s also a possibility that the entire database is having issues. Thus, you
need to dig into who has access to the database and whether or not it’s just a single
service having a problem.
● That’s where the right database management support comes in. You can then drill
down into its metrics so that you know the root cause.

?
Bring your users closer to the data with organization-wide self-service
analytics and lakehouse flexibility, scalability, and performance at a fraction of the cost. Run
Dremio anywhere with self-managed software or Dremio Cloud.

You might also like