db 5
db 5
1. NOSQL VS NEWSQL:
FEATURES NO SQL NEW SQL
It is both schema-fix and
Schema It is schema-free.
schema-free.
Base Properties/Theorem It follows the CAP theorem. It takes care of ACID properties.
Security It is less secure. It is moderately secure.
Databases Distributed database. Distributed database.
It does not support old SQL but It supports SQL with improved
Query Language
supports UQL. functions and features.
It is both vertically and
Scalability It is only vertically scalable.
horizontally scalable.
Relational database but not
Types of database Non-relational database.
purely.
Online transaction processing
Online processing Online analytical processing.
with full functionality.
Complex queries can be directed Highly efficient for complex
Query Handling
better than SQL. queries.
Example MongoDB Cockroach DB.
2. NOSQL DATABASES:
NoSQL refers to nonrelational databases that store data differently from relational tables.
They can be queried using various languages, making them "not only SQL" databases.
Developers prefer them for agile development, as they easily adapt to changing requirements.
NoSQL databases store data in ways closer to how applications use it, minimizing
transformations. They leverage the cloud for zero downtime.
Scalability
Instead of scaling up by adding more servers, NoSQL databases can scale out by using
commodity hardware. This has the ability to support increased traffic in order to meet demand
with zero downtime. By scaling out, NoSQL databases can become larger and more powerful,
which is why they have become the preferred option for evolving data sets.
High performance
The scale-out architecture of a NoSQL database is particularly beneficial for handling
increased data volume and traffic. This architecture, illustrated in the graphic, ensures fast and
predictable single-digit millisecond response times. NoSQL databases excel in ingesting and
delivering data quickly and reliably, making them suitable for applications collecting terabytes of
data daily while maintaining a highly interactive user experience. The graphic depicts an
incoming rate of 300 reads per second (blue line) with a 95th latency in the 3-4ms range, and an
incoming rate of 150 writes per second (green line) with a 95th latency in the 4-5ms range.
Availability
NoSQL databases automatically replicate data across multiple servers, data centers, or
cloud resources. In turn, this minimizes latency for users, no matter where they’re located. This
feature also works to reduce the burden of database management, which frees up time to focus
on other priorities.
Highly Functional
NoSQL databases are designed for distributed data stores that have extremely large data
storage needs. This is what makes NoSQL the ideal choice for big data, real-time web apps,
customer 360, online shopping, online gaming, Internet of things, social networks, and online
advertising applications.
Document
Also referred to as document store or document-oriented databases, these databases are
used for storing, retrieving, and managing semi-structured data. There is no need to specify
which fields a document will contain.
Graph
This database organizes data as nodes and relationships, which show the connections
between nodes. This supports a richer and fuller representation of data. Graph databases are
applied in social networks, reservation systems, and fraud detection.
Wide column
These databases store and manage data in the form of tables, rows, and columns. They
are broadly deployed in applications that require a column format to capture schema-free data.
4.1 WORKING:
Now, we will see what actually happens behind the scenes. As we know that MongoDB
is a database server and the data is stored in these databases. Or in other words, the MongoDB
environment gives you a server that you can start and then create multiple databases on it using
MongoDB. Because of its NoSQL database, the data is stored in the collections and documents.
Hence the database, collection, and documents are related to each other as shown below:
● The MongoDB database contains collections just like the MYSQL database contains
tables. You are allowed to create multiple databases and multiple collections.
● Now inside of the collection we have documents. These documents contain the data we
want to store in the MongoDB database and a single collection can contain multiple
documents and you are schema-less means it is not necessary that one document is
similar to another.
● The documents are created using the fields. Fields are key-value pairs in the documents,
it is just like columns in the relational database. The value of the fields can be of any
BSON data types like double, string, boolean, etc.
● The data stored in MongoDB is in the format of BSON documents. Here, BSON stands
for Binary representation of JSON documents. Or in other words, in the backend, the
MongoDB server converts the JSON data into a binary form that is known as BSON and
this BSON is stored and queried more efficiently.
● In MongoDB documents, you are allowed to store nested data. This nesting of data
allows you to create complex relations between data and store them in the same
document which makes the working and fetching of data extremely efficient as compared
to SQL. In SQL, you need to write complex joins to get the data from table 1 and table 2.
The maximum size of the BSON document is 16MB. In the MongoDB server, you are
allowed to run multiple databases.
MongoDB RDBMS
It is suitable for hierarchical data storage. It is not suitable for hierarchical data storage.
Document Oriented: In MongoDB, all the data is stored in the documents instead of tables like
in RDBMS. In these documents, the data is stored in fields(key-value pair) instead of rows and
columns which make the data much more flexible in comparison to RDBMS. And each
document contains its unique object id.
Indexing: In MongoDB database, every field in the documents is indexed with primary and
secondary indices; this makes it easier and takes less time to get or search data from the pool of
the data. If the data is not indexed, then the database searches each document with the specified
query which takes lots of time and is not so efficient.
Scalability: MongoDB provides horizontal scalability with the help of sharding. Sharding means
to distribute data on multiple servers, here a large amount of data is partitioned into data chunks
using the shard key, and these data chunks are evenly distributed across shards that reside across
many physical servers. It will also add new machines to a running database.
Replication: MongoDB provides high availability and redundancy with the help of replication, it
creates multiple copies of the data and sends these copies to a different server so that if one
server fails, then the data is retrieved from another server.
Aggregation: It allows to perform operations on the grouped data and get a single result or
computed result. It is similar to the SQL GROUPBY clause. It provides three different
aggregations i.e, aggregation pipeline, map-reduce function, and single-purpose aggregation
methods
High Performance: The performance of MongoDB is very high and data persistence as
compared to another database due to its features like scalability, indexing, replication, etc.
Supports replication & Multi data center replication: Replication factor comes with the best
configurations in cassandra. Cassandra is designed to have a distributed system, for the
deployment of a large number of nodes across multiple data centers and other key features too.
Scalability: It is designed to r/w throughput, Increase gradually as new machines are added
without interrupting other applications.
Fault-tolerance: Data is automatically stored & replicated for fault-tolerance. If a node Fails,
then it is replaced within no time.
MapReduce Support: It supports Hadoop integration with MapReduce support.Apache Hive &
Apache Pig is also supported.
Query Language: Cassandra has introduced the CQL (Cassandra Query Language). Its a simple
interface for accessing Cassandra.
6.2 WORKING:
Cassandra is an open-source NoSQL distributed database managed by the Apache
non-profit organization, emphasizing high availability and reliability through a distributed
architecture. The article highlights Cassandra's features and identifies six key applications in
enterprise use cases. Given the modern abundance of data, effective storage systems like
databases are essential for processing and referencing information. Database Management
Systems (DBMS), crucial for managing databases, interact with databases and other software for
dataset analysis.
Cassandra, as an open-source NoSQL distributed database, excels in managing large data
volumes across commodity servers, providing high availability without a single point of failure.
It is designed for decentralized, scalable storage and operates in multiple cloud data centers.
Understanding Cassandra involves exploring its architecture components, partitioning system,
and replicability..
1. Architecture of Cassandra
Cassandra's primary architecture consists of a peer-to-peer system with a cluster of equal
nodes, resembling DynamoDB and Google Bigtable. Each node stores specific data, and related
nodes form a data center, while a cluster comprises multiple data centers. Cassandra's
architecture allows easy expansion by adding more nodes, doubling data capacity, and
dynamically scaling both ways.
This scalability contrasts with the complexity of increasing data capacity in traditional
SQL databases. Additionally, Cassandra's architecture enhances data security and safeguards
against data loss.
3. Cassandra’s replicability
Cassandra ensures data reliability through replication across nodes, with secondary nodes
known as replica nodes. The number of replica nodes is determined by the replication factor
(RF), where a factor of 3, for example, means three nodes store the same data.
This redundancy contributes to Cassandra's reliability, as even if one node fails
temporarily or permanently, other nodes retain the same data, minimizing the risk of data loss.
When a temporarily disrupted node recovers, it receives updates on missed data actions and
catches up to resume normal functioning.
Lecture Notes Template
NoSQL databases scale while standard SQL databases are consistent. NewSQL
attempts to produce both features and find a middle ground. As a result, the database type
solves the problems in big data fields.
5.3.1 VoltDB
VoltDB works well with high-speed transactional applications. The database
performs in-memory processing on a distributed architecture. The software is available as
both open source and proprietary.
Key features:
○ Real-time decision-making.
○ Support for Kafka import and export.
○ Disaster recovery through database replication.
○ Hadoop and OLAP export integration.
5.3.2 CockroachDB
CockroachDB is a scalable and robust database. The database offers strong data
consistency and works well with low-latency resources.
Key features:
● Robust disaster recovery system.
● Historical data view, record, and storage options.
● Built-in cleaning processes for disks and storage devices.
● CockroachDB works in unfavorable conditions.
5.3.3 NuoDB
NuoDB is a geo-distributed database with flexible scaling for various geographic
locations. The database maps data across various points while staying ACID compliant.
Key features:
● High-quality data transformations.
● Always available with online schema evolutions and rolling upgrades.
● Tailored features for data storage and control.
● Full ACID transaction support.
5.3.4 ClustrixDB
ClustrixDB is a self-managing NewSQL database. The software automates
scaling operations and supports high availability.
Key features:
● Efficient data categorization.
● SQL code migration options.
● Built-in health metrics in a browser interface.
● DevOps assistance and query caching.
5.3.4 Altibase
Altibase is an in-memory database with a hybrid architecture. The database
reduces hardware and software costs by combining in-memory data processing with an
on-disk DBMS with a single license. Altibase comes in both community and proprietary
versions.
Key Features
● Memory-optimized engine for increased speeds.
● Custom persistence and performance balance levels.
● Flexible deployment options.
● Real-time access to vital data.
Redis is a NoSQL database which follows the principle of key-value store. The
key-value store provides the ability to store some data called a value, inside a key.
You can receive this data later only if you know the exact key used to store it.
Redis is a flexible, open-source (BSD licensed), in-memory data structure store,
used as database, cache, and message broker. Redis is a NoSQL database so it
facilitates users to store huge amounts of data without the limit of a Relational
database.
Redis supports various types of data structures like strings, hashes, lists, sets,
sorted sets, bitmaps, hyperloglogs and geospatial indexes with radius queries.
Speed: Redis stores the whole dataset in primary memory that's why it is
extremely fast. It loads up to 110,000 SETs/second and 81,000 GETs/second can
be retrieved in an entry level Linux box. Redis supports Pipelining of commands
and facilitates you to use multiple values in a single command to speed up
communication with the client libraries.
Persistence: While all the data lives in memory, changes are asynchronously
saved on disk using flexible policies based on elapsed time and/or number of
updates since last save. Redis supports an append-only file persistence mode.
Check more on Persistence, or read the AppendOnlyFileHowto for more
information.
Data Structures: Redis supports various types of data structures such as strings,
hashes, sets, lists, sorted sets with range queries, bitmaps, hyperloglogs and
geospatial indexes with radius queries.
Atomic Operations: Redis operations working on the different Data Types are
atomic, so it is safe to set or increase a key, add and remove elements from a set,
increase a counter etc.
Supported Languages: Redis supports a lot of languages such as ActionScript,
C, C++, C#, Clojure, Common Lisp, D, Dart, Erlang, Go, Haskell, Haxe, Io, Java,
JavaScript (Node.js), Julia, Lua, Objective-C, Perl, PHP, Pure Data, Python, R,
Racket, Ruby, Rust, Scala, Smalltalk and Tcl.
Master/Slave Replication: Redis follows a very simple and fast Master/Slave
replication. It takes only one line in the configuration file to set it up, and 21
seconds for a Slave to complete the initial sync of a 10 MM key set on an
Amazon EC2 instance.
Sharding: Redis supports sharding. It is very easy to distribute the dataset across
multiple Redis instances, like other key-value store
Portable: Redis is written in ANSI C and works in most POSIX systems like
Linux, BSD, Mac OS X, Solaris, and so on. Redis is reported to compile and work
under WIN32 if compiled with Cygwin, but there is no official support for
Windows currently.
5.4.4 Redis Implementation
Implementing Redis involves setting up and configuring a Redis server,
connecting to it, and using the Redis commands to perform various operations.
Redis is an in-memory data structure store that can be used as a database, cache,
and message broker.
Here is a basic guide to implementing Redis:
1. Install Redis:
You can download and install Redis from the official Redis website.
Follow the installation instructions for your specific operating system.
2. Start the Redis Server:
Once installed, you can start the Redis server by running the following command:
redis-server
This will start the Redis server on the default port (6379).
3. Connect to Redis from a Client:
You can interact with Redis using various programming languages and libraries.
Some popular libraries include redis-py for Python, ioredis for Node.js, and jedis
for Java.
Below is an example using Python's redis-py:
import redis
# Connect to the Redis server
r = redis.StrictRedis(host='localhost', port=6379, decode_responses=True)
# Set a key-value pair
r.set('example_key', 'example_value')
# Retrieve the value
value = r.get('example_key')
print(value)
4. Basic Redis Operations:
Redis supports various data types and operations. Here are some basic examples:
# Strings
r.set('name', 'John')
print(r.get('name'))
# Lists
r.rpush('mylist', 'item1')
r.rpush('mylist', 'item2')
print(r.lrange('mylist', 0, -1))
# Sets
r.sadd('myset', 'item1')
r.sadd('myset', 'item2')
print(r.smembers('myset'))
# Hashes
r.hset('myhash', 'field1', 'value1')
r.hset('myhash', 'field2', 'value2')
print(r.hgetall('myhash'))
5. Closing the Connection:
It's a good practice to close the connection to Redis when you're done:
r.close()
5.4.3.1 Caching
● Redis is a great choice for implementing a highly available in-memory cache to
decrease data access latency, increase throughput, and ease the load off your
relational or NoSQL database and application.
● Redis can serve frequently requested items at sub-millisecond response times, and
enables you to easily scale for higher loads without growing the costlier backend.
● Database query results caching, persistent session caching, web page caching, and
caching of frequently used objects such as images, files, and metadata are all
popular examples of caching with Redis.
5.4.3.2 Chat, messaging, and queues
● Redis supports Pub/Sub with pattern matching and a variety of data structures such as
lists, sorted sets, and hashes.
● This allows Redis to support high performance chat rooms, real-time comment
streams, social media feeds and server intercommunication.
● The Redis List data structure makes it easy to implement a lightweight queue.
● Lists offer atomic operations as well as blocking capabilities, making them suitable
for a variety of applications that require a reliable message broker or a circular list.
● Redis as an in-memory data store with high availability and persistence is a popular
choice among application developers to store and manage session data for internet-scale
applications.
● Redis provides the sub-millisecond latency, scale, and resiliency required to manage
session data such as user profiles, credentials, session state, and user-specific
personalization.
● Redis offers a fast, in-memory data store to power live streaming use cases.
● Redis can be used to store metadata about users' profiles and viewing histories,
authentication information/tokens for millions of users, and manifest files to enable CDNs
to stream videos to millions of mobile and desktop users at a time.
Consistency
Consistency means that all clients see the same data at the same time, no matter which node they
connect to. For this to happen, whenever data is written to one node, it must be instantly
forwarded or replicated to all the other nodes in the system before the write is deemed
‘successful.’
Availability
Availability means that any client making a request for data gets a response, even if one or more
nodes are down. Another way to state this—all working nodes in the database management
system return a valid response for any request, without exception.
Partition tolerance
It’s important to note that these database systems may have different configurations and
settings that can change their behavior with respect to consistency, availability, and partition
tolerance. Therefore, the exact behavior of a database system may depend on its
configuration and usage.
BASE PROPERTIES
The BASE properties of a database management system are a set of principles that guide the
design and operation of modern databases.
The acronym BASE stands for Basically Available, Soft State, and Eventual Consistency.
Basically Available
This property refers to the fact that the database system should always be available to respond to
user requests, even if it cannot guarantee immediate access to all data. The database may
experience brief periods of unavailability, but it should be designed to minimize downtime and
provide quick recovery from failures.
Soft State
This property refers to the fact that the state of the database can change over time, even without
any explicit user intervention. This can happen due to the effects of background processes,
updates to data, and other factors. The database should be designed to handle this change
gracefully, and ensure that it does not lead to data corruption or loss.
Eventual Consistency
This property refers to the eventual consistency of data in the database, despite changes over
time. In other words, the database should eventually converge to a consistent state, even if it
takes some time for all updates to propagate and be reflected in the data. This is in contrast to the
immediate consistency required by traditional ACID-compliant databases.
Uses of BASE Databases
BASE databases are used in modern, highly-available, and scalable systems that handle large
amounts of data. Examples of such systems include online shopping websites, social media
platforms, and cloud-based services.
Difference between Base Properties and ACID Properties
ACID BASE
ACID (Atomicity, Consistency, Isolation, The BASE properties are a more relaxed
Durability) is a set of properties that version of ACID that trade off some
guarantee the integrity and consistency of consistency guarantees for greater
data in a traditional database. scalability and availability.
The primary difference between the two is while BASE only requires eventual
that ACID requires immediate consistency, consistency.
ACID is better suited to traditional The BASE is more suitable for use in
transactional databases. large-scale, highly-available systems,
HeidiSQL
HeidiSQL is free software that enables you to browse and edit data, create and edit tables, views,
procedures, triggers, and scheduled events from computers running one of the database systems
MariaDB, MySQL, Microsoft SQL, PostgreSQL, and SQLite.
HeidiSQL is a popular open-source database management tool for MySQL, MariaDB, Microsoft
SQL Server, and PostgreSQL. It provides a graphical interface for users to interact with their
databases. Here are some key properties and features of HeidiSQL:
1. Cross-Platform Compatibility:
● HeidiSQL is a cross-platform tool, compatible with Windows, Linux, and macOS.
2. Multi-Database Support:
● It supports various database systems, including MySQL, MariaDB, Microsoft
SQL Server, and PostgreSQL.
3. Intuitive User Interface:
● HeidiSQL offers a user-friendly and intuitive interface, making it easy for both
beginners and experienced database administrators to manage and manipulate
databases.
4. Query and Script Execution:
● Users can execute SQL queries and scripts directly within the application, with
syntax highlighting and code completion features.
5. Database Connection Management:
● HeidiSQL allows users to manage multiple database connections simultaneously,
providing easy navigation between different databases and servers.
6. Table and Data Management:
● Users can create, modify, and delete database tables and manage table data using
the graphical interface.
7. Import and Export Data:
● The tool supports importing and exporting data in various formats, making it
convenient for transferring data between different databases or applications.
8. SSH Tunneling:
● HeidiSQL supports secure connections through SSH tunneling, enhancing the
security of database connections.
9. Transaction Management:
● Users can work with database transactions, ensuring data consistency and
integrity.
10. Stored Procedure and Function Support:
● HeidiSQL allows users to create, modify, and execute stored procedures and
functions within the application.
11. Visual Query Builder:
● It provides a visual query builder, helping users to construct complex SQL queries
without writing the code manually.
12. Database Design Tools:
● HeidiSQL includes tools for designing and managing database structures, such as
creating and altering tables, indexes, and relationships.
13. Customization Options:
● Users can customize the appearance and behavior of HeidiSQL according to their
preferences, including themes and color schemes.
14. Open Source and Free:
● HeidiSQL is an open-source software, and users can use it for free without any
licensing fees.
Heidisql use cases:
1.Database Connection and Management:
● Connect to different database servers using various protocols (TCP/IP, SSH
tunneling).
● Manage multiple database connections simultaneously.
● Browse databases, tables, and views.
2.Querying and Editing Data:
● Execute SQL queries and view results.
● Edit and modify table data directly within the application.
● Import and export data in various formats.
3.Database Structure Modification:
● Create and modify database tables, indexes, and views.
● Manage triggers, stored procedures, and functions.
4.User and Privilege Management:
● Manage user accounts and their privileges.
● Grant and revoke permissions for database objects.
5.Data Backup and Restore:
● Perform database backups and restores.
● Schedule automatic backups.
6.SSH Tunneling:
● Connect to databases securely using SSH tunneling.
7.Version Control Integration:
● HeidiSQL supports version control systems like Git and integrates with them.
8.Export and Import:
● Export database structure and data to SQL files.
● Import SQL files to create or update database structures.
9.Stored Procedure and Function Editing:
● Create, edit, and execute stored procedures and functions.
10.Customization and Themes:
● Customize the look and feel of the interface.
● Choose from different themes to personalize your experience.
22CD405-DATABASE MANAGEMENT SYSTEMS
Unit 5&LP4-IN-MEMORY DATABASES AND
CACHING,DATABASE SECURITY AND ENCRYPTION
DATABASE PERFORMANCE TUNING
1.WHAT IS AN IN-MEMORY DATABASE?
An in-memory database is a data storage software that holds all of its data in the memory of
the host. The main difference between a traditional database and an in-memory database
relies upon where the data is stored. Even when compared with solid-state drives (SSD),
random access memory (RAM) is orders of magnitude faster than disk access. Because an
in-memory database uses the latter for storage, access to the data is much faster than with a
traditional database using disk operations.
In-memory databases provide quick access to their content; on the downside, they are at high
risk of losing data in case of a server failure, since the data is not persisted anywhere. If a
server failure or shutdown should occur, everything currently in the memory of that computer
would be lost due to the volatile nature of RAM. It is also worth noting that the cost of
memory is much higher than the cost of hard disks. This is why there is typically much more
hard disk space than memory on modern computers. This factor makes in-memory databases
much more expensive. They are also more at risk of running out of space to store data.
Your decision to use an in-memory database would depend on your use case. In-memory
databases are great for high-volume data access where a data loss would be acceptable. Think
of a large e-commerce website. The information about the products is crucial and should be
kept on a persisted storage, but the information in the shopping cart could potentially be kept
in an in-memory database for quicker access.
Using RAM as a storage medium comes with a price. If a server failure occurs, all data will
be lost. As a way to prevent this, replica sets can be created in modern databases such
as MongoDB with a mix of in-memory engines and traditional on-disk storage. This replica
set ensures that some of the members of the cluster are persisting data.
A replica set with both in-memory and traditional storage.
In this scenario, possible with MongoDB Enterprise Advanced, the primary node of the
replica set uses an in-memory storage engine. It has two other nodes, one of which uses an
in-memory storage, the other one using the WiredTiger engine. The secondary node using the
disk storage is configured as a hidden member.
In case of a failure, the secondary in-memory server would become the primary and still
provide quick access to the data. Once the failing server comes back up, it would sync with
the server using the WiredTiger engine, and no data would be lost.
Similar setups can be done with sharded clusters when using MongoDB Enterprise
Advanced.
It might seem like a great idea to use exclusively in-memory databases to benefit from the
speed gains, but there are some drawbacks. First, the cost of RAM is about 80 times higher
than the price of traditional disk space. This would significantly increase the operating costs
of your infrastructure, whether the servers are hosted in the cloud or on-premises.
Secondly, the lack of data persistence in case of a failure can be an issue in some cases. While
there are ways to mitigate the risks associated with these data losses, those risks might not be
acceptable for your business case.
Finally, because servers typically have much less RAM than disk space, it is not uncommon
to run out of space. In this case, write operations would fail, losing any new data stored.
Using an on-disk database with an NVMe SSD can prove to be a solid alternative to
in-memory databases. These disks offer a data bandwidth similar to RAM, although the
latency is slightly higher.
Any application where the need for speed is more important than the need for durability
could benefit from an in-memory database over a traditional database.
In many cases, the in-memory database can be used only by a small portion of the total
application, while the more critical data is stored in an on-disk database such as MongoDB
Atlas.
● IoT data: IoT sensors can provide large amounts of data. An in-memory database
could be used for storing and computing data to later be stored in a traditional
database.
● E-commerce: Some parts of e-commerce applications, such as the shopping cart, can
be stored in an in-memory database for faster retrieval on each page view, while the
product catalog could be stored in a traditional database.
● Gaming: Leaderboards require quick updates and fast reads when millions of players
are accessing a game at the same time. In-memory databases can help to sort the
results more quickly than traditional databases.
● Session management: In stateful web applications, a session is created to keep track of
a user identity and recent actions. Storing this information in an in-memory database
avoids a round trip to the central database with each web request.
Since cache data is kept separate from the application, the in-memory cache requires data
movement from the cache system to the application and back to the cache for processing. This
often occurs across networks.
In-memory databases and caches play a crucial role in improving the performance of applications
by storing and retrieving data in the system's main memory rather than relying on slower
disk-based storage. Several tools and technologies support the implementation and management
of in-memory databases and caches. Here are some commonly used tools:
Redis:
Description: Redis is an open-source, in-memory data structure store. It supports various data
structures such as strings, hashes, lists, sets, and more.
Use Cases: Caching, real-time analytics, messaging, leaderboards, and other scenarios where
low-latency data access is critical.
Key Features: Persistence options, built-in replication, support for transactions, and a variety
of data types.
Memcached:
Use Cases: Caching frequently accessed data, session storage, and database query result
caching.
Key Features: Distributed architecture, simple API, and support for multiple programming
languages.
Hazelcast:
Key Features: Distributed data structures (maps, queues, etc.), distributed computing
capabilities, and support for multiple programming languages.
Apache Ignite:
Key Features: Distributed in-memory storage, SQL queries, distributed computing, and
integration with various data sources.
Description: Azure Redis Cache is a fully managed, in-memory caching service provided by
Microsoft Azure. It is built on the open-source Redis.
Use Cases: Caching, session storage, and improving the performance of Azure applications.
Key Features: Fully managed service, support for Redis commands, and integration with
Azure services.
Oracle TimesTen:
Key Features: Full ACID compliance, in-memory storage, and seamless integration with
Oracle Database.
SAP HANA:
Description: SAP HANA is an in-memory, column-oriented, relational database management
system. It is designed for both analytical and transactional processing.
Key Features: In-memory storage, advanced analytics, and integration with SAP applications.
When choosing a tool for in-memory databases or caches, consider factors such as the
specific use case, scalability requirements, ease of integration, and the programming
languages and frameworks supported. Each tool has its strengths and may be better suited for
particular scenarios.
● The physical database server and/or the virtual database server and the underlying
hardware
● Business continuity (or lack thereof): Some business cannot continue to operate
until a breach is resolved.
● Fines or penalties for non-compliance: The financial impact for failing to comply
with global regulations such as the Sarbannes-Oxley Act (SAO) or Payment Card
Industry Data Security Standard (PCI DSS), industry-specific data privacy regulations
such as HIPAA, or regional data privacy regulations, such as Europe’s General Data
Protection Regulation (GDPR) can be devastating, with fines in the worst cases
exceeding several million dollars per violation.
● A negligent insider who makes errors that make the database vulnerable to attack
Accidents, weak passwords, password sharing, and other unwise or uninformed user
behaviors continue to be the cause of nearly half (49%) of all reported data breaches.
Hackers make their living by finding and targeting vulnerabilities in all kinds of software,
including database management software. All major commercial database software vendors
and open source database management platforms issue regular security patches to address
these vulnerabilities, but failure to apply these patches in a timely fashion can increase your
exposure.
A database-specific threat, these involve the insertion of arbitrary SQL or non-SQL attack
strings into database queries served by web applications or HTTP headers. Organizations that
don’t follow secure web application coding practices and perform regular vulnerability
testing are open to these attacks.
Buffer overflow occurs when a process attempts to write more data to a fixed-length block of
memory than it is allowed to hold. Attackers may use the excess data, stored in adjacent
memory addresses, as a foundation from which to launch attacks.
Malware
Attacks on backups
Organizations that fail to protect backup data with the same stringent controls used to protect
the database itself can be vulnerable to attacks on backups.
These threats are exacerbated by the following:
● Growing data volumes: Data capture, storage, and processing continues to grow
exponentially across nearly all organizations. Any data security tools or practices need
to be highly scalable to meet near and distant future needs.
In a denial of service (DoS) attack, the attacker deluges the target server—in this case the
database server—with so many requests that the server can no longer fulfill legitimate
requests from actual users, and, in many cases, the server becomes unstable or crashes.
In a distributed denial of service attack (DDoS), the deluge comes from multiple servers,
making it more difficult to stop the attack. See our video “What is a DDoS Attack”(3:51) for
more information:
Best practices
Because databases are nearly always network-accessible, any security threat to any
component within or portion of the network infrastructure is also a threat to the database, and
any attack impacting a user’s device or workstation can threaten the database. Thus, database
security must extend far beyond the confines of the database alone.
When evaluating database security in your environment to decide on your team’s top
priorities, consider each of the following areas:
● Database software security: Always use the latest version of your database
management software, and apply all patches as soon as they are issued.
● Application/web server security: Any application or web server that interacts with
the database can be a channel for attack and should be subject to ongoing security
testing and best practice management.
● Backup security: All backups, copies, or images of the database must be subject to
the same (or equally stringent) security controls as the database itself.
● Auditing: Record all logins to the database server and operating system, and log all
operations performed on sensitive data as well. Database security standard audits
should be performed regularly.
● Detective controls to monitor database activity monitoring and data loss prevention
tools. These solutions make it possible to identify and alert on anomalous or
suspicious activities.
Database security policies should be integrated with and support your overall business goals,
such as protection of critical intellectual property and your cybersecurity policies and cloud
security policies. Ensure you have designated responsibility for maintaining and auditing
security controls within your organization and that your policies complement those of your
cloud provider in shared responsibility agreements. Security controls, security awareness
training and education programs, and penetration testing and vulnerability assessment
strategies should all be established in support of your formal security policies.
Data protection tools and platforms
Today, a wide array of vendors offer data protection tools and platforms. A full-scale solution
should include all of the following capabilities:
● Discovery: Look for a tool that can scan for and classify vulnerabilities across all
your databases—whether they’re hosted in the cloud or on-premise—and offer
recommendations for remediating any vulnerabilities identified. Discovery
capabilities are often required to conform to regulatory compliance mandates.
● Data activity monitoring: The solution should be able to monitor and audit all data
activities across all databases, regardless of whether your deployment is on-premise,
in the cloud, or in a container. It should alert you to suspicious activities in real-time
so that you can respond to threats more quickly. You’ll also want a solution that can
enforce rules, policies, and separation of duties and that offers visibility into the status
of your data through a comprehensive and unified user interface. Make sure that any
solution you choose can generate the reports you’ll need to meet compliance
requirements.
● Data security optimization and risk analysis: A tool that can generate contextual
insights by combining data security information with advanced analytics will enable
you to accomplish optimization, risk analysis, and reporting with ease. Choose a
solution that can retain and synthesize large quantities of historical and recent data
about the status and security of your databases, and look for one that offers data
exploration, auditing, and reporting capabilities through a comprehensive but
user-friendly self-service dashboard.
● Increased query efficiency: Faster data retrieval allows users to run complex
queries without experiencing delays
● Reduced resource consumption: Optimized databases require fewer computing
resources, leading to cost savings
● Improved scalability: Tuned databases can handle more concurrent users and
larger data volumes
● Better user experience: Prompt data access enhances user satisfaction and
productivity
● Higher return on investment: Efficient databases make the best use of hardware
and software investments
There are plenty of ways a query can be tuned. However, one of the most vital is to
take your time with the process. You can start with basic arrangements and then work
your way up. You can adjust indexes or place a simple skip list around the query.
You can also run a database command and look for high-selectivity queries. Then,
make changes to these queries by placing indexes and changing your query plan.
Making these simple changes can massively affect performance. Check out the
practical ways to maximize your website for lead generation.
● Statistics can be used effectively to generate the right execution plans. This is for
performance tuning.
● There are plenty of great performance tuning tools. However, if you’re using outdated
statistics, these plans couldn’t be optimized to meet present issues.
● Wildcard parameters can force a full table scan even if indexed fields are inside given
tables.
● If database engines scan every row in a table to look for a specific entry, then the
delivery speed can significantly decrease. Parallel queries may also suffer from this
scanning of the whole data in the memory.
● This may result in optimum CPU utilization and not letting other queries run in the
memory.
4. Use Constraints
● Constraints are one of the most effective ways to speed up queries since it allows SQL
optimizer to develop a better execution plan. However, the enhanced performance
also comes at the cost of the data since it needs more memory.
● Depending on your objectives, the enhanced query speed will be well worth it, but
being aware of the price is vital.
5. Increase memory
● In data from Statista, 32% of B2B businesses expect to boost their spending on
database management.
● One way that DBAs might handle SQL performance tuning issues is to enhance the
memory allocation of the current databases. SQL databases have plenty of memory,
improve efficiency, and performance often follows.
6. Overhaul CPU
● If you’re regularly experiencing database and operation strain, you might want to
invest in a more robust CPU.
● Businesses that often use obsolete hardware and other systems experience database
performance issues that can affect their ROI.
7. Improve indexes
● Aside from queries, another essential element of a database is the index. If done right,
indexing enhances database performance and optimizes the query execution duration.
● In the same way, it helps build a data structure that lets you optimize your data and
make it easy to find information. Because it’s a lot easier to find data, indexing boosts
the efficiency of data retrieval and speeds up the whole process. This saves you and
the system a lot of time and effort.
8. Defragment data
● One of the best approaches to improving database performance is data
defragmentation. Over time, so much data is constantly being written and deleted in
the database.
● As a result, data can become defragmented. It slows down the data retrieval process
since it affects the query’s ability to find the information it’s searching for.
● When you defragment data, you allow that relevant data to be grouped together, and
then you erase index page issues. As a result, I/O-related operations will run so much
quicker.
9. Review the actual execution plan, not just the estimated plan
● The estimated execution plan can be helpful if you’re writing queries since it
previews how the plan will run. However, it could be blind to parameter data types
which would be wrong.
● If you want to get the best results in performance tuning, it helps that you review the
actual execution plan first. This is because the existing plan will use accurate
statistics.
● Many too many changes at once can be detrimental in the long run. A more efficient
approach is to make changes with the most expensive operations and then work from
there.
● Once you know that your database hardware works well, the next thing that you need
to do is to review your database access. It includes which applications are accessing
your database.
● Suppose one of the services and applications suffers from poor database performance.
In that case, you mustn’t immediately jump to conclusions about the service or
application responsible for it.
● It’s also possible that a single client might be experiencing poor performance.
However, there’s also a possibility that the entire database is having issues. Thus, you
need to dig into who has access to the database and whether or not it’s just a single
service having a problem.
● That’s where the right database management support comes in. You can then drill
down into its metrics so that you know the root cause.
?
Bring your users closer to the data with organization-wide self-service
analytics and lakehouse flexibility, scalability, and performance at a fraction of the cost. Run
Dremio anywhere with self-managed software or Dremio Cloud.