0% found this document useful (0 votes)

68 views6 pages

Notes NoSQL Module 2 Leason 5

Uploaded by

shobhacr24

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

68 views6 pages

Notes NoSQL Module 2 Leason 5

Uploaded by

shobhacr24

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Chapter 5.

Consistency

Relational databases try to exhibit strong consistency by avoiding all the various inconsistencies.

5.1. Update Consistency

We’ll begin by considering updating a telephone number. Coincidentally, Martin and Pramod are looking
at the company website and notice that the phone number is out of date. They both have update
access, so they both go in at the same time to update the number. We’ll assume they update it using a
slightly different format. This issue is called awrite-write conflict: two people updating the same data
item at the same time.
When the writes reach the server, the server will serialize them—decide to apply one, then the other.
Let’s assume it uses alphabetical order and picks Martin’s update first, thenPramod’s. Without any
concurrency control, Martin’s update would be applied and

immediately overwritten by Pramod’s. In this case Martin’s is a lost update. We see this as afailure of
consistency because Pramod’s update was based on the state before Martin’s update, yet was applied
after it.

Approaches for maintaining consistency in the face of concurrency: pessimistic or optimistic

A pessimistic approach works by preventing conflicts from occurring;
An optimistic approach lets conflicts occur, but detects them and takes action tosort them
out.
For update conflicts, the most common pessimistic approach is to have write locks,so that in
order to change a value you need to acquire a lock, and the system ensures that only one client
can get a lock at a time.
Eg. Martin and Pramod would both attempt to acquire the write lock, but only Martin (the first
one) would succeed. Pramod would then see the result of Martin’swriting before deciding
whether to make his own update.
A common optimistic approach is a conditional update where any client that does an update
tests the value just before updating it to see if it’s changed since his last read. In this case,
Martin’s update would succeed but Pramod’s would fail. The error would let Pramod know that
he should look at the value again and decide whetherto attempt a further update.
Both of the approaches work in a single server environment. In multiple server environments like
peer to peer then two nodes might apply the updates in a differentorder, resulting in a different
value for the telephone number on each peer.

Often, another approach for concurrency in distributed systems, sequential consistency—ensuring

that all nodes apply operations in the same order.

This approach is familiar to many programmers from version control systems, particularly
distributed version control systems that by their nature will often haveconflicting commits.
The next step again follows from version control: You have to merge the twoupdates
somehow.
Users may update the merged information or the computer may be able to perform the merge
itself; if it was a phone formatting issue, it may be able to realizethat and apply the new number
with the standard format.
Any automated merge of write-write conflicts is highly domain-specific and needsto be
programmed for each particular case.

Often, when people first encounter these issues, their reaction is to prefer pessimistic concurrency
because they are determined to avoid conflicts. While in some cases this is theright answer, there is
always a tradeoff.
Concurrent programming involves a fundamental tradeoff between safety (avoidingerrors such
as update conflicts) and liveness (responding quickly to clients).
Disadvantage of Pessimistic approach
Pessimistic approaches often severely degrade the responsiveness of a system to the degreethat it
becomes unfit for its purpose. This problem is made worse by the danger of errors— pessimistic
concurrency often leads to deadlocks, which are hard to prevent and debug.

Replication makes it much more likely to run into write-write conflicts. If different nodes have different
copies of some data which can be independently updated, then you’ll get conflicts unless you take
specific measures to avoid them. Using a single node as the targetfor all writes for some data makes it
much easier to maintain update consistency. Of the distribution models we discussed earlier, all but
peer-to-peer replication do this.

5.2. Read Consistency

Having a data store that maintains update consistency is one thing, but it doesn’tguarantee
that readers of that data store will always get consistent responses to their requests.
Let’s imagine we have an order with line items and a shipping charge. The shippingcharge is
calculated based on the line items in the order. If we add a line item, we thus also need to
recalculate and update the shipping charge. In a relational database, the shipping charge and
line items will be in separate tables.
The danger of inconsistency is that Martin adds a line item to his order, Pramod thenreads the
line items and shipping charge, and then Martin updates the shipping charge. This is an
inconsistent read or read-write conflict: In Figure 5.1 Pramod has done a read in the middle of
Martin’s write.

We refer to this type of consistency as logical consistency: ensuring that differentdata

items make sense together.
To avoid a logically inconsistent read-write conflict, relational databases supportthe notion
of transactions.
Providing Martin wraps his two writes in a transaction, the system guarantees that Pramodwill either read
both data items before the update or both after the update.
A common claim we hear is that NoSQL databases don’t support transactions and thus
can’t be consistent. Such a claim is mostly wrong because

Our first clarification is that any statement about lack of transactions usually onlyapplies to
some NoSQL databases, in particular the aggregate-oriented ones. In contrast, graph
databases tend to support ACID transactions just the same as relational databases.
Secondly, aggregate-oriented databases do support atomic updates, but only within a single
aggregate. This means that you will have logical consistency withinan aggregate but not
between aggregates.
Of course not all data can be put in the same aggregate, so any update that affects multiple aggregates leaves open a
time when clients could perform an inconsistent read. The length of time an inconsistency is present is called the
inconsistency window.

Relaxing Consistency

Consistency is a Good Thing—but,it comes with sacrifices.

It is always possible to design a system to avoid inconsistencies, but often impossible

to do so without making unbearable sacrifices in other characteristicsof the system.

As a result, we often have to tradeoff consistency for something else.

While some architects see this as a disaster, we see it as part of the inevitabletradeoffs
involved in system design.

Furthermore, different domains have different tolerances for inconsistency, andwe

need to take this tolerance into account as we make our decisions.

Trading off consistency is a familiar concept even in single-server relational database systems.
Here, our principal tool to enforce consistency is the transaction,and transactions can provide
strong consistency guarantees.

However, transaction systems usually come with the ability to relax isolation
levels, allowing queries to read data that hasn’t been committed yet.

In practice we see most applications relax consistency down from the highestisolation
level (serialized) in order to get effective performance.

We most commonly see people using the read-committed transaction level, whicheliminates
some read-write conflicts but allows others.

Many systems forgo transactions entirely because the performance impact of

transactions is too high.

On a small scale, we saw the popularity of MySQL during the days when it didn’tsupport
transactions. Many websites liked the high speed of MySQL and were prepared to live
without transactions.
At the other end of the scale, some very large websites, such as eBay, have had toforgo
transactions in order to perform acceptably— this is particularly true when you need to introduce
sharding.

Even without these constraints, many application builders need to interact with remote systems
that are outside a transaction boundary, so updating outside oftransactions is a quite common
occurrence for enterprise applications.

The CAP Theorem

In the NoSQL world it’s common to refer to the CAP theorem as the reason why you may need to relax
consistency. It was originally proposed by Eric Brewer in 2000 [Brewer] and given a formal proof by Seth
Gilbert and Nancy Lynch [Lynch and Gilbert] a couple of yearslater.

The basic statement of the CAP theorem is that, given the three properties of
Consistency, Availability, and Partition tolerance, you can only get two.

Obviously, this depends very much on how you define these three properties, and differing
opinions have led to several debates on what the real consequences of the CAP theorem are.

Consistency in database systems refers to the requirement that any given database transaction must
change affected data only in allowed ways. For a database to be consistent, data written to the database
must be valid according to all defined rules.

Consistency does not guarantee correctness of the transaction in all ways an application programmer
might expect (that is the responsibility of application-level code). Instead, consistency merely
guarantees that programming errors cannot result in the violation of any defined database constraints.

Availability has a particular meaning in the context of CAP—it means that if youcan talk to a node in the
cluster, it can read and write data.

Partition tolerance means that the cluster can survive communication breakages inthe cluster that separate the
cluster into multiple partitions unable to communicate with each other.

A single-server system is the obvious example of a CA system—a system that has Consistency and Availability but not
Partition tolerance. A single machine can’t partition, so it does not have to worry about partition tolerance.

It is theoretically possible to have a CA cluster. However, this would mean that if a partition ever occurs in the cluster, all
the nodes in the cluster would go down so that no client can talk to a node. By the usual definition of “available,” this
would mean a lack of availability, but this is where CAP’s special usage of “availability” gets confusing. CAP defines
“availability” to mean “every request received by a nonfailing node in the system must result in a response”.

An example should help illustrate this. Martin and Pramod are both trying to book the last hotel room on a system that
uses peer-to-peer distribution with two nodes (London for Martin and Mumbai for Pramod). If we want to ensure
consistency, then when Martin tries to book his room on the London node, that node must communicate with the
Mumbai node before confirming the booking. Essentially, both nodes must agree on the serialization of their requests.
This gives us consistency—but should the network link break, then neither system can book any hotel room, sacrificing
availability.

One way to improve availability is to designate one node as the master for a particular hotel and ensure all bookings are
processed by that master. Should that master be Mumbai, then Mumbai can still process hotel bookings for that hotel
and Pramod will get the last room. If we use master-slave replication, London users can see the inconsistent room
information but cannot make a booking and thus cause an update inconsistency. However, users expect that it could
happen in this situation—so, again, the compromise works for this particular use case.

Relaxing Durability

So far we’ve talked about consistency, which is most of what people mean when they talk about the ACID properties of
database transactions. The key to Consistency is serializing requests by forming Atomic, Isolated work units. But most
people would scoff at relaxing durability—after all, what is the point of a data store if it can lose updates?

As it turns out, there are cases where you may want to trade off some durability for higher performance. If a database
can run mostly in memory, apply updates to its in-memory representation, and periodically flush changes to disk, then it
may be able to provide substantially higher responsiveness to requests. The cost is that, should the server crash, any
updates since the last flush will be lost.

One example of where this tradeoff may be worthwhile is storing user-session state. A big website may have many users
and keep temporary information about what each user is doing in some kind of session state. There’s a lot of activity on
this state, creating lots of demand, which affects the responsiveness of the website. The vital point is that losing the
session data isn’t too much of a tragedy—it will create some annoyance, but maybe less than a slower website would
cause. This makes it a good candidate for nondurable writes. Often, you can specify the durability needs on a call-by-call
basis, so that more important updates can force a flush to disk. Another example of relaxing durability is capturing
telemetric data from physical devices. It may be that you’d rather capture data at a faster rate, at the cost of missing the
last updates should the server go down.

Another class of durability tradeoffs comes up with replicated data. A failure of replication durability occurs when a
node processes an update but fails before that update is replicated to the other nodes. A simple case of this may happen
if you have a master-slave distribution model where the slaves appoint a new master automatically should the existing
master fail. If a master does fail, any writes not passed onto the replicas will effectively become lost. Should the master
come back online, those updates will conflict with updates that have happened since. We think of this as a durability
problem because you think your update has succeeded since the master acknowledged it, but a master node failure
caused it to be lost.

Quorums

When you’re trading off consistency or durability, it’s not an all or nothing proposition. The more nodes you involve in a
request, the higher is the chance of avoiding an inconsistency. This naturally leads to the question: How many nodes
need to be involved to get strong consistency.

Imagine some data replicated over three nodes. You don’t need all nodes to acknowledge a write to ensure strong
consistency; all you need is two of them—a majority. If you have conflicting writes, only one can get a majority. This is
referred to as a write quorum and expressed in a slightly pretentious inequality of W > N/2, meaning the number of
nodes participating in the write (W) must be more than the half the number of nodes involved in replication (N). The
number of replicas is often called the replication factor.

Similarly to the write quorum, there is the notion of read quorum: How many nodes you need to contact to be sure you
have the most up-to-date change. The read quorum is a bit more complicated because it depends on how many nodes
need to confirm a write.

Let’s consider a replication factor of 3. If all writes need two nodes to confirm (W = 2) then we need to contact at least
two nodes to be sure we’ll get the latest data. If, however, writes are only confirmed by a single node (W = 1) we need
to talk to all three nodes to be sure we have the latest updates. In this case, since we don’t have a write quorum, we
may have an update conflict, but by contacting enough readers we can be sure to detect it. Thus we can get strongly
consistent reads even if we don’t have strong consistency on our writes.
This relationship between the number of nodes you need to contact for a read (R), those confirming a write (W), and the
replication factor (N) can be captured in an inequality: You can have a strongly consistent read if R + W > N.

The number of nodes participating in an operation can vary with the operation. When writing, we might require quorum
for some types of updates but not others, depending on how much we value consistency and availability. Similarly, a
read that needs speed but can tolerate staleness should contact less nodes. Often you may need to take both into
account. If you need fast, strongly consistent reads, you could require writes to be acknowledged by all the nodes, thus
allowing reads to contact only one (N = 3, W = 3, R = 1). That would mean that your writes are slow, since they have to
contact all three nodes, and you would not be able to tolerate losing a node. But in some circumstances that may be the
tradeoff to make.

Big Data Analytics Module 2 Continued....
No ratings yet
Big Data Analytics Module 2 Continued....
56 pages
NoSql Module 2 Part2
No ratings yet
NoSql Module 2 Part2
13 pages
Understanding Consistency in NoSQL Databases
No ratings yet
Understanding Consistency in NoSQL Databases
42 pages
Lecture 27
No ratings yet
Lecture 27
19 pages
Nosql Module 2
100% (1)
Nosql Module 2
87 pages
NoSQL Database Distribution Models
No ratings yet
NoSQL Database Distribution Models
10 pages
DS CH6 - Consistency and Replication
No ratings yet
DS CH6 - Consistency and Replication
18 pages
Module-2 NOSQL
No ratings yet
Module-2 NOSQL
5 pages
Understanding Consistency Models in Big Data
No ratings yet
Understanding Consistency Models in Big Data
8 pages
Consistency Models in Distributed Systems
No ratings yet
Consistency Models in Distributed Systems
15 pages
NoSQL Sharding and Replication Guide
No ratings yet
NoSQL Sharding and Replication Guide
28 pages
Module 2
No ratings yet
Module 2
40 pages
Data Distribution: Sharding vs. Replication
No ratings yet
Data Distribution: Sharding vs. Replication
39 pages
NoSQL Database Internal Assessment 2022
100% (1)
NoSQL Database Internal Assessment 2022
12 pages
FIFO Consistency in Data Replication
No ratings yet
FIFO Consistency in Data Replication
73 pages
Replication and Consistency in Distributed Systems
No ratings yet
Replication and Consistency in Distributed Systems
71 pages
Optimistic Concurrency Control in Databases
No ratings yet
Optimistic Concurrency Control in Databases
14 pages
Deepak and Deepa - Consistency - and - Replication
No ratings yet
Deepak and Deepa - Consistency - and - Replication
38 pages
Consistency Models in Distributed Systems
No ratings yet
Consistency Models in Distributed Systems
41 pages
Consistency and Replication Models Explained
No ratings yet
Consistency and Replication Models Explained
39 pages
Chapter - 7 - Consistency and Replication112
No ratings yet
Chapter - 7 - Consistency and Replication112
30 pages
Consistency
No ratings yet
Consistency
23 pages
Chapter 7-Consistency and Replication
No ratings yet
Chapter 7-Consistency and Replication
73 pages
NoSQL Data Management Techniques
No ratings yet
NoSQL Data Management Techniques
13 pages
Chapter 7 - Consistency and Replication
No ratings yet
Chapter 7 - Consistency and Replication
28 pages
Version Stamps in NoSQL Databases
No ratings yet
Version Stamps in NoSQL Databases
5 pages
Consistency Replication
No ratings yet
Consistency Replication
49 pages
Distributed Systems Consistency Guide
No ratings yet
Distributed Systems Consistency Guide
36 pages
Active vs Passive Replication in Systems
No ratings yet
Active vs Passive Replication in Systems
20 pages
DBMS Transaction Management Explained
No ratings yet
DBMS Transaction Management Explained
7 pages
Replication & Consistency Models
No ratings yet
Replication & Consistency Models
78 pages
Consistency and Replication in Distributed Systems
No ratings yet
Consistency and Replication in Distributed Systems
76 pages
Consistency & Replication in Distributed Systems
No ratings yet
Consistency & Replication in Distributed Systems
57 pages
Strong vs. Eventual Consistency - by Ashish Pratap Singh
No ratings yet
Strong vs. Eventual Consistency - by Ashish Pratap Singh
14 pages
Chap 5
No ratings yet
Chap 5
75 pages
Unit 5 Dbms
No ratings yet
Unit 5 Dbms
19 pages
NoSQL for Data Engineers
No ratings yet
NoSQL for Data Engineers
144 pages
Optimistic Concurrency Control Methods
No ratings yet
Optimistic Concurrency Control Methods
24 pages
Data Replication and Consistency Models
No ratings yet
Data Replication and Consistency Models
30 pages
Understanding Database Transactions and ACID Properties
No ratings yet
Understanding Database Transactions and ACID Properties
67 pages
Client - Centric Consistency Models
0% (1)
Client - Centric Consistency Models
16 pages
CH 15 Updated
No ratings yet
CH 15 Updated
28 pages
3358 Sample Final Part1 Empty
No ratings yet
3358 Sample Final Part1 Empty
9 pages
Dbms Mid-2 Unit-5 Longs Answers
No ratings yet
Dbms Mid-2 Unit-5 Longs Answers
50 pages
Data Replication and Consistency Models
No ratings yet
Data Replication and Consistency Models
46 pages
Consistency
No ratings yet
Consistency
48 pages
CH-07 Replication
No ratings yet
CH-07 Replication
35 pages
Csss - 2012 - 336 - Anna's Archive
No ratings yet
Csss - 2012 - 336 - Anna's Archive
4 pages
Ds Chapter 6
No ratings yet
Ds Chapter 6
23 pages
6.to Study Data Centric and Client Centric Consistency Model
100% (8)
6.to Study Data Centric and Client Centric Consistency Model
6 pages
No SQL
No ratings yet
No SQL
49 pages
Consistency and Replication Models Explained
No ratings yet
Consistency and Replication Models Explained
37 pages
7.distributed Systems-Consistancy Replication
No ratings yet
7.distributed Systems-Consistancy Replication
82 pages
Chapter 7-Consistency and Replication
No ratings yet
Chapter 7-Consistency and Replication
63 pages
Dbms Kcs501 2018-19 Aktusolution
No ratings yet
Dbms Kcs501 2018-19 Aktusolution
20 pages
Relational vs NoSQL Databases Explained
No ratings yet
Relational vs NoSQL Databases Explained
6 pages
Chapter 7 Consistency and Replication
No ratings yet
Chapter 7 Consistency and Replication
43 pages
Transaction
No ratings yet
Transaction
7 pages
Unit 2grp
No ratings yet
Unit 2grp
16 pages
Business Continuity and Disaster Recovery Planning
100% (1)
Business Continuity and Disaster Recovery Planning
25 pages
Veeam Backup & Replication - User Guide For Hyper-V Environments
No ratings yet
Veeam Backup & Replication - User Guide For Hyper-V Environments
249 pages
Hadoop Deployment: OS, Roles, and Best Practices
100% (3)
Hadoop Deployment: OS, Roles, and Best Practices
11 pages
Docu59127 - VNX Operating Environment For Block 05.33.008.5.119 and For File 8.1.8.119, EMC Unisphere 1.3.8.1.0119 Release Notes PDF
No ratings yet
Docu59127 - VNX Operating Environment For Block 05.33.008.5.119 and For File 8.1.8.119, EMC Unisphere 1.3.8.1.0119 Release Notes PDF
91 pages
SANsymphony-V Datasheet
No ratings yet
SANsymphony-V Datasheet
2 pages
Archive Center Storage Platforms 24.1 Release Notes
No ratings yet
Archive Center Storage Platforms 24.1 Release Notes
18 pages
Continuity Patrol - Datasheet
No ratings yet
Continuity Patrol - Datasheet
9 pages
CENTRALA TELEFONICA OpenScape 4000 Systems Components (394-419)
No ratings yet
CENTRALA TELEFONICA OpenScape 4000 Systems Components (394-419)
26 pages
WP Accelerating Splunk at Enterprise Scale Flasharray
No ratings yet
WP Accelerating Splunk at Enterprise Scale Flasharray
27 pages
Zerto Virtual Replication Administration Guide
No ratings yet
Zerto Virtual Replication Administration Guide
269 pages
CVTSP1120-M05-Commvault Disaster Recovery
No ratings yet
CVTSP1120-M05-Commvault Disaster Recovery
19 pages
Building A Replicated Logging System With Apache Kafka
No ratings yet
Building A Replicated Logging System With Apache Kafka
2 pages
Blockchain Consensus Protocols Overview
No ratings yet
Blockchain Consensus Protocols Overview
24 pages
Website: Vce To PDF Converter: Facebook: Twitter:: Hpe0-J57.Vceplus - Premium.Exam.59Q
No ratings yet
Website: Vce To PDF Converter: Facebook: Twitter:: Hpe0-J57.Vceplus - Premium.Exam.59Q
19 pages
Understanding Fault Tolerance in Systems
No ratings yet
Understanding Fault Tolerance in Systems
9 pages
AWS Well-Architected Framework Overview
No ratings yet
AWS Well-Architected Framework Overview
38 pages
Module 3 NOSQL
No ratings yet
Module 3 NOSQL
69 pages
Nutanix Certified Professional - Unified Storage (NCP-US) v6.10 (NCP-US-6.10) Free Practice Test - TestSimulate - Pg5
No ratings yet
Nutanix Certified Professional - Unified Storage (NCP-US) v6.10 (NCP-US-6.10) Free Practice Test - TestSimulate - Pg5
7 pages
1.USCDB System Introduction
No ratings yet
1.USCDB System Introduction
67 pages
h8224 Replication Isilon Synciq WP
No ratings yet
h8224 Replication Isilon Synciq WP
94 pages
DP Ss3 Note First Term
75% (4)
DP Ss3 Note First Term
43 pages
AWS Certified SAP on AWS PAS-C01 Exam Guide
100% (1)
AWS Certified SAP on AWS PAS-C01 Exam Guide
24 pages
SAP C_HANATEC142 Exam Practice Questions
No ratings yet
SAP C_HANATEC142 Exam Practice Questions
10 pages
Comptia It Fundamentals+ (Exam Fc0-U61) : Module 5 / Unit 1 / Security Concerns
No ratings yet
Comptia It Fundamentals+ (Exam Fc0-U61) : Module 5 / Unit 1 / Security Concerns
18 pages
Machine Learning Logistics Final
75% (4)
Machine Learning Logistics Final
89 pages
DATA ANALYTICS USING R Unit 1 To 5
No ratings yet
DATA ANALYTICS USING R Unit 1 To 5
115 pages
SCv3000 Series Spec Sheet DellEMC
No ratings yet
SCv3000 Series Spec Sheet DellEMC
6 pages
MySQL Replication Setup Guide
100% (1)
MySQL Replication Setup Guide
18 pages
Sap Teched 2016 - Deployment Options With Business Continuity For Sap Hana (Ha and DR)
No ratings yet
Sap Teched 2016 - Deployment Options With Business Continuity For Sap Hana (Ha and DR)
35 pages
Data Domain Replication Commands
100% (1)
Data Domain Replication Commands
3 pages

Notes NoSQL Module 2 Leason 5

Uploaded by

Notes NoSQL Module 2 Leason 5

Uploaded by

Chapter 5.

5.1. Update Consistency

Approaches for maintaining consistency in the face of concurrency: pessimistic or optimistic

Often, another approach for concurrency in distributed systems, sequential consistency—ensuring

5.2. Read Consistency

We refer to this type of consistency as logical consistency: ensuring that differentdata

Consistency is a Good Thing—but,it comes with sacrifices.

It is always possible to design a system to avoid inconsistencies, but often impossible

As a result, we often have to tradeoff consistency for something else.

Furthermore, different domains have different tolerances for inconsistency, andwe

Many systems forgo transactions entirely because the performance impact of

The CAP Theorem

You might also like