0% found this document useful (0 votes)
104 views

Web Architecture

This document provides an overview of a course on web application and software architecture. It outlines what will be covered in the course, including different architectural components, styles, and concepts. The course is meant for anyone looking to build skills in software architecture fundamentals and web application design. It will cover topics like architectural styles, layers, scalability, and how to choose the right architecture and technology stack for different use cases. The goal is to provide a comprehensive understanding of web application architecture.

Uploaded by

dArKhAcKs
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
104 views

Web Architecture

This document provides an overview of a course on web application and software architecture. It outlines what will be covered in the course, including different architectural components, styles, and concepts. The course is meant for anyone looking to build skills in software architecture fundamentals and web application design. It will cover topics like architectural styles, layers, scalability, and how to choose the right architecture and technology stack for different use cases. The goal is to provide a comprehensive understanding of web application architecture.

Uploaded by

dArKhAcKs
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 284

About This Course

WE'LL COVER THE FOLLOWING

• Who Is This Course For?


• Why Take This Course? What To Expect From It?
• Course Author

Who Is This Course For? #


There is no prerequisite to taking this course. It is meant for anyone looking to
build a good concept in web application & software architecture & for anyone
who wants to strengthen their fundamentals on it.

If you are a beginner just starting your career in software development, this
course will help you a lot. Designing software is like fitting the lego blocks
together. With this course, you’ll develop an insight into how to fit them
together and build cool stuff.

It will also help you with the software engineering interviews, especially for
the full stack developer positions.

This course does not contain any code, & has a thorough discussion on the
architectural concepts. It also contains a lot of illustrations to help you grasp
the concepts better.

Why Take This Course? What To Expect From It?


#
This course is a 101 on Web Application & Software Architecture. It walks you
step by step through different components & concepts involved when
designing the architecture of a web application. We’ll learn about various
architectural styles such as the client-server, peer to peer decentralized
architecture, microservices, the fundamentals of data flow in a web

application, different layers involved, concepts like scalability, high


availability & much more.

In this course, I also go through the techniques of picking the right


architecture and the technology stack to implement our use case. I walk you
through different use cases which will help you gain an insight into what
technology & architecture fits best for a certain use case when writing a web
application. You’ll come to understand the technology trade-offs involved.

By the end of the course, you’ll have a comprehensive insight into the web
application architecture. If you have a startup idea & you are asking yourself,
how do I implement my app? What technologies do I use? Where do I start?
This course will help you kickstart your entrepreneurial journey.

Also, this course will be continually updated & new content will be
added from time to time.

Course Author #
I am Shivang. I’ve been writing code from the past 8 years professionally & 14
years personally. In my career, I’ve gotten the opportunity to work on large
scale internet services for some of the industry giants on several different
domains such as E-commerce, Fintech, Telecom and others.

I’ve written applications from the bare bones, right from the idea to
production. I’ve maintained code, as well as worked in the production support
for systems receiving millions of hits every single day.

My last job was at Hewlett Packard Enterprise as a Full-Stack developer in their


Technical Solutions – R&D team.

Via this course, I’ve tried my best to share the knowledge, insights and the
experience gained in my years of software development, with all of you
guys!!.

Here is my LinkedIn profile, in case you want to say Hi!!


Cheers!!
Significance Of Software Architecture

In this lesson, we'll cover the signi cance of web application & software architecture & the reasoning behind
learning it.

WE'LL COVER THE FOLLOWING

• Importance Of Getting The Application Architecture Right


• An Overview Of The Software Development Process
• Proof Of Concept
• Course Design

Importance Of Getting The Application


Architecture Right #
The key element in successfully creating anything is getting the base right.
Now whether it is constructing a building or making a pizza. If we don’t get
the base right, we have to start over. Yeah!!.. I know, but there is no other way
around.

Building a web application is no different. The architecture is it’s base & has to
be carefully thought, to avoid any major design changes & code refactoring at
a later point in time.

Speaking with experience, you don’t want to delve into re-designing stuff. It
eats up your time like a black hole. It has the potential to push your shipping
date further down the calendar by months, if not longer. And I won’t even
bring up the wastage of engineering & financial resources which is caused due
to this. No, I won’t!!

It also depends on what stage of the development process we hit an impasse


due to the hasty decisions taken during the initial design phases. So, before we
even touch the code & get our hands dirty, we have to make the underlying
architecture right.
A look at the architecture of our app should bring a smile to everyone’s face.

Though software development is an iterative and evolutionary process, we


don’t always get things perfect at the first go. Still, this can’t be an excuse for
not doing our homework.

An Overview Of The Software Development


Process #
In the industry, architects, developers and product owners spend a lot of time
studying & discussing business requirements. In software engineering jargon,
this is known as the Requirement Gathering & Analysis.

Once we are done with the business requirements, we sit down & brainstorm
the use cases which we have to implement. This involves figuring out the
corner cases as early as possible & fitting the Lego blocks together.

If you’re a fan of documentation, you might also want to write a high-level


design document.

Now, we have an understanding of the business requirements, use cases,


corner cases and all. It’s time to start the research on picking the right
technology stack to implement the use cases.

Proof Of Concept #
After we pick the fitting tech stack, we start writing a POC (Proof of Concept)

Why a POC?

A POC helps us get a closer, more hands-on view of the technology & the basic
use case implementation. We get an insight into the pros and cons of the tech,
performance or other technical limitations if any.

It helps with the learning curve if we’re working with completely new tech,
also the non-technical people like the product owners, stakeholders have
something concrete to play with & to base their further decisions on.

Now, this is only for an industry scale product. If you are a solo indie
developer or a small group, you can always skip the POC part and start with
the main code.
So, we showcase the POC to the stakeholders & if everyone is satisfied, we
finally get down to creating the main repo & our first dev branch on GitHub,
or any other similar code hosting service which the business prefers.

Phew!!

So, by now you would have realized how important it is to get the architecture
right at the first time & the knowledge of web architecture to developers.

Course Design #
Hmmm…. Alright, this being said. Now speaking of this course. It is divided
into two parts. In the first, we will discuss the concepts & the architectural
components involved in designing web applications.

We will get insights into different tiers of software applications, monolithic


repos, microservices, peer to peer architecture & a lot more.

In the second part, we will go through some of the use cases of designing the
architecture for applications which we use in our day to day lives & are well
familiar with.

We will also understand how applications are designed from the bare bones.
What is the thought process of picking the right technology stack for our use
case & so forth?

So, without further ado. Let’s get started.


Introduction

This lesson gives an overview of the different topics we will cover in this chapter. Also, we will learn what is a Tier
& it’s components?

WE'LL COVER THE FOLLOWING

• What Is A Tier?

I’ll begin the course by discussing different tiers involved in the software
architectural landscape. This is like a bird’s eye view of the realm of software
architecture & is important to be understood well.

This chapter will help us understand:

What is a Tier?
Why do software applications have different tiers? What is the need for
them?
How do I decide how many tiers should my application have?

What Is A Tier? #
Think of a tier as a logical separation of components in an application or a
service. And when I say separation, I mean physical separation at the
component level, not the code level.

What do I mean by components?

Database
Backend application server
User interface
Messaging
Caching
These are the different components that make up a web service.

Now let’s have a look at the different types of tiers & their real-life examples.
Single Tier Applications

In this lesson, we will learn about the Single Tier applications.

WE'LL COVER THE FOLLOWING

• Single Tier Applications


• Advantages Of Single Tier Applications
• Disadvantages Of Single Tier Applications

Single Tier Applications #

A single-tier application is an application where the user interface,


backend business logic & the database all reside in the same machine.

Typical examples of single-tier applications are desktop applications like MS


Office, PC Games or an image editing software like Gimp.

Advantages Of Single Tier Applications #


The main upside of single-tier applications is they have no network latency
since every component is located on the same machine. This adds up to the
performance of the software.

There are no data requests to the backend server every now and then, which
would make the user experience slow. In single-tier apps, the data is easily &
quickly available since it is located in the same machine.

Though it largely depends on how powerful the machine is & the hardware
requirements of the software, to gauge the real performance of a single-tier
app.

Also, the data of the user stays in his machine & doesn’t need to be transmitted
over a network. This ensures data safety at the highest level.

Disadvantages Of Single Tier Applications #


One big downside of single-tier app is that the business has no control over the
application. Once the software is shipped, no code or features changes can
possibly be done until the customer manually updates it by connecting to the
remote server or by downloading & installing a patch.

In the 90s due to this, if a game was shipped with buggy code, there was
nothing the studios could do. They would eventually have to face quite some
heat due to the buggy nature of the software. The testing of the product has to
be thorough, there is no room for any mistakes.

The code in single-tier applications is also vulnerable to being tweaked &


reversed engineered. The security, for the business, is minimal.

Also, the applications’ performance & the look and feel can get inconsistent as
it largely depends on the configuration of the user’s machine.
Two Tier Applications

In this lesson, we will learn about the Two Tier applications.

WE'LL COVER THE FOLLOWING

• Two Tier Application


• The Need For Two Tier Application

Two Tier Application #

A Two-tier application involves a client and a server. The client would


contain the user interface & the business logic in one machine. And the
backend server would be the database running on a different machine.
The database server is hosted by the business & has control over it.

Why the need for two-tier applications? Why not host the business logic on a
different machine & have control over it?

Also, again isn’t the application code vulnerable to being accessed by a third
person?

The Need For Two Tier Application #


Well, yes!! But there are use cases where two-tier applications come in handy,
for instance, a to-do list app or a similar planner or a productivity app.

In these scenarios, it won’t cause the business significant harm, even if the
code is accessed by a third person. On the contrary, the upside is since the
code & the user interface reside in the same machine, there are fewer
network calls to the backend server which keeps the latency of the application
low.

The application makes a call to the database server, only when the user has
finished creating his to-do list & wants to persist the changes.

Another good example of this is the online browser & app-based games. The
game files are pretty heavy, they get downloaded on the client just once when
the user uses the application for the first time. Moreover, they make the
network calls only to keep the game state persistent.

Also, fewer server calls mean less money to be spent on the servers which is
naturally economical.

Though, it largely depends on our business requirements & the use case if we
want to pick this type of tier when writing our service.

We can either keep the user interface and the business logic on the client or
move the business logic to a dedicated backend server, which would make it a
three-tier application. Which I am going to discuss up next.
Three Tier Applications

In this lesson, we will learn about the Three Tier applications.

Three-tier applications are pretty popular & largely used in the industry.
Almost all of the simple websites like blogs, news websites etc. are part of this
category.

In a three-tier application the user interface, application logic & the


database all lie on different machines & thus have different tiers. They
are physically separated.

So, if we take the example of a simple blog, the user interface would be
written using Html, JavaScript, CSS, the backend application logic would run
on a server like Apache & the database would be MySQL. A three-tier
architecture works best for simple use cases.

Alright! Now, let’s move on to learn about the N-tier applications.


N Tier Applications

In this lesson, we will go over the N Tier applications and its components.

WE'LL COVER THE FOLLOWING

• N-Tier Application
• Why The Need For So Many Tiers?
• Single Responsibility Principle
• Separation Of Concerns
• Difference Between Layers & Tiers

N-Tier Application #

An N-tier application is an application which has more than three


components involved.

What are those components?

Cache
Message queues for asynchronous behaviour
Load balancers
Search servers for searching through massive amounts of data
Components involved in processing massive amounts of data
Components running heterogeneous tech commonly known as web services
etc.

All the social applications like Instagram, Facebook, large scale industry
services like Uber, Airbnb, online massive multiplayer games like Pokemon Go,
applications with fancy features are n-tier applications.
Note: There is another name for n-tier apps, the “distributed
applications”. But, I think it’s not safe to use the word “distributed” yet,
as the term distributed brings along a lot of complex stuff with it. It
would rather confuse us than help. Though I will discuss the distributed
architecture in this course, for now, we will just stick with the term N-
tier applications.

So, why the need for so many tiers?

Why The Need For So Many Tiers? #


Two software design principles that are key to explaining this are the Single
Responsibility Principle & the Separation of Concerns.

Single Responsibility Principle #


Single Responsibility Principle simply means giving one, just one
responsibility to a component & letting it execute it with perfection. Be it
saving data, running the application logic or ensuring the delivery of the
messages throughout the system.

This approach gives us a lot of flexibility & makes management easier.

For instance, when upgrading a database server. Like when installing a new
OS or a patch, it wouldn’t impact the other components of the service running
& even if something amiss happens during the OS installation process, just the
database component would go down. The application as a whole would still be
up & would only impact the features requiring the database.

We can also have dedicated teams & code repositories for every component,
thus keeping things cleaner.

Single responsibility principle is a reason, why I was never a fan of stored


procedures.

Stored procedures enable us to add business logic to the database, which is a


big no for me. What if in future we want to plug in a different database?
Where do we take the business logic? To the new database? Or do we try to

refactor the application code & squeeze in the stored procedure logic
somewhere?

A database should not hold business logic, it should only take care of
persisting the data. This is what the single responsibility principle is. And this
is why we have separate tiers for separate components.

Separation Of Concerns #
Separation of concerns kind of means the same thing, be concerned about
your work only & stop worrying about the rest of the stuff.

These principles act at all the levels of the service, be it at the tier level or the
code level.

Keeping the components separate makes them reusable. Different services


can use the same database, the messaging server or any component as long as
they are not tightly coupled with each other.

Having loosely coupled components is the way to go. The approach makes
scaling the service easy in future when things grow beyond a certain level.

Difference Between Layers & Tiers #

Note: Don’t confuse tiers with the layers of the application. Some prefer
to use them interchangeably. But in the industry layers of an application
typically means the user interface layer, business layer, service layer, or
the data access layer.
The layers mentioned in the illustration are at the code level. The difference
between layers and tiers is that the layers represent the organization of the
code and breaking it into components. Whereas, tiers involve physical
separation of components.

All these layers together can be used in any tiered application. Be it single,
two, three or N-tier. I’ll discuss these layers in detail in the course ahead.

Alright, now we have an understanding of tiers. Let’s zoom-in one notch &
focus on web architecture.
Different Tiers In Software Architecture Quiz

This lesson contains a quiz to test your understanding of tiers in software architecture.

Let’s Test Your Understanding Of Different Tiers In Software Architecture

1 What is a tier?

COMPLETED 0%
1 of 6
What Is Web Architecture?

In this lesson, we will have a brief introduction to web architecture.

Web architecture involves multiple components like database, message queue,


cache, user interface & all running in conjunction with each other to form an
online service.

In the introduction of this course, I already talked about why the knowledge
of web architecture is important to software engineers. Now we will explore it
further.

This is a typical architecture of a web application, used in the majority of the


applications running online.

If we have an understanding of the components involved in this diagram,


then we can always build upon this architecture for more complex
requirements.

I’ll step by step go through every component starting with the client-server
architecture.
Client Server Architecture

This lesson is an introduction to the Client-Server Architecture.

We’ve already learned a bit about the client-server architecture when


discussing the two-tier, three-tier & the N-tier architecture. Now we look at it
in detail.

Client-Server architecture is the fundamental building block of the web.

The architecture works on a request-response model. The client sends the


request to the server for information & the server responds with it.

Every website you browse, be it a Wordpress blog or a web application like


Facebook, Twitter or your banking app is built on the client-server
architecture.

A very small percent of the business websites and applications use the peer to
peer architecture, which is different from the client-server.

I will discuss that in ahead in the course. For now, let’s focus on the client.
Client

In this lesson, we will explore the Client component of the Client-Server Architecture.

WE'LL COVER THE FOLLOWING

• Client
• Technologies Used To Implement Clients In Web Applications

Client #
The client holds our user interface. The user interface is the presentation part
of the application. It’s written in Html, JavaScript, CSS and is responsible for
the look & feel of the application.

The user interface runs on the client. The client can be a mobile app, a
desktop or a tablet like an iPad. It can also be a web-based console, running
commands to interact with the backend server.
Technologies Used To Implement Clients In Web
Applications #
In very simple terms, a client is the window to our application. In the industry,
the open-source technologies popular for writing the web-based user interface
are ReactJS, AngularJS, VueJS, Jquery etc. All these libraries use JavaScript.

There are a plethora of other technologies for writing the front-end too, I have
just listed the popular ones for now.

Different platforms require different frameworks & libraries to write front-


end. For instance, mobile phones running Android would need a different set
of tools, those running Apple or Windows OS would need a different set of
tools.

If you are intrigued about the technologies popular in the industry have a look
at the developer survey run by StackOverflow for this year
Types of Client

In this lesson, we will learn about the two types of client: the Thin Client and the Thick Client (sometimes also
called the Fat Client).

WE'LL COVER THE FOLLOWING

• Thin Client
• Thick Client

There are primarily two types of clients:

1. Thin Client
2. Thick Client (sometimes also called the Fat client)

Thin Client #
Thin Client is the client which holds just the user interface of the application.
It has no business logic of any sort. For every action, the client sends a request
to the backend server. Just like in a three-tier application.
Thick Client #
On the contrary, the thick client holds all or some part of the business logic.
These are the two-tier applications. We’ve already gone through this if you
remember.

The typical examples of Fat clients are utility apps, online games etc.
Server

In this lesson, we will explore the Server component of the Client-Server Architecture.

WE'LL COVER THE FOLLOWING

• What is A Web Server?


• Server-Side Rendering

What is A Web Server? #

The primary task of a web server is to receive the requests from the
client & provide the response after executing the business logic based on
the request parameters received from the client.

Every service, running online, needs a server to run. Servers running web
applications are commonly known as the application servers.
Besides the application servers, there are other kinds of servers too with
specific tasks assigned to them such as the:

Proxy server
Mail server
File server
Virtual server

The server configuration & the type can differ depending on the use case.

For instance, if we run a backend application code written in Java, we


would pick Apache Tomcat or Jetty.

For simple use cases such as hosting websites, we would pick the Apache
HTTP Server.

In this lesson, we will stick to the application server.

All the components of a web application need a server to run. Be it a database,


a message queue, a cache or any other component. In modern application
development, even the user interface is hosted separately on a dedicated
server.

Server-Side Rendering #
Often the developers use a server to render the user interface on the backend
& then send the rendered data to the client. The technique is known as server-
side rendering. I will discuss the pros & cons of client-side vs server-side
rendering further down the course.

Now we have a fundamental understanding of both the client & the server.
Let’s delve into some of the concepts involved in the communication between
them.
Communication Between the Client & the Server

In this lesson, we will learn how communication takes place between the Client and the Server.

WE'LL COVER THE FOLLOWING

• Request-Response Model
• HTTP Protocol
• REST API & API Endpoints
• Real World Example Of Using A REST API

Request-Response Model #
The client & the server have a request-response model. The client sends the
request & the server responds with the data.

If there is no request, there is no response. Pretty simple right?

HTTP Protocol #
The entire communication happens over the HTTP protocol. It is the protocol
for data exchange over the World Wide Web. HTTP protocol is a request-
response protocol that defines how information is transmitted across the web.

It’s a stateless protocol, every process over HTTP is executed independently &
has no knowledge of previous processes.

If you want to read more about the protocol, this is a good resource on it

Alright, moving on…

REST API & API Endpoints #


Speaking from the context of modern N-tier web applications, every client has
to hit a REST end-point to fetch the data from the backend.
Note: If you aren’t aware of the REST API & the API Endpoints, I have
discussed it in the next lesson in detail. I’ve brought up the terms in this
lesson, just to give you a heads up on how modern distributed web
applications communicate.

The backend application code has a REST-API implemented which acts as an


interface to the outside world requests. Every request be it from the client
written by the business or the third-party developers which consume our data
have to hit the REST-endpoints to fetch the data.

Real World Example Of Using A REST API #


For instance, let’s say we want to write an application which would keep track
of the birthdays of all our Facebook friends & send us a reminder a couple of
days before the event date.

To implement this, the first step would be to get the data on the birthdays of
all our Facebook friends.

We would write a client which would hit the Facebook Social Graph API which
is a REST-API to get the data & then run our business logic on the data.

Implementing a REST-based API has several advantages. Let’s delve into it in


detail to have a deeper understanding.
Web Architecture Quiz - Part 1

This lesson contains a quiz to test your understanding of the client, the server & the communication between
them.

Let’s Test Your Understanding Of the Client Server Communication

1 Where does the user interface component of a web application runs?

COMPLETED 0%
1 of 5
What Is A REST API?

In this lesson we will have an insight into the REST API

WE'LL COVER THE FOLLOWING

• WHAT IS REST?
• REST API
• REST Endpoint
• Decoupling Clients & the Backend Service
• Application Development Before the REST API
• API Gateway

WHAT IS REST? #

REST stands for Representational State Transfer. It’s a software


architectural style for implementing web services. Web services
implemented using the REST architectural style are known as the
RESTful Web services.

REST API #
A REST API is an API implementation that adheres to the REST architectural
constraints. It acts as an interface. The communication between the client &
the server happens over HTTP. A REST API takes advantage of the HTTP
methodologies to establish communication between the client and the server.
REST also enables servers to cache the response that improves the
performance of the application.
The communication between the client and the server is a stateless process.
And by that, I mean every communication between the client and the server is
like a new one.

There is no information or memory carried over from the previous


communications. So, every time a client interacts with the backend, it has to
send the authentication information to it as well. This enables the backend to
figure out that the client is authorized to access the data or not.

With the implementation of a REST API the client gets the backend endpoints to
communicate with. This entirely decouples the backend & the client code.

Let’s understand what this means.

REST Endpoint #
An API/REST/Backend endpoint means the url of a service. For example,
https://myservice.com/getuserdetails/{username} is a backend endpoint for
fetching the user details of a particular user from the service.

The REST-based service will expose this url to all its clients to fetch the user
details using the above stated url.
Decoupling Clients & the Backend Service #
With the availability of the endpoints, the backend service does not have to
worry about the client implementation. It just calls out to its multiple clients &
says “Hey everyone, here is the url address of the resource/information you
need. Hit it when you need it. Any client with the required authorization to
access a resource can access it”.

Developers can have different implementations with separate codebases, for


different clients, be it a mobile browser, a desktop browser, a tablet or an API
testing tool. Introducing new types of clients or modifying the client code has
no effect on the functionality of the backend service.

This means the clients and the backend service are decoupled.

Application Development Before the REST API #


Before the REST-based API interfaces got mainstream in the industry. We often
tightly coupled the backend code with the client. JSP (Java Server Pages) is one
example of it.

We would always put business logic in the JSP tags. Which made code
refactoring & adding new features really difficult as the logic got spread
across different layers.

Also, in the same codebase, we had to write separate code/classes for handling
requests from different types of clients. A different servlet for a mobile client
and a different one for a web-based client.

After the REST APIs became widely used, there was no need to worry about
the type of the client. Just provide the endpoints & the response would contain
data generally in the JSON or any other standard data transport format. And
the client would handle the data in whatever way they would want.

This cut down a lot of unnecessary work for us. Also, adding new clients
became a lot easier. We could introduce multiple types of new clients without
considering the backend implementation.

In today’s industry landscape, there is hardly any online service without a


REST API. Want to access public data of any social network? Use their REST
API.

API Gateway #

The REST-API acts as a gateway, as a single entry point into the system. It
encapsulates the business logic. Handles all the client requests, taking care of
the authorization, authentication, sanitizing the input data & other necessary
tasks before providing access to the application resources.

So, now we are aware of the client-server architecture, we know what a REST
API is. It acts as the interface & the communication between the client and the
server happens over HTTP.

Let’s look into the HTTP Pull & Push-based communication mechanism.
HTTP Push & Pull - Introduction

In this lesson, we will have an introduction to the HTTP Push & Pull mechanism.

WE'LL COVER THE FOLLOWING

• HTTP PULL
• HTTP PUSH

In this lesson, we will get an insight into the HTTP Push & Pull mechanism. We
know that the majority of the communication on the web happens over HTTP,
especially wherever the client-server architecture is involved.

There are two modes of data transfer between the client and the server. HTTP
PUSH & HTTP PULL. Let’s find out what they are & what they do.

HTTP PULL #
As I stated earlier, for every response, there has to be a request first. The
client sends the request & the server responds with the data. This is the
default mode of HTTP communication, called the HTTP PULL mechanism.

The client pulls the data from the server whenever it requires. And it keeps
doing it over and over to fetch the updated data.

An important thing to note here is that every request to the server and the
response to it consumes bandwidth. Every hit on the server costs the business
money & adds more load on the server.

What if there is no updated data available on the server, every time the client
sends a request?

The client doesn’t know that, so naturally, it would keep sending the requests
to the server over and over. This is not ideal & a waste of resources. Excessive
pulls by the clients have the potential to bring down the server.
HTTP PUSH #
To tackle this, we have the HTTP PUSH based mechanism. In this mechanism,
the client sends the request for particular information to the server, just for
the first time, & after that the server keeps pushing the new updates to the
client whenever they are available.

The client doesn’t have to worry about sending requests to the server, for
data, every now & then. This saves a lot of network bandwidth & cuts down
the load on the server by notches.

This is also known as a Callback. Client phones the server for information. The
server responds, Hey!! I don’t have the information right now but I’ll call you
back whenever it is available.

A very common example of this is user notifications. We have them in almost


every web application today. We get notified whenever an event happens on
the backend.

Clients use AJAX (Asynchronous JavaScript & XML) to send requests to the
server in the HTTP Pull based mechanism.

There are multiple technologies involved in the HTTP Push based mechanism
such as:

Ajax Long polling


Web Sockets
HTML5 Event Source
Message Queues
Streaming over HTTP

We’ll go over all of them in detail up-next.


HTTP Pull - Polling with Ajax

In this lesson, we will understand HTTP Pull, AJAX and how polling is done using AJAX.

WE'LL COVER THE FOLLOWING

• AJAX – Asynchronous JavaScript & XML

There are two ways of pulling/fetching data from the server.

The first is sending an HTTP GET request to the server manually by triggering
an event, like by clicking a button or any other element on the web page.

The other is fetching data dynamically at regular intervals by using AJAX


without any human intervention.

AJAX – Asynchronous JavaScript & XML #

AJAX stands for asynchronous JavaScript & XML. The name says it all, it
is used for adding asynchronous behaviour to the web page.
As we can see in the illustration above, instead of requesting the data
manually every time with the click of a button. AJAX enables us to fetch the
updated data from the server by automatically sending the requests over and
over at stipulated intervals.

Upon receiving the updates, a particular section of the web page is updated
dynamically by the callback method. We see this behaviour all the time on
news & sports websites, where the updated event information is dynamically
displayed on the page without the need to reload it.

AJAX uses an XMLHttpRequest object for sending the requests to the server
which is built-in the browser and uses JavaScript to update the HTML DOM.

AJAX is commonly used with the Jquery framework to implement the


asynchronous behaviour on the UI.

This dynamic technique of requesting information from the server after


regular intervals is known as Polling.
HTTP Push

In this lesson, we will learn about the HTTP Push mechanism.

WE'LL COVER THE FOLLOWING

• Time To Live (TTL)


• Persistent Connection
• Heartbeat Interceptors
• Resource Intensive

Time To Live (TTL) #


In the regular client-server communication, which is HTTP PULL, there is a
Time to Live (TTL) for every request. It could be 30 secs to 60 secs, varies from
browser to browser.

If the client doesn’t receive a response from the server within the TTL, the
browser kills the connection & the client has to re-send the request hoping it
would receive the data from the server before the TTL ends this time.

Open connections consume resources & there is a limit to the number of open
connections a server can handle at one point in time. If the connections don’t
close & new ones are being introduced, over time, the server will run out of
memory. Hence, the TTL is used in client-server communication.

But what if we are certain that the response will take more time than the TTL
set by the browser?

Persistent Connection #
In this case, we need a Persistent Connection between the client and the
server.
A persistent connection is a network connection between the client & the
server that remains open for further requests & the responses, as
opposed to being closed after a single communication.

It facilitates HTTP Push-based communication between the client and the


server.

Heartbeat Interceptors #
Now you might be wondering how is a persistent connection possible if the
browser kills the open connections to the server every X seconds?

The connection between the client and the server stays open with the help of
Heartbeat Interceptors.

These are just blank request responses between the client and the server
to prevent the browser from killing the connection.

Isn’t this resource-intensive?


Resource Intensive #

Yes, it is. Persistent connections consume a lot of resources in comparison to


the HTTP Pull behaviour. But there are use cases where establishing a
persistent connection is vital to the feature of an application.

For instance, a browser-based multiplayer game has a pretty large amount of


request-response activity within a certain time in comparison to a regular
web application.

It would be apt to establish a persistent connection between the client and the
server from a user experience standpoint.

Long opened connections can be implemented by multiple techniques such as


Ajax Long Polling, Web Sockets, Server-Sent Events etc.

Let’s have a look into each of them.


HTTP Push-Based Technologies

In this lesson, we will discuss some HTTP Push based technologies.

WE'LL COVER THE FOLLOWING

• Web Sockets
• AJAX – Long Polling
• HTML5 Event Source API & Server Sent Events
• Streaming Over HTTP
• Summary

Web Sockets #
A Web Socket connection is ideally preferred when we need a persistent bi-
directional low latency data flow from the client to server & back.

Typical use-cases of these are messaging, chat applications, real-time social


streams & browser-based massive multiplayer games which have quite a
number of read writes in comparison to a regular web app.

With Web Sockets, we can keep the client-server connection open as long as
we want.

Have bi-directional data? Go ahead with Web Sockets. One more thing, Web
Sockets tech doesn’t work over HTTP. It runs over TCP. The server & the client
should both support web sockets or else it won’t work.

The WebSocket API & Introducing WebSockets – Bringing Sockets to the Web
are good resources for further reading on web sockets

AJAX – Long Polling #


Long Polling lies somewhere between Ajax & Web Sockets. In this technique
instead of immediately returning the response, the server holds the response
until it finds an update to be sent to the client.

The connection in long polling stays open a bit longer in comparison to


polling. The server doesn’t return an empty response. If the connection
breaks, the client has to re-establish the connection to the server.

The upside of using this technique is that there are quite a smaller number of
requests sent from the client to the server, in comparison to the regular
polling mechanism. This reduces a lot of network bandwidth consumption.

Long polling can be used in simple asynchronous data fetch use cases when
you do not want to poll the server every now & then.

HTML5 Event Source API & Server Sent Events #


The Server-Sent Events implementation takes a bit of a different approach.
Instead of the client polling for data, the server automatically pushes the data
to the client whenever the updates are available. The incoming messages from
the server are treated as Events.

Via this approach, the servers can initiate data transmission towards the
client once the client has established the connection with an initial request.

This helps in getting rid of a huge number of blank request-response cycles


cutting down the bandwidth consumption by notches.

To implement server-sent events, the backend language should support the


technology & on the UI HTML5 Event source API is used to receive the data in-
coming from the backend.

An important thing to note here is that once the client establishes a


connection with the server, the data flow is in one direction only, that is from
the server to the client.

SSE is ideal for scenarios such as a real-time feed like that of Twitter,
displaying stock quotes on the UI, real-time notifications etc.

This is a good resource for further reading on SSE

Streaming Over HTTP #


Streaming Over HTTP is ideal for cases where we need to stream large data
over HTTP by breaking it into smaller chunks. This is possible with HTML5 &

a JavaScript Stream API.

The technique is primarily used for streaming multimedia content, like large
images, videos etc, over HTTP.

Due to this, we can watch a partially downloaded video as it continues to


download, by playing the downloaded chunks on the client.

To stream data, both the client & the server agree to conform to some
streaming settings. This helps them figure when the stream begins & ends
over an HTTP request-response model.

You can go through this resource for further reading on Stream API

Summary #
So, now we have an understanding of what HTTP Pull & Push is. We went
through different technologies which help us establish a persistent connection
between the client and the server.

Every tech has a specific use case, Ajax is used to dynamically update the web
page by polling the server at regular intervals.

Long polling has a connection open time slightly longer than the polling
mechanism.
Web Sockets have bi-directional data flow, whereas Server sent events
facilitate data flow from the server to the client.

Streaming over HTTP facilitates streaming of large objects like multi-media


files.

What tech would fit best for our use cases depends on the kind of application
we intend to build.

Alright, let’s quickly gain an insight into the pros & cons of the client and the
server-side rendering.
Client-Side Vs Server-Side Rendering

In this lesson, we will learn about the client side and the server-side rendering & the use cases for both the
approaches.

WE'LL COVER THE FOLLOWING

• Client-Side Rendering - How Does A Browser Render A Web Page?


• Server-Side Rendering
• Use Cases For Server-Side & Client-Side Rendering

Client-Side Rendering - How Does A Browser


Render A Web Page? #
When a user requests a web page from the server & the browser receives the
response. It has to render the response on the window in the form of an
HTML page.

For this, the browser has several components, such as the:

Browser engine
Rendering engine
JavaScript interpreter
Networking & the UI backend
Data storage etc.

I won’t go into much detail but the browser has to do a lot of work to convert
the response from the server into an HTML page.

The rendering engine constructs the DOM tree, renders & paints the
construction. And naturally, all this activity needs a bit of time.

Server-Side Rendering #
To avoid all this rendering time on the client, developers often render the UI
on the server, generate HTML there & directly send the HTML page to the UI.
This technique is known as the Server-side rendering. It ensures faster
rendering of the UI, averting the UI loading time in the browser window since
the page is already created & the browser doesn’t have to do much assembling
& rendering work.

Use Cases For Server-Side & Client-Side


Rendering #
The server-side rendering approach is perfect for delivering static content,
such as WordPress blogs. It’s also good for SEO as the crawlers can easily read
the generated content.

However, the modern websites are highly dependent on Ajax. In such


websites, content for a particular module or a section of a page has to be
fetched & rendered on the fly.

Therefore, server-side rendering doesn’t help much. For every Ajax-request,


instead of sending just the required content to the client, the approach
generates the entire page on the server. This process consumes unnecessary
bandwidth & also fails to provide a smooth user experience.

A big downside to this is once the number of concurrent users on the website
rises, it puts an unnecessary load on the server.

Client-side rendering works best for modern dynamic Ajax-based websites.

Though we can leverage a hybrid approach, to get the most out of both
techniques. We can use server-side rendering for the home page & for the
other static content on our website & use client-side rendering for the
dynamic pages.

Alright, before moving down to the database, message queue & the caching
components. It’s important for us to understand a few concepts such as:

Monolithic architecture
Micro-services
Scalability
High availability
Distributed systems

What are nodes in distributed systems? Why are they important to


software design?

The clarity on these concepts will help us understand the rest of the web
components better. Let’s have a look one by one.
Web Architecture Quiz - Part 2

This lesson contains a quiz to test your understanding of the REST API & the HTTP mechanisms.

Let’s Test Your Understanding Of the REST API & the HTTP mechanisms

1 Why should we implement a REST API in our application? Which of the


following option(s) are correct?

COMPLETED 0%
1 of 8
What Is Scalability?

This lesson is an introduction to scalability.

WE'LL COVER THE FOLLOWING

• What is Scalability?
• What Is Latency?
• Measuring Latency
• Network Latency
• Application Latency
• Why Is Low Latency So Important For Online Services?

I am pretty sure, being in the software development universe, you’ve come


across this word a lot many times. Scalability. What is it? Why is it so
important? Why is everyone talking about it? Is it important to scale systems?
What are your plans or contingencies to scale when your app or the platform
experiences significant traffic growth?

This chapter is a deep dive into scalability. It covers all the frequently asked
questions on it such as: what does scalability mean in the context of web
applications, distributed systems or cloud computing? Etc.

So, without further ado. Let’s get started.

What is Scalability? #
Scalability means the ability of the application to handle & withstand
increased workload without sacrificing the latency.

For instance, if your app takes x seconds to respond to a user request. It


should take the same x seconds to respond to each of the million concurrent
user requests on your app.
The backend infrastructure of the app should not crumble under a load of a
million concurrent requests. It should scale well when subjected to a heavy
traffic load & should maintain the latency of the system.

What Is Latency? #
Latency is the amount of time a system takes to respond to a user request.
Let’s say you send a request to an app to fetch an image & the system takes 2
seconds to respond to your request. The latency of the system is 2 seconds.

Minimum latency is what efficient software systems strive for. No matter how
much the traffic load on a system builds up, the latency should not go up. This
is what scalability is.

If the latency remains the same, we can say yeah, the application scaled well
with the increased load & is highly scalable.

Let’s think of scalability in terms of Big-O notation. Ideally, the complexity of a


system or an algorithm should be O(1) which is constant time like in a key-
value database.

A program with the complexity of O(n^2) where n is the size of the data set is
not scalable. As the size of the data set increases the system will need more
computational power to process the tasks.
So, how do we measure latency?

Measuring Latency #
Latency is measured as the time difference between the action that the user
takes on the website, it can be an event like the click of a button, & the system
response in reaction to that event.

This latency is generally divided into two parts:

1. Network Latency
2. Application Latency

Network Latency #
Network Latency is the amount of time that the network takes for sending a
data packet from point A to point B. The network should be efficient enough
to handle the increased traffic load on the website. To cut down the network
latency, businesses use CDN & try to deploy their servers across the globe as
close to the end-user as possible.

Application Latency #
Application Latency is the amount of time the application takes to process a
user request. There are more than a few ways to cut down the application
latency. The first step is to run stress & load tests on the application & scan for
the bottlenecks that slow down the system as a whole. I’ve talked more about
it in the upcoming lesson.

Why Is Low Latency So Important For Online


Services? #
Latency plays a major role in determining if an online business wins or loses a
customer. Nobody likes to wait for a response on a website. There is a well-
known saying if you want to test a person’s patience, give him a slow internet
connection 😊

If the visitor gets the response within a stipulated time, great or he bounces
off to another website.

There are numerous market researches that present the fact that high latency
in applications is a big factor in customers bouncing off a website. If there is
money involved, zero latency is what businesses want, only if that was
possible.

Think of massive multiplayer online MMO games, a slight lag in an in-game


event ruins the whole experience. A gamer with a high latency internet
connection will have a slow response time despite having the best reaction
time of all the players in an arena.

Algorithmic trading services need to process events within milliseconds.


Fintech companies have dedicated networks to run low latency trading. The
regular network just won’t cut it.

We can realize the importance of low latency by the fact that Huawei &
Hibernia Atlantic in the year 2011 started laying a fibre-optic link cable across
the Atlantic Ocean between London & Newyork, that was estimated having a
cost of approx. $300M, just to save traders 6 milliseconds of latency.
Types Of Scalability

In this lesson, we will explore the two types of scaling: Vertical and Horizontal Scaling.

WE'LL COVER THE FOLLOWING

• What is Vertical Scaling?


• What is Horizontal Scaling?
• Cloud Elasticity

An application to scale well needs solid computing power. The servers should
be powerful enough to handle increased traffic loads.

There are two ways to scale an application:

1. Vertical Scaling
2. Horizontal Scaling

What is Vertical Scaling? #


Vertical scaling means adding more power to your server. Let’s say your app
is hosted by a server with 16 Gigs of RAM. To handle the increased load you
increase the RAM to 32 Gigs. You have vertically scaled the server.
Ideally, when the traffic starts to build upon your app the first step should be
to scale vertically. Vertical scaling is also called scaling up.

In this type of scaling we increase the power of the hardware running the
app. This is the simplest way to scale since it doesn’t require any code
refactoring, not making any complex configurations and stuff. I’ll discuss
further down the lesson, why code refactoring is required when we
horizontally scale the app.

But there is only so much we can do when scaling vertically. There is a limit to
the capacity we can augment for a single server.

A good analogy would be to think of a multi-story building we can keep


adding floors to it but only upto a certain point. What if the number of people
in need of a flat keeps rising? We can’t scale up the building to the moon, for
obvious reasons.

Now is the time to build more buildings. This is where Horizontal Scalability
comes in.

When the traffic is just too much to be handled by single hardware, we bring
in more servers to work together.
What is Horizontal Scaling? #
Horizontal scaling, also known as scaling out, means adding more hardware
to the existing hardware resource pool. This increases the computational
power of the system as a whole.

Now the increased traffic influx can be easily dealt with the increased
computational capacity & there is literally no limit to how much we can scale
horizontally assuming we have infinite resources. We can keep adding servers
after servers, setting up data centres after data centres.

Horizontal scaling also provides us with the ability to dynamically scale in


real-time as the traffic on our website increases & decreases over a period of
time as opposed to vertical scaling which requires pre-planning & a stipulated
time to be pulled off.

Cloud Elasticity #
The biggest reason why cloud computing got so popular in the industry is the
ability to scale up & down dynamically. The ability to use & pay only for the
resources required by the website became a trend for obvious reasons.

If the site has a heavy traffic influx more server nodes get added & when it
doesn’t the dynamically added nodes are removed.

This approach saves businesses bags of money every single day. The approach
is also known as cloud elasticity. It indicates the stretching & returning to the
original infrastructural computational capacity.

Having multiple server nodes on the backend also helps with the website
staying alive online all the time even if a few server nodes crash. This is
known as High Availability. We’ll get to that in the upcoming lessons.
Which Scalability Approach Is Right For Your App?

In this lesson, we will learn about which type of scaling is better for a given scenario.

WE'LL COVER THE FOLLOWING

• Pros & Cons of Vertical & Horizontal Scaling


• What about the code? Why does the code need to change when it has to
run on multiple machines?
• Which Scalability Approach Is Right for Your App?

Pros & Cons of Vertical & Horizontal Scaling #


This is the part where I talk about the plus & minuses of both the approaches.

Vertical scaling for obvious reasons is simpler in comparison to scaling


horizontally as we do not have to touch the code or make any complex
distributed system configurations. It takes much less administrative,
monitoring, management efforts as opposed to when managing a distributed
environment.

A major downside of vertical scaling is availability risk. The servers are


powerful but few in number, there is always a risk of them going down & the
entire website going offline which doesn’t happen when the system is scaled
horizontally. It becomes more highly available.

What about the code? Why does the code need


to change when it has to run on multiple
machines? #
If you need to run the code in a distributed environment, it needs to be
stateless. There should be no state in the code. What do I mean by that?

No static instances in the class. Static instances hold application data & if a
particular server goes down all the static data/state is lost. The app is left in an
inconsistent state.
Rather, use a persistent memory like a key-value store to hold the data & to
remove all the state/static variable from the class. This is why functional
programming got so popular with distributed systems. The functions don’t
retain any state.

Always have a ballpark estimate on mind when designing your app. How
much traffic will it have to deal with?

Development teams today are adopting a distributed micro-services


architecture right from the start & the workloads are meant to be deployed on
the cloud. So, inherently the workloads are horizontally scaled out on the fly.

The upsides of horizontally scaling include no limit to augmenting the


hardware capacity. Data is replicated across different geographical regions as
nodes & data centres are set up across the globe.

I’ll discuss cloud, serverless and microservices in the upcoming lessons. So,
stay tuned.

Which Scalability Approach Is Right for Your


App? #
If your app is a utility or a tool which is expected to receive minimal
consistent traffic, it may not be mission-critical. For instance, an internal tool
of an organization or something similar.

Why bother hosting it in a distributed environment? A single server is enough


to manage the traffic, go ahead with vertical scaling when you know that the
traffic load would not increase significantly.

If your app is a public-facing social app like a social network, a fitness app or
something similar, then the traffic is expected to spike exponentially in the
near future. Both high availability & horizontal scalability is important to you.

Build to deploy it on the cloud & always have horizontal scalability in mind
right from the start.
Primary Bottlenecks that Hurt the Scalability Of Our
Application

WE'LL COVER THE FOLLOWING

• Database
• Application Architecture
• Not Using Caching In the Application Wisely
• Inef cient Con guration & Setup of Load Balancers
• Adding Business Logic to the Database
• Not Picking the Right Database
• At the Code Level

There are several points in a web application which can become a bottleneck
& can hurt the scalability of our application. Let’s have a look at them.

Database #
Consider that, we have an application that appears to be well architected.
Everything looks good. The workload runs on multiple nodes & has the ability
to horizontally scale.

But the database is a poor single monolith, just one server been given the onus
of handling the data requests from all the server nodes of the workload.

This scenario is a bottleneck. The server nodes work well, handle millions of
requests at a point in time efficiently, still, the response time of these requests
& the latency of the application is very high due to the presence of a single
database. There is only so much it can handle.

Just like the workload scalability, the database needs to be scaled well.
Make wise use of database partitioning, sharding, use multiple database
servers to make the module efficient.

Application Architecture #
A poorly designed application’s architecture can become a major bottleneck as
a whole.

A common architectural mistake is not using asynchronous processes &


modules where ever required rather all the processes are scheduled sequentially.

For instance, if a user uploads a document on the portal, tasks such as sending
a confirmation email to the user, sending a notification to all of the
subscribers/listeners to the upload event should be done asynchronously.

These tasks should be forwarded to a messaging server as opposed to doing it


all sequentially & making the user wait for everything.

Not Using Caching In the Application Wisely #


Caching can be deployed at several layers of the application & it speeds up the
response time by notches. It intercepts all the requests going to the database,
reducing the overall load on it.

Use caching exhaustively throughout the application to speed up things


significantly.

Inef cient Con guration & Setup of Load


Balancers #
Load balancers are the gateway to our application. Using too many or too few
of them impacts the latency of our application.

Adding Business Logic to the Database #


No matter what justification anyone provides, I’ve never been a fan of adding
business logic to the database.
The database is just not the place to put business logic. Not only it makes the
whole application tightly coupled. It puts unnecessary load on it.

Imagine when migrating to a different database, how much code refactoring it


would require.

Not Picking the Right Database #


Picking the right database technology is vital for businesses. Need
transactions & strong consistency? Pick a Relational Database. If you can do
without strong consistency rather need horizontal scalability on the fly pick a
NoSQL database.

Trying to pull off things with a not so suitable tech always has a profound
impact on the latency of the entire application in negative ways.

At the Code Level #


This shouldn’t come as a surprise but inefficient & badly written code has the
potential to take down the entire service in production, which includes:

Using unnecessary loops, nested loops.


Writing tightly coupled code.
Not paying attention to the Big-O complexity while writing the code. Be
ready to do a lot of firefighting in production.

In this lesson, if a few of the things are not clear to you such as Strong
consistency, how message queue provides an asynchronous behaviour, how to
pick the right database. I’ll discuss all that in the upcoming lessons, stay
tuned.
How To Improve & Test the Scalability Of Our
Application?

In this lesson, we will learn how we can improve & test the scalability of our application.

WE'LL COVER THE FOLLOWING

• Tuning The Performance Of The Application – Enabling It To Scale Better


• Pro ling
• Caching
• CDN (Content Delivery Network)
• Data Compression
• Avoid Unnecessary Client Server Requests
• Testing the Scalability Of Our Application

Here are some of the common & the best strategies to fine-tune the
performance of our web application. If the application is performance-
optimized it can withstand more traffic load with less resource consumption
as opposed to an application that is not optimized for performance.

Now you might be thinking why am I talking about performance when I


should be talking about scalability?

Well, the application’s performance is directly proportional to scalability. If an


application is not performant it will certainly not scale well. These best
practices can be implemented even before the real pre-production testing is
done on the application.

So, here we go.

Tuning The Performance Of The Application –


Enabling It To Scale Better #
Pro ling #

Profile the hell out. Run application profiler, code profiler. See which processes
are taking too long, eating up too much resources. Find out the bottlenecks.
Get rid of them.

Profiling is the dynamic analysis of our code. It helps us measure the space
and the time complexity of our code & enables us to figure out issues like
concurrency errors, memory errors & robustness & safety of the program.
This Wikipedia resource contains a good list of performance analysis tools
used in the industry

Caching #
Cache wisely. Cache everywhere. Cache all the static content. Hit the database
only when it is really required. Try to serve all the read requests from the
cache. Use a write-through cache.

CDN (Content Delivery Network) #


Use a CDN. Using a CDN further reduces the latency of the application due to
the proximity of the data from the requesting user.

Data Compression #
Compress data. Use apt compression algorithms to compress data. Store data
in the compressed form. As compressed data consumes less bandwidth,
consequently, the download speed of the data on the client will be faster.

Avoid Unnecessary Client Server Requests #


Avoid unnecessary round trips between the client & server. Try to club
multiple requests into one.

These are a few of the things we should keep in mind in context to the
performance of the application.

Testing the Scalability Of Our Application #


Once we are done with the basic performance testing of the application, it is
time for capacity planning, provisioning the right amount of hardware &
computing power.
The right approach for testing the application for scalability largely depends
on the design of our system. There is no definite formula for that. Testing can
be performed at both the hardware and the software level. Different services
& components need to be tested both individually and collectively.

During the scalability testing, different system parameters are taken into
account such as the CPU usage, network bandwidth consumption, throughput,
the number of requests processed within a stipulated time, latency, memory
usage of the program, end-user experience when the system is under heavy load
etc.

In this testing phase, simulated traffic is routed to the system, to study how the
system behaves under the heavy load, how the application scales under the
heavy load. Contingencies are planned for unforeseen situations.

As per the anticipated traffic, appropriate hardware & the computational


power is provisioned to handle the traffic smoothly with some buffer.

Several load & stress tests are run on the application. Tools like JMeter are
pretty popular for running concurrent user test on the application if you are
working on a Java ecosystem. There are a lot of cloud-based testing tools
available that help us simulate tests scenarios just with a few mouse clicks.

Businesses test for scalability all the time to get their systems ready to handle
the traffic surge. If it’s a sports website it would prepare itself for the sports
event day, if it’s an e-commerce website it would make itself ready for the
festival season.

Read how production engineers support global events on Facebook.

Also, how Hotstar a video streaming service scaled with over 10 million
concurrent users

In the industry tech like Cadvisor, Prometheus and Grafana are pretty popular
for tracking the system via web-based dashboards.
I’ve written an article on it in case you want to read more about the pre-
production monitoring.
Scalability Quiz

This lesson contains a quiz to test your understanding of scalability.

Let’s Test Your Understanding Of Scalability

1 Which of the following statements is true in context to latency &


scalability?

COMPLETED 0%
1 of 5
What Is High Availability?

In this lesson, we will learn about high availability and its importance in online services.

WE'LL COVER THE FOLLOWING

• What Is High Availability?


• How Important Is High Availability To Online Services?

Highly available computing infrastructure is the norm in the computing


industry today. More so, when it comes to the cloud platforms, it’s the key
feature which enables the workloads running on them to be highly available.

This lesson is an insight into high availability. It covers all the frequently asked
questions about it such as:

What is it?
Why is it so important to businesses?
What is a highly available cluster?
How do cloud platforms ensure high availability of the services running
on them?
What is fault tolerance & redundancy? How are they related to high
availability?

So, without any further ado. Let’s get on with it.

What Is High Availability? #

High availability also known as HA is the ability of the system to stay


online despite having failures at the infrastructural level in real-time.

High availability ensures the uptime of the service much more than the
normal time. It improves the reliability of the system, ensures minimum
downtime.
The sole mission of highly available systems is to stay online & stay connected.
A very basic example of this is having back-up generators to ensure
continuous power supply in case of any power outages.

In the industry, HA is often expressed as a percentage. For instance, when the


system is 99.99999% highly available, it simply means 99.99999% of the total
hosting time the service will be up. You might often see this in the SLA
(Service Level Agreements) of cloud platforms.

How Important Is High Availability To Online


Services? #
It might not impact businesses that much if social applications go down for a
bit & then bounce back. However, there are mission-critical systems like
aircraft systems, spacecrafts, mining machines, hospital servers, finance stock
market systems that just cannot afford to go down at any time. After all, lives
depend on it.

The smooth functioning of the mission-critical systems relies on the continual


connectivity with their network/servers. These are the instances when we just
cannot do without super highly available infrastructures.

Besides no service likes to go down, critical or not.

To meet the high availability requirements systems are designed to be fault-


tolerant, their components are made redundant.

What is fault-tolerant & redundancy in systems designing? I’ll discuss up next;


Reasons For System Failures

In this lesson, we will discuss the common reasons for system failure.

WE'LL COVER THE FOLLOWING

• Software Crashes
• Hardware Failures
• Human Errors
• Planned Downtime

Before delving into the HA system design, fault-tolerance and redundancy. I’ll
first talk about the common reasons why systems fail.

Software Crashes #
I am sure you are pretty familiar with software crashes. Applications crash all
the time, be it on a mobile phone or a desktop.

Corrupt software files. Remember the BSOD blue screen of death in windows?
OS crashing, memory-hogging unresponsive processes. Likewise, software
running on cloud nodes crash unpredictably, along with it they take down the
entire node.

Hardware Failures #
Another reason for system failure is hardware crashes. Overloaded CPU, RAM,
hard disk failures, nodes going down. Network outages.

Human Errors #
This is the biggest reason for system failures. Flawed configurations & stuff.

Google made a tiny network configuration error & it took down almost half of
the internet in Japan. This is an interesting read.

Planned Downtime #
Besides the unplanned crashes, there are planned down times which involve
routine maintenance operations, patching of software, hardware upgrades
etc.

These are the primary reasons for system failures, now let’s talk about how
HA systems are designed to overcome these scenarios of system downtime.
Achieving High Availability - Fault Tolerance

In this lesson, we will learn about fault tolerance & designing a HA fault tolerant service.

WE'LL COVER THE FOLLOWING

• What is Fault Tolerance?


• Designing A Highly Available Fault-Tolerant Service – Architecture

There are several approaches to achieve HA. The most important of them is to
make the system fault-tolerant.

What is Fault Tolerance? #

Fault tolerance is the ability of the system to stay up despite taking hits.

A fault-tolerant system is equipped to handle faults. Being fault-tolerant is an


essential element in designing life-critical systems.

A few of the instances/nodes, out of several, running the service go offline &
bounce back all the time. In case of these internal failures, the system could
work at a reduced level but it will not go down entirely.

A very basic example of a system being fault-tolerant is a social networking


application. In the case of backend node failures, a few services of the app
such as image upload, post likes etc. may stop working. But the application as
a whole will still be up. This approach is also technically known as Fail Soft.

Designing A Highly Available Fault-Tolerant


Service – Architecture #
To achieve high availability at the application level, the entire massive service
is architecturally broken down into smaller loosely coupled services called the
micro-services.

There are many upsides of splitting a big monolith into several microservices,
as it provides:

Easier management
Easier development
Ease of adding new features
Ease of maintenance
High availability

Every microservice takes the onus of running different features of an


application such as image upload, comment, instant messaging etc.

So, even if a few services go down the application as a whole is still up.
Redundancy

In this lesson, we will learn about Redundancy as a High Availability mechanism.

WE'LL COVER THE FOLLOWING

• Redundancy – Active-Passive HA Mode


• Getting Rid Of Single Points Of Failure
• Monitoring & Automation

Redundancy – Active-Passive HA Mode #

Redundancy is duplicating the components or instances & keeping them


on standby to take over in case the active instances go down. It’s the fail-
safe, backup mechanism.
In the above diagram, you can see the instances active & on standby. The
standby instances take over in case any of the active instances goes down.

This approach is also known as Active-Passive HA mode. An initial set of nodes


are active & a set of redundant nodes are passive, on standby. Active nodes get
replaced by passive nodes, in case of failures.

There are systems like GPS, aircrafts, communication satellites which have
zero downtime. The availability of these systems is ensured by making the
components redundant.

Getting Rid Of Single Points Of Failure #


Distributed systems got so popular solely due to the reason that with them, we
could get rid of the single points of failure present in a monolithic
architecture.

A large number of distributed nodes work in conjunction with each other to


achieve a single synchronous application state.

When so many redundant nodes are deployed, there are no single points of
failure in the system. In case a node goes down redundant nodes take its
place. Thus, the system as a whole remains unimpacted.

Single points of failure at the application level mean bottlenecks. We should


detect bottlenecks in performance testing & get rid of them as soon as we can.

Monitoring & Automation #


Systems should be well monitored in real-time to detect any bottlenecks or
single point of failures. Automation enables the instances to self-recover
without any human intervention. It gives the instances the power of self-
healing.

Also, the systems become intelligent enough to add or remove instances on


the fly as per the requirements.

Since the most common cause of failures is human error, automation helps
cut down failures to a big extent.
Replication

In this lesson, we will learn about Replication as a High Availability mechanism.

WE'LL COVER THE FOLLOWING

• Replication – Active-Active HA Mode


• Geographical Distribution of Workload

Replication – Active-Active HA Mode #


Replication means having a number of similar nodes running the workload
together. There are no standby or passive instances. When a single or a few
nodes go down, the remaining nodes bear the load of the service. Think of this
as load balancing.
This approach is also known as the Active Active High Availability mode. In
this approach, all the components of the system are active at any point in

time.

Geographical Distribution of Workload #


As a contingency for natural disasters, data centre regional power outages &
other big-scale failures, workloads are spread across different data centres
across the world in different geographical zones.

This avoids the single point of failure thing in context to a data centre. Also,
the latency is reduced by quite an extent due to the proximity of data to the
user.

All the highly available fault-tolerant design decisions are subjective to how
critical the system is? What are the odds that the components will fail? Etc.

Businesses often use multi-cloud platforms to deploy their workloads which


ensures further availability. If things go south with one cloud provider, they
have another to fail back over.
High Availability Clustering

In this lesson, we will learn about High Availability Clustering.

Now, that we have a clear understanding of high availability, let’s talk a bit
about the high availability cluster.

A High Availability cluster also known as the Fail-Over cluster contains a set of
nodes running in conjunction with each other that ensures high availability of
the service.

The nodes in the cluster are connected by a private network called the
Heartbeat network that continuously monitors the health and the status of
each node in the cluster.

A single state across all the nodes in a cluster is achieved with the help of a
shared distributed memory & a distributed co-ordination service like the
Zookeeper.
To ensure the availability, HA clusters use several techniques such as Disk
mirroring/RAID Redundant Array Of Independent Disks, redundant network
connections, redundant electrical power etc. The network connections are
made redundant so if the primary network goes down the backup network
takes over.

Multiple HA clusters run together in one geographical zone ensuring


minimum downtime & continual service.

Alright, so now we have a pretty good understanding of scalability and high


availability. These two concepts are crucial to software system design.

Moving on to the next chapter where we discuss monolithic & microservices


architecture.
High Availability Quiz

This lesson contains a quiz to test your understanding of high availability.

Let’s Test Your Understanding Of High Availability

1 Which of the following statements is true in context to scalability & high


availability?

COMPLETED 0%
1 of 4
What Is A Monolithic Architecture?

In this lesson, we will discuss the Monolithic Architecture.

WE'LL COVER THE FOLLOWING

• What Is A Monolithic Architecture?

What Is A Monolithic Architecture? #

An application has a monolithic architecture if it contains the entire


application code in a single codebase.

A monolithic application is a self-contained, single-tiered software application


unlike the microservices architecture, where different modules are
responsible for running respective tasks and features of an app.

The diagram below represents a monolithic architecture:


In a monolithic web-app all the different layers of the app, UI, business, data
access etc. are in the same codebase.

We have the Controller, then the Service Layer interface, Class


implementations of the interface, the business logic goes in the Object Domain
model, a bit in the Service, Business and the Repository/DAO [Data Access
Object] classes.

Monolithic apps are simple to build, test & deploy in comparison to a


microservices architecture.

There are times during the initial stages of the business when teams chose to
move forward with the monolithic architecture & then later intend to branch
out into the distributed, microservices architecture.

Well, this decision has several trade-offs. And there is no standard solution to
this.

In the present computing landscape, the applications are being built &
deployed on the cloud. A wise decision would be to pick the loosely coupled
stateless microservices architecture right from the start if you expect things to
grow at quite a pace in the future.
Because re-writing stuff has its costs. Stripping down things in a tightly
coupled architecture & re-writing stuff demands a lot of resources & time.

On the flip side, if your requirements are simple why bother writing a
microservices architecture? Running different modules in conjunction with
each other isn’t a walk in the park.

Let’s go through some of the pros and cons of monolithic architecture.


When Should You Pick a Monolithic Architecture?

In this lesson, we will learn about the pros and cons of a Monolithic Architecture & when to choose it for our
project.

WE'LL COVER THE FOLLOWING

• Pros Of Monolithic Architecture


• Simplicity
• Cons Of Monolithic Architecture
• Continuous Deployment
• Regression Testing
• Single Points Of Failure
• Scalability Issues
• Cannot Leverage Heterogeneous Technologies
• Not Cloud-Ready, Hold State
• When Should You Pick A Monolithic Architecture?

Pros Of Monolithic Architecture #


Simplicity #
Monolithic applications are simple to develop, test, deploy, monitor and
manage since everything resides in one repository.

There is no complexity of handling different components, making them work


in conjunction with each other, monitoring several different components &
stuff. Things are simple.

Cons Of Monolithic Architecture #


Continuous Deployment #
Continuous deployment is a pain in case of monolithic applications as even a
minor code change in a layer needs a re-deployment of the entire application.

Regression Testing #
The downside of this is that we need a thorough regression testing of the
entire application after the deployment is done as the layers are tightly
coupled with each other. A change in one layer impacts other layers
significantly.

Single Points Of Failure #


Monolithic applications have a single point of failure. In case any of the layers
has a bug, it has the potential to take down the entire application.

Scalability Issues #
Flexibility and scalability are a challenge in monolith apps as a change in one
layer often needs a change and testing in all the layers. As the code size
increases, things might get a bit tricky to manage.

Cannot Leverage Heterogeneous Technologies #


Building complex applications with a monolithic architecture is tricky as
using heterogeneous technologies is difficult in a single codebase due to the
compatibility issues.

It’s tricky to use Java & NodeJS together in a single codebase, & when I say
tricky, I am being generous. I am not sure if it’s even possible to do that.

Not Cloud-Ready, Hold State #


Generally, monolithic applications are not cloud-ready as they hold state in
the static variables. An application to be cloud-native, to work smoothly & to
be consistent on the cloud has to be distributed and stateless.

When Should You Pick A Monolithic


Architecture? #
Monolithic applications fit best for use cases where the requirements are
pretty simple, the app is expected to handle a limited amount of traffic. One
example of this is an internal tax calculation app of an organization or a
similar open public tool.
These are the use cases where the business is certain that there won’t be an
exponential growth in the user base and the traffic over time.

There are also instances where the dev teams decide to start with a monolithic
architecture and later scale out to a distributed microservices architecture.

This helps them deal with the complexity of the application step by step as
and when required. This is exactly what LinkedIn did.

In the next lesson, we will learn about the Microservice architecture.


What Is A Microservice Architecture?

In this lesson, we will learn about the Microservice Architecture.

WE'LL COVER THE FOLLOWING

• What Is A Microservices Architecture?

What Is A Microservices Architecture? #

In a microservices architecture, different features/tasks are split into


separate respective modules/codebases which work in conjunction with
each other forming a large service as a whole.

Remember the Single Responsibility & the Separation of Concerns principles?


Both the principles are applied in a microservices architecture.

This particular architecture facilitates easier & cleaner app maintenance,


feature development, testing & deployment in comparison to a monolithic
architecture.

Imagine accommodating every feature in a single repository. How complex


things would be? It would be a maintenance nightmare.

Also, since the project is large, it is expected to be managed by several


different teams. When modules are separate, they can be assigned to
respective teams with minimum fuss, smoothening out the development
process.

And did I bring up scalability? To scale, we need to split things up. We need
to scale out when we can’t scale up further. Microservices architecture is
inherently designed to scale.
The diagram below represents a microservices architecture:

Every service ideally has a separate database, there are no single points of
failure & system bottlenecks.

Let’s go through some of the pros and cons of using a microservices


architecture.
When Should You Pick A Microservices Architecture?

In this lesson, we will learn about the pros and cons of the Microservice Architecture & when should we pick it for
our project.

WE'LL COVER THE FOLLOWING

• Pros of Microservice Architecture


• No Single Points Of Failure
• Leverage the Heterogeneous Technologies
• Independent & Continuous Deployments
• Cons Of Microservices Architecture
• Complexities In Management
• No Strong Consistency
• When Should You Pick A Microservices Architecture?

Pros of Microservice Architecture #


No Single Points Of Failure #
Since microservices is a loosely coupled architecture, there is no single point
of failure. Even if a few of the services go down, the application as a whole is
still up.

Leverage the Heterogeneous Technologies #


Every component interacts with each other via a REST API Gateway interface.
The components can leverage the polyglot persistence architecture & other
heterogeneous technologies together like Java, Python, Ruby, NodeJS etc.

Polyglot persistence is using multiple databases types like SQL, NoSQL together
in an architecture. I’ll discuss it in detail in the database lesson.

Independent & Continuous Deployments #


The deployments can be independent and continuous. We can have dedicated
teams for every microservice, it can be scaled independently without
impacting other services.

Cons Of Microservices Architecture #


Complexities In Management #
Microservices is a distributed environment, where there are so many nodes
running together. Managing & monitoring them gets complex.

We need to setup additional components to manage microservices such as a


node manager like Apache Zookeeper, a distributed tracing service for
monitoring the nodes etc.

We need more skilled resources, maybe a dedicated team to manage these


services.

No Strong Consistency #
Strong consistency is hard to guarantee in a distributed environment. Things
are Eventually consistent across the nodes. And this limitation is due to the
distributed design.

I’ll discuss both Strong and eventual consistency in the database chapter.

When Should You Pick A Microservices


Architecture? #
The microservice architecture fits best for complex use cases and for apps
which expect traffic to increase exponentially in future like a fancy social
network application.

A typical social networking application has various components such as


messaging, real-time chat, LIVE video streaming, image uploads, Like, Share
feature etc.

In this scenario, I would suggest developing each component separately


keeping the Single Responsibility and the Separation of Concerns principle in
mind.
Writing every feature in a single codebase would take no time in becoming a
mess.

So, by now, in the context of monolithic and microservices, we have gone


through three approaches:

1. Picking a monolithic architecture


2. Picking a microservice architecture
3. Starting with a monolithic architecture and then later scale out into a
microservice architecture.

Picking a monolithic or a microservice architecture largely depends on our


use case.

I’ll suggest, keep things simple, have a thorough understanding of the


requirements. Get the lay of the land, build something only when you need it
& keep evolving the code iteratively. This is the right way to go.
Monolith & Microservices Quiz

This lesson contains a quiz to test your understanding of monoliths and microservices.

Let’s Test Your Understanding Of Monolithic & Microservices architecture

1 When should we use a monolithic architecture for our project? Which of


the following option(s) are correct?

COMPLETED 0%
1 of 4
Introduction & Types of Data

In this lesson, we will have an introduction to databases and the different types of data.

WE'LL COVER THE FOLLOWING

• What Is A Database?
• Structured Data
• Unstructured Data
• Semi-structured Data
• User state

What Is A Database? #
A database is a component required to persist data. Data can be of many
forms: structured, unstructured, semi-structured and user state data.

Let’s quickly have an insight into the classification of data before delving into
the databases.
Structured Data #
Structured data is the type of data which conforms to a certain structure,
typically stored in a database in a normalized fashion.

There is no need to run any sort of data preparation logic on structured data
before processing it. Direct interaction can be done with this kind of data.

An example of structured data would be the personal details of a customer


stored in a database row. The customer id would be of integer type, the name
would be of String type with a certain character limit etc.

So, with structured data, we know what we are dealing with. Since the
customer name is of String type, without much worry of errors or exceptions,
we can run String operations on it.

Structured data is generally managed by a query language such as SQL


(Structured query language).

Unstructured Data #
Unstructured data has no definite structure. It is generally the heterogeneous
type of data comprising of text, image files, video, multimedia files, pdfs, Blob
objects, word documents, machine-generated data etc.

This kind of data is often encountered in data analytics. Here the data streams
in from multiple sources such as IoT devices, social networks, web portals,
industry sensors etc. into the analytics systems.

We cannot just directly process unstructured data. The initial data is pretty
raw, we have to make it flow through a data preparation stage which
segregates it based on some business logic & then runs the analytics
algorithms on it.

Semi-structured Data #
Semi-structured data is a mix of structured & unstructured data. Semi-
structured data is often stored in data transport formats such as XML or JSON
and is handled as per the business requirements.
User state #

The data containing the user state is the information of activity which the user
performs on the website.

For instance, when browsing through an e-commerce website, the user would
browse through several product categories, change the preferences, add a few
products to the reminder list for the price drops.

All this activity is the user state. So next time, when the user logs in, he can
continue from where he left off. It would not feel like that one is starting fresh
& all the previous activity is lost.

Storing user state improves the browsing experience & the conversion rate for
the business.
So, now we are clear on the different types of data. Let’s have a look into
different types of databases.

There are multiple different types of databases with specific use cases. We’ll
quickly go through each of them in order to have a bird’s eye view of the
database realm.
Relational Database

In this lesson, we will discuss the relational databases.

WE'LL COVER THE FOLLOWING

• What Is A Relational Database?


• What Are Relationships?
• Data Consistency
• ACID Transactions

What Is A Relational Database? #


This is the most common & widely used type of database in the industry. A
relational database saves data containing relationships. One to One, One to
Many, Many to Many, Many to One etc. It has a relational data model. SQL is
the primary data query language used to interact with relational databases.

MySQL is the most popular example of a relational database. Alright!! I get it


but what are relationships?

What Are Relationships? #


Let’s say you as a customer buy five different books from an online book
store. When you created an account on the book store you would have been
assigned a customer id say C1. Now that C1[You] is linked to five different
books B1, B2, B3, B4, B5.

This is a one to many relationship. In the simplest of forms, one table will
contain the details of all the customers & another table will contain all the
products in the inventory.

One row in the customer table will correspond to multiple rows in the product
inventory table.
On pulling the user object with id C1 from the database we can easily find
what books C1 purchased via the relationship model.

Data Consistency #
Besides, the relationships, relational databases also ensure saving data in a
normalized fashion. In very simple terms, normalized data means a unique
entity occurs in only one place/table, in its simplest and atomic form and is
not spread throughout the database.

This helps in maintaining the consistency of the data. In future, if we want to


update the data, we just update at that one place and every fetch operation
gets the updated data.

Had the data been spread throughout the database in different tables. We
would have to update the new value of an entity everywhere. This is
troublesome and things can get inconsistent.

ACID Transactions #
Besides normalization & consistency, relational databases also ensure ACID
transactions.

ACID – Atomicity, Consistency, Integrity, Durability.

An acid transaction means if a transaction in a system occurs, say a financial


transaction, either it will be executed with perfection without affecting any
other processes or transactions.

The system will have a new state after the transaction which is durable &
consistent. Or if anything, amiss happens during the transaction, say a minor
system failure, the entire operation is rolled back.

When a transaction happens, there is an initial state of the system State A &
then there is a final state of the system State B after the transaction. Both the
states are consistent and durable.

A relational database ensures that either the system is in State A or State B at


all times. There is no middle state. If anything fails, the system goes back to
State A.

If the transaction is executed smoothly the system transitions from State A to


State B.
When Should You Pick A Relational Database?

In this lesson, we will discuss when to choose a relational database for our project.

WE'LL COVER THE FOLLOWING

• Transactions & Data Consistency


• Large Community
• Storing Relationships
• Popular Relational Databases

If you are writing a stock trading, banking or a Finance-based app or you need
to store a lot of relationships, for instance, when writing a social network like
Facebook. Then you should pick a relational database. Here is why –

Transactions & Data Consistency #


If you are writing a software which has anything to do with money or
numbers, that makes transactions, ACID, data consistency super important to
you.

Relational DBs shine when it comes to transactions & data consistency. They
comply with the ACID rule, have been around for ages & are battle-tested.

Large Community #
Also, they have a larger community. Seasoned engineers on the tech are easily
available, you don’t have to go too far looking for them.

Storing Relationships #
If your data has a lot of relationships like which friends of yours live in a
particular city? Which of your friend already ate at the restaurant you plan to
visit today? etc. There is nothing better than a relational database for storing
this kind of data.

Relational databases are built to store relationships. They have been tried &
tested & are used by big guns in the industry like Facebook as the main user-
facing database.

Popular Relational Databases #


Some of the popular relational databases used in the industry are MySQL - it’s
an open-source relationship database written in C, C++ been around since
1995.

Others are Microsoft SQL Server, a proprietary RDBMS written by Microsoft in


C, C++. PostgreSQL an open-source RDBMS written in C. MariaDB, Amazon
Aurora, Google Cloud SQL etc.

Well, that’s all on the relational databases. Moving on to non-relational


databases.
NoSQL Databases - Introduction

In this lesson, we will get an insight into NoSQL databases and how they are different from Relational databases.

WE'LL COVER THE FOLLOWING

• What Is A NoSQL Database?


• How Is A NoSQL Database Different From A Relational Database?
• Scalability
• Clustering

What Is A NoSQL Database? #


In this lesson, we will get an insight into NoSQL databases and how they are
different from Relational databases. As the name implies NoSQL databases
have no SQL, they are more like JSON-based databases built for Web 2.0

They are built for high frequency read & writes, typically required in social
applications like Twitter, LIVE real-time sports apps, online massive multi-
player games etc.

How Is A NoSQL Database Different From A


Relational Database? #
Now, one obvious question that would pop-up in our minds is:

Why the need for NoSQL databases when relational databases were doing
fine, were battle-tested, well adopted by the industry & had no major
persistence issues?

Scalability #
Well, one big limitation with SQL based relational databases is Scalability.
Scaling relational databases is something which is not trivial. They have to be
Sharded or Replicated to make them run smoothly on a cluster. In short, this
requires careful thought and human intervention.

On the contrary, NoSQL databases have the ability to add new server nodes on
the fly & continue the work, without any human intervention, just with a snap
of the fingers.

Today’s websites need fast read writes. There are millions, if not billions of
users connected with each other on social networks.

Massive amount of data is generated every micro-second, we needed an


infrastructure designed to manage this exponential growth.

Clustering #
NoSQL databases are designed to run intelligently on clusters. And when I say
intelligently, I mean with minimal human intervention.

Today, the server nodes even have self-healing capabilities. That’s pretty
smooth. The infrastructure is intelligent enough to self-recover from faults.

Though all this innovation does not mean old school relational databases
aren’t good enough & we don’t need them anymore.

Relational databases still work like a charm & are still in demand. They have a
specific use-case. We have already gone through this in When to pick a
relational database lesson. Remember? 😊

Also, NoSQL databases had to sacrifice Strong consistency, ACID Transactions


& much more to scale horizontally over a cluster & across the data centres.

The data with NoSQL databases is more Eventually Consistent as opposed to


being Strongly Consistent.

So, this obviously means NoSQL databases aren’t the silver bullet. And it’s
completely alright, we don’t need silver bullets. We aren’t hunting
werewolves, we are upto a much harder task connecting the world online.

I’ll talk about the underlying design of NoSQL databases in much detail and
why they have to sacrifice Strong consistency and Transactions in the
upcoming lessons.

For now, let’s focus on some of the features of NoSQL databases.


Features Of NoSQL Databases

In this lesson, we will discuss the features of NoSQL databases.

WE'LL COVER THE FOLLOWING

• Pros Of NoSQL Databases


• Gentle Learning Curve
• Schemaless
• Cons Of NoSQL Databases
• Inconsistency
• No Support For ACID Transactions
• Conclusion
• Popular NoSQL Databases

In the introduction we learned that the NoSQL databases are built to run on
clusters in a distributed environment, powering Web 2.0 websites.

Now, let’s go over some features of NoSQL databases.

Pros Of NoSQL Databases #


Besides the design part, NoSQL databases are also developer-friendly. What do
I mean by that?

Gentle Learning Curve #


First, the learning curve is less than that of relational databases. When
working with relational databases a big chunk of our time goes into learning
how to design well-normalized tables, setting up relationships, trying to
minimize joins and stuff.

Schemaless #
One needs to be pretty focused when designing the schema of a relational
database to avoid running into any issues in the future.

Think of relational databases as a strict headmaster. Everything has to be in


place, neat and tidy, things need to be consistent. But NoSQL databases are a
bit of chilled out & relaxed.

There are no strict enforced schemas, work with the data as you want. You
can always change stuff, spread things around. Entities have no relationships.
Thus, things are flexible & you can do stuff your way.

Wonderful Right?

Not always!! This flexibility is good and bad at the same time. Being so
flexible, developer-friendly, having no joins and relationships etc. makes it
good.

Cons Of NoSQL Databases #


Inconsistency #
But it introduces a risk of entities being inconsistent at the same time. Since
an entity is spread throughout the database one has to update the new values
of the entity at all places.

Failing to do so, makes the entity inconsistent. This is not a problem with
relational databases since they keep the data normalized. An entity resides at
one place only.

No Support For ACID Transactions #


Also, NoSQL distributed databases don’t provide ACID transactions. A few that
claim to do that, don’t support them globally. They are just limited to a certain
entity hierarchy or a small region where they can lock down nodes to update
them.

Note: Transactions in distributed systems come with terms and conditions


applied.
Conclusion #
My first experience with a NoSQL datastore was with the Google Cloud
Datastore.

An upside I felt was that we don’t have to be a pro in database design to write
an application. Things were comparatively simpler, as there was no stress of
managing joins, relationships, n+1 query issues etc.

Just fetch the data with its Key. You can also call it the id of the entity. This is a
constant O(1) operation, which makes NoSQL Dbs really fast.

I have designed a lot of MySQL DB schemas in the past with complex


relationships. And I would say working with a NoSQL database is a lot easier
than working with relationships.

It’s alright if we need to make a few extra calls to the backend to fetch data in
separate calls that doesn’t make much of a difference. We can always cache
the frequently accessed data to overcome that.

Popular NoSQL Databases #


Some of the popular NoSQL databases used in the industry are MongoDB,
Redis, Neo4J, Cassandra.

So, I guess, by now we have a pretty good idea of what NoSQL databases are.
Let’s have a look at some of the use cases which fit best with them.
When To Pick A NoSQL Database?

In this lesson, we will discover when to choose a NoSQL Database over any other kind of database.

WE'LL COVER THE FOLLOWING

• Handling A Large Number Of Read Write Operations


• Flexibility With Data Modeling
• Eventual Consistency Over Strong Consistency
• Running Data Analytics

Handling A Large Number Of Read Write


Operations #
Look towards NoSQL databases when you need to scale fast. And when do you
generally need to scale fast?

When there are a large number of read-write operations on your website &
when dealing with a large amount of data, NoSQL databases fit best in these
scenarios. Since they have the ability to add nodes on the fly, they can handle
more concurrent traffic & big amount of data with minimal latency.

Flexibility With Data Modeling #


The second cue is during the initial phases of development when you are not
sure about the data model, the database design, things are expected to change
at a rapid pace. NoSQL databases offer us more flexibility.

Eventual Consistency Over Strong Consistency #


It’s preferable to pick NoSQL databases when it’s OK for us to give up on
Strong consistency and when we do not require transactions.

A good example of this is a social networking website like Twitter. When a


tweet of a celebrity blows up and everyone is liking and re-tweeting it from

around the world. Does it matter if the count of likes goes up or down a bit for
a short while?

The celebrity would definitely not care if instead of the actual 5 million 500
likes, the system shows the like count as 5 million 250 for a short while.

When a large application is deployed on hundreds of servers spread across


the globe, the geographically distributed nodes take some time to reach a
global consensus.

Until they reach a consensus, the value of the entity is inconsistent. The value
of the entity eventually gets consistent after a short while. This is what
Eventual Consistency is.

Though the inconsistency does not mean that there is any sort of data loss. It
just means that the data takes a short while to travel across the globe via the
internet cables under the ocean to reach a global consensus and become
consistent.

We experience this behaviour all the time. Especially on YouTube. Often you
would see a video with 10 views and 15 likes. How is this even possible?

It’s not. The actual views are already more than the likes. It’s just the count of
views is inconsistent and takes a short while to get updated. I will discuss
Eventual consistency in more detail further down the course.

Running Data Analytics #


NoSQL databases also fit best for data analytics use cases, where we have to
deal with an influx of massive amounts of data.

There are dedicated databases for use cases like this such as Time-Series
databases, Wide-Column, Document Oriented etc. I’ll talk about each of them
further down the course.

Right now, let’s have an insight into the performance comparison of SQL and
NoSQL tech.
Is NoSQL More Performant than SQL?

In this lesson, we will learn if the NoSQL database is more performant than the SQL databases.

WE'LL COVER THE FOLLOWING

• Why Do Popular Tech Stacks Always Pick NoSQL Databases?


• Real World Case Studies
• Using Both SQL & NoSQL Database In An Application

Is NoSQL more performant than SQL? This question is asked all the time. And I
have a one-word answer for this.

No!!

From a technology benchmarking standpoint, both relational and non-


relational databases are equally performant.

More than the technology, it’s how we design our systems using the
technology that affects the performance.

Both SQL & NoSQL tech have their use cases. We have already gone through
them in the lessons When to pick a relational database? & When to pick a
NoSQL database?

So, don’t get confused with all the hype. Understand your use case and then
pick the technology accordingly.

Why Do Popular Tech Stacks Always Pick


NoSQL Databases? #
But why do the popular tech stacks always pick NoSQL databases? For instance,
the MEAN (MongoDB, ExpressJS, AngularJS/ReactJS, NodeJs) stack.

Well, most of the applications online have common use cases. And these tech
stacks have them covered. There are also commercial reasons behind this.

Now, there are a plethora of tutorials available online & a mass promotion of
popular tech stacks. With these resources, it’s easy for beginners to pick them
up and write their applications as opposed to running solo research on other
technologies.

Though, we don’t always need to stick with the popular stacks. We should pick
what fits best with our use case. There are no ground rules, pick what works
for you.

We have a separate lesson on how to pick the right tech stack for our app
further down the course. We will continue this discussion there.

Coming back to the performance, it entirely depends on the application & the
database design. If we are using more Joins with SQL. The response will
inevitably take more time.

If we remove all the relationships and joins, SQL becomes just like NoSQL.

Real World Case Studies #


Facebook uses MySQL for storing its social graph of millions of users. Though
it did have to change the DB engine and make some tweaks but MySQL fits
best for its use case.

Quora uses MySQL pretty efficiently by partitioning the data at the application
level. This is an interesting read on it.

Note: A well-designed SQL data store will always be more performant


than a not so well-designed NoSQL store.

Hmmm…. Ok!! Alright

Using Both SQL & NoSQL Database In An


Application #
Can’t I use both in my application? Both SQL & a NoSQL datastore. What if I
have a requirement fitting both?
You can!! As a matter of fact, all the large-scale online services use a mix of
both to implement their systems and achieve the desired behaviour.

The term for leveraging the power of multiple databases is called Polyglot
Persistence. Let’s know more about it in the next lesson.
Database Quiz - Part 1

This lesson contains a quiz to test your understanding of databases.

Let’s Test Your Understanding Of Databases

1 What is the use of a database in web applications?

COMPLETED 0%
1 of 8
Polyglot Persistence

In this lesson, we will understand what is meant by Polyglot Persistence.

WE'LL COVER THE FOLLOWING

• What Is Polyglot Persistence?


• Real World Use Case
• Relational Database
• Key Value Store
• Wide Column Database
• ACID Transactions & Strong Consistency
• Graph Database
• Document Oriented Store
• Downside Of This Approach

What Is Polyglot Persistence? #

Polyglot persistence means using several different persistence


technologies to fulfil different persistence requirements in an
application.

We will understand this concept with the help of an example.

Real World Use Case #


Let’s say we are writing a social network like Facebook.

Relational Database #
To store relationships like persisting friends of a user, friends of friends, what
rock band they like, what food preferences they have in common etc. we
would pick a relational database like MySQL.

Key Value Store #


For low latency access of all the frequently accessed data, we will implement a
cache using a Key-value store like Redis or Memcache.

We can use the same Key-value data store to store user sessions.

Now our app is already a big hit, it has got pretty popular and we have
millions of active users.

Wide Column Database #


To understand user behaviour, we need to set up an analytics system to run
analytics on the data generated by the users. We can do this using a wide-
column database like Cassandra or HBase.

ACID Transactions & Strong Consistency #


The popularity of our application just doesn’t seem to stop, it’s soaring. Now
businesses want to run ads on our portal. For this, we need to set up a
payments system.

Again, we would pick a relational database to implement ACID transactions &


ensure Strong consistency.

Graph Database #
Now to enhance the user experience of our application we have to start
recommending content to the users to keep them engaged. A Graph database
would fit best to implement a recommendation system.

Alright, by now, our application has multiple features, & everyone loves it.
How cool it would be if a user can run a search for other users, business pages
and stuff on our portal & could connect with them?

Document Oriented Store #


To implement this, we can use an open-source document-oriented datastore
like ElasticSearch. The product is pretty popular in the industry for
implementing a scalable search feature on websites. We can persist all the
search-related data in the elastic store.

Downside Of This Approach #


So, this is how we use multiple databases to fulfil different persistence
requirements. Though, one downside of this approach is increased complexity
in making all these different technologies work together.

A lot of effort goes into building, managing and monitoring polyglot


persistence systems. What if there was something simpler? That would save
us the pain of putting together everything ourselves. Well, there is.

What?

Let’s find out in the next lesson.


Multi-Model Databases

In this lesson, we will talk about the multi-model databases.

WE'LL COVER THE FOLLOWING

• What Are Multi-Model Databases?


• Popular Multi-Model Databases

What Are Multi-Model Databases? #


Until now the databases supported only one data model, it could either be a
relational database, a graph database or any other database with a certain
specific data model.

But with the advent of multi-model databases, we have the ability to use
different data models in a single database system.

Multi-model databases support multiple data models like Graph, Document-


Oriented, Relational etc. as opposed to supporting only one data model.

They also avert the need for managing multiple persistence technologies in a
single service. They reduce the operational complexity by notches. With multi-
model databases, we can leverage different data models via a single API.
Popular Multi-Model Databases #
Some of the popular multi-model databases are Arango DB, Cosmos DB, Orient
DB, Couchbase etc.

So, by now we are clear on what NoSQL databases are & when to pick them
and stuff. Now let’s understand concepts like Eventual Consistency, Strong
Consistency which are key to understanding distributed systems.
Eventual Consistency

In this lesson, we will discuss Eventual Consistency.

WE'LL COVER THE FOLLOWING

• What Is Eventual Consistency?


• Real World Use Case

What Is Eventual Consistency? #


Eventual consistency is a consistency model which enables the data store to be
highly available. It is also known as optimistic replication & is key to
distributed systems.

So, how exactly does it work?

We’ll understand this with the help of a use case.

Real World Use Case #


Think of a popular microblogging site deployed across the world in different
geographical regions like Asia, America, Europe. Moreover, each geographical
region has multiple data centre zones: North, East, West, South. Furthermore,
each of the zones has multiple clusters which have multiple server nodes
running.

So, we have many datastore nodes spread across the world which the micro-
blogging site uses for persisting data.

Since there are so many nodes running, there is no single point of failure. The
data store service is highly available. Even if a few nodes go down the
persistence service as a whole is still up.

Alright, now let’s say a celebrity makes a post on the website which everybody
starts liking around the world.

At a point in time, a user in Japan likes the post which increases the “Like”
count of the post from say 100 to 101. At the same point in time, a user in
America, in a different geographical zone clicks on the post and he sees the
“Like” count as 100, not 101.

Why did this happen?

Simply, because the new updated value of the Post “Like” counter needs some
time to move from Japan to America and update the server nodes running
there.

Though the value of the counter at that point in time was 101, the user in
America sees the old inconsistent value.

But when he refreshes his web page after a few seconds the “Like” counter
value shows as 101. So, the data was initially inconsistent but eventually got
consistent across the server nodes deployed around the world. This is what
eventual consistency is.

Let’s take it one step further, what if at the same point in time both the users
in Japan and America Like the post, and a user in another geographic zone say
Europe accesses the post.

All the nodes in different geographic zones have different post values. And
they will take some time to reach a consensus.

The upside of eventual consistency is that the system can add new nodes on
the fly without the need to block any of them, the nodes are available to the
end-users to make an update at all times.

Millions of users across the world can update the values at the same time
without having to wait for the system to reach a common final value across all
nodes before they make an update. This feature enables the system to be
highly available.

Eventual consistency is suitable for use cases where the accuracy of values
doesn’t matter much like in the above-discussed use case.

Other use cases of eventual consistency can be when keeping the count of
users watching a Live video stream online. When dealing with massive
amounts of analytics data. A couple of counts up and down won’t matter
much.

But there are use cases where the data has to be laser accurate like in
banking, stock markets. We just cannot have our systems to be Eventually
Consistent, we need Strong Consistency.

Let’s discuss it in the next lesson.


Strong Consistency

In this lesson, we will discuss Strong Consistency.

WE'LL COVER THE FOLLOWING

• What Is Strong Consistency?


• Real World Use Case
• ACID Transaction Support

What Is Strong Consistency? #


Strong Consistency simply means the data has to be strongly consistent at all
times. All the server nodes across the world should contain the same value of
an entity at any point in time. And the only way to implement this behaviour
is by locking down the nodes when being updated.

Real World Use Case #


Let’s continue the same Eventual Consistency example from the previous
lesson. To ensure Strong Consistency in the system, when the user in Japan
likes the post, all the nodes across different geographical zones have to be
locked down to prevent any concurrent updates.

This means at one point in time, only one user can update the post “Like”
counter value.
So, once the user in Japan updates the “Like” counter from 100 to 101. The
value gets replicated globally across all nodes. Once all the nodes reach a
consensus, the locks get lifted.

Now, other users can Like the post. If the nodes take a while to reach a
consensus, they have to wait until then.

Well, this is surely not desired in case of social applications. But think of a
stock market application where the users are seeing different prices of the
same stock at one point in time and updating it concurrently. This would
create chaos.

Therefore, to avoid this confusion we need our systems to be Strongly


Consistent. The nodes have to be locked down for updates.

Queuing all the requests is one good way of making a system Strongly
Consistent. Well, the implementation is beyond the scope of this course.
Though we will discuss a theorem called the CAP theorem which is key to
implementing these consistency models.

So, by now I am sure you would have realized that picking the Strong
Consistency model hits the capability of the system to be Highly Available.
The system while being updated by one user does not allow other users to
perform concurrent updates. This is how strongly consistent ACID

transactions are implemented.

ACID Transaction Support #


Distributed systems like NoSQL databases which scale horizontally on the fly
don’t support ACID transactions globally & this is due to their design. The
whole reason for the development of NoSQL tech is the ability to be Highly
Available and Scalable. If we have to lock down the nodes every time, it
becomes just like SQL.

So, NoSQL databases don’t support ACID transactions and those that claim to,
have terms and conditions applied to them.

Generally, the transaction support is limited to a geographic zone or an entity


hierarchy. Developers of the tech make sure that all the Strongly consistency
entity nodes reside in the same geographic zone to make the ACID
transactions possible.

Well, this is pretty much about Strong Consistency. Now let’s take a look into
the CAP Theorem
CAP Theorem

In this lesson, we will learn about the CAP theorem.

WE'LL COVER THE FOLLOWING

• What Is CAP Theorem?

What Is CAP Theorem? #


CAP stands for Consistency, Availability, Partition Tolerance. We’ve gone
through consistency and availability in great detail. Partition Tolerance means
Fault Tolerance. The system is tolerant of failures or partitions. It keeps
working even if a few nodes go down.

There are many definitions of the theorem, you’ll find online, which state that
amongst the three, Consistency, Availability & the Partition Tolerance, we have
to pick two. I find that a teeny tiny bit of confusing. I will try to give a simpler
explanation of the theorem.

CAP theorem simply states that in case of a network failure, when a few
of the nodes of the system are down, we have to make a choice between
Availability & Consistency

If we pick Availability that means when a few nodes go down, the other nodes
are available to the users for making updates. In this situation, the system is
inconsistent as the nodes which are down don’t get updated with the new
data. At the point in time when they come back online, if a user fetches the
data from them, they’ll return the old values they had when they went down.

If we pick Consistency, in that scenario, we have to lock down all the nodes for
further writes until the nodes which have gone down come back online. This
would ensure the Strong consistency of the system as all the nodes will have
the same entity values.

Picking between Availability and Consistency largely depends on our use case
and the business requirements. We have been through this in great detail.
Also, the limitation of picking one out of the two is due to the design of the
distributed systems. We can’t have both Availability and Consistency at the
same time.

Nodes, spread around the globe, will take some time to reach a consensus. It’s
impossible to have zero-latency unless we transit data faster than or at the
speed of time.
Database Quiz - Part 2

This lesson contains a quiz to test your understanding of different models of databases, eventual, strong
consistency & CAP theorem.

Let’s Test Your Understanding Of A Few Database Concepts

1 What does polyglot persistence mean?

COMPLETED 0%
1 of 5
Types of Databases

In this lesson, we will brie y recall the different types of databases.

WE'LL COVER THE FOLLOWING

• Different Types Of Databases

Different Types Of Databases #


There are quite a number of different types of databases available to the
application developers, catering to specific use cases.

Such as the:

Document-Oriented database
Key-value datastore
Wide-column database
Relational database
Graph database
Time-Series database
Databases dedicated to mobile apps

In the polyglot persistence lesson, we went through the need for different
types of databases. We have also covered relational databases in-depth when
to pick a relational one & stuff in the previous lessons.

Now we will have insights into the other remaining types of databases and the
use cases which fit them.

So, without any further ado. Let’s get on with it.


Document Oriented Database

In this lesson, we will get to know about the Document Oriented database and when to choose it for our projects.

WE'LL COVER THE FOLLOWING

• What Is A Document Oriented Database?


• Popular Document Oriented Databases
• When Do I Pick A Document Oriented Data Store for My Project?
• Real Life Implementations

What Is A Document Oriented Database? #

Document Oriented databases are the main types of NoSQL databases.


They store data in a document-oriented model in independent
documents. The data is generally semi-structured & stored in a JSON-like
format.

The data model is similar to the data model of our application code, so it’s
easier to store and query data for developers.

Document oriented stores are suitable for Agile software development


methodology as it’s easier to change things with evolving demands when
working with them.

Popular Document Oriented Databases #


Some of the popular document-oriented stores used in the industry are
MongoDB, CouchDB, OrientDB, Google Cloud Datastore, Amazon Document DB

When Do I Pick A Document Oriented Data Store


for My Project? #
If you are working with semi-structured data, need a flexible schema which
would change often. You ain’t sure about the database schema when you start
writing the app. There is a possibility that things might change over time. You
are in need of something flexible which you could change over time with
minimum fuss. Pick a Document-Oriented data store.

Typical use cases of Document oriented databases are the following:

Real-time feeds
Live sports apps
Writing product catalogues
Inventory management
Storing user comments
Web-based multiplayer games

Being in the family of NoSQL databases these provide horizontal scalability,


performant read-writes as they cater to CRUD - Create Read Update Delete use
cases. Where there isn’t much relational logic involved & all we need is just
quick persistence & retrieval of data.

Real Life Implementations #


Here are some of the good real-life implementations of the tech below -

SEGA uses Mongo-DB to improve the experience for millions of mobile


gamers

Coinbase scaled from 15k requests per min to 1.2 million requests per
minute with MongoDB
Graph Database

In this lesson, we will get to know about the Graph database and when to choose it for our projects

WE'LL COVER THE FOLLOWING

• What Is A Graph Database?


• Features Of A Graph Database
• When Do I Pick A Graph Database?
• Real Life Implementations

What Is A Graph Database? #


Graph databases are also a part of the NoSQL database family. They store data
in nodes/vertices and edges in the form of relationships.

Each Node in a graph database represents an entity. It can be a person, a


place, a business etc. And the Edge represents the relationship between the
entities.

But, why use a graph database to store relationships when we already


have SQL based relational databases available?

Features Of A Graph Database #


Hmmm… primarily, two reasons. The first is visualization. Think of that
pinned board in the thriller detective movies where the pins are pinned on a
board over several images connected via threads. It does help in visualizing
how the entities are related & how things fit together. Right?

The second reason is the low latency. In graph databases, the relationships are
stored a bit differently from how the relational databases store relationships.

Graph databases are faster as the relationships in them are not calculated at
the query time, as it happens with the help of joins in the relational databases.
Rather the relationships here are persisted in the data store in the form of
edges and we just have to fetch them. No need to run any sort of computation
at the query time.

A good real-life example of an application which would fit a graph database is


Google Maps. Nodes represent the cities and the Edges represent the
connection between them.

Now, if I have to look for roads between different cities, I don’t need joins to
figure out the relationship between the cities when I run the query. I just need
to fetch the edges which are already stored in the database.

When Do I Pick A Graph Database? #


Ideal use cases of graph databases are building social, knowledge, network
graphs. Writing AI-based apps, recommendation engines, fraud analysis app,
storing genetic data etc.

Graph databases help us visualize our data with minimum latency. A popular
graph database used in the industry is Neo4J.

Real Life Implementations #


Here are some of the real-life implementations of the tech listed below -

Walmart shows product recommendations to its customers in real-time


using Neo4J graph database

NASA uses Neo4J to store “lessons learned” data from their previous
missions to educate the scientists & engineers.
Key Value Database

In this lesson, we will get to know about the Key-Value database and when to choose it for our projects.

WE'LL COVER THE FOLLOWING

• What Is A Key Value Database?


• Features Of A Key Value Database
• Popular Key Value Databases
• When Do I Pick A Key Value Database?
• Real Life Implementations

What Is A Key Value Database? #


Key-value databases also are a part of the NoSQL family. These databases use a
simple key-value method to store and quickly fetch the data with minimum
latency.

Features Of A Key Value Database #


A primary use case of a Key-value database is to implement caching in
applications due to the minimum latency they ensure.

The Key serves as a unique identifier and has a value associated with it. The
value can be as simple as a block of text & can be as complex as an object
graph.

The data in Key-value databases can be fetched in constant time O(1), there is
no query language required to fetch the data. It’s just a simple no-brainer
fetch operation. This ensures minimum latency.

Popular Key Value Databases #


Some of the popular key-value data stores used in the industry are Redis,
Hazelcast, Riak, Voldemort & Memcache.

When Do I Pick A Key Value Database? #


If you have a use case where you need to fetch data real fast with minimum
fuss & backend processing then you should pick a key-value data store.

Key-value stores are pretty efficient in pulling off scenarios where super-fast
data fetch is the order of the day.

Typical use cases of a key value database are the following:

Caching
Persisting user state
Persisting user sessions
Managing real-time data
Implementing queues
Creating leaderboards in online games & web apps
Implementing a pub-sub system

Real Life Implementations #


Some of the real-life implementations of the tech are -

Inovonics uses Redis to drive real-time analytics on millions of sensor


data

Microsoft uses Redis to handle the traffic spike on its platforms

Google Cloud uses Memcache to implement caching on their cloud


platform
Time Series Database

In this lesson, we will get to know the Time Series database and when to choose it for our projects.

WE'LL COVER THE FOLLOWING

• What Is A Time Series Database?


• What Is Time Series Data?
• Why Store Time Series Data?
• Popular Time Series Databases
• When To Pick A Time Series Database?
• Real Life Implementations

What Is A Time Series Database? #


Time-Series databases are optimized for tracking & persisting time series data.

What Is Time Series Data? #


It is the data containing data points associated with the occurrence of an event
with respect to time. These data points are tracked, monitored and then
finally aggregated based on certain business logic.

Time-Series data is generally ingested from IoT devices, self-driving vehicles,


industry sensors, social networks, stock market financial data etc.

Okay!! But, what is the need for storing such a massive amount of time-series
data?

Why Store Time Series Data? #


Studying data, streaming-in from applications helps us track the behaviour of
the system. It helps us study user patterns, anomalies & how things change
over time.

Time-series data is primarily used for running analytics, deducing conclusions


and making future business decisions looking at the results of the analytics.
Running analytics also helps the product evolve continually.

General databases are not built to handle time-series data. With the advent of
IoT, these databases are getting pretty popular and are being adopted by the
big guns in the industry.

Popular Time Series Databases #


Some of the popular time-series databases used in the industry are Influx DB,
Timescale DB, Prometheus etc.

When To Pick A Time Series Database? #


If you have a use case where you need to manage data in real-time &
continually over a long period of time, then a time-series database is what you
need.

As we know that the time-series databases are built to deal with data,
streaming in real-time. The typical use cases of it are fetching data from IoT
devices. Managing data for running analytics & monitoring. Writing an
autonomous trading platform which deals with changing stock prices in real-
time etc.

Real Life Implementations #


Here are some of the real-life implementations of the tech -

IBM uses Influx DB to run analytics for real-time cognitive fraud


detection

Spiio uses Influx DB to remotely monitor vertical lining green walls &
plant installations.
Wide-Column Database

In this lesson, we will get to know the Wide-Column database & when to choose it for our projects.

WE'LL COVER THE FOLLOWING

• What Is A Wide Column Database?


• Popular Wide Column Databases
• When To Pick a Wide Column Database?
• Real-Life Implementations

What Is A Wide Column Database? #


Wide-column databases belong to the NoSQL family of databases, primarily
used to handle massive amounts of data, technically called the Big Data.

Wide-column databases are perfect for analytical use cases. They have a high
performance and a scalable architecture.

Also, known as column-oriented databases wide-column databases store data


in a record with a dynamic number of columns. A record can hold billions of
columns.

Popular Wide Column Databases #


Some of the popular wide column databases are Cassandra, HBase, Google
BigTable, Scylla DB etc.

When To Pick a Wide Column Database? #


If you have a use case where you need to grapple with Big data, to ingest it or
to run analytics on it, then a wide-column database is a good fit for this
scenario.
Wide-column databases are built to manage big data ensuring scalability,
performance & high availability at the same time.

Real-Life Implementations #
Some of the real-life implementations of the tech are -

Netflix uses Cassandra as the backend database for the streaming service

Adobe uses HBase for processing large amounts of data


Database Quiz - Part 3

This lesson contains a quiz to test your understanding of different types of databases.

Let’s Test Your Understanding Of Different Types Of Databases

1 What are the use cases for a document-oriented database? Which of the
following option(s) are correct?

COMPLETED 0%
1 of 5
Introduction

In this lesson, we will get introduced to the concept of caching and why it is important for performance.

WE'LL COVER THE FOLLOWING

• What Is Caching?
• Caching Dynamic Data
• Caching Static Data

Hmmm… before beginning with this lesson, I want to ask you a question.
When you visit a website and request certain data from the server. How long
do you wait for the response?

5 seconds, 10 seconds, 15 seconds, 30 seconds? I know, I know, I am pushing


it… 45? What? Still no response…

And then you finally bounce off & visit another website for your answer. We
are impatient creatures; we need our answers quick. This makes caching vital
to applications to prevent users from bouncing off to other websites, all the
time.

What Is Caching? #

Caching is key to the performance of any kind of application. It ensures


low latency and high throughput. An application with caching will
certainly do better than an application without caching, simply because
it returns the response in less time as opposed to the application without
a cache implemented.

Implementing caching in a web application simply means copying frequently


accessed data from the database which is disk-based hardware and storing it
in RAM Random Access Memory hardware.

RAM-based hardware provides faster access than the disk-based hardware. As


I said earlier it ensures low latency and high throughput. Throughput means
the number of network calls i.e. request-response between the client and the
server within a stipulated time.

RAM-based hardware is capable of handling more requests than the disk-


based hardware, on which the databases run.

Caching Dynamic Data #


With caching we can cache both the static data and the dynamic data.
Dynamic data is the data which changes more often, it has an expiry time or a
TTL “Time To Live”. After the TTL ends, the data is purged from the cache and
the newly updated data is stored in it. This process is known as Cache
Invalidation.

Caching Static Data #


Static data consists of images, font files, CSS & other similar files. This is the
kind of data which doesn’t change often & can easily be cached on the client-
side in the browser or their local memory. Also, on the CDNs the Content
Delivery Networks.
Caching also helps applications maintain their expected behaviour during
network interruptions.

In the next lesson, let’s understand how do we figure if we really need a cache
in our applications?
How Do I figure If I Need A Cache In My Application?

In this lesson, we will discuss how to tell if we need caching in our application.

WE'LL COVER THE FOLLOWING

• Different Components In the Application Architecture Where the Cache Can


Be Used

First up, it’s always a good idea to use a cache as opposed to not using it. It
doesn’t do any harm. It can be used at any layer of the application & there are
no ground rules as to where it can and cannot be applied.

The most common usage of caching is database caching. Caching helps


alleviate the stress on the database by intercepting the requests being routed
to the database for data.

The cache then returns all the frequently accessed data. Thus, cutting down
the load on the database by notches.

Different Components In the Application


Architecture Where the Cache Can Be Used #
Across the architecture of our application, we can use caching at multiple
places. Caching is used in the client browser to cache static data. It is used
with the database to intercept all the data requests, in the REST API
implementation etc.

Besides these places, I would suggest you to look for patterns. We can always
cache the frequently accessed content on our website, be it from any
component. There is no need to compute stuff over and over when it can be
cached.

Think of Joins in relational databases. They are notorious for making the
response slow. More Joins means more latency. A cache can avert the need for
running joins every time just by storing the data in demand. Now imagine
how much would this mechanism speed up our application.

Also, even if the database goes down for a while. The users won’t notice it as
the cache would continue to serve the data requests.

Caching is also the core of the HTTP protocol. This is a good resource to read
more about it.

We can store user sessions in a cache. It can be implemented at any layer of


an application be it at the OS level, at the network level, CDN or the database.
You might remember, we talked about the Key-value data stores in the
database lesson. They are primarily used to implement caching in web
applications.

They can be used for cross-module communication in a microservices


architecture by saving the shared data which is commonly accessed by all the
services. It acts as a backbone for the microservice communication.

Key-value data stores via caching are also widely used in in-memory data
stream processing and running analytics.
Reducing the Application Deployment Costs Via
Caching

In this lesson, we will discuss a real-world example of how the deployment cost of an application can be reduced
by using a cache.

WE'LL COVER THE FOLLOWING

• Real Life Use Case


• Conclusion

Real Life Use Case #


In this lesson, I am going to share an insight from a stock market-based
gaming app that I developed and deployed on the cloud.

The game had several stocks of companies listed on the stock market and the
algorithm would trigger the price movement of the stocks every second, if not
before that.

Initially, I persisted the updated prices of the stocks in the database as soon as
the prices changed, to create a stock price movement timeline at the end of
the day. But so many database writes cost me a fortune. The number of writes
every hour was just crazy.

Eventually, I decided to not persist the updated price every second in the
database rather use Memcache to persist the stock prices. And then run a
batch operation at regular intervals to update the database.

Memcache was comparatively a lot cheaper than the disk-based database


access. The cache served all the stock price requests, & the database did not
have the updated values until the batch operation ran.

Conclusion #
This tweak may not be ideal for a real-life Fintech app but it helped me save a
truck-load of money & I was able to run the game for a longer period of time.

So, Guys!! This is one instance where you can leverage the caching mechanism
to cut down costs. You might not want to persist each and every information
in the database rather use cache to store not so mission-critical information.

Now let’s look into some of the caching strategies we can leverage to further
enhance the performance of our apps.
Caching Strategies

In this lesson, we will discuss some of the commonly used caching strategies.

WE'LL COVER THE FOLLOWING

• Cache Aside
• Read-Through
• Write-Through
• Write-Back

There are different kinds of caching strategies which serve specific use cases.
Those are Cache Aside, Read-through cache, Write-through cache & Write-back
cache

Let’s find out what they are & why do we need different strategies when
implementing caching.

Cache Aside #
This is the most common caching strategy. In this approach, the cache works
along with the database trying to reduce the hits on it as much as possible.

The data is lazy-loaded in the cache. When the user sends a request for
particular data, the system first looks for it in the cache. If present, then it is
simply returned from it. If not, the data is fetched from the database, the
cache is updated and is returned to the user.

This kind of strategy works best with read-heavy workloads. This includes the
kind of data which is not much frequently updated, for instance, user profile
data in a portal. User’s name, account number etc.

The data in this strategy is written directly to the database. This means that
the data present in the cache and the database could get inconsistent. To avoid
this data on the cache has a TTL “Time to Live”. After that stipulated period
the data is invalidated from the cache.

Read-Through #
This strategy is pretty similar to the Cache Aside strategy. A subtle difference
from the Cache Aside strategy is that in the Read-through strategy, the cache
always stays consistent with the database.

The cache library or the framework takes the onus of maintaining the
consistency with the backend; The information in this strategy too is lazy-
loaded in the cache, only when the user requests it.

So, for the first time when information is requested, it results in a cache miss.
Then the backend has to update the cache while returning the response to the
user.

However, the developers can always pre-load the cache with the information
which is expected to be requested most by the users.

Write-Through #
In this strategy, each & every information written to the database goes
through the cache. Before the data is written to the DB, the cache is updated
with it.

This maintains high consistency between the cache and the database though it
adds a little latency during the write operations as data is to be updated in the
cache additionally. This works well for write-heavy workloads like online
massive multiplayer games.

This strategy is generally used with other caching strategies to achieve


optimized performance.

Write-Back #
This strategy helps optimize costs significantly. In the Write-back caching
strategy the data is directly written to the cache instead of the database. And
the cache after some delay as per the business logic writes data to the
database.

If there are quite a heavy number of writes in the application. Developers can
reduce the frequency of database writes to cut down the load & the associated
costs.

This is the strategy which I talked about in the previous lesson.

A risk in this approach is if the cache fails before the DB is updated, the data
might get lost. Again, this strategy is used with other caching strategies to
make the most out of these.

Guys!! With this, we are done with the caching mechanism of web
applications. Now let’s move on to the world of message queues.
Caching Quiz

This lesson contains a quiz to test your understanding of the caching mechanism.

Let’s Test Your Understanding Of Caching In Web Architecture

1 Why is caching so important for the performance of an application?


Which of the following option(s) are correct?

COMPLETED 0%
1 of 4
Introduction to Message Queues

In this lesson, we will learn about the message queues and their functionalities.

WE'LL COVER THE FOLLOWING

• What Is A Message Queue?


• Features Of A Message Queue
• Real World Example Of A Message Queue
• Message Queue In Running Batch Jobs

What Is A Message Queue? #

Message queue as the name says is a queue which routes messages from
the source to the destination or we can say from the sender to the
receiver.

Since it is a queue it follows the FIFO (First in First Out) policy. The message
that is sent first is delivered first. Though messages do have a priority
attached with them that makes the queue a priority queue but for now let’s
keep things simple.
Features Of A Message Queue #
Message queues facilitate asynchronous behaviour. We have already learned
what asynchronous behaviour is in the AJAX lesson. Asynchronous behaviour
allows the modules to communicate with each other in the background
without hindering their primary tasks.

We will understand the behaviour of message queues with the help of an


example in a short while, for now, let’s have a quick look at the features of the
message queues.

Message queues facilitate cross-module communication which is key in service-


oriented or microservices architecture. It allows communication in a
heterogeneous environment. They also provide temporary storage for storing
messages until they are processed & consumed by the consumer.

Real World Example Of A Message Queue #


Think of email as an example, both the sender and receiver of the email don’t
have to be online at the same moment to communicate with each other. The
sender sends an email, the message is temporarily stored on the message
server until the recipient comes online and reads the message.
Message queues enable us to run background processes, tasks, batch jobs.
Speaking of background processes, let’s understand this with the help of a use

case.

Think of a user signing up on a portal. After he signs up, he is immediately


allowed to navigate to the home page of the application, but the sign-up
process isn’t complete yet. The system has to send a confirmation email to the
registered email id of the user. Then the user has to click on the confirmation
email for the confirmation of the sign-up event.

But the website cannot keep the user waiting until it sends the email to the
user. Either he is allowed to navigate to the home page or he bounces off. So,
this task is assigned as an asynchronous background process to a message
queue. It sends an email to the user for confirmation while the user continues
to browse the website.

This is how a message queue can be used to add asynchronous behaviour to a


web application. Message queues are also used to implement notification
systems just like Facebook notifications. I’ll discuss that in the upcoming
lessons.

Message Queue In Running Batch Jobs #


Now coming to the batch jobs. Do you remember the scenario from the
previous caching lesson where I discussed how I used the cache to cut down
the application deployment costs?

The batch job which updated the stock prices at regular intervals in the
database was run by a message queue.

So, by now I am sure we kind of have an idea what a message queue is why do
we use it in applications.

So, we now have a basic understanding that there is a queue, there is a


message sender also called the producer and there is a message receiver also
called the consumer.

Both the producer and the consumer don’t have to reside on the same machine
to communicate, that is pretty obvious.
In this routing of messages through the queue, we can define several rules
based on our business requirements. Adding priority to the messages is one I

pointed out. Other important features of queuing include message


acknowledgements, retrying failed messages etc.

Speaking of the size of the queue, there is no definite size, it can be an infinite
buffer, depending on the infrastructure the business has.

We’ll now look into the messaging models widely used in the industry,
beginning with the publish subscribe message routing model, which is pretty
popular in today’s online universe. Also, it is how we consume information at
large.
Publish Subscribe Model

In this lesson, we will learn about the Publish-Subscribe model, when it is used & what are Exchanges in
messaging?

WE'LL COVER THE FOLLOWING

• What Is A Publish Subscribe Model?


• Exchanges

What Is A Publish Subscribe Model? #

A Publish-Subscribe model is the model where multiple consumers


receive the same message sent from a single or multiple producers.

A real-world newspaper service is a good analogy for the publish-subscribe


pattern, where the consumers subscribe to a newspaper service, and the
service delivers the news to the multiple consumers of its service, every single
day.

In the online world, we often subscribe to various topics in applications to be


continually notified of the new updates on any particular segment. Be it
sports, politics, or economics etc.

Exchanges #
To implement the pub-sub pattern, message queues have exchanges which
further push the messages to the queues based on the exchange type and the
rules which are set. Exchanges are just like telephone exchanges which route
messages from sender to the receiver through the infrastructure based on a
certain logic.

There are different types of exchanges available in message queues, some of


which are, direct, topic, headers, fanout. To have more insight into how these
different exchange types work, this RabbitMQ article is a good read.

There is no certainty that every message queue tech will have the same
exchange type. These are just general scenarios I am discussing here. Things
can change with technologies. Besides, technology is not important, all you
need right now is just an idea of how things work.

So, we would pick a fanout exchange type to broadcast the messages from the
queue. The exchange will push the message to the queue and the consumers
will receive the message. The relationship between exchange and the queue is
known as Binding.

This is how we get updates of new content generated in real-time on social


apps by businesses or individuals followed by a lot of people.

In the upcoming lessons, I will discuss this in detail how real-time feeds and
notification systems work in social networks powered by the message queues.

Let’s move on to the point to point messaging model.


Point to Point Model

In this lesson, we will learn about the point to point messaging model, its applications, popular message queue
protocols & the technology used to implement them.

WE'LL COVER THE FOLLOWING

• What Is Point to Point Model?


• Messaging Protocols
• Technology Used To Implement the Messaging Protocols

What Is Point to Point Model? #

Point to point communication is a pretty simple use case where the


message from the producer is consumed by only one consumer.

It’s like a one to one relationship, a publish-subscribe model is a one to many


relationship.
Though based on the business requirements we can set up multiple
combinations in this messaging model, like adding multiple producers &

consumers to a queue. But at the end of the day, a message sent by the
producer will be consumed by only one consumer. This is why it’s called a
point to point queuing model. It’s not a broadcast of messages rather an entity
to entity communication.

Messaging Protocols #
Speaking of the messaging protocols, there are two protocols popular when
working with message queues. AMQP Advanced Message Queue Protocol &
STOMP Simple or Streaming Text Oriented Message Protocol.

Technology Used To Implement the Messaging


Protocols #
Speaking of the queuing tech widely used in the industry, they are RabbitMQ,
ActiveMQ, Apache Kafka etc.

So, Guys!! this is pretty much it on the queuing models. Now, let’s have an
insight into how do notification systems work with message queues.
Notification Systems & Real-time Feeds with Message
Queues

In this lesson, we will discuss how noti cation systems and real-time feeds are implemented using message
queues.

WE'LL COVER THE FOLLOWING

• Real-World Use Case


• Pull-Based Approach
• Push-Based Approach

This is the part where we get an insight into how notification systems and
real-time feeds are designed with the help of message queues. However, these
modules are really complex in today’s modern Web 2.0 applications. They
involve machine learning, understanding the user behaviour, recommending
new relevant information & integration of other modules with them etc. We
won’t get into that level of complexity, simply because that’s not required.

I present a very simple use case so that we can wrap our heads around it.

Also, as we discuss this use case, I would like you to think from your own
perspective. Imagine, how you would implement such a notification system
from the bare bones. This will help you understand the concept better.

Real-World Use Case #


Alright!! So, imagine we are writing a social network like Facebook using a
relational database and we would use a message queue to add the
asynchronous behaviour to our application.

In the application, a user will have many friends and followers. This is a many
to many relationship like a social graph. One user has many friends, and he
would be a friend of many users. Just like as we discussed in the graph
database lesson. Remember?

So, when a user creates a post on the website, we would persist it in the
database. There will be one User table and another Post table. Since one user
will create many posts, it will be a one to many relationship between the user
and his posts.

Also, at the same time, as we persist the post in the database, we have to show
the post created by the user on the home page of his friends and followers,
even send the notification if needed.

How would you implement this? Pause & think… before you read further.

Pull-Based Approach #
Alright!!

One simple way to implement this without the message queue would be, for
every user on the website, at regular short intervals poll the database if any of
his connections have a new update.

For this, first, we will find all the connections of the user and then run a check
for every connection one by one if there is a new post created by them.

If there are, the query will pull up all the new posts created by the
connections of a user and display on his home page. We can also send the
notifications to the user about the same, tracking the count of those with the
help of a notification counter column in the User table & adding an extra AJAX
poll query from the client for new notifications.

What do you think of this approach? Pretty simple & straight forward right?

There are two major downsides to this approach.

First, we are polling the database so often, this is expensive. It will consume a
lot of bandwidth & will also put a lot of unnecessary load on the database.

The second downside is, the display of the user post on the home page of his
connection will not be in real-time. The posts won’t display until the database
is polled. We may call this as real-time but is not really real-time.

Push-Based Approach #
Let’s make our system more performant, instead of polling the database every
now and then we will take the help of a message queue.

So, this time when a user creates a new post, it will have a distributed
transaction. One transaction will update the database, and the other
transaction will send the post payload to the message queue. Payload means
the content of the message posted by the user.
Notification systems and real-time feeds establish a persistent connection with
the database to facilitate real-time streaming of data. We have already been
through this.

The message queue on receiving the message will asynchronously


immediately push the post to the connections of the user which are online.
There is no need for them to poll the database at regular intervals to check if
the user has created a post.

We can also use the message queue temporary storage with a TTL Time to wait
for the connections of the user to come online & then push the updates to
them. We can also have separate Key-value database to store the details of the
user required to push the notifications to his connections. Like the ids of his
connections and stuff. This would avert the need to even poll the database to
get the connections of the user.

So, did you see how we transitioned from a pull-based mechanism to a push-
based mechanism with the help of message queues? This would certainly
spike the performance of the application and cut down a lot of resource
consumption, thus saving truckloads of our hard-earned money.
Regarding the distributed transactions, it entirely depends on how we want to
deal with it. Though the transactions are distributed, they can still work as a
single unit.

If the database persistence fails, naturally, we will roll back the entire
transaction. There won’t be any message push to the message queue either.

What if the message queue push fails? Do you want to roll-back the
transaction? Or do you want to proceed? The decision entirely depends on
you. How you want your system to behave.

Even if the message queue push fails, the message isn’t lost. It can still be
persisted in the database.

When the user refreshes his home page you can write a query where he can
poll the database for new updates. Take the polling approach, we discussed
initially, as a backup.

Or you can totally rollback the transaction, even if the database persistence
transaction is a success but the message queue push transaction fails. The post
still hasn’t gone into the database yet as it is generally a two-phase commit.
We can always write custom distributed transaction code or leverage the
distributed transaction managers which the frameworks offer.

I can go on and on about it, till the minutest of the details. But it would just
make the lesson unnecessarily lengthy. For now, I have just touched the
surface, to help you understand how notification systems & real-time feeds
work.

Just like the post creation process, the same process or the flow repeats itself
when the users trigger events like visiting a public place, eating at a
restaurant etc. And the message queues push the events to the connections of
the user.

When designing scalable systems, I would want to assert this fact that there is
no perfect or best solution. The solution should serve us well, fulfil our
business requirements. Optimization is an evolutionary process, so don’t
sweat about it in the initial development cycles.

First, get the skeleton in place and then start optimizing notch by notch.

Recommended read - How Does LinkedIn Identify Its Users Online?


Handling Concurrent Requests With Message Queues

In this lesson, we will have an insight into how concurrent requests are handled with a message queue.

WE'LL COVER THE FOLLOWING

• Using A Message Queue To Handle the Traf c Surge


• How Facebook Handles Concurrent Requests On Its Live Video Streaming
Service With a Message Queue?

Using A Message Queue To Handle the Traf c


Surge #
In the distributed NoSQL databases lesson, we learned about Eventual
Consistency & Strong Consistency. We discussed how both the consistency
models come into effect when incrementing the value of a “Like” counter.

Here is a quick insight into a way where we can use a message queue to
manage a high number of concurrent requests to update an entity.

When millions of users around the world update an entity concurrently, we


can queue all the update requests in a high throughput message queue. Then
we can process them one by one in a FIFO First in First Out approach
sequentially.

This would enable the system to be highly available, open to updation & still
being consistent at the same time.

Though the implementation of this approach is not as simple as it sounds,


implementing anything in a distributed real-time environment is not so
trivial. I thought I will just bring this approach up so that you guys can
meditate upon this.

How Facebook Handles Concurrent Requests On


Its Live Video Streaming Service With a Message
Queue? #
Facebook’s approach of handling concurrent user requests on its LIVE video
streaming service is another good example of how queues can be used to
efficiently handle the traffic surge.

On the platform, when a popular person goes LIVE there is a surge of user
requests on the LIVE streaming server. To avert the incoming load on the
server Facebook uses cache to intercept the traffic.

But, since the data is streamed LIVE often the cache is not populated with real-
time data before the requests arrive. Now, this would naturally result in a
cache-miss & the requests would move on to hit the streaming server.

To avert this, Facebook queues all the user requests, requesting for the same
data. It fetches the data from the streaming server, populates the cache & then
serves the queued requests from the cache.

This is a recommended read on Facebook’s Live Streaming architecture.

Alright, moving on!! In the next chapter we will take a deep dive into stream
processing.
Message Queue Quiz

This lesson contains a quiz to test your understanding of the message queues.

Let’s Test Your Understanding Of the Message Queues

1 Which message delivery policy does a message queue follow?

COMPLETED 0%
1 of 5
Introduction

In this lesson, we will have an insight into Stream Processing & it's use cases.

WE'LL COVER THE FOLLOWING

• Rise Of Data-Driven Systems


• Use Cases For Data Stream Processing

Rise Of Data-Driven Systems #


Our world today is largely data-driven & is progressing towards becoming
completely data-driven. With the advent of IoT Internet Of Things, entities
have gained self-awareness to a certain degree, they are generating &
transmitting data online at an unprecedented rate. They are capable of
communicating with each other and taking decisions without any sort of
human intervention.

Use Cases For Data Stream Processing #


Primary large-scale use of IoT devices is in industry sensors, smart cities,
electronic devices, wearable healthcare body sensors etc.

To manage the massive amount of streaming data we need to have


sophisticated backend systems in place to gather meaningful information
from it and archive/purge not so meaningful data.

The more data we have, the better our systems evolve. Businesses today rely
on data. They need customer data to make future plans & projections. They
need to understand the user’s needs & their behaviour. All these things enable
businesses create better products, make smarter decisions, run effective ad
campaigns, recommend new products to their customers, gain better insights
into the market etc.
All this study of data eventually results in more customer-centric products &
increased customer loyalty.

Another use case of processing streaming-in data is tracking the service


efficiency, for instance, getting Everything Is Okay signal from the IoT devices
used by millions of customers.

All these use cases make stream processing key to businesses and modern
software applications. Time-series databases is one tech we discussed which
persist and run queries on real-time data, ingesting in from the IoT devices.

In the next lesson let’s have an insight into the components involved in data
stream processing. We will also look at some of the key architectures in the
domain of data processing.
Data Ingestion

In this lesson, we will have an insight into the process of data ingestion.

WE'LL COVER THE FOLLOWING

• What Is Data Ingestion?


• Layers Of Data Processing Setup
• Data Standardization
• Data Processing
• Data Analysis
• Data Visualization
• Data Storage & Security

What Is Data Ingestion? #

Data Ingestion is a collective term for the process of collecting data


streaming-in from several different sources and making it ready to be
processed by the system.

In a data processing system, the data is ingested from the IoT devices & other
sources, into the system to be analysed. It is routed to different
components/layers through the data pipelines, algorithms are run on it and is
eventually archived.

Layers Of Data Processing Setup #


There are several stages/layers to this whole data processing setup such as the:

Data collection layer


Data query layer
Data processing layer
Data visualization layer
Data storage layer
Data security layer

As you can see in the diagram all the data processing layers are pretty self-
explanatory.

Data Standardization #
The data which streams in from several different sources is not in a
homogeneous structured format. We have already gone through different
types of data, structured, unstructured, semi-structured in the database
lesson. So, you have an idea of what unstructured heterogeneous data is.

Data streams-in into the system at different speeds & sizes, from the web-
based services, social networks, IoT devices, industrial machines & whatnot.
Every stream of data has different semantics.

So, in order to make the data uniform and fit for processing, it has to be first
collected and converted into a standardized format to avoid any future
processing issues. This process of data standardization occurs in the Data
collection and preparation layer.
Data Processing #

Once the data is transformed into a standard format it is routed to the Data
processing layer where it is further processed based on the business
requirements. It is generally classified into different flows, routed to different
destinations.

Data Analysis #
After being routed, analytics is run on the data which includes execution of
different analytics models such as predictive modelling, statistical analytics,
text analytics etc. All the analytical events occur in the Data Analytics layer.

Data Visualization #
Once the analytics are run & we have valuable intel from it. All the
information is routed to the Data visualization layer to be presented before
the stakeholders, generally in a web-based dashboard.

Kibana is one good example of a data visualization tool, pretty popular in the
industry.

Data Storage & Security #


Moving data is highly vulnerable to security breaches. The Data security layer
ensures the secure movement of data all along. Speaking of the Data Storage
layer, as the name implies, is instrumental in persisting the data.

So, this is a gist of how massive amounts of data is processed and analyzed for
business use cases. This is just a bird’s eye view of things. The field of data
analytics is pretty deep, an in-depth detailed microscopic view of each layer
demands a dedicated data analytics course for itself.

Alright, now let’s have a look at the different ways in which the data can be
ingested.
Different Ways Of Ingesting Data & the Challenges
Involved

In this lesson, we will discuss the different ways in which we can ingest the data. Also, we will cover the
challenges involved in this process.

WE'LL COVER THE FOLLOWING

• Different Ways To Ingest Data


• Challenges with Data Ingestion
• Slow Process
• Complex & Expensive
• Moving Data Around Is Risky

Different Ways To Ingest Data #


There are two primary ways to ingest data, in Real-time & in Batches which
run at regular intervals. Which one to pick of the two entirely depends on the
business requirements.

Data Ingestion in real-time is typically preferred in systems reading medical


data like a heartbeat, blood pressure via wearable IoT sensors where the time
is of critical importance. Also, in systems handling financial data like stock
market events etc. These are a few instances where time, lives & money are
closely linked & we need information as soon as we can get.

On the contrary, in systems that read trends over time, we can always ingest
data in batches. For instance, when estimating the popularity of a sport in a
region over a period of time.

Let’s talk about some of the challenges which developers have to face when
ingesting massive amounts of data. I have added this lesson just to give you a
deeper insight into the entire process. In the upcoming lesson, I also talk
about the general use-cases of data streaming in the application development
domain.

Challenges with Data Ingestion #


Slow Process #
Data ingestion is a slow process. Why? I’ve brought up this before. When the
data is streamed from several different sources into the system, data coming
from each & every different source has a different format, different syntax,
attached metadata. The data as a whole is heterogeneous. It has to be
transformed into a common format like JSON or something to be understood
well by the analytics system.

The conversion of data is a tedious process. It takes a lot of computing


resources & time. Flowing data has to be staged at several stages in the
pipeline, processed & then moved ahead.

Also, at each & every stage data has to be authenticated & verified to meet the
organization’s security standards. With the traditional data cleansing
processes, it takes weeks if not months to get useful information on hand.
Traditional data ingestion systems like ETL ain’t that effective anymore.

Okay!! But you just said data can be ingested in real-time Right? So, how
is it slow?

Two things, I would like to bring up here, first the modern data processing
tech & frameworks are continually evolving to beat the limitations of the
legacy, traditional data processing systems. Real-time data ingestion wasn’t
even possible with the traditional systems.

Second, analytics information obtained from real-time processing is not that


accurate & holistic since the analytics continually runs on a limited set of data
as it streams as opposed to the batch processing approach which takes into
account the entire data set. So, it’s basically the more time we spend studying
the data the more accurate results we get.

You’ll learn more about this when we go through the Lambda and the Kappa
architectures of data processing.

Complex & Expensive #


The entire data flow process is resource-intensive. A lot of heavy lifting has to
be done to prepare the data before being ingested into the system. Also, it isn’t
a side process, a dedicated team is required to pull off something like that.

Engineering teams often come across scenarios where the tools & frameworks
available in the market fail to serve their needs & they have no option other
than to write a custom solution from the bare bones.

Gobblin is a data ingestion tool by LinkedIn. At one point in time, LinkedIn


had 15 data ingestion pipelines running which created several data
management challenges. To tackle this problem, LinkedIn wrote Gobblin in-
house.

It is a part of the Apache Software Foundation now. This is a good read

The semantics of the data coming from externals sources changes sometimes
as they are not always under our control, which then requires a change in the
backend data processing code. Today the IoT machines in the industry are
continually evolving at a rapid pace.

These are the factors we have to keep in mind when setting up a data
processing & analytics system.

Moving Data Around Is Risky #


When data is moved around it opens up the possibility of a breach. Moving
data is vulnerable. It goes through several different staging areas & the
engineering teams have to put in additional effort and resources to ensure
their system meets the security standards at all times.

These are some of the challenges which developers face when working with
streaming data.
Data Ingestion Use Cases

In this lesson, we will discuss some common data ingestion use cases in the industry.

WE'LL COVER THE FOLLOWING

• Moving Big Data Into Hadoop


• Streaming Data from Databases to Elasticsearch Server
• Log Processing
• Stream Processing Engines for Real-Time Events

This is the part where I talk about some of the data streaming use cases
commonly required in the industry.

Moving Big Data Into Hadoop #


This is the most popular use case of data ingestion. As discussed before, Big
Data from IoT devices, social apps & other sources, streams through data
pipelines, moves into the most popular distributed data processing framework
Hadoop for analysis & stuff.

Streaming Data from Databases to Elasticsearch Server #


Elastic search is an open-source framework for implementing search in web
applications. It is a defacto search framework used in the industry simply
because of its advanced features, & it being open-source. These features
enable businesses to write their own custom solutions when they need them.

In the past, with a few of my friends, I wrote a product search software as a


service using Java, Spring Boot & Elastic Search. Speaking of its design, we
would stream & index quite a large amount of product data from the legacy
storage solutions to the Elastic search server in order to make the products
come up in the search results.
All the data intended to show up in the search was replicated from the main
storage to the Elastic search storage. Also, as the new data was persisted in the
main storage it was asynchronously rivered to the Elastic server in real-time
for indexing.

Log Processing #
If your project isn’t a hobby project, chances are it’s running on a cluster.
When we talk about running a large-scale service, monolithic systems are a
thing of the past. With so many microservices running concurrently. There is
a massive number of logs which is generated over a period of time. And logs
are the only way to move back in time, track errors & study the behaviour of
the system.

So, to study the behaviour of the system holistically, we have to stream all the
logs to a central place. Ingest logs to a central server to run analytics on it with
the help of solutions like ELK Elastic LogStash Kibana stack etc.

Stream Processing Engines for Real-Time Events #


Real-time streaming & data processing is the core component in systems
handling LIVE information such as sports. It’s imperative that the
architectural setup in place is efficient enough to ingest data, analyse it, figure
out the behaviour in real-time & quickly push the updated information to the
fans. After all, the whole business depends on it.

Message queues like Kafka, Stream computation frameworks like Apache


Storm, Apache Nifi, Apache Spark, Samza, Kinesis etc are used to implement
the real-time large-scale data processing features in online applications.

This is a good read on the topic:

An Insight into Netflix’s real-time streaming platform

Alright!! time to have a look into data pipelines in the lesson up-next.
Data Pipelines

In this lesson, we will learn about Data Pipelines.

WE'LL COVER THE FOLLOWING

• What Are Data Pipelines?


• Features Of Data Pipelines
• What Is ETL?

What Are Data Pipelines? #

Data pipelines are the core component of a data processing


infrastructure. They facilitate the efficient flow of data from one point to
another & also enable the developers to apply filters on the data
streaming-in in real-time.

Today’s enterprise is data-driven. That makes data pipelines key in


implementing scalable analytics systems.

Features Of Data Pipelines #


Speaking of some more features of the data pipelines -

These ensure smooth flow of data.


Enables the business to apply filters and business logic on streaming
data.
Avert any bottlenecks & redundancy in the data flow.
Facilitate parallel processing of data.
Avoid data being corrupted.

These pipelines work on a set of rules predefined by the engineering teams &
the data is routed accordingly without any manual intervention. The entire

flow of data extraction, transformation, combination, validation, converging


of data from multiple streams into one etc. is completely automated.

Data pipelines also facilitate parallel processing of data via managing multiple
streams. I’ll talk more about the distributed data processing in the upcoming
lesson.

Traditionally we used ETL systems to manage all the movement of data, but
one major limitation with them is they don’t really support handling of real-
time streaming data. Which is possible with new era evolved data processing
infrastructure powered by the data pipelines.

What Is ETL? #
If you haven’t heard of ETL before, it means Extract Transform Load.

Extract means fetching data from single or multiple data sources.

Transform means transforming the extracted heterogeneous data into a


standardized format based on the rules set by the business.

Load means moving the transformed data to a data warehouse or another


data storage location for further processing of data.

The ETL flow is the same as the data ingestion flow. It’s just the entire
movement of data is done in batches as opposed to streaming it, through the
data pipelines, in real-time.

Also, at the same time, it doesn’t mean the batch processing approach is
obsolete. Both real-time & batch data processing techniques are leveraged
based on the project requirements.

You’ll gain more insight into it when we will go through the Lambda & Kappa
architectures of distributed data processing in the upcoming lessons.

In the previous lesson, I brought up a few of the popular data processing tools
such as Apache Flink, Storm, Spark, Kafka etc. All these tools have one thing in
common they facilitate processing of data in a cluster, in a distributed
environment via data pipelines.
This Netflix case study is a good read on how they migrated batch ETL to
Stream processing using Kafka & Flink

What is distributed data processing? How does it work? We are gonna look
into it in the next lesson. Stay tuned.
Distributed Data Processing

In this lesson, we will discuss distributed data processing and the technologies used for it.

WE'LL COVER THE FOLLOWING

• What Is Distributed Data Processing?


• Distributed Data Processing Technologies
• MapReduce – Apache Hadoop
• Apache Spark
• Apache Storm
• Apache Kafka

Alright!! Fellas, this lesson is all about distributed data processing. I’ll talk
about what it is? How different is it in comparison to a centralized data
processing system? What are the architectures involved in it? And other
similar topics.

So, let’s get on with it.

What Is Distributed Data Processing? #

Distributed data processing means diverging large amounts of data to


several different nodes, running in a cluster, for parallel processing.

All the nodes execute the task allotted parallelly, working in conjunction with
each other co-ordinated by a node-co-ordinator. Apache Zookeeper is a pretty
popular, de-facto, node co-ordinator used in the industry.

Since the nodes are distributed and the tasks are executed parallelly, this
makes the entire set-up pretty scalable & highly available. The workload can
be scaled both horizontally & vertically. Data is made redundant & replicated
across the cluster to avoid any sort of data loss.

Processing data in a distributed environment helps accomplish the task in a


significantly less amount of time as opposed to when running it on a
centralized data processing system.

In a distributed system the tasks are shared by several nodes on the contrary
in a centralized system the tasks are queued in a queue to be processed one by
one.

Distributed Data Processing Technologies #


Here are some of the popular technologies, I’ve listed, that are used in the
industry for large scale data processing.

MapReduce – Apache Hadoop #


MapReduce is a programming model written for managing distributed data
processing across several different machines in a cluster, distributing tasks to
several machines, running work in parallel, managing all the communication
and data transfer within different parts of the system.

The Map part of the programming model involves sorting the data based on a
parameter and the Reduce part involves summarizing the sorted data.
The most popular open-source implementation of the MapReduce
programming model is Apache Hadoop. The framework is used by all big guns

in the industry to manage massive amounts of data in their system. It is used


by Twitter for running analytics. It is used by Facebook for storing big data.

Apache Spark #
Apache Spark is an open-source cluster computing framework. It provides
high performance for both batch & real-time in-stream processing. It can
work with diverse data sources & facilitates parallel execution of work in a
cluster.

Spark has a cluster manager and distributed data storage. The cluster
manager facilitates communication between different nodes running together
in a cluster whereas the distributed storage facilitates storage of big data.
Spark seamlessly integrates with distributed data stores like Cassandra, HDFS,
MapReduce File System, Amazon S3 etc.

Apache Storm #
Apache Storm is a distributed stream processing framework. In the industry, it
is primarily used for processing massive amounts of streaming data. It has
several different use cases such as real-time analytics, machine learning,
distributed remote procedure calls etc.

Apache Kafka #
Apache Kafka is an open-source distributed stream processing & messaging
platform. It’s written using Java & Scala & was developed by LinkedIn.

The storage layer of Kafka involves a distributed scalable pub/sub message


queue. It helps read & write streams of data like a messaging system.

Kafka is used in the industry to develop real-time features such as notification


platforms, managing streams of massive amounts of data, monitoring website
activity & metrics, messaging, log aggregation.

Hadoop is preferred for batch processing of data whereas Spark, Kafka &
Storm are preferred for processing real-time streaming data.

So, by now, I am sure you have a good idea of what data processing is. It’s use-
cases in modern application development. The technologies involved etc.

Let’s have a look at a couple of architectures involved in the process. Lambda


& Kappa.
Lambda Architecture

In this lesson, we will learn about Lambda Architecture of data processing.

WE'LL COVER THE FOLLOWING

• What Is Lambda Architecture?


• Layers Of the Lambda Architecture

What Is Lambda Architecture? #

Lambda is a distributed data processing architecture that leverages both


the batch & the real-time streaming data processing approaches to tackle
the latency issues arising out of the batch processing approach. It joins
the results from both the approaches before presenting it to the end
user.
Batch processing does take time considering the massive amount of data
businesses have today but with it the accuracy of the approach is high & the
results are comprehensive.

On the contrary, real-time streaming data processing provides quick access to


insights. In this scenario, the analytics is run over a small portion of data so
the results are not that accurate & comprehensive when compared to that of
the batch approach.

Lambda architecture makes the most of the two approaches.

Layers Of the Lambda Architecture #


The architecture has typically three layers:

Batch Layer
Speed Layer
Serving layer

The Batch Layer deals with the results acquired via batch processing the data.
The Speed layer gets data from the real-time streaming data processing & the
Serving layer combines the results obtained from both the Batch & the Speed
layers.
Kappa Architecture

In this lesson, we will discuss the Kappa Architecture of data processing

WE'LL COVER THE FOLLOWING

• What Is Kappa Architecture?


• Layers Of Kappa Architecture

What Is Kappa Architecture? #

In Kappa architecture, all the data flows through a single data streaming
pipeline as opposed to the Lambda architecture which has different data
streaming layers that converge into one.

The architecture flows the data of both real-time & batch processing through a
single streaming pipeline reducing the complexity of not having to manage
separate layers for processing data.
Layers Of Kappa Architecture #
Kappa contains only two layers. Speed, which is the streaming processing
layer, & the Serving which is the final layer.

Kappa is not an alternative for Lambda. Both the architectures have their use
cases.

Kappa is preferred if the batch and the streaming analytics results are fairly
identical in a system. Lambda is preferred if they are not.

Well, this concludes the stream processing chapter. Though the entire
distributed data processing approach appears pretty smooth and efficient it’s
important that we do not forget that setting up and managing a distributed
data processing system is something not trivial. It’ requires years of work in
perfecting the system. Also, a distributed system does not promise Strong
Consistency of data.

With this being said, let’s move on to the next chapter where I talk about
different kinds of architectures involved in the domain of software
development.
Stream Processing Quiz

This lesson contains a quiz to test your understanding of stream processing.

Let’s Test Your Understanding Of Stream Processing

1 What is the need for processing data streams in web applications? Which
of the following option(s) are correct?

COMPLETED 0%
1 of 6
Event Driven Architecture - Part 1

In this lesson, which is the part one of the event driven architecture, we will understand concepts like Blocking &
Non-blocking.

WE'LL COVER THE FOLLOWING

• What Is Blocking?
• What Is Non-Blocking?

When writing modern Web 2.0 applications chances are you have across
terms like Reactive programming, Event-driven architecture, concepts like
Blocking & Non-blocking.

What are they? Should I be aware of them?

You might have also noticed that tech like NodeJS, Play, Tornado, Akka.io are
gaining more popularity in the developer circles for modern application
development in comparison to the traditional tech.

What is the reason for that? Is it just that we are bored of working on the
traditional tech like Java, PHP etc. & are attracted towards the shiny new stuff
or are there any technical reasons behind this?

In this lesson, we will go through each and every concept step by step &
realize the demands of modern software application development.

So, without any further ado, let’s get on with it.

Alright, at this point in the course, we know what persistent connections are?
What asynchronous behaviour is & why do we need it? We can’t really write
real-time apps without implementing them.

Starting with Blocking. What is it?


What Is Blocking? #

In web applications blocking means the flow of execution is blocked waiting


for a process to complete. Until the process completes it cannot move on. Let’s
say we have a block of code of 10 lines within a function and every line
triggers another external function executing a specific task.

Naturally, when the flow of execution enters the main function it will start
executing the code from the top, from the first line. It will run the first line of
code and will call the external function.

At this point in time until the external function returns the response. The flow
is blocked. The flow won’t move further, it just waits for the response. Unless
we add asynchronous behaviour to it by annotating it and moving the task to a
separate thread. But that’s not what happens in the regular scenario, like in
the regular CRUD-based apps. Right?

So, this behaviour is known as Blocking. The flow of execution is blocked.

What Is Non-Blocking? #
Now coming to the Non-blocking approach. In this approach the flow doesn’t
wait for the first function, that is called, to return the response. It just moves
on to execute the next lines of code. This approach is a little not so consistent
as opposed to the blocking approach since a function might not return
anything or throw an error, still the code in the sequence up next is executed.

The non-blocking approach facilitates IO Input-Output intensive operations.


Besides the disk & other the hardware-based operations, communication and
network operations also come under the IO operations.

We will continue this discussion in part two of event-driven architecture


lesson. We will have an insight into what are events? what is event-driven
architecture? And the technologies used to implement it.
Event Driven Architecture - Part 2

This lesson contains the second part of the discussion on the event-driven architecture. We will be continuing
where we left off in the previous lesson.

WE'LL COVER THE FOLLOWING

• What Are Events?


• Event-Driven Architecture
• Technologies For Implementing the Event Driven Architecture

What Are Events? #


There are generally two kinds of processes in applications CPU intensive & IO
intensive. IO in the context of web applications means events. A large number
of IO operations mean a lot of events occurring over a period of time. And an
event can be anything from a tweet to a click of a button, an HTTP request, an
ingested message, a change in the value of a variable etc.

We know that Web 2.0 real-time applications have a lot of events. For
instance, there is a lot of request-response between the client and the server,
typically in an online game, messaging app etc. Events happening too often is
called a stream of events. In the previous chapter, we have already discussed
how stream processing works.

Event-Driven Architecture #
Non-blocking architecture is also known as the Reactive or the Event-driven
architecture. Event-driven architectures are pretty popular in the modern web
application development.

Technologies like NodeJS, frameworks in the Java ecosystem like Play, Akka.io
are non-blocking in nature and are built for modern high IO scalable
applications.
They are capable of handling a big number of concurrent connections with
minimal resource consumption. Modern applications need a fully
asynchronous model to scale. These modern web frameworks provide more
reliable behaviour in a distributed environment. They are built to run on a
cluster, handle large scale concurrent scenarios, tackle problems which
generally occur in a clustered environment. They enable us to write code
without worrying about handling multi-threads, thread lock, out of memory
issues due to high IO etc.

Coming back to the Event-driven reactive architecture. It simply means


reacting to the events occurring regularly. The code is written to react to the
events as opposed to sequentially moving through the lines of codes.

I’ve already brought this up, that the sequence of events occurring over a
period of time is called as a stream of events. In order to react to the events,
the system has to continually monitor the stream. Event-driven architecture is
all about processing asynchronous data streams. The application becomes
inherently asynchronous.

Technologies For Implementing the Event Driven


Architecture #
With the advent of Web 2.0, people in the tech industry felt the need to evolve
the technologies to be powerful enough to implement the modern web

application use cases. Spring framework added Spring Reactor module to the
core Spring repo. Developers wrote NodeJS, Akka.io, Play etc.

So, you would have already figured that Reactive event-driven applications are
difficult to implement with thread-based frameworks. As dealing with
threads, shared mutable state, locks make things a lot more complex. In an
event-driven system everything is treated as a stream. The level of abstraction
is good, developers don’t have to worry about managing the low-level
memory stuff.

And I am sure that you are well aware of the data streaming use cases that
apply here, like handling a large number of transaction events, handling
changing stock market prices, user events on an online shopping application
etc.

NodeJS is a single-threaded non-blocking framework written to handle more


IO intensive tasks. It has an event loop architecture. This is a good read on it.

LinkedIn uses Play framework for identifying the online status of its users.

At the same time, I want to assert this fact that the emergence of non-blocking
tech does not mean that the traditional tech is obsolete. Every tech has it use
cases.

NodeJS is not fit for CPU intensive tasks. CPU intensive operations are
operations that require a good amount of computational power like for
graphics rendering, running ML algorithms, handling data in enterprise
systems etc. It would be a mistake to pick NodeJS for these purposes.

In the upcoming lessons, I will discuss the general guidelines to keep in mind
when picking the server-side technology. That will give you more insight into
how to pick the right backend technology?
Web Hooks

In this lesson, we’ll understand the need for web hooks & how do they work?

WE'LL COVER THE FOLLOWING

• What Are Web Hooks?


• How Do Web Hooks Work?

What Are Web Hooks? #


Imagine you’ve written an API which provides information on the latest
exclusive events on Baseball. Now your API is consumed by a lot many third-
party services, that fetch the information from the API, add their own flavour
to it & present it before their users.

But so many API requests every now and then, just to check if a particular
event has occurred is crushing your server. The server can hardly keep up
with the requests. There is no way for consumers to know that the new
information isn’t available on the server yet, or an event hasn’t occurred yet.
They just keep polling the API. This would eventually pile up the unwanted
load on the server and could bring it down.

What do we do? Is there a way we can cut down the load on our servers?

Yes!! WebHooks.

WebHooks are more like call-backs. It’s like I will call you when new
information is available. You carry on with your work.

WebHooks enable communication between two services without a


middleware. They have an event-based mechanism.

So, how do they work?


How Do Web Hooks Work? #

To use the Webhooks, consumers register an HTTP endpoint with the service
with a unique API Key. It’s like a phone number. Call me on this number,
when an event occurs. I won’t call you anymore.

Whenever the new information is available on the backend. The server fires
an HTTP event to all the registered endpoints of the consumers, notifying
them of the new update.

Browser notifications are a good example of Webhooks. Instead of visiting the


websites every now and then for new info, the websites notify us when they
publish new content.
Shared Nothing Architecture

In this lesson, we will brie y discuss the Shared Nothing Architecture.

When working with distributed systems, you’ll hear this term Shared Nothing
Architecture often. I thought I’ll just quickly discuss it with you, though there
is nothing really unique about this. I’ve already talked about it in the course.

When several modules work in conjunction with each other. They often share
RAM, also known as the shared memory. They share the disk, that is sharing
the database. And then they share nothing. The architecture of the system
where the modules or the services sharing nothing is called the Shared
Nothing Architecture.

Shared Nothing Architecture means eliminating all single points of


failure. Every module has its own memory, own disk. So even if several
modules in the system go down, the other modules online stay
unaffected. It also helps with the scalability and performance.

Moving on!! let’s discuss the Hexagonal Architecture in the next lesson.
Hexagonal Architecture

In this lesson, we will have an insight into the Hexagonal architecture.

WE'LL COVER THE FOLLOWING

• What Is A Hexagonal Architecture?


• Real World Code Implementation

What Is A Hexagonal Architecture? #


The architecture consists of three components:

Ports
Adapters
Domain

The focus of this architecture is to make different components of the


application: independent, loosely coupled & easy to test.
The application should be designed in a way such that it can be tested by
humans, automated tests, with mock databases, mock middleware, with &
without a UI, without making any changes or adjustments to the code.

The architectural pattern holds the domain at its core, that’s the business logic.
On the outside, the outer layer has Ports & Adapters. Ports act like an API, as
an interface. All the input to the app goes through the interface.

So, the external entities don’t have any direct interaction with the domain, the
business logic. The Adapter is the implementation of the interface. Adapters
convert the data obtained from the Ports, to be processed by the business
logic. The business logic lies at the centre isolated & all the input and output is
at the edges of the structure.

The hexagonal shape of the structure doesn’t have anything to do with the
pattern, it’s just a visual representation of the architecture. Initially, the
architecture was called as the Ports and the Adapter pattern, later the name
Hexagonal stuck.

The Ports & the Adapter analogy comes from the computer ports, as they act as
the input interface to the external devices & the Adapter converts the signals
obtained from the ports to be processed by the chips inside.

Real World Code Implementation #


Coming down to the real-world code implementation, isn’t that’s what we
already do with the Layered Architecture approach? We have different layers
in our applications, we have the Controller, then the Service Layer interface,
Class implementations of the interface, the Business logic goes in the Domain
model, a bit in the Service, Business and the Repository classes.
Well, yeah. That’s right. First up, I would say that the Hexagonal approach is
an evolution of the layered architecture. It’s not entirely different. As long as
the business logic stays in one place, things should be fine. The issue with the
layered approach is, often large repos end up with too many layers beside the
regular service, repo & business ones.

The business logic gets scattered across the layers making testing, refactoring
& pluggability of new entities difficult. Remember the Stored procedures in the
databases & the business logic coupled with the UI in JSP Java Server Pages?

When working with JSPs and Stored procedures, we still have the layered
architecture, the UI layer is separate, the persistence layer is separate but the
business logic is tightly coupled with these layers.

On the contrary, the Hexagonal pattern has its stance pretty clear, there is an
inside component which holds the business logic & then the outside layer, the
Ports & the Adapters which involve the databases, message queues, APIs &
stuff.
More On Architecture Quiz – Part 1

This lesson contains a quiz to test your understanding of event driven, shared nothing, hexagonal architecture &
the web hooks

Let’s Test Your Understanding Of A Few Concepts In Software Architecture

1 Which of the following statements is true in context to reactive


programming and non-blocking?

COMPLETED 0%
1 of 6
Peer to Peer Architecture – Part 1

In this lesson, which is part one of the discussion on Peer to Peer Architecture, we will take a deep dive into the
architecture & discuss it in detail.

WE'LL COVER THE FOLLOWING

• What Is A Peer to Peer (P2P) Network?


• What Does A Central Server Mean?
• Downsides Of Centralized Systems
• What Is A Decentralized Architecture?
• Advantages Of A Peer to Peer Network

P2P architecture is the base of blockchain tech. We’ve all used it at some point
in our lives to download files via torrent. So, I guess you have a little idea of
what it is. You are probably aware of the terms like Seeding, Leeching etc.
Even if you aren’t, you’ll learn everything in this lesson.

Let’s begin the lesson with having an understanding of what a P2P network is?

What Is A Peer to Peer (P2P) Network? #


A P2P network is a network in which computers also known as nodes can
communicate with each other without the need of a central server. The
absence of a central server rules out the possibility of a single point of failure.
All the computers in the network have equal rights. A node acts as a seeder
and a leecher at the same time. So, even if some of the computers/nodes go
down, the network & the communication is still up.

A Seeder is a node which hosts the data on its system and provides bandwidth
to upload the data to the network, a Leecher is a node which downloads the
data from the network.
What Does A Central Server Mean? #
I want you to think of a messaging app. When two users communicate, the
first user sends a message from his device, the message moves on the server
of the organization hosting the messaging service, from there the message is
routed to the destination, that is, the device of the user receiving the message.

The server of the organization is the central server. These systems are also
known as Centralized systems.

Okay, so what’s the issue when communicating with my friend via a central
server? I have never faced any issues.

Downsides Of Centralized Systems #


In this scenario, there are a few important things to consider -

First, the central server has access to all your messages. It can read it,
share it with its associates, laugh about it and so on. So, communication is
not really secure. Even though the businesses say that the entire message
pipeline is encrypted and stuff. But still, data breaches happen,
governments get access to our data. Data is sold to third parties for fat
profits. Do you think these messaging apps are really secure? Should the

national security, enterprise officials sitting at the top of the food chain
use these central server messaging apps for communication?

Second, in case of events like a natural disaster, like an earthquake, a


zombie attack on the data centre, a massive infrastructural failure or the
organization going out of business. We are stranded, there is no way to
communicate with our friends across the globe. Think about it.

Third, let’s say you start creating content on social media, you have a
pretty solid following on it, you spend 100+ hours a week to put out the
best content ever and have worked years to reach this point of success.
But then one fine day, out of the blue, the organization pokes you and
says. Hey!! Good job, but, aaaaa… for some reason, which we can’t talk
about, we have to let your data go. We just don’t like your content. Shift +
Del and whoosh… all your data disappears like a Genie. What are you
gonna do next? If you are already a content creator or is active on social
media, this happens all the time, you know that.

Fortunately, P2P networks are resilient to all these scenarios, due to their
design. They have a Decentralized architecture.
What Is A Decentralized Architecture? #
Nobody has control over your data, nobody has the power to delete your data
as all the participating nodes in a P2P network have equal rights. During a
zombie apocalypse when the huge corporation servers would be dead or on
fire, we can still communicate with each other via a peer to peer connection.

Though I’ve nothing against any of the corporations :) They’ve made our lives
really easy. It’s just I am making you aware of all the possible scenarios out
there.

Advantages Of A Peer to Peer Network #


Here is another use case where a peer to peer network rocks!!

Imagine this, you have finally returned home from a trekking tour. Having
visited all the seven continents around the world, it couldn’t be more
beautiful & emotionally satisfying.

You have documented the entire expedition with state-of-the-art cameras &
equipment in super ultra HD 4K quality, which has stacked up the hard drive
of your computer. You are super excited to share all the videos & photos, of
the tour, with your friends.

But how do you really plan to share the data, which is in several gigabytes, with
your friends?

Facebook messenger, Whatsapp?

Messengers have a memory limit, so they aren’t even an option. Well, you
could upload all the stuff on the cloud & share the link with your folks, but,
hold on, uploading that much amount of data needs some serious storage
space & that would mean some serious money. Would you be in the mood of
spending anymore after such a long trip?

No problemo, we can write all the files on a physical memory like a DVD or a
portable hard drive & share with our friends, Right?

Well yes, we can but physical memory has its costs & writing files to it for
every friend is time-consuming, expensive & resource-intensive. And I get it
you are tired already. Oh!! & by the way, we have to courier the disks to
friends located across the globe. Do add additional courier expense & of
course the time it will take to reach them.

We’ve got this, don’t you worry!! We’ll surely find out some way. Okay…
Alright. So, now what options do we have, remaining? Think about it.

Hey!! Why don’t we use peer to peer file sharing? That would be awesome.

With P2P Peer to peer file sharing, we can easily share all the content with
your friends with a minimum, almost no costs & fuss.

Beautiful!!

We can use a P2P protocol like BitTorrent for it. It’s the most commonly used
P2P peer to peer protocol to distribute data and large electronic files over the
internet. It has approx. 25 million concurrent users at any point in time.

So, we will create a torrent file of our data, share it with all our folks. They
have to just put the torrent in their BitTorrent client & start downloading the
files to their systems while hosting/seeding the files simultaneously for others
to download.

Okay!! So, these are a few of the use cases, we discussed, where P2P network
rocks. In the next lesson which is part 2 of the P2P architecture, we will take a
deep dive into the architecture of it.
Peer to Peer Architecture – Part 2

This lesson contains the second part of the discussion on the Peer to Peer Architecture. We will be continuing
where we left off in the previous lesson.

WE'LL COVER THE FOLLOWING

• What Is A Peer to Peer Architecture? How Does It Work?


• Types Of P2P Networks
• Unstructured Network
• Structured Network
• Hybrid Model

What Is A Peer to Peer Architecture? How Does


It Work? #
A P2P Peer to Peer architecture is designed around several nodes in the
network taking part equally acting as both the client & the server.
The data is exchanged over TCP IP just like it happens over the HTTP protocol
in a client-server model. The P2P design has an overlay network over TCP IP
which enables the users to connect directly. It takes care of all the
complexities and the heavy lifting. Nodes/Peers are indexed & discoverable in
this overlay network.

A large file is transferred between the nodes by being divided into chunks of
equal size in a non-sequential order.

Say a system hosts a large file of 75 Gigabytes. Other nodes in the network, in
need of the file, locate the system containing the file. Then they download the
file in chunks, re-hosting the downloaded chunk simultaneously, making it
more available to the other users. This approach is known as Segmented P2P
file transfer.

Based on how these peers are linked with each other in the network, the
networks are classified into a Structured, Unstructured or a Hybrid model.

Types Of P2P Networks #


Unstructured Network #
In an unstructured network nodes/peers keep connecting with each other
randomly. So, there is no structure, no rule. Just simply connect & grow the
network.

In this architectural design, there is no indexing of the nodes. To search the


data, we have to scan through each & every node of the network. This is O(n)
in complexity where n is the number of nodes in the network. This is very
resource-intensive.

Think of it in this way. There are a billion systems connected in the network.
And then there is a file stored in just one system in the network. In an
unstructured network, we have to run a search through each system in the
network to find the file.

So, if the search for a file in a system, say, needs 1 sec. The search through the
entire network would require 1 billion seconds.
Some of the protocols of the unstructured network are Gossip, Kazaa &
Gnutella.

Structured Network #
In contrast to an unstructured network, a structured P2P peer to peer network
holds proper indexing of the nodes or the topology which makes it easier to
search for a specific data in it.

This kind of network implements a Distributed hash table to index the nodes.
This index is just like an index of a book where we check to find a piece of
particular information in the book rather than searching through every page
of it.

BitTorrent is an example of this type of network.

Hybrid Model #
The majority of the blockchain startups have a hybrid model. A hybrid model
means cherry-picking the good stuff from all the models like P2P, client-server
etc. It is a network, involving both a peer to peer & a client-server model.

As we know in a p2p network one single entity doesn’t have all the control. So
to establish control we need to set up our own server. For that, we need a
client-server model.

A P2P network offers more availability. To take down a blockchain network


you have to literally take down all the nodes of the network across the globe.
A P2P application can scale to the moon without putting the load on a single
entity or the node. In an ideal environment, all the nodes in the network
equally share the bandwidth & the storage space; The system scales
automatically as new users use the app.

Nodes get added, as more & more people interact with your data. There are
zero data storage and bandwidth costs, you don’t have to shell out money to
buy third-party servers to store your data. No third-party intervention, data is
secure. Share stuff only with friends you intend to share with.
The cult of the decentralized web is gaining ground in the present times. I can t
deny this that this is a disruptive tech with immense potential. Blockchain,

Cryptocurrency is one example of this. It has taken the financial sector, in


particular by storm.

There are numerous P2P applications available on the web for instance –

1. Tradepal

2. Peer to Peer digital cryptocurrencies like Bitcoin, Peercoin.

3. GitTorrent (a decentralized GitHub which uses BitTorrent and Bitcoin).

4. Twister (a decentralized microblogging service, which uses WebTorrent


for media attachments).

5. Diaspora (a decentralized social network implementing the federated


architecture).

Federated architecture is an extension of the decentralized architecture, used


in decentralized social networks, which I am going to discuss up next.
Decentralized Social Networks

This lesson is an insight into the decentralized social networks and their bene ts.

WE'LL COVER THE FOLLOWING

• What Is A Decentralized Social Network?


• What Are the Features Of Decentralized Social Networks?
• Bring Your Own Data
• Ensuring the Safety of Our Data
• Economic Compensation to the parties involved in the network
• Infrastructure Ease

Before delving right into the Federated architecture & it’s use in the
decentralized social networks, let’s have a quick heads up on what
decentralized social networks are and why should you care about them? How
different is a decentralized social network in comparison to a centralized social
network?

Let’s get on with it.

What Is A Decentralized Social Network? #


Simply put, decentralized social networks have servers, spread out across the
globe, hosted by individuals like you & me. Nobody has autonomous control
over the network, everybody has an equal say.

Decentralized networks do not have to face any scalability issues. The


scalability of a decentralized network is directly proportional to the number
of users joining & active on the network.

We host our data from our systems instead of sending it to a third-party


server. Nobody eavesdrops on our conversations or holds the rights to modify
our data at their whim.

You might have heard of the term BYOD - Bring Your Own Device.
Decentralized social networks ask you to Bring Your Own Data.

What does it really mean?

In these networks, the user data layer is separate & they run on standardized
protocols, specifically designed for the decentralized web. The data formats &
protocols are consistent across the networks & apps.

So, if you want to get out of a particular social network. You don’t lose your
data; your data doesn’t just die. You can carry your data with you & feed it
into the app you sign up next.

Cool isn’t it?

There are decentralized social networks active on the web such as Minds,
Mastodon, Diaspora, Friendica, Sola etc.

Let’s talk about some of the cool features decentralization offers.

What Are the Features Of Decentralized Social


Networks? #
Bring Your Own Data #
As I’ve brought this up earlier, you can carry your data with you across the
myriad of applications. And this is a really unique feature which the
blockchain economy leverages especially in video games.

The in-game currency or content bought by the players, such as swords,


powers etc. can be carried forward & used in other games based on the
decentralized protocol. Even if the game studios take the game offline the in-
game items bought still hold value. The purchased stuff in real sense stays
with you.

Ensuring the Safety of Our Data #


No more eavesdropping of private organizations on our data. We decide with
whom we want to share our data with. The data is encrypted for everyone
including the technical team of the network. No selling of our data for
personal profits.

Economic Compensation to the parties involved in the


network #
Networks like Diaspora, Sola, Friendica have come out with features that
would financially compensate all the parties involved in the network.

Users get compensated for the awesome stuff they share online. People
sharing their computing power to host the network get their compensation in
the form of tokens or equity or whatever as per the economic policy of the
network.

The teams involved in moderating the network, developers writing new


features, get compensated by enabling content-relevant ads on the network or
by the token-based economy of the platform.

It’s a win-win for all.

Infrastructure Ease #
A single entity does not have to bear the entire cost of the infrastructure since
it is decentralized. The possibility of the network going down is almost zero.

An individual developer can build cool stuff without worrying about the
server costs. The data just like a blockchain ledger is replicated across the
nodes. So, even if a few nodes go down our data is not lost.

These social networks are written on protocols & software which are open
source so that the community can keep improving on the code & keep building
awesome features.

ActivityPub is one example of this, it’s an open decentralized social


networking protocol. It provides an API for modifying & accessing the content
on the network. Also, for communication with other pods in the federation.

I’ve added this lesson to give you an insight into the decentralized web
applications. What are they? How do they work? In the near future, these are
going to grab a big chunk of the market share.

Decentralization in the Fintech industry is becoming the norm. It’s always


good to stay ahead of the curve.

Now let’s have a look into the Federated architecture.


Federated Architecture

In this lesson, we will have an insight into the Federated Architecture.

WE'LL COVER THE FOLLOWING

• What Is A Federated Architecture?


• How Is Federated Architecture Implemented In Decentralized Social
Networks?
• What Is the Need For Pods?

What Is A Federated Architecture? #


Federated architecture is an extension to the decentralized architecture. It
powers social networks like Mastodon, Minds, Diaspora etc.

The term federated in a general sense means a group of semi-autonomous


entities exchanging information with each other. A real-world example of this
would be looking at different states of a country which are managed by the
state governments. They are partially self-governing & exercise power to keep
things running smoothly. And then, those states governments share
information with each other & with a central government making a complete
autonomous government.

This is just an example. The federated model from a technical standpoint is


under continual research, development & evolution. There are no standard
rules. Developers, architects can have their own designs in place. After all, it’s
all decentralized. Not under the control of any single entity.

How Is Federated Architecture Implemented In


Decentralized Social Networks? #
As shown in the diagram below. A federated network has entities called
servers or pods. A large number of nodes subscribe to the pods. There are

several pods in the network that are linked to each other & share information
with each other.

The pods can be hosted by individuals as it is ideally achieved in a


decentralized network. As new pods are hosted & introduced to the network,
the network keeps growing.

In case if the link between a few pods breaks temporarily. The network is still
up. Nodes can still communicate with each other via the pods they are
subscribed to.

What Is the Need For Pods? #


What is the need for Pods? Can’t just the nodes be linked to each other like in a
regular peer to peer network?

Pods facilitate node discovery. In a peer to peer network, there is no way of


discovering other nodes & we would just sit in the dark if it weren’t for a
centralized node registry or something.

The other way is to run a scan through the network & try to discover other
nodes. That’s a really time-consuming & a tedious task. Why not just have a
pod instead.

Okay!! So, Guys, I think I have given you a pretty good insight into the
decentralized web.

Let’s move on to the next lesson where we talk about picking the right server-
side technology.
More On Architecture Quiz – Part 2

This lesson contains a quiz to test your understanding of peer to peer architecture & the decentralized web.

Let’s Test Your Understanding Of Peer to Peer Architecture & the Decentralized
Web

1 What is the need for a peer to peer network? Which of the following
option(s) are correct?

COMPLETED 0%
1 of 5
How to Pick the Right Server-Side Technology?

In this lesson, we’ll learn how to pick the right server-side technology for our projects.

WE'LL COVER THE FOLLOWING

• Real-time Data Interaction


• Peer to Peer Web Application
• CRUD-based Regular Application
• Simple, Small Scale Applications
• CPU & Memory Intensive Applications

Before commencing the lesson, I would like to say that there is no rule of
thumb that for a use case X you should always pick a technology Y.

Everything depends on our business requirements. Every use case has its
unique needs. There is no perfect tech, everything as has its pros & cons. You
can be as creative as you want. There is no rule that holds us back.

Alright, this being said. I have listed some of the general scenarios or I can say
the common use cases in the world of application development and the fitting
backend technology for those based on my development experience.

Real-time Data Interaction #


If you are building an app that needs to interact with the backend server in
real-time like stream data to & fro. For instance, a messaging application, a
real-time browser-based massive multiplayer game, a real-time collaborative
text editor or an audio-video streaming app like Spotify, Netflix etc.

You need a persistent connection between the client and server, also you need
a non-blocking technology on the back-end. We’ve already talked about both
the concepts in detail.
Some of the popular technologies which enable us to write these apps are
NodeJS, Python has a framework called Tornado. If you are working in the

Java Ecosystem you can look into Spring Reactor, Play, Akka.io.

Once you start researching on these technologies, go through the architecture,


concepts given in the developer docs. You’ll gain further insights into how
things work, what other tech and concepts I can leverage to implement my
use case. One thing leads us to the other.

Uber used NodeJS to write their core trip execution engine. Using it they could
easily manage a large number of concurrent connections.

Peer to Peer Web Application #


If you intend to build a peer to peer web app, for instance, a P2P distributed
search engine or a P2P Live TV radio service, something similar to LiveStation
by Microsoft.

Look into JavaScript, protocols like DAT, IPFS. Checkout FreedomJS, it’s a
framework for building P2P web apps that work in the modern web browsers.

This is a good read Netflix researching on Peer to peer technology for


streaming data.

CRUD-based Regular Application #


If you have simple use cases such as a regular CRUD-based app like online
movie booking portal, a tax filing app etc.

CRUD (Create Read Update Delete) is the most common form of web apps being
built today by businesses. Be it an online booking portal, an app collecting
user data or a social site etc. all have an MVC (Model View Controller)
architecture on the backend.

Though the view part is tweaked a little with the rise of UI frameworks by
React, Angular, Vue etc. The Model View Controller pattern stays.

Some of the popular technologies which help us implement these use cases
are Spring MVC, Python Django, Ruby on Rails, PHP Laravel, ASP .NET MVC.

Simple, Small Scale Applications #


If you intend to write an app which doesn’t involve much complexity like a
blog, a simple online form, simple apps which integrate with social media that
run within in the IFrame of the portal. This includes web browser-based
strategy, airline, football manager kind of games. You can pick PHP.

PHP is ideally used in these kinds of scenarios. We can also consider other
web frameworks like Spring boot, Ruby on Rails, which cut down the verbosity,
configuration, development time by notches & facilitate rapid development.
But PHP hosting will cost much less in comparison to hosting other
technologies. It is ideal for very simple use cases.

CPU & Memory Intensive Applications #


Do you need to run CPU Intensive, Memory Intensive, heavy computational
tasks on the backend such as Big Data Processing, Parallel Processing, Running
Monitoring & Analytics on quite a large amount of data?

Performance is critical in systems running tasks which are CPU & Memory
intensive. Handling massive amounts of data has its costs. A system with high
latency & memory consumption can blow up the economy of a tech company.

Also, regular web frameworks & scripting languages are not meant for
number crunching.

Tech commonly used in the industry to write performant, scalable, distributed


systems are –

C++, it has features that facilitate low-level memory manipulation. Provides


more control over memory to the developers when writing distributed
systems. Majority of the cryptocurrencies are written using this language.

Rust is a programming language similar to C++. It is built for high


performance and safe concurrency. It’s gaining a lot of popularity lately
amongst the developer circles.

Java, Scala & Erlang are also a good pick. Most of the large scale enterprise
systems are written in Java.

Elastic search an open-source real-time search and analytics engine is written


in Java.
Erlang is a functional programming language with built-in support for
concurrency, fault-tolerance & distribution. It facilitates the development of

massively scalable systems. This is a good read on Erlang

Go is a programming language by Google to write apps for multi-core


machines & handling a large amount of data.

Julia is a dynamically programmed language built for high performance &


running computations & numerical analytics.

Well, this is pretty much it. In the next lesson, let’s talk about a few key things
which we should bear in mind when researching on picking a fitting
technology stack for our project.
Key Things To Remember When Picking the Tech Stack

In this lesson. I’ll share a few key things which we should bear in mind when researching on a technology stack for
our application.

WE'LL COVER THE FOLLOWING

• Be Thorough with the Requirements


• See If What We Already Know Fits the Requirements
• Does the Tech We Have Picked Has An Active Community? How Is the
Documentation & the Support?
• Is the Tech Being Used by Big Guns in Production?
• Check the License. Is It Open Source?
• Availability Of Skilled Resources on the Tech

I’ve written numerous projects from the ground up, spent countless hours
scavenging the web, browsing through technologies, frameworks to pick the
right tech that would align with my requirements.

I have wasted days, if not months, trying to implement stuff with a not so
fitting technology. I have implemented stuff with tech having no support and
community and have later cried for days into a pillow.

Picking the right tech stack is crucial for the success of our business. There is
no way around it. I think we all understand this fact pretty well. Once we pick
the tech & get down to coding there should be no looking back. We naturally
can’t afford it.

The guidelines listed below is the gist of my experience, they are the factors
which hold much importance in the process of picking the right technology
stack.

So, without any further ado. Let’s get started.


Be Thorough with the Requirements #

We should be crystal clear on what we are going to build. Things should not
be hazy. We cannot pick the right tech if we are unclear on the requirements.
Once we go hunting, we should be clear on what we are looking for.

For instance, when looking for a database, we should be clear on if we are


going to store relational data or will it be document-oriented, semi-structured
or with no structure at all.

Are we handling quite a large amount of data which is expected to grow


exponentially? Or the data is expected to grow at a manageable pace upto a
certain limit?

Will a monolithic architecture serve our requirements well or do we need to


split our app into several different modules?

Splitting the app into several modules, using heterogeneous tech in services,
helps us bail out on a particular technology in case things don’t work out.

See If What We Already Know Fits the Requirements #


It’s easier to build new applications with the tech we already know. We don’t
have to go through the steep learning curve that comes along with the new
tech.

Also, things are comparatively clearer when using the tech, we are well
familiar with. Being aware of the nitty-gritty, having familiarity with the
errors, exceptions & the knowledge to fix them helps us release the features at
a quick pace.

Avoid running for shiny new toys until you really need them. Do not fall for
the hype.

Imagine an exception thrown by a new tech which you haven’t seen before
ever in your life & also cannot find the solution online. You are stranded.
There is no one to help you, all you hear is crickets.

I’ve been there, done that. It’s frustrating, clicking through all the search
result pages of Google. Finding nothing.
Does the Tech We Have Picked Has An Active Community?
How Is the Documentation & the Support? #
The technology we pick ought to have an active community. Check the
involvement of the community on GitHub, StackOverflow etc. The
documentation should be smooth, easy to comprehend.

Larger the community, the better. Having an active community means


updated tools, libraries, frameworks & stuff.

See if there is official support available for the tech? There should be some
rescue available if we get stranded down the road. Right?

Is the Tech Being Used by Big Guns in Production? #


If the tech we are picking is being used by big guns in the industry, then this
confirms it being battle-tested. It can be used in production without any
worries.

We can be certain that down the line we won’t face any inherent scalability,
security or any other design-related issues with the technology. Since, the
codebase is continually patched with new updates, bug & design fixes.

We can go through the engineering blogs of the companies to get more


information on how they have implemented the tech.

Check the License. Is It Open Source? #


Picking an open-source technology helps us write our own custom features in
case the original solution does not have it. We do not have to rely on the
creator of the tech for new features & stuff.

Also, in terms of money, we don’t have to pay anyone any sort of fee to use the
product. Open-source tech also has a larger community since the code is open
to all, anyone can fork it, start writing new features or fix the existing known
bugs.

Availability Of Skilled Resources on the Tech #


Once our business starts gaining traction. We would need a hand to move at a
quick pace & roll out new features within a stipulated time. It’s important that
there are enough skilled resources available in the industry on the technology
we pick.

For instance, it’s always easy to find a MySQL administrator or a Java


developer as opposed to looking for a resource skilled on comparatively
newer technology.

Well, this concludes the lesson. Moving on to the next.


Conclusion

In this lesson, we will conclude our discussion by having an overview of all the things that we have covered in the
course.

Okay!! Guys… In this course upto this point, we’ve gone through all the layers
of a web application. Right from the UI, moving through the backend, caching,
database, messaging, data processing, pipelines etc.

We learned about different tiers involved in software architecture. The


fundamentals of communication over the web, scalability, high availability,
monolith, microservices. We’ve learned how notification systems work, how
webhooks work.

We covered the most frequently asked questions online like which is the best
database? Which is the best backend & the front-end programming language?
We’ve understood that there is no perfect or the best technology. Every
technology has a use case.

So, in context to designing web applications we’ve comprehensively covered


all the concepts in depth. This pretty much concludes the concepts part of this
course.

I am sure by now you have a solid understanding of the web architecture, the
concepts involved. When designing an application from the bare bones you
won’t be sitting in the dark anymore, you have an understanding of different
tiers in web architecture, you know the difference between a monolith & a
microservice, & when to pick which.

You know what technology to pick based on the use case. You know how to do
your research.

In the upcoming lessons, we’ll go through a few of the case studies. I believe
you already know everything; the case studies are just about applying the
learning, that’s about it.
Lets’ jump right into it.
A Web-based Mapping Service Like Google Maps

In this lesson, we will discuss a case study of a web-based mapping service like Google Maps.

WE'LL COVER THE FOLLOWING

• A Little Background On Google Maps


• Read-Heavy Application
• Data Type: Spatial
• Database
• Architecture
• Backend Technology
• Monolith Vs Microservice
• APIs
• Server-Side Rendering Of Map Tiles
• User Interface
• Real-time Features

Before I begin talking about the architecture of the service, I would like to
state that this is not a system design lesson, as it doesn’t contain any of the
database design, traffic estimations or code of any sort.

I will just discuss the basic architectural aspects of the service and how do the
concepts we’ve learned in the course apply here.

Let’s get on with it.

A Little Background On Google Maps #


Google Maps is a web-based mapping service by Google. If offers satellite
imagery, route planning features, real-time traffic conditions, an API for
writing map-based games like Pokemon Go & several other features.
First up, these massive successful services are a result of years of evolution
and iterative development. Online services are built feature by feature and
take years to perfect. Google Maps started as a desktop-based software written
in C++ & evolved over the years to become what it is today. A beautiful
mapping service used by over a billion users.

Read-Heavy Application #
Let’s get down to the technicalities of it. An application like this is read-heavy
& not write-heavy. As the end-users aren’t generating new content in the
application over time. Users do perform some write operations though it is
negligible in comparison to a write-heavy application like Twitter or
Instagram. This means the data can be largely cached and there will be
significantly less load on the database.

Data Type: Spatial #


Speaking of the data, a mapping application like this has spatial data. Spatial
data is the data with objects representing geometric information like points,
lines, polygons. The data also contains alphanumeric stuff like Geohash,
latitudes, longitudes, GIS Geographical Information System data etc.

There are dedicated spatial databases available for persisting this kind of
data. Also, popular databases like MySQL, MongoDB, CouchDB, Neo4J, Redis,
Google Big Query GIS also support persistence of spatial data. They have
additional plugins built for it.

If you want to read more about Spatial databases. This is a good read.

Database #
The coordinates of the places are persisted in the database and when the user
runs a search for a specific location the co-ordinates are fetched from the
database & the numbers are converted into a map image.

We can expect the surge in traffic on the service during the peak office hours,
during festivals or major events in the city. We need dynamic horizontal
scalability, to manage the traffic spikes. The app needs to be elastic to scale up
and down on the fly.

As I brought this up earlier, we have the option of picking from multiple


databases as both relational and non-relational support persistence of spatial
data. I will be more inclined to pick a non-relational NoSQL one as the map
data doesn’t contain many relationships. It’s a direct fetch of the co-ordinates
& the processing of them based on the user request. Also, a NoSQL database is
inherently horizontally scalable.

Though, we can scale well with a relational database too with caching as the
application is read-heavy. But in real-time use cases with a lot of updates, it
will be a bit of a challenge.

Real-time features like LIVE traffic patterns, information on congested routes,


the suggestion of alternative routes as we drive in real-time etc. are pretty
popular with the users of Google Maps.

Architecture #
Naturally, to set up a service like this we will pick a client-server architecture
as we need control over the service. Else we could have thought about the P2P
architecture. But P2P won’t do us any good here.

Backend Technology #
Speaking of the server-side language we can pick Java, Scala, Python, Go. Any
of the mature backend technology stack will do. My personal pick will be Java,
since it is performant, heavily used for writing scalable distributed systems, as
well as for the enterprise development.

Monolith Vs Microservice #
Speaking of monolithic architecture vs microservice, which one do you think
we should pick for writing the app?

Let’s figure this out by going through the features of the service. The core
feature is the map search. The service also enables us to plan our routes based
on different modes of travel, by car, walking, cycling etc.

Once our trip starts, the map offers alternative route locations in real-time.
The service adjusts the map based on the user’s real-time location & the
destination.
APIs #
For the third-party developers, Google offers different APIs such as the
Direction API, Distance Matrix, Geocoding, Places, Roads, Elevation, Time zone,
Custom search API.

The Distance matrix API tells us how much time will it take to reach a
destination depending on the mode of travel walking, flying, driving. Real-
time alternative routes are displayed with the help of predictive modelling
based on machine learning algorithms. The Geocoding API is about converting
numbers into actual places & vice versa.

Google Maps also has a Gaming API for building map-based games.

Though we may not have to implement everything in the first release. But this
gives us a clue that a monolithic architecture is totally out of the picture.

We need microservices to implement so many different functionalities. Write


a separate service for every feature. This is a cleaner approach, helps the
service scale and stay highly available. If a few services like real-time traffic,
elevation API etc. go down, the core search remains unaffected.
Server-Side Rendering Of Map Tiles #
Speaking of the core location search service, when the user searches for a
specific location. The service has to match the search text with the name of
the location in the database and pull up the coordinates of the place.

Once the service has the co-ordinates how do we convert those into an image?
Also, should we render the image on the client or the server?

Server-side rendering is a preferable option in this scenario as we can cache


the rendered image for future requests, as the image is kind of a static content
& will be same for all the users.

Also, as opposed to generating a single image of the map for the full web page,
the entire map is broken down into tiles that enable the system to generate
only the part of the map user engages with.

Smaller tiles help with the zoom in & out operations. You might have noticed
this when using Google Maps, that instead of the entire web page being
refreshed, the map is refreshed in sections or tiles. Rendering the entire map
instead of tiles every time would be very resource-intensive.

We can create the map in advance by rendering it on the server & caching the
tiles. Also, we need a dedicated map server to render the tiles on the backend.

User Interface #
Speaking of the UI, we can write that using JavaScript, Html5. Simple
JavaScript, Jquery serves me well for simple requirements. But if you want to
leverage a framework, you can look into React, Angular etc.

The UI having JavaScript events enable the user to interact with the map, pin
locations, search for places, draw markers and other vectors on the map etc.

OpenLayers is a popular open-source UI library for making maps work with


web browsers. You can leverage it if you do not want to write everything from
the ground up.

Okay!! So, the user runs a search for a location, on the backend, the request is
routed to the tile cache. The cache which has all the pre-generated tiles. It sits
between the UI and the map server. If the requested tile is present in the
cache it is sent to the UI. If not, the map server hits the database fetches the
co-ordinates and related data & generates the tile.

Real-time Features #
Coming to the real-time features. To implement it we have to establish a
persistent connection with the server. We’ve been through the persistent
connections in detail in the course.

Though real-time features are cool, they are very resource-intensive. There is
a limit to the number of concurrent connections, servers can handle. So, I’ll
advise implementing real-time features only when it’s really required.

This is a good read on the topic, how Hotstar a video streaming service scaled
with over 10 million concurrent users

Well, this is pretty much it on a web-based mapping service. We’ve covered


the backend, database, caching and the UI & a fundamental understanding of
how a service like Google Maps works.

I’ll see you in the next lesson, where we will discuss a baseball game online
ticket booking service.
A Baseball Game Ticket Booking Web Portal

In this lesson, we’ll discuss the case study of a baseball game online ticket booking application.

WE'LL COVER THE FOLLOWING

• Database
• Handling Concurrency
• Message Queue
• Database Locks
• Caching
• Backend Tech
• User Interface

In this lesson, we’ll have an understanding of the architecture & the key
points to consider when designing an application like a baseball game online
ticket booking portal.

Let’s get started.

Database #
Starting with the database, one key thing in this particular use case is the sale
of tickets online. We need to set up a foolproof payment system for the fans to
buy tickets to their most awaited baseball game.

For setting up payments, what do you think what database should we pick &
why?

Implementing an online payment system makes transactions & strong


consistency vital. The database needs to be ACID compliant. This makes a
relational database like MySQL an obvious pick for us.
Handling Concurrency #

Another important thing to note is that the application should be designed to


handle a high number of concurrent connections. There will be a surge of fans
on the portal, to buy tickets for the baseball game as soon as they are made
available.

Also, the number of requests will be a lot more than the number of tickets
available.

At one point in time, there will be n requests to buy one ticket. We need to
make sure the system handles this concurrent scenario well.

How will you implement this scenario? Think about it

Message Queue #
One way is to queue all the ticket buy requests using a message queue. Apply
the FIFO First In First Out principle. We’ve talked about this in the message
queue lesson, handling concurrent requests with the help of message queue.

Database Locks #
Another approach is to use database locks. Use the right Transaction Isolation
Level.

A transaction isolation level ensures consistency in a database transaction. It


ensures that at one point in time only one transaction has access to a resource
in the database.

This is a good read on it. Also, read snapshot isolation

Transaction isolation levels can be implemented only with a transactional


ACID compliant database like MySQL.

Generally, in e-commerce sites or when booking travel tickets, the number of


tickets shown on the website are not accurate, they are the cached values.
When a user moves on to buy a particular ticket, checks out the cart, then the
system polls the database for the accurate value & locks the resource for the
transaction.

Caching #
Speaking of caching. Pick any of the popular caches like Redis, Memcache or
Hazelcast to implement caching. There are a lot number of user events on the
portal where the users just browse the website to look at the current price of
the tickets and not buy them. Caching averts the load on the database in this
scenario.

Backend Tech #
Speaking of the backend technology, we can take a pick from Java, Scala,
Python, Go etc.

To send notifications to the users we can pick a message queue like RabbitMQ
or Kafka.

Let’s move to the UI

User Interface #
We don’t really need to establish a persistent connection with the server as the
application is kind of a CRUD Create Read Update Delete based app. Simple
Ajax queries will work good.

It’s a good idea to make the UI responsive, as fans will access it via devices
having different screen sizes. The UI should be smart enough to adjust itself
based on the screen size.

We can either design the responsive behaviour from the ground up using
CSS3 or leverage a popular open-source responsive framework like Bootstrap
JS.

If you are fond of JavaScript frameworks you can use a framework like React,
Angular, Vue etc. These frameworks are pretty popular in the industry &
businesses prefer to use them to standardize the behaviour & the
implementation of their applications.

Well, this pretty much sums the case study on a baseball ticket booking web
portal.
Introduction

WE'LL COVER THE FOLLOWING

• The Arrival Of Handheld Devices


• The Transition From Desktop To Mobile

The Arrival Of Handheld Devices #


This shouldn’t come as a surprise to you, if I say that, today mobile devices
have the maximum market penetration, globally. Approx. 60 to 70% of the
users in this day and age are accessing online services through their mobile
devices as opposed to accessing them via their laptops or desktops.

And this technology shift is for the obvious reasons: accessibility & the ease of
use of mobile devices. We can carry our mobile phones anywhere with us, be
it when hanging out with our friends or when at our office cubicles. Also, we
don’t have to be tech-savvy to know how to operate a handheld device. My
mother, who has never operated a computer, runs google searches through
voice commands on her android device, without requiring any sort of
assistance. This is how simple and intuitive the user interface of handheld
devices is.

Engineers have done an impeccable job in making the user interfaces as


intuitive as possible making the onboarding of the new users smooth. The
mass adoption of handheld devices has totally changed the technology
landscape. It has provided a way for non-tech-savvy users to enter into the
online universe. It’s a totally different ball game. Businesses are going from
web-first to mobile-first. There was a time when just nerds used to play
computer games, today everyone is playing Candy Crush, PubG, Fortnite on
their mobile devices.
The Transition From Desktop To Mobile #

I’ve always been an avid gamer, love to play games on all the platforms, be it
web, console or the desktop. Back in the day, this is around 2010, I got
introduced to casual gaming via Facebook social games like Farmville & many
more. These games have a freemium business model, these are free to play
with in-game purchases & were popularized by companies like Zynga in
conjunction with social networks like Facebook. Farmville is one popular
product of Zynga, that had a major contribution in putting the company on
NASDAQ.

I got pretty hooked; these casual games became my thing. I often switched
between my work tab & the game tab throughout the day on my laptop. This
kind of kept me going for a longer period of time without getting bored or
burnt out.

Online multiplayer social gaming was a new thing then & it became a major
source of earning for Facebook. Every now and then there was a new exciting
game on the Facebook app store. Just Zynga alone contributed to 15 - 20% of
Facebook’s total earning.

Gradually over time, smartphones started getting popular. They had a


significant improvement in hardware. They were loaded with more features,
cool cameras, better memory and whatnot. As their popularity rose, & they
became a household thing, this changed the whole online gaming landscape.
Almost all of the games transitioned to mobile devices, as the gaming
companies observed better user retention and engagement rate via the mobile
versions of their games. There were more MAU (Monthly Active Users) & DAU
(Daily Active Users) after the gaming studios introduced mobile clients for
their games.

As the mobile engagement was increasing the web engagement was


decreasing. Many businesses decided to focus just on the mobile. Clash Of
Clans is a good example of this.

Today, there are hardly any games, besides the instant messenger games, you
can find on Facebook. Several have gone mobile-only. The Facebook game
store feels like a deserted place & the social network is focusing more on ads,
business pages & Facebook groups to make profits. And this technology shift is
not just for the gaming segment, this is happening for arguably every business
niche.

Mobile devices today bring-in the majority of the traffic for businesses. There
are mobile-only businesses with billions of users like Instagram, TikTok,
Pokemon Go & so on. Google Play Store has over 2 billion monthly active users
and has over 3.5 million apps, 82 billion app downloads as of today.

In case you are wondering what do these terms like mobile-only, mobile-first,
mobile-friendly really mean? How important are they in the application
development universe? We’ll find out, in the next lesson.
Before You Design Your Mobile App

WE'LL COVER THE FOLLOWING

• Mobile Only
• Mobile First
• Mobile Friendly – Responsive
• What Approach Should You Choose For Your Business?
• Real Life Examples

If you are in the initial planning & design phase of your service or already
have a web-based service up and running & are wondering whether you
should have a mobile app for your service. It’s always a good idea to do a
thorough market research in your niche before you get down to writing a
mobile app for your service.

I know I just said that mobile devices have the market penetration & are
bringing-in the majority of the traffic & all, but there is no one size fits all
strategy for approaching the market for businesses. One quick example I
would want to give you here is I own a technology website & almost 80% of
my visitors are from the desktop. My business does not have a mobile app &
it’s doing just fine.

So, there are several things to consider before we are in a position to take this
decision - if we really need an app for our business. Yeah!! it may be a good to
have but is it a necessity? As writing a mobile app has significant costs
associated with it to the point you may have to set up a dedicated mobile team
for your business.

If you are feeling courageous & thinking of writing a mobile app all by
yourself let me tell you this, it has a steep learning curve. And saying with
experience, you will start with a feeling like “well, I know the backend
development. How tricky would writing a mobile app be?” & along the way, you

would be like “Oh man!! I never thought getting my business on mobile would
involve so much work. Will this ever end?”.

Hand-held devices are battery-powered; the resources (CPU, Storage, RAM,


Data) & user’s patience are limited. It requires a completely different
approach when writing mobile apps as opposed to when writing web
applications that would run on the cloud. Big guns like Facebook, Instagram &
others do a lot of R&D & strategizing when developing their apps.

We’ll talk all about that but before let’s be clear on terms like mobile-only,
mobile-first, mobile-friendly. What do they really mean?

Mobile Only #
Mobile-only means that the business operates just via mobile. It may have a
mobile website or an app on the play store or both. Generally, it’s the apps that
the businesses move forward with when going mobile-only & that drives the
majority of the traffic.

Mobile websites run on mobile browsers and are primarily built for engaging
traffic coming in from the google search. And then from there, the users are
routed to the respective apps. TikTok, Pokemon Go are examples of mobile-
only businesses.

Mobile First #
Mobile-first means the user interface of the service is designed with an idea
that the majority of the users of the business will use its services via its mobile
client. The desktop-based website interface is secondary. The mobile interface
is the interface that a business would want to show to its customers first as
opposed to showing its desktop interface.

A mobile-first website will contain everything that a user would need to fully
experience a service and to interact with all of its features. In case of a mobile-
first approach, it’s possible for the desktop interface to contain fewer features.

When the designers start to design the interface, they would first design the
mobile interface and then based on that, interfaces for other platforms like
desktop, tablet etc. would be built. In a mobile-first approach, a business
typically goes to the market with an app or a browser-based mobile website.

Myntra.com, India’s leading online fashion retail business is a good example


of this. It started with both a desktop website and a mobile app. Majority of
the revenue was being generated from the mobile app over 70% with over
90% traffic. So, it started focussing more on the mobile app and eventually
killed its desktop website.

But over the time because of the customer demand, that the business should
be on both the platforms, Myntra re-opened its desktop website but is still a
mobile-first business.

Mobile Friendly – Responsive #


Mobile-friendly as the name implies are websites that are friendly for mobile
but are originally built to render on desktop browsers. They are popularly
known as responsive websites. They have a grid-based design & adapt
themselves based on the screen size of the device. We can also call these
websites as web-first or desktop-first.

Generally, a responsive web page is divided into rows and columns containing
grids. And as the screen size gets smaller those grids re-arrange themselves
based on the screen size.
So, with this approach, we don’t have to do anything extra for mobile. Just
develop a desktop-based responsive website and it will automatically render

for all the screen sizes.

This may sound convenient but there is a little hitch. We may not have 100%
control over how the responsive website renders on different devices. Some
elements of the page may get hidden or may not render in a way that we
would like. To tackle this, we can write CSS-media queries but still, it won’t be
as good as a mobile-first built website. This is why businesses prefer to go
ahead with the mobile-first strategy if the majority of the traffic comes-in
from the mobile.

What Approach Should You Choose For Your


Business? #
When picking on the right strategy to approach the market, we need to inform
ourselves well on things like:

How are the users of the existing businesses, if there are any, in the same
niche, accessing their websites?

Do these businesses have an app on the popular app stores or are they
operating just via their websites? If they have an app, how many
downloads do they have? What are their primary traffic & social
discovery (how users find the service) sources?

What is the revenue generation platform wise? Sometimes it’s hard to get
that kind of info if the business doesn’t declare it publicly. However, we
can look into the traffic on their app and the website and kind of assume
(though it’s not always true) that the platform getting the maximum
traffic will generate most of the revenue. For all this business analytics
information there are a plethora of freemium tools with browser
extensions and services available online. Just Google.

Besides these key points, the type of service being offered to the users plays a
decisive role in creating a strategy for approaching the market. It makes it
really easy to figure out if we need to move ahead with or without a mobile
app. For instance, let’s say, we intend to bootstrap a service in the health
niche, a service that would enable the users to track their eating habits, also
suggest them of the healthy alternatives available. The service would also
have some of the social features for user engagement & retention.

What do you think would be the best strategy to approach the market in this use
case? Do we need a mobile app? Or just a web-based website would suffice?

To track the meals throughout the day, it would be inconvenient for the user
to open his laptop or rush to his desktop everytime he wants to input or check
the calories of the food he just ate. On the other hand, if we offer the end user
all the services via an app that he could install on his mobile, he can easily
track all the meals throughout the day anywhere he enjoys his meals be it at
the home, the office cafeteria or when stuck in traffic. It’s an obvious fact that
a mobile-first approach will work best for our use case.

We can also have a web-based interface, it would be a good to have, though


not a necessity in the initial stages of starting up. Also, from a technical
standpoint, there are technology offerings like Firebase that enable us to write
an app without investing too much time in setting up the backend
infrastructure. More on that in the upcoming lessons.

Real Life Examples #


Myntra.com was making 30% of the revenue from its desktop site still they
took the decision to ditch the web version of the business for the mobile
version. The reason being dedicated platform teams have costs. Focusing on a
single platform is both peaceful & economical at the same time.

Speaking of my technology website, the maximum visitors, almost 80%, are


from the desktop. Here is the Google analytics image for the traffic on my
website.
So, naturally, a mobile-first approach isn’t for me. A mobile-friendly approach
is ideal for my use case hence my website has a responsive user interface.
However, say if I was selling something on my website & most of the sales
would be happening via the mobile client regardless of the fact that the
maximum traffic was coming in from the desktop. I may have to think about
writing a mobile app for my business, for one simple reason, money. It would
then make perfect sense for me to have a strong presence on both the
platforms.

Here is one more example with regards to this.

Pixel Federation, a browser & app-based gaming company based out of


Slovakia, launched a game called Seaport in 2015. The team started with the
desktop browser version of the game integrated with Facebook, that’s how I
discovered the game. In 2017 they launched an app that got over 8.1 Million
downloads.

The app launch helped the game earn over 7.6 Million €. The game has
approx. 305K Daily active users & out of them only 50K are desktop users but
those 50K users bring in almost one-third of the game revenue.

So, I think you got the point. Research in your niche is important before you
start writing any sort of code.

Moving ahead, In the next lesson, let’s talk about the responsive user
interface.
Responsive Interfaces

In this lesson, we will talk about the responsive user interfaces.

WE'LL COVER THE FOLLOWING

• Designing Responsive Websites

In the previous lesson, I talked a bit about mobile-friendly responsive websites.


These websites run in the mobile browser and is one way of having a mobile
client for our service. In this lesson, we will have a quick insight into how we
can develop responsive websites for our service.

There are two approaches to designing responsive websites mobile-first & web-
first. We have already discussed these approaches. In the mobile-first
approach, we can design the website for the small screen & then let it adapt
for the bigger desktop screen or if we follow the second approach, we can
design the website for the bigger screen and then let it adapt for the smaller
screens.

In this day and age, when most of the world is online there are a plethora of
smart devices available in the market with different & unique screen sizes.
Smart TVs, Kindle, Android-powered devices, IoT devices, Blackberry,
Windows handheld phones, Apple products like iPhone, iPad and the list
doesn’t end here.

We can now even check our Facebook notifications & emails on our super tiny
smartwatch screens. Well, if you ask me unless I am James Bond, I will always
prefer to check my messages on my phone. I know that was a bit out of
context. Anyway, so, it’s not possible for developers to create & maintain
dedicated user interfaces for every screen size. This makes writing a
responsive user interface an obvious go-to approach for us.

A popular saying with responsive web design is that the content should be like
water, it should take the shape of the vessel it is poured in.

Okay!! Now, let’s talk about the popular technologies used by the developers
for writing responsive user interfaces.

Designing Responsive Websites #


I Am Not A Designer, How Do I Develop A Responsive Website?

If you are not a designer, cannot hire a designer, you are more of a backend
developer & want to design a professional-looking responsive website all by
yourself. Pick BootstrapJS. Period.

BootstrapJS is a production-grade open-source CSS framework for designing


responsive user interfaces. It contains CSS, common JavaScript features,
animations, typography, form elements, buttons & much more that the
websites commonly have. So, we don’t have to write anything from the
ground up, everything is pre-loaded. Just plug the elements together and build
your website.

Also, since the framework is open source there are a lot of ready-made
templates & plugins both free and premium if you need much more than what
the core framework offers. Still, if you cannot find the additional features
online, you can always write it for yourself as the code is open to all.

I have personally used the Bootstrap framework for most of my websites. It


has never let me down. I am not a designer; I am more of a backend
developer. The learning curve is not that steep. If you have some idea of front-
end development it should hardly take less than a week to get the hang of the
framework. It’s intuitive & easy to understand. Just wisely use the grid-based
approach using rows and columns to build the web page.
The framework was originally written at Twitter to have consistency across all
the user interfaces of the internal tools. Before Bootstrap, the front-end
developers at Twitter struggled with the maintenance and consistency across
multiple user interfaces. At a later point in time, Twitter released the project
as open source. You can check out some of the projects built using Bootstrap
framework here.

Another popular framework for writing responsive websites is jQuery Mobile.


The project is developed and maintained by the jQuery project team. I am a
bit biased towards using Bootstrap as I really like the default CSS provided by
the framework.

Besides, these two popular frameworks, if you wish to browse through some
other solutions for designing responsive websites there are many like
Skeleton, HTML5 Boilerplate, Less Framework etc. you can do your research.

I think that’s about it regarding designing responsive websites. In the next


lesson let’s talk about the types of mobile apps.
Types Of Mobile Apps – Part 1

In this lesson, I’ll talk about the two different types of mobile apps that are the native apps and the hybrid apps.

WE'LL COVER THE FOLLOWING

• Native App
• Technologies For Writing Native Apps
• Hybrid App
• Technologies For Writing Hybrid Apps
• React Native
• Apache Cordova - PhoneGap
• Ionic Framework
• Flutter

Just for clarity, when I say mobile apps, I mean the apps that we download
from the app stores like the Google Play Store & install on our mobile.

There are two types of mobile apps – Native & Hybrid. In this lesson, we’ll find
out what they are & what are the technologies, popular in the developer
circles, that are required to build those apps. In the subsequent lessons, we’ll
also discuss things like:

Why is it so important for developers to pick the right type of app to


implement their use case?

Why do we need different types of mobile apps? What are the pain points
these app types are trying to solve?

Which app type, hybrid or native, will suit best for my use case? We’ll
discuss this with some real-life examples.

So, without further ado. Let’s get on with it.


Native App #
Native apps are apps that are built exclusively for a particular operating
system be it the Android, iOS or a Windows-based handheld device. These apps
function only on the OS they are built for. For instance, an app that is built for
Android OS will not work on Apple OS.

Native apps interact directly with the operating system and the device
hardware as opposed to communicating with it via a wrapper, an adapter or a
middle layer. For this reason, they have full access to the device hardware like
camera, sensors & stuff.

These apps provide high performance, they have a consistent user interface &
provide the look and feel of the native OS.

Native apps don’t face any lag issues when rendering UI animations like the
slider movement, hiding and display of UI elements etc. With these apps, the
UI is pretty responsive, that means when the user clicks on the UI, he can
immediately see the change as opposed to seeing the change after a bit of a
lag.

Native apps are developed using the APIs and the SDKs provided by the native
OS. Some of the examples of native apps are the android apps of LinkedIn,
Tinder & Hike.
Technologies For Writing Native Apps #
Every mobile OS supports a certain set of technologies for writing an app that
would run on that particular OS. For instance, if you want to build an app that
would run on Android OS, you can use Java, Kotlin or C++. This official
Android developer documentation is a good place to start for the android
application development.

Likewise, for writing native apps for iOS you can use Swift, Objective C along
with the Cocoa Touch framework. To write iOS apps, this apple developer
documentation is a good place to start.

Just like this, every respective mobile OS supports a different set of


technologies to enable developers to build apps for its platform.

Hybrid App #
Hybrid apps as the name implies are a hybrid between the native and the
web-based technologies. Just like native apps, they can be installed from the
app stores on the mobile and can access the hardware & communicate with
the OS of the device.

Hybrid apps are primarily built using open web-based technologies such as
Html5, CSS, JavaScript. They run in a native-container and communicate with
the native OS via a wrapper or a middle layer. This middle layer enables the
open web technologies to talk to the native OS. Now, because of this additional
middle layer which native apps don’t have, hybrid apps are a bit slower than
the native apps when it comes to performance & the rendering of the UI.
There are several popular frameworks available to write hybrid apps such as
React-Native, Ionic, Cordova etc. Let’s discuss this up next.

Technologies For Writing Hybrid Apps #


Below are a few popular technologies available to us for developing hybrid
mobile apps.

React Native #
React Native is an open-source mobile application development framework,
written in JavaScript, developed by Facebook. With it, we can develop
applications for multiple platforms like Android, iOS, Windows etc.

Before releasing the framework, Facebook was already using it for its ad
manager, analytics & the group app. React-Native is a pretty popular
framework for writing hybrid apps. In 2018, it had the highest number of
contributors for any repository on GitHub.

Some of the companies using React-Native for their mobile apps are
Bloomberg, Walmart, Uber Eats, Discord.

Apache Cordova - PhoneGap #


Apache Cordova is an open-source hybrid mobile application development
framework released by Adobe. The framework enables the developers to
build mobile apps for Android, Windows, iOS, using Html, JavaScript, CSS.

There are several ecosystems and frameworks built on top of Cordova like
Ionic Framework, PhoneGap etc. PhoneGap is Adobe’s commercial version of
Cordova. Besides the open Cordova framework, Adobe provides several tools,
an entire ecosystem, to facilitate mobile development via PhoneGap.

Here is a list of apps developed using PhoneGap.

Ionic Framework #
Ionic is an open-source SDK for writing hybrid mobile apps built on top of
Apache Cordova and Angular JS. Here are some of the companies developing
their apps using the Ionic Framework.

Flutter #
Flutter is an open-source hybrid mobile application SDK by Google. It can be
leveraged to develop applications for platforms like Android, iOS, Windows,
Mac, Linux, Google Fuchsia & the web. Some of the apps developed using
Flutter are:

Square, Google Assistant.

For a full list of apps developed using Flutter here you go.

This is a good Wikipedia resource that lists out the various mobile app
development tools, SDKs & platforms for writing mobile apps.

So, these are some of the popular technologies used by the industry to write
hybrid apps. Let’s continue this discussion on hybrid and native apps in the
next lesson.
Types Of Mobile Apps – Part 2

In this lesson, we will continue the discussion, from the previous lesson, on different types of mobile apps.

WE'LL COVER THE FOLLOWING

• Cross-Platform App Development Pain Points - The Need For Hybrid Apps
• Issues With Hybrid Apps
• Real Life Examples
• Airbnb Ditched React-Native For Native Tech
• Udacity Abandoned React Native
• Facebook Admitted Counting Big On Html5 For Their Mobile App Was A
Mistake

Okay, so, upto this point, we have learned about the two different types of
mobile apps & the popular technologies that are leveraged to build those.
Now, when we speak of hybrid apps, the first question that pops up in our
mind is what is the need for this type of app when we already have native apps,
they are performant & have a consistent UI? Why would any business want to
compromise on user experience by offering its service via a hybrid app?

In the mobile app development universe, there are a few pain points that
come along with the native app development & businesses have to turn
towards hybrid apps to overcome those pain points. Let’s find out what they
are.

Cross-Platform App Development Pain Points -


The Need For Hybrid Apps #
We’ve discussed this earlier that when writing native apps, we have to
develop dedicated apps for every platform be it Android, iOS, Windows,
Blackberry or any other OS. Developing & maintaining a dedicated mobile app
for every OS is the biggest pain point of cross-platform app development.
Every OS supports a specific set of technologies to build apps for them. There

is no common ground, no common technology that is supported by all the


platforms. Due to the need of having a presence on multiple platforms,
developers have to first educate themselves on various technologies before
they get down to the implementation of any sort.

Businesses have to set up dedicated teams for every platform. A team building
an android app has to be proficient in Java, Kotlin or C++, & a team building
an app for iOS has to be proficient in Swift.

Even if the reluctant developers do go through the steep learning curve, build
and launch their apps on these platforms. What’s the guarantee that in future
a different OS won’t pop-up, that would support a different set of technologies
to build apps for its platform?

We naturally, when starting up, do not have enough resources (developers +


money) to set up dedicated teams and codebases for every platform. We need
a common codebase, something portable, something that we could build once
& run everywhere.

This led to the emergence of hybrid apps. Since, these apps are developed
using open web-based technologies like Html5, JavaScript. Developers
working in the modern web development space have this skillset already, they
do not have to go through a steep learning curve to start building these apps.
Any developer with the modern web development skillset can start writing
code without going through a daunting learning process. With hybrid apps,
businesses do not need dedicated teams for different platforms. The same
codebase can be deployed on multiple platforms with minor modifications.
These apps are easy to build due to the familiarity with the tech. This saves
time and money.

So, building hybrid apps is the way to go right? I am just starting up, my team is
small, I have limited resources, why would I want to write a dedicated app for
every platform. I should pick the hybrid app approach, Right?

Well, I wish the answer was that straight forward & I could always say yes!! As
I’ve said this over and over throughout the course, that there is no silver
bullet, no one size fits all. Every tech has a use case, it comes along with its
pros and cons, hybrid apps are no different.

Issues With Hybrid Apps #


Hybrid apps are not as performant and smooth as native apps as they run
inside a native container and talk to the underlying OS via a middle layer.
This slows down their performance a bit and introduces lag.

Though a few frameworks and ecosystems claim to be as performant as native


apps, sometimes even better but marketing is one thing and running an app in
production achieving the same performance as native apps is another.

A few of the businesses in the past have tried to adopt the hybrid app single
codebase strategy to deploy their apps across platforms but have eventually
reverted to the native app approach to achieve the desired user experience.

Here are a few examples:

Real Life Examples #


Airbnb Ditched React-Native For Native Tech #
Airbnb engineering in a series of blog posts shared their experience of
developing their mobile app with React-Native.

They built their desktop website using React JS, hence considered React-Native
as an opportunity to speed up the app development process by having a single
codebase as opposed to having multiple codebases for different platforms.

They spent a couple of years working on it & eventually abandoned React-


Native for the native technology. They faced performance issues specifically
during the app initialization, initial render time, with app launch screen,
when navigating between different screens & also experienced dropped
frames.

They had to write several patches for React-Native to get the desired native
functionality. They found some of the trivial stuff, that could be easily done
with the native tech, quite difficult to pull off with React-Native.

The lack of type safety in JavaScript made it difficult to scale & the
development process turned out to be difficult for engineers who were used to
writing code in languages with default type-safety checks. The lack of type
safety made code refactoring extremely difficult.
For a full account of their experience read React Native at Airbnb

Udacity Abandoned React Native #


Here is another instance, where the Udacity mobile engineering team
abandoned React-Native due to the increased number of Android-specific
features requested by their users. Their android team was reluctant to go
ahead with the hybrid app approach, the long-term maintenance costs of the
React-Native codebase were high. They also faced UX consistency issues
across the platforms. For a full account of their experience, here you go.

Facebook Admitted Counting Big On Html5 For Their Mobile


App Was A Mistake #
This is back in 2012, I know it’s been a while, technologies have matured a lot,
but still, I felt I should add this instance.

Facebook admitted that they made a big mistake investing too much time and
resources writing their mobile app with Html5 instead of using the native
tech. Their mobile strategy relied too much on open web technologies. Here is
a full account on VentureBeat.

With this, we have reached the end of the lesson. Here are a few interesting
reads:

Who Will Steal Android From Google?

The Story Of Firefox OS

In the next lesson, I’ll talk about how to choose the right mobile app type for
our use case. Hybrid or Native?
Choosing Between A Native & A Hybrid App

In this lesson, we will have an insight into how to choose the right type of mobile app for our use case?

WE'LL COVER THE FOLLOWING

• When Should We Pick A Native App For Our Use Case?


• When Should We Pick A Hybrid App For Our Use Case?

When Should We Pick A Native App For Our Use


Case? #
Here are the scenarios, listed below in bullet points, when we should go ahead
with a native app:

When we have heavy, graphic and hardware requirements like for a


mobile game or a game streaming app. This is the scenario when we need
top-notch performance from the app & even a tad bit of lag is
unacceptable.

When we intend to write an app that has heavy UI animations, for


instance, a fancy social app containing a lot of animations or a finance
app containing a lot of real-time charts and graphs & stuff. In this
scenario, it’s just not okay to have any sort of lag in the application. The
application needs to be as responsive & reliable as it can be.

When the app is pretty complex & is reliant on hardware access, like
camera, sensors, GPS etc. to function. In this use case, we generally have
to write a lot of platform-specific code. A GPS, sensor-based health and a
step-tracking app is a good example of this.

When the look & feel of the app and the user experience should be just
like the native OS. When the UI needs to be, not just functional, but
flawless.
When you have other businesses in the same niche competing with you
with native apps. It would be a blunder to offer our service via a hybrid
app. User’s today aren’t installing as many apps as they used to. Don’t
expect them to show mercy on you when you don’t have a product that is
better than your competition.

When the app always needs to support new mobile OS features as soon as
they are released.

If you are a business that can afford dedicated teams for Android & iOS,
you should go ahead with native apps. Don’t even think about the hybrid
app approach.

When Should We Pick A Hybrid App For Our


Use Case? #
When the app requirements are simple, there is nothing complex. Also, in
the future addition of any new complex features isn’t expected. A news
app is a good example of this. Developing a news app as a hybrid app will
also provide the same look and feel of the app across all the platforms.

When you just cannot afford dedicated codebases for platforms but still
have to hit the market. There are two approaches to this either launch
with a native app on one platform or write a hybrid app. This entirely
depends on how you want to go ahead.
Yes, the native apps provide top-notch performance but we cannot
entirely discard hybrid tech on the grounds of performance and the
availability of other native features. There can be instances where you
won’t even need dedicated apps, a hybrid app could fulfil all your
requirements pretty well. It all depends on your requirements.

When we just need to test the waters with a pre-alpha release or an MVP
(Minimum Viable Product). In this scenario, it won’t make sense to learn
the native tech and then write the app. You can quickly launch the MVP
via a hybrid app written with the open web technologies.

When you have a team that is not fluent with the native technologies and
it would take a lot of time to learn that tech. This scenario is a trade-off
between costs and performance. Also, the developer sentiment is another
aspect to this.

So, these are some of the general rules that we can follow when taking our
pick of the two types of apps. Another good approach is to find the businesses
in the same niche, research on what technologies they have used to write
their apps.

With this being said, let’s move on to the next lesson where I’ll be discussing
progressive web apps.
Progressive Web Apps

In this lesson, we will learn about progressive web apps & why you should build them for your service.

WE'LL COVER THE FOLLOWING

• What Are Progressive Web Apps?


• The Need For PWAs
• Will PWAs Replace Native Apps?
• Examples Of Progressive Web Apps
• BookMyShow PWA
• Flipkart PWA
• Twitter PWA

What Are Progressive Web Apps? #


Progressive Web Apps or PWAs are apps, with the look and feel of native apps,
that can run in the browser of both mobile and desktop devices & can also be
installed, from the browser, on the device of the user. When installed on the
device, progressive web apps run in their own window without an address
bar or a browser tab just like the native apps. When you open a PWA in the
browser tab, in the address bar you’ll see the install option with a plus sign.
Clicking on it will install the app on your device with a shortcut on the home
screen.

But don’t we already have responsive mobile websites for the browsers? Why do
we need progressive web apps? What good is that?

The Need For PWAs #


Businesses today are kind of inclined towards writing progressive web apps,
as opposed to responsive websites, as they have the same look and feel of the
native apps. The general trend is businesses entertain the search engine
traffic via their responsive mobile websites and then try to direct that traffic
to their native mobile apps. I’ve talked about this before.

So, now instead of directing the users to their native apps, businesses can
offer the same native app user experience to the users directly in the browser.
Also, if the user wishes he can install that app from the browser on his device.
Progressive web apps function just like the native apps with having access to
the underlying OS and the device hardware.

Also, since progressive web apps are developed using the open web
technologies like Html, CSS, JavaScript, also with the help of frameworks like
Angular, React, Ionic, Google Polymer, so there is no native tech learning
curve. Just write the code once and run everywhere.

PWAs run in both the mobile and the desktop browsers & can be installed
even on the desktops. These apps can work offline, have push notifications
just like native apps. They can be indexed by search engines, users can share
the links of the apps with their friends, you don’t need to update them every
now & then as we generally do with native apps. So, every time you open an
installed PWA on your device you will see the latest version of it.

Okay!! So, that means there is a possibility of native apps going obsolete. Right?

Will PWAs Replace Native Apps? #


No!! PWAs are not a replacement for native apps. Native apps still hold good
for the use cases I discussed in the previous lesson. We definitely don’t want
to write an online mobile game that is CPU intensive with a PWA. A Native
app will easily beat a PWA in terms of performance and user experience.

PWAs are more in competition with responsive mobile websites. I mean, why
write a responsive website when you can develop something that provides an
app-like experience? Imagine browsing an e-commerce website via a
responsive mobile site and a progressive web app. One would prefer an app-
like experience any day.

Let’s have a look at the some of the examples of progressive web apps.

Examples Of Progressive Web Apps #


BookMyShow PWA #
Bookmyshow.com is India’s leading event & movie, online ticket booking
platform with over 50 million monthly visitors. They were experiencing a
high bounce rate on their mobile website, to provide a better user experience
to the visitors they replaced their mobile website with a progressive web app.

After the launch of the PWA, they observed an exponential increase, upto 80%
on the conversion rates. In terms of the app size, it’s 54 times lighter than the
android app and 180 times smaller than their iOS app.

Flipkart PWA #
Flipkart.com, India’s leading retail e-commerce website shut down its mobile
website and moved forward with the app-only strategy. It was hard for the
development team to provide an app like immersive experience on their
responsive mobile website.

But with the launch of their progressive web app, the engagement rate
increased 3x, conversion rate went up by 70%, there was a reduction in the
data usage by 3x. For a full account of this here you go

Twitter PWA #
Twitter has approx. 328 monthly active users. It launched its progressive web
app in 2017 and made it the default mobile web experience for the users. This
increased the pages per session by 65%, 20% decrease in bounce rate, 75%
increase in Tweets sent.

This resource contains a list of businesses who have launched PWAs for their
service.

Check out this Google developers’ resource to begin writing your first PWA.

This Mozilla documentation is a good resource to gain more knowledge on


progressive web apps.
Mobile Backend as a Service

In this lesson, we will learn about mobile backend as a service and when to use it.

WE'LL COVER THE FOLLOWING

• What Is Mobile Backend as a Service?


• When Should You Use A Mobile Backend as a Service?

What Is Mobile Backend as a Service? #


Mobile Backend as a Service or MBaaS is a cloud-based service model that
takes care of the backend infrastructure of our mobile app and enables us to
focus on the business logic and the user interface.

So, what are the things a MBaaS takes care of? What features does it
bring along?

An online service besides the business logic and the user interface contains
several other key features that collectively make the service functional, top-
notch, a service worthy of getting the user attention. These features are user
authentication, integration with social networks, push-notifications, real-time
database, caching, data storage, messaging, chat integration, integration of
third-party tools, analytics, crash reporting and so on.
A mobile backend as a service takes care of all these features making a
developer’s life a hell lot easier during the bootstrapping phase. Imagine
writing and maintaining all these features yourself from the bare bones. I
mean it’s not even possible unless you have a team.

With these freemium cloud-based services, you don’t have to worry much
about the app hosting costs during the initial days as these services offer a
generous free tier. So, if you are a solo developer, with these services, you can
always bring your idea to reality & show it to the world.

Deploy your app to the cloud. Show it to the community. Have some initial
customers. Get feedback. Pitch it to the potential investors without paying a
dime for hosting & the infrastructure. Well, what more can I say?

This is the whole reason the cloud service model blew up. It provided a way
for solo, indie developers to bootstrap their business, get a foothold in the
market by just focussing on the idea implementation part and letting the cloud
service take care of the rest.

In case you aren’t much aware of the cloud. I have written a blog post about it
why use the cloud? How a cloud is different than traditional computing?. This
will give you an insight into it.
A MBaaS typically offers an API for every feature. There will be an API for
user authentication, an API for real-time database, an API for messaging and
so on. Our code can directly interact with the respective API and exchange
information.

Also, since, we do not have to manage the infrastructure, a mobile backend as


a service cuts down the time it takes to develop an app by notches. A few
popular examples of MBaaS are Google Firebase, AWS Amplify, Parse.

Parse was the early leader in this space but was shut down by Facebook.

When Should You Use A Mobile Backend as a


Service? #
MBaaS is great for mobile-only services, great for use cases where you do not
need or you don’t already have a custom backend, up and running for your
service. In case of a MBaaS, all the business logic resides on the client which is
the mobile app. So, the app is a Fat client.

MBaaS is best for apps like mobile games, messaging apps, to-do list kind of
apps. When using a MBaaS, there are a few things that I would want you to
keep in mind. Since we don’t have much control over the backend we always
have to keep the business logic on the client. If we ever need to add a new
feature that would require the business logic on the server. We will have to
design a custom backend from the bare bones.

On the flipside, if we start with a custom backend and then write a mobile
client which is the conventional way. You can always customize the design of
your service, introduce new clients and stuff, just with an introduction of
dedicated APIs for respective clients.
We can also use MBaaS & a custom backend setup in the same app in
scenarios where we are required to integrate a legacy enterprise system with
our mobile app or if we need to leverage some additional features that the
custom backend server hosts. Think of a banking app built using a MBaaS that
needs to interact with the legacy enterprise backend to cross verify the data
entered by the user every time.
Also, not having much control over the backend, makes this kind of a vendor
lock-in situation. Just like parse.com what if the service provider decides to

close his shop. Or he stops upgrading his service, which may result in severe
security flaws or he stops adding new features to his service or you in future
disapprove of his updated billing rules. What are you gonna do next? Keep that
in mind.
Epilogue

Guys!! This pretty much concludes the course. You can send me your thoughts
at [email protected] I would appreciate any sort of feedback, suggestions
or anything you want to say.

This is my LinkedIn profile

You can also follow my blog 8bitmen.com I keep writing about the
decentralized web, software architecture & architectures of large scale
internet services.

Wish you luck with your career…

Cheers!!

You might also like