Big Data Analytics - CCS334 - Notes - Unit 1 - Understanding Big Data
Big Data Analytics - CCS334 - Notes - Unit 1 - Understanding Big Data
Page: 1 / 39
processed, accessed, and analyzed at the same rate to have any
meaningful impact.
Variety
Data is heterogeneous, meaning it can come from many different
sources and can be structured, unstructured, or semi-structured.
More traditional structured data (such as data in spreadsheets or
relational databases) is now supplemented by unstructured text,
images, audio, video files, or semi- structured formats like sensor
data that can’t be organized in a fixed data schema. big data” and
were first defined by Gartner in 2001.
In addition to these three original Vs, three others that are often mentioned
in relation to harnessing the power of big data: veracity, variability, and
value.
Veracity:
Big data can be messy, noisy, and error-prone, which makes it
difficult to control the quality and accuracy of the data. Large
datasets can be unwieldy and confusing, while smaller datasets could
present an incomplete picture. The higher the veracity of the data,
the more trustworthy it is.
Variability:
The meaning of collected data is constantly changing, which can lead
to inconsistency over time. These shifts include not only changes in
context and interpretation but also data collection methods based on
the information that companies want to capture and analyze.
Value:
It’s essential to determine the business value of the data you collect.
Big data must contain the right data and then be effectively analyzed
in order to yield insights that can help drive decision-making.
Sources of Big Data
These data come from many sources like
Page: 2 / 39
o Telecom company: Telecom giants like Airtel, Vodafone study the
user trends and accordingly publish their plans and for this they store
the data of its million users.
o Share Market: Stock exchange across the world generates huge
amount of data through its daily transaction.
Page: 3 / 39
Technologies such as business intelligence (BI) tools and systems help
organisations take unstructured and structured data from multiple sources.
Users (typically employees) input queries into these tools to understand
business operations and performance. Big data analytics uses the four data
analysis methods to uncover meaningful insights and derive solutions.
Page: 4 / 39
Benefits of big data analytics
Incorporating big data analytics into a business or organisation has several
advantages. These include:
Cost reduction: Big data can reduce costs in storing all business data
in one place. Tracking analytics also helps companies find ways to
work more efficiently to cut costs wherever possible.
Product development: Developing and marketing new products,
services, or brands is much easier when based on data collected from
customers’ needs and wants. Big data analytics also helps businesses
understand product viability and to keep up with trends.
Strategic business decisions: The ability to constantly analyse data
helps businesses make better and faster decisions, such as cost and
supply chain optimisation.
Customer experience: Data-driven algorithms help marketing
efforts (targeted ads, for example) and increase customer satisfaction
by delivering an enhanced customer experience.
Risk management: Businesses can identify risks by analysing data
patterns and developing solutions for managing those risks.
UNSTRUCTURED DATA
Types of Big Data
All data cannot be stored in the same way. The methods for data storage
can be accurately evaluated after the type of data has been identified
1.Structured data
Structured data is data whose elements are addressable for effective
analysis. It has been organized into a formatted repository that is typically a
database. It concerns
Page: 5 / 39
all data which can be stored in database in a table with rows and columns.
They have relational keys and can easily be mapped into pre-designed
fields. Today, those data are most processed in the development and
simplest way to manage information. Example: Relational data.
2.Semi-Structured data
Semi-structured data is information that does not reside in a
relational database but that has some organizational properties that make
it easier to analyze. With some processes, you can store them in the
relation database (it could be very hard for some kind of semi-structured
data), but Semi-structured exist to ease space. Example: XML data.
3.Unstructured data
Unstructured data is a data which is not organized in a predefined
manner or does not have a predefined data model, thus it is not a good
fit for a mainstream
Page: 6 / 39
relational database. So for Unstructured data, there are alternative
platforms for storing and managing, it is increasingly prevalent in IT
systems and is used by organizations in a variety of business intelligence
and analytics applications. Example: Word, PDF, Text, Media logs.
Page: 7 / 39
Unstructured data, in contrast, doesn’t fit into these sorts of pre-
defined data models. It can’t be stored in an RDBMS. And because it
comes in so many formats, it’s a real challenge for conventional software
to ingest, process, and analyze. Simple content searches can be
undertaken across textual unstructured data with the right tools.
Beyond that, the lack of consistent internal structure doesn’t conform
to what typical data mining systems can work with. As a result, companies
have largely been unable to tap into value-laden data like customer
interactions, rich media, and social network conversations. Robust tools for
doing so are only now being developed and commercialized.
What are some examples of unstructured data?
Unstructured data can be created by people or generated by
machines. Here are some examples of the human-generated
variety:
Email: Email message fields are unstructured and cannot be parsed
by traditional analytics tools. That said, email metadata affords it
some structure, and explains why email is sometimes considered
semi-structured data.
Text files: This category includes word processing documents,
spreadsheets, presentations, email, and log files.
Social media and websites: data from social networks like Twitter,
LinkedIn, and Facebook, and websites such as Instagram, photo-
sharing sites, and YouTube.
Mobile and communications data: For this category, look no further
than text messages, phone recordings, collaboration software, chat,
and instant messaging.
Media: This data includes digital photos, audio, and
video files. Here are some examples of unstructured data
generated by machines:
Scientific data: This includes oil and gas surveys, space
exploration, seismic imagery, and atmospheric data.
Digital surveillance: This category features data like reconnaissance
photos and videos.
Satellite imagery: This data includes weather data, land forms, and
military movements.
le business intelligence.
Characteristics of Unstructured Data:
Data neither conforms to a data model nor has any structure.
Data cannot be stored in the form of rows and columns as in Databases
Page: 8 / 39
Data does not follow any semantic or rules
Data lacks any particular format or sequence
Data has no easily identifiable structure
Due to lack of identifiable structure, it cannot used by computer
programs easily
Sources of Unstructured Data:
Web pages
Images (JPEG, GIF, PNG, etc.)
Videos
Memos
Reports
Word documents and PowerPoint presentations
Surveys
Advantages of Unstructured Data:
Its supports the data which lacks a proper format or sequence
The data is not constrained by a fixed schema
Very Flexible due to absence of schema.
Data is portable
It is very scalable
It can deal easily with the heterogeneity of sources.
These types of data have a variety of business intelligence and
analytics applications.
Disadvantages of Unstructured data:
It is difficult to store and manage unstructured data due to lack of
schema and structure
Indexing the data is difficult and error prone due to unclear
structure and not having pre-defined attributes. Due to which
search results are not very accurate.
Ensuring security to data is difficult task.
Problems faced in storing unstructured data:
It requires a lot of storage space to store unstructured data.
Page: 9 / 39
It is difficult to store videos, images, audios, etc.
Due to unclear structure, operations like update, delete and search
is very difficult.
Storage cost is high as compared to structured data
Indexing the unstructured data is difficult
Possible solution for storing Unstructured data:
Unstructured data can be converted to easily manageable formats
using Content addressable storage system (CAS) to store
unstructured data. It stores data based on their metadata and a
unique name is assigned to every object stored in it. The object is
retrieved based on content not its location.
Unstructured data can be stored in XML format.
Unstructured data can be stored in RDBMS which supports BLOBs
Extracting information from unstructured Data:
unstructured data do not have any structure. So it cannot easily interpreted by
conventional algorithms. It is also difficult to tag and index unstructured
data. So extracting information from them is tough job. Here are possible
solutions:
Taxonomies or classification of data helps in organising data in
hierarchical structure. Which will make search process easy.
Data can be stored in virtual repository and be automatically
tagged. For example Documentum.
Use of application platforms like
XOLAP. XOLAP helps in extracting information from e-
mails and XML based documents
Use of various data mining tools
Page: 10 / 39
Healthcare - With the help of a patient’s medical history, Big Data analytics
is
used to predict how likely they are to have health issues
Media and entertainment - Used to understand the demand of
shows, movies, songs, and more to deliver a personalized
recommendation list to its users
Banking - Customer income and spending patterns help to predict the
likelihood of choosing various banking offers, like loans and credit cards
Telecommunications - Used to forecast network capacity and improve
customer experience
Government - Big Data analytics helps governments in law
enforcement, among other things
In today’s world, there are a lot of data. Big companies utilize those data
for their business growth. By analyzing this data, the useful decision can be
made in various cases as discussed below:
1.Tracking Customer Spending Habit, Shopping Behavior:
In big retails store (like Amazon, Walmart, Big Bazar etc.)
management team has to keep data of customer’s spending habit (in
which product customer spent, in which brand they wish to spent, how
frequently they spent), shopping behavior, customer’s most liked product
(so that they can keep those products in the store). Which product is being
searched/sold most, based on that data, production/collection rate of that
product get fixed.
Banking sector uses their customer’s spending behavior-related
data so that they can provide the offer to a particular customer to buy his
particular liked product by using bank’s credit or debit card with discount
or cashback. By this way, they can send the right offer to the right person
at the right time.
2.Recommendation:
By tracking customer spending habit, shopping behavior, Big retails
store provide a recommendation to the customer. E-commerce site like
Amazon, Walmart, Flipkart does product recommendation. They track what
product a customer is searching, based on that data they recommend that
type of product to that customer.
As an example, suppose any customer searched bed cover on
Amazon. So, Amazon got data that customer may be interested to buy bed
cover. Next time when that customer will go to any google page,
advertisement of various bed covers will be seen. Thus, advertisement of
the right product to the right customer can be sent.
Page: 11 / 39
YouTube also shows recommend video based on user’s previous
liked, watched video type. Based on the content of a video, the user is
watching, relevant advertisement is shown during video running. As an
example suppose someone watching a tutorial video of Big data, then
advertisement of some other big data course will be shown during that
video.
3.Smart Traffic System:
Data about the condition of the traffic of different road, collected
through camera kept beside the road, at entry and exit point of the city,
GPS device placed in the vehicle (Ola, Uber cab, etc.). All such data are
analyzed and jam-free or less jam way, less time taking ways are
recommended. Such a way smart traffic system can be built in the city by
Big data analysis. One more profit is fuel consumption can be reduced.
4.Secure Air Traffic System:
At various places of flight (like propeller etc) sensors present. These
sensors capture data like the speed of flight, moisture, temperature, other
environmental condition. Based on such data analysis, an environmental
parameter within flight are set up and varied.
By analyzing flight’s machine-generated data, it can be estimated
how long the machine can operate flawlessly when it to be
replaced/repaired.
5.Auto Driving Car:
Big data analysis helps drive a car without human interpretation. In
the various spot of car camera, a sensor placed, that gather data like the
size of the surrounding car, obstacle, distance from those, etc. These data
are being analyzed, then various calculation like how many angles to
rotate, what should be speed, when to stop, etc carried out. These
calculations help to take action automatically.
6.Virtual Personal Assistant Tool:
Big data analysis helps virtual personal assistant tool (like Siri in
Apple Device, Cortana in Windows, Google Assistant in Android) to provide
the answer of the various question asked by users. This tool tracks the
location of the user, their local time, season, other data related to question
asked, etc. Analyzing all such data, it provides an answer.
As an example, suppose one user asks “Do I need to take
Umbrella?”, the tool collects data like location of the user, season and
weather condition at that location, then analyze these data to conclude if
there is a chance of raining, then provide the answer.
Page: 12 / 39
7.IoT:
Manufacturing company install IOT sensor into machines to collect operational
data. Analyzing such data, it can be predicted how long machine will work
without any problem when it requires repairing so that company can take
action before the situation when machine facing a lot of issues or gets
totally down. Thus, the cost to replace the whole machine can be saved.
In the Healthcare field, Big data is providing a significant contribution.
Using big data tool, data regarding patient experience is collected and is
used by doctors to give better treatment. IoT device can sense a symptom
of probable coming disease in the human body and prevent it from giving
advance treatment. IoT Sensor placed near-patient, new-born baby
constantly keeps track of various health condition like heart bit rate, blood
presser, etc. Whenever any parameter crosses the safe limit, an alarm sent
to a doctor, so that they can take step remotely very soon.
8.Education Sector:
Online educational course conducting organization utilize big data to
search candidate, interested in that course. If someone searches for
YouTube tutorial video on a subject, then online or offline course provider
organization on that subject send ad online to that person about their
course.
9.Energy Sector:
Smart electric meter read consumed power every 15 minutes and
sends this read data to the server, where data analyzed and it can be
estimated what is the time in a day when the power load is less throughout
the city. By this system manufacturing unit or housekeeper are suggested
the time when they should drive their heavy machine in the night time
when power load less to enjoy less electricity bill.
10. Media and Entertainment Sector:
Media and entertainment service providing company like Netflix,
Amazon Prime, Spotify do analysis on data collected from their users. Data
like what type of video, music users are watching, listening most, how long
users are spending on site, etc are collected and analyzed to set the next
business strategy.
Page: 13 / 39
BIG DATA TECHNOLOGIES
Big data technologies can be categorized into four main types: data
storage, data mining, data analytics, and data visualization [2]. Each of
these is associated with certain tools, and you’ll want to choose the right
tool for your business needs depending on the type of big data technology
required.
1.Data storage
Big data technology that deals with data storage has the capability to fetch,
store, and manage big data. It is made up of infrastructure that allows users
to store the data so that it is convenient to access. Most data storage
platforms are compatible with other programs. Two commonly used tools
are Apache Hadoop and MongoDB.
Apache Hadoop: Apache is the most widely used big data tool. It is
an open- source software platform that stores and processes big data
in a distributed computing environment across hardware clusters.
This distribution allows for faster data processing. The framework is
designed to reduce bugs or faults, be scalable, and process all data
formats.
MongoDB: MongoDB is a NoSQL database that can be used to store
large volumes of data. Using key-value pairs (a basic unit of data),
MongoDB categorizes documents into collections. It is written in C,
C++, and JavaScript, and is one of the most popular big data
databases because it can manage and store unstructured data with
ease.
2.Data mining
Data mining extracts the useful patterns and trends from the raw data. Big
data technologies such as Rapidminer and Presto can turn unstructured
and structured data into usable information.
Rapidminer: Rapidminer is a data mining tool that can be used to
build predictive models. It draws on these two roles as strengths, of
processing and preparing data, and building machine and deep
learning models. The end-to- end model allows for both functions to
drive impact across the organization [3].
Presto: Presto is an open-source query engine that was originally
developed by Facebook to run analytic queries against their large
datasets. Now, it is available widely. One query on Presto can
combine data from multiple sources within an organization and
perform analytics on them in a matter of minutes.
3.Data analytics
In big data analytics, technologies are used to clean and transform data
into information that can be used to drive business decisions. This next
step (after data
Page: 14 / 39
mining) is where users perform algorithms, models, and predictive
analytics using tools such as Apache Spark and Splunk.
Apache Spark: Spark is a popular big data tool for data analysis
because it is fast and efficient at running applications. It is faster than
Hadoop because it uses random access memory (RAM) instead of
being stored and processed in batches via MapReduce . Spark
supports a wide variety of data analytics tasks and queries.
Splunk: Splunk is another popular big data analytics tool for deriving
insights from large datasets. It has the ability to generate graphs,
charts, reports, and dashboards. Splunk also enables users to
incorporate artificial intelligence (AI) into data outcomes.
4.Data visualization
Finally, big data technologies can be used to create stunning visualizations
from the data. In data-oriented roles, data visualization is a skill that is
beneficial for presenting recommendations to stakeholders for business
profitability and operations—to tell an impactful story with a simple graph.
Tableau: Tableau is a very popular tool in data visualization because
its drag- and-drop interface makes it easy to create pie charts,
bar charts, box plots, Gantt charts, and more. It is a secure platform
that allows users to share visualizations and dashboards in real time.
Looker: Looker is a business intelligence (BI) tool used to make
sense of big data analytics and then share those insights with other
teams. Charts, graphs, and dashboards can be configured with a
query, such as monitoring weekly brand engagement through social
media analytics.
Page: 15 / 39
Features of Apache Hadoop:
Free to use and offers an efficient storage solution for businesses.
Offers quick access via HDFS (Hadoop Distributed File System).
Highly flexible and can be easily implemented with MySQL, and JSON.
Highly scalable as it can distribute a large amount of data in small segments.
It works on small commodity hardware like JBOD or a bunch of disks.
2.Cassandra
APACHE Cassandra is an open-source NoSQL distributed database
that is used to fetch large amounts of data. It’s one of the most popular
tools for data analytics and has been praised by many tech companies due
to its high scalability and availability without compromising speed and
performance. It is capable of delivering thousands of operations every
second and can handle petabytes of resources with almost zero downtime.
It was created by Facebook back in 2008 and was published publicly.
Features of APACHE Cassandra:
Data Storage Flexibility: It supports all forms of data i.e.
structured, unstructured, semi-structured, and allows users to
change as per their needs.
Data Distribution System: Easy to distribute data with the help of
replicating data on multiple data centers.
Fast Processing: Cassandra has been designed to run on efficient
commodity hardware and also offers fast storage and data
processing.
Fault-tolerance: The moment, if any node fails, it will be replaced
without any delay.
3.Qubole
It’s an open-source big data tool that helps in fetching data in a value
of chain using ad-hoc analysis in machine learning. Qubole is a data lake
platform that offers end-to-end service with reduced time and effort which
are required in moving data pipelines. It is capable of configuring multi-
cloud services such as AWS, Azure, and Google Cloud. Besides, it also helps
in lowering the cost of cloud computing by 50%.
Features of Qubole:
Supports ETL process: It allows companies to migrate data from
multiple sources in one place.
Real-time Insight: It monitors user’s systems and allows them to
view real-time insights
Predictive Analysis: Qubole offers predictive analysis so that
companies can take actions accordingly for targeting more
acquisitions.
Advanced Security System: To protect users’ data in the cloud,
Qubole uses an advanced security system and also ensures to protect
any future breaches. Besides, it also allows encrypting cloud data
from any potential threat.
Page: 16 / 39
4.Xplenty
It is a data analytic tool for building a data pipeline by using minimal
codes in it. It offers a wide range of solutions for sales, marketing, and
support. With the help of its interactive graphical interface, it provides
solutions for ETL, ELT, etc. The best part of using Xplenty is its low
investment in hardware & software and its offers support via email, chat,
telephonic and virtual meetings. Xplenty is a platform to process data for
analytics over the cloud and segregates all the data together.
Features of Xplenty:
Rest API: A user can possibly do anything by implementing Rest API
Flexibility: Data can be sent, and pulled to databases,
warehouses, and salesforce.
Data Security: It offers SSL/TSL encryption and the platform is
capable of verifying algorithms and certificates regularly.
Deployment: It offers integration apps for both cloud & in-house and
supports deployment to integrate apps over the cloud.
5.Spark
APACHE Spark is another framework that is used to process data and
perform numerous tasks on a large scale. It is also used to process data via
multiple computers with the help of distributing tools. It is widely used
among data analysts as it offers easy-to-use APIs that provide easy data
pulling methods and it is capable of handling multi-petabytes of data as
well. Recently, Spark made a record of processing 100 terabytes of data in
just 23 minutes which broke the previous world record of Hadoop (71
minutes). This is the reason why big tech giants are moving towards spark
now and is highly suitable for ML and AI today.
Features of APACHE Spark:
Ease of use: It allows users to run in their preferred language. (JAVA,
Python, etc.)
Real-time Processing: Spark can handle real-time streaming
via Spark Streaming
Flexible: It can run on, Mesos, Kubernetes, or the cloud.
6.Mongo DB
Came in limelight in 2010, is a free, open-source platform and a
document- oriented (NoSQL) database that is used to store a high volume of
data. It uses collections and documents for storage and its document
consists of key-value pairs which are considered a basic unit of Mongo DB. It
is so popular among developers due to its availability for multi-
programming languages such as Python, Jscript, and Ruby.
Page: 17 / 39
Features of Mongo DB:
Written in C++: It’s a schema-less DB and can hold varieties of
documents inside.
Simplifies Stack: With the help of mongo, a user can easily store
files without any disturbance in the stack.
Master-Slave Replication: It can write/read data from the master
and can be called back for backup.
7.Apache Storm
A storm is a robust, user-friendly tool used for data analytics,
especially in small companies. The best part about the storm is that it has
no language barrier (programming) in it and can support any of them. It was
designed to handle a pool of large data in fault-tolerance and horizontally
scalable methods. When we talk about real-time data processing, Storm
leads the chart because of its distributed real-time big data processing
system, due to which today many tech giants are using APACHE Storm in
their system. Some of the most notable names are Twitter, Zendesk,
NaviSite, etc.
Features of Storm:
Data Processing: Storm process the data even if the node gets disconnected
Highly Scalable: It keeps the momentum of performance even if
the load increases
Fast: The speed of APACHE Storm is impeccable and can process up
to 1 million messages of 100 bytes on a single node.
8.SAS
Today it is one of the best tools for creating statistical modeling used by data
analysts. By using SAS, a data scientist can mine, manage, extract or
update data in different variants from different sources. Statistical
Analytical System or SAS allows a user to access the data in any format
(SAS tables or Excel worksheets). Besides that it also offers a cloud platform
for business analytics called SAS Viya and also to get a strong grip on AI &
ML, they have introduced new tools and products.
Features of SAS:
Flexible Programming Language: It offers easy-to-learn syntax
and has also vast libraries which make it suitable for non-
programmers
Vast Data Format: It provides support for many programming
languages which also include SQL and carries the ability to read data
from any format.
Encryption: It provides end-to-end security with a feature called
SAS/SECURE.
9.Data Pine
Datapine is an analytical used for BI and was founded back in 2012
(Berlin, Germany). In a short period of time, it has gained much popularity
in a number of
Page: 18 / 39
countries and it’s mainly used for data extraction (for small-medium
companies fetching data for close monitoring). With the help of its
enhanced UI design, anyone can visit and check the data as per their
requirement and offer in 4 different price brackets, starting from $249 per
month. They do offer dashboards by functions, industry, and platform.
Features of Datapine:
Automation: To cut down the manual chase, datapine offers a wide
array of AI assistant and BI tools.
Predictive Tool: datapine provides forecasting/predictive
analytics by using historical and current data, it derives the future
outcome.
Add on: It also offers intuitive widgets, visual analytics &
discovery, ad hoc reporting, etc.
10. Rapid Miner
It’s a fully automated visual workflow design tool used for data
analytics. It’s a no-code platform and users aren’t required to code for
segregating data. Today, it is being heavily used in many industries such
as ed-tech, training, research, etc. Though it’s an open-source platform but
has a limitation of adding 10000 data rows and a single logical processor.
With the help of Rapid Miner, one can easily deploy their ML models to the
web or mobile (only when the user interface is ready to collect real-time
figures).
Features of Rapid Miner:
Accessibility: It allows users to access 40+ types of files (SAS,
ARFF, etc.) via URL
Storage: Users can access cloud storage facilities such as AWS and dropbox
Data validation: Rapid miner enables the visual display of multiple
results in history for better evaluation.
Page: 19 / 39
Examples of the sources where big data is generated includes social media
data, e- commerce data, weather station data, IoT Sensor data etc.
Characteristics of Big Data :
Variety of Big data – Structured, unstructured, and semi structured data
Velocity of Big data – Speed of data generation
Volume of Big data – Huge volumes of data that is being generated
Value of Big data – Extracting useful information and making it valuable
Variability of Big data – Inconsistency which can be shown by the data at
times.
Advantages of Big Data :
Cost Savings
Better decision-making
Better Sales insights
Increased Productivity
Improved customer service.
Disadvantages of Big Data :
Incompatible tools
Security and Privacy Concerns
Need for cultural change
Rapid change in technology
Specific hardware needs.
2. Cloud Computing :
Cloud computing refers to the on demand availability of computing
resources over internet. These resources includes servers, storage,
databases, software, analytics, networking and intelligence over the
Internet and all these resources can be used as per requirement of the
customer. In cloud computing customers have to pay as per use. It is very
flexible and can be resources can be scaled easily depending upon the
requirement. Instead of buying any IT resources physically, all resources
can be availed depending on the requirement from the cloud vendors.
Cloud computing has three service models i.e Infrastructure as a Service
(IaaS), Platform as a Service (PaaS) and Software as a Service (SaaS).
Examples of cloud computing vendors who provides cloud computing
services are Amazon Web Service (AWS), Microsoft Azure, Google Cloud
Platform, IBM Cloud Services etc.
Page: 20 / 39
Characteristics of Cloud Computing :
On-Demand availability
Accessible through a network
Elastic Scalability
Pay as you go model
Multi-tenancy and resource pooling.
Advantages of Cloud Computing :
Back-up and restore data
Improved collaboration
Excellent accessibility
Low maintenance cost
On-Demand Self-service.
Disadvantages of Cloud Computing:
Vendor lock-in
Limited Control
Security Concern
Downtime due to various reason
Requires good Internet connectivity.
Big data refers to the data which Cloud computing refers to the
01. is huge in size and also on demand availability of
increasing rapidly with respect to computing resources over
time. internet.
Page: 21 / 39
Value of data are considered as elasticity and measured service
the 5 most important are considered as the main
characteristics of Big data. characteristics of cloud
computing.
Page: 22 / 39
WEB ANALYTICS
Page: 23 / 39
Assess web content problems so that they can be rectified
Have a clear perspective of website trends
Monitor web traffic and user flow
Demonstrate goals acquisition
Figure out potential keywords
Identify segments for improvement
Find out referring sources
Web Analytics Process
The primary objective of carrying out Web Analytics is to optimize the
website in order to provide better user experience. It provides a data-driven
report to measure visitors’ flow throughout the website.
Take a look at the following illustration. It depicts the process of web analytics.
Set the business goals.
To track the goal achievement, set the Key Performance Indicators (KPI).
Collect correct and suitable data.
To extract insights, Analyze data.
Based on assumptions learned from the data analysis, Test alternatives.
Based on either data analysis or website testing, Implement
insights. Types of Web Analytics
There are two types of web analytics −
On-site − It measures the users’ behaviour once it is on the website.
For
example, measurement of your website performance.
Off-site − It is the measurement and analysis irrespective of
whether you own or maintain a website. For example, measurement
of visibility, comments, potential audience, etc.
Metrics of Web Analytics
There are three basic metrics of web analytics −
Count
It is most basic metric of measurement. It is represented as a whole
number or a fraction. For example,
Number of visitors = 12999, Number of likes = 3060, etc.
Page: 24 / 39
Total sales of merchandise = $54,396.18.
Ratio
It is typically a count divided by some other count. For example, Page
views per visit.
Key Performance Indicator (KPI)
It depends upon the business type and strategy. KPI varies from one
business to another.
Micro and macro Level Data Insights
Google Analytics gives you more insight data accurately. You can
understand the data at two levels micro level and macro level.
Micro Level Analysis
It pertains to an individual or a small group of individuals. For example,
number of times job application submitted, number of times print this
page was clicked, etc.
Macro Level Analysis
It is concerned with the primary business objectives with huge groups of
people such as communities, nation, etc. For example, number of
conversions in a particular demographic.
Web Analysis - What to Measure?
These are the few measurements conducted in web analytics −
Engagement Rate
It shows how long a person stays on your web page. What all pages he
surf. To make your web pages more engaging, include informative
content, visuals, fonts and bullets.
Bounce Rate
If a person leaves your website within a span of 30 sec, it is considered
as a bounce. The rate at which users spin back is called the bounce
rate. To minimize bounce rate include related posts, clear call-to-
action and backlinks in your webpages.
Dashboards
Dashboard is single page view of information important to user. You
can create your own dashboards keeping in mind your
requirements. You may keep only frequently viewed data on
dashboard.
Event Tracking
Event tracking allows you to track other activities on your website. For
example, you can track downloads and sign-ups through event
tracking.
Traffic Source
Page: 25 / 39
You can overview traffic sources. You can even filter it further.
Figuring out the key areas can help you learn about the area of
improvement.
Annotations
It allows you to view a traffic report for past time. You can click on
graph and type in to save it for future study.
Visitor Flow
It gives you a clear picture of pages visited and the sequence of the
same. Understanding users’ path may help you in re-navigation in
order to give customer a hassle-free navigation.
Content
It gives you insight about website’s content section. You can see how
each
page is doing, website loading speed, etc.
Conversions
Analytics lets you track goals and path used to achieve these goals.
You can get details regarding, product performances, purchase
amount, and mode of billing. Web Analytics offer you more than this.
All you need is to analyze things minutely and keep patience.
Page Load Time
More is the load time, the more is bounce rate. Tracking page load
time is equally important.
Behavior
Behavior lets you know page views and time spent on website. You
can find out how customer behaves once he is on your website.
Page: 26 / 39
The ability to access analytics and data on mobile devices or tablets rather
than desktop computers is referred to as mobile business intelligence. The
business metric dashboard and key performance indicators (KPIs) are more
clearly displayed.
With the rising use of mobile devices, so have the technology that we all
utilise in our daily lives to make our lives easier, including business. Many
businesses have benefited from mobile business intelligence. Essentially,
this post is a guide for business owners and others to educate them on the
benefits and pitfalls of Mobile BI.
Need for mobile BI?
Mobile phones' data storage capacity has grown in tandem with their use.
You are expected to make decisions and act quickly in this fast-paced
environment. The number of businesses receiving assistance in such a
situation is growing by the day.
To expand your business or boost your business productivity, mobile BI can
help, and it works with both small and large businesses. Mobile BI can help
you whether you are a salesperson or a CEO. There is a high demand for
mobile BI in order to reduce information time and use that time for quick
decision making.
As a result, timely decision-making can boost customer satisfaction and
improve an enterprise's reputation among its customers. It also aids in
making quick decisions in the face of emerging risks.
Data analytics and visualisation techniques are essential skills for any team
that wants to organise work, develop new project proposals, or wow clients
with impressive presentations.
Advantages of mobile BI
1. Simple access
Mobile BI is not restricted to a single mobile device or a certain place. You
can view your data at any time and from any location. Having real-time
visibility into a firm improves production and the daily efficiency of the
business. Obtaining a company's perspective with a single click simplifies
the process.
2. Competitive advantage
Many firms are seeking better and more responsive methods to do
business in order to stay ahead of the competition. Easy access to real-
time data improves company opportunities and raises sales and capital.
This also aids in making the necessary decisions as market conditions
change.
3. Simple decision-making
As previously stated, mobile BI provides access to real-time data at any
time and from any location. During its demand, Mobile BI offers the
information. This assists consumers in obtaining what they require at the
time. As a result, decisions are made quickly.
Page: 27 / 39
4.Increase Productivity
By extending BI to mobile, the organization's teams can access critical
company data when they need it. Obtaining all of the corporate data with
a single click frees up a significant amount of time to focus on the smooth
and efficient operation of the firm. Increased productivity results in a
smooth and quick-running firm.
Disadvantages of mobile
1.Stack of data
The primary function of a mobile BI is to store data in a systematic
manner and then present it to the user as required. As a result, Mobile BI
stores all of the information and does end up with heaps of earlier data.
The corporation only needs a small portion of the previous data, but they
need to store the entire information, which ends up in the stack
2.Expensive
Mobile BI can be quite costly at times. Large corporations can continue to
pay for their expensive services, but small businesses cannot. As the cost
of mobile BI is not sufficient, we must additionally consider the rates of IT
workers for the smooth operation of BI, as well as the hardware costs
involved. However, larger corporations do not settle for just one Mobile
BI provider for their organisations; they require multiple. Even when
doing basic commercial transactions, mobile BI is costly.
3 Time consuming
Businesses prefer Mobile BI since it is a quick procedure. Companies are
not patient enough to wait for data before implementing it. In today's
fast-paced environment, anything that can produce results quickly is
valuable. The data from the warehouse is used to create the system,
hence the implementation of BI in an enterprise takes more than 18
months.
4 Data breach
The biggest issue of the user when providing data to Mobile BI is data
leakage. If you handle sensitive data through Mobile BI, a single error can
destroy your data as well as make it public, which can be detrimental to
your business.
Many Mobile BI providers are working to make it 100 percent secure to
protect their potential users' data. It is not only something that mobile BI
carriers must consider, but it is also something that we, as users, must
consider when granting data access authorization.
5 Poor quality data
Because we work online in every aspect, we have a lot of data stored in
Mobile BI, which might be a significant problem. This means that a large
portion of the data
Page: 28 / 39
analysed by Mobile BI is irrelevant or completely useless. This can speed
down the entire procedure. This requires you to select the data that is
important and may be required in the future.
Best Mobile BI tools
1. Si Sense
Sisense is a flexible business intelligence (BI) solution that includes
powerful analytics, visualisations, and reporting capabilities for
managing and supporting corporate data. Businesses can use the
solution to evaluate large, diverse databases and generate relevant
business insights. You may easily view enormous volumes of complex
data with Si Sense's code-first, low-code, and even no-code technologies.
Si Sense was established in 2004 with its headquarters in New York.
Since then, the team has only taken precautionary steps in their
investigation. Once the company had received $ 4 million in funding from
investors, they began to pace its research.
2 SAP Roambi analytics
Roambi analytics is a BI tool that offers a solution that allows you to
fundamentally rethink your data analysis, making it easier and faster
while also increasing your data interaction.
You can consolidate all of your company's data in a single tool using SAP
Roambi Analytics, which integrates all ongoing systems and data. Use of
SAP Roambi analysis is a simple three-step technique. Upload your html
or spreadsheet files first. The information is subsequently transformed
into informative data or graphs, as well as data that may be visualised.
After the data is collected, you may easily share it with your preferred
device. Roambi Analytics was founded in 2008 by a team based in
California.
3 Microsoft Power BI pro
Microsoft's strength BI is an easy-to-use tool for all non-technical
business owners. who are unfamiliar with BI tools but wish to aggregate,
analyse, visualise, and share data you only need a basic understanding
of Excel and other Microsoft tools, and if you are familiar with these, the
Microsoft BI tool can be used as a self-service tool. Microsoft Power BI
has a unique feature that allows users to create subsets of data and
then automatically apply analytics to that information.
4 IBM Cognos Analytics
Cognos Analytics is an IBM-registered web-based business intelligence
tool. Cognos Analytics is now merging with Watsons, and the benefits
for users are extremely exciting. Watson cognos analytics will assist in
connecting and cleaning the users' data, resulting in proper visualised
data.
Page: 29 / 39
That way, the business owner will know where they stand in comparison
to their competitors and where they can grow in the future. It combines
reporting, modelling, analysis, dashboards to help you understand your
organization's data and make sound business decisions.
5 Amazon quick sights
Amazon Quick View assists in the creation and distribution of
interactive BI dashboards to their users, as well as the retrieval of
answers in natural language queries in seconds. Quick sight can be
accessed through any device embedded in any website, portal, or app.
Amazon Quick Sight allows you to quickly and easily create interactive
dashboards and reports for your users. Anyone in your organisation
can securely access those dashboards via browsers or mobile devices.
Quick sight's eye-catching feature is its pay-per-session model, which
allows users to use the creative dashboard created by another without
paying much. The user pays according to the length of the session, with
prices ranging from $0.30 for a 30-minute session to $5 for unlimited
use per month per user.
Page: 30 / 39
3. Marketing
4. Education
5. Finance
6. Science and Health
How to Crowdsource?
1. For scientific problem solving, a broadcast search is used where an
organization mobilizes a crowd to come up with a solution to a
problem.
2. For information management problems, knowledge discovery
and management is used to find and assemble information.
3. For processing large datasets, distributed human intelligence is
used. The organization mobilizes a crowd to process and analyze the
information.
Examples of Crowdsourcing
1. Doritos: It is one of the companies which is taking advantage of
crowdsourcing for a long time for an advertising initiative. They use
consumer-created ads for one of their 30-Second Super Bowl
Spots(Championship Game of Football).
2. Starbucks: Another big venture which used crowdsourcing as a
medium for idea generation. Their white cup contest is a famous
contest in which customers need to decorate their Starbucks cup with
an original design and then take a photo and submit it on social
media.
3. Lays:” Do us a flavor” contest of Lays used crowdsourcing as an
idea-generating medium. They asked the customers to submit their
opinion about the next chip flavor they want.
4. Airbnb: A very famous travel website that offers people to rent their
houses or apartments by listing them on the website. All the listings
are crowdsourced by people.
Crowdsourced Marketing
As discussed already crowdsourcing helps grow businesses grow a lot. May
it be a business idea or just a logo design, crowdsourcing engages
people directly and in turn, saves money and energy. In the upcoming
years, crowdsourced marketing will surely get a boost as the world is
accepting technology faster.
Main Types of Crowdsourcing
Crowdsourcing involves obtaining information or resources from a wide
swath of people. In general, we can break this up into four main
categories:
Page: 31 / 39
Wisdom - Wisdom of crowds is the idea that large groups of people
are collectively smarter than individual experts when it comes to
problem-solving or identifying values (like the weight of a cow or
number of jelly beans in a jar).
Creation - Crowd creation is a collaborative effort to design or build
something. Wikipedia and other wikis are examples of this. Open-
source software is another good example.
Voting - Crowd voting uses the democratic principle to choose a
particular policy or course of action by "polling the audience."
Funding - Crowdfunding involved raising money for various purposes
by soliciting relatively small amounts from a large number of funders.
Crowdsourcing Sites
Here is the list of some famous crowdsourcing and crowdfunding sites.
1. Kickstarter
2. GoFundMe
3. Patreon
4. RocketHub
Advantages of Crowdsourcing
1. Evolving Innovation: Innovation is required everywhere and in this
advancing world innovation has a big role to play. Crowdsourcing
helps in getting innovative ideas from people belonging to different
fields and thus helping businesses grow in every field.
2. Save costs: There is the elimination of wastage of time of meeting
people and convincing them. Only the business idea is to be proposed
on the internet and you will be flooded with suggestions from the
crowd.
3. Increased Efficiency: Crowdsourcing has increased the efficiency of
business models as several expertise ideas are also funded.
Disadvantages of Crowdsourcing
1. Lack of confidentiality: Asking for suggestions from a large group of
people can bring the threat of idea stealing by other organizations.
Page: 32 / 39
INTER AND TRANS FIREWALL ANALYTICS
Page: 33 / 39
Page: 34 / 39
Inter-firewall analytics
Focus: Analyzes traffic flows between different firewalls within a network.
Methodology: Utilizes data collected from multiple firewalls to
identify anomalies and potential breaches.
Benefits: Provides a comprehensive view of network traffic flow
and helps identify lateral movement across different security zones.
Limitations: Requires deployment of multiple firewalls within the
network and efficient data exchange mechanisms between them.
Page: 35 / 39
Page: 36 / 39
Page: 37 / 39
Trans-firewall analytics
Focus: Analyzes encrypted traffic that traverses firewalls, which
traditional security solutions may not be able to decrypt and inspect.
Methodology: Uses deep packet inspection (DPI) and other
advanced techniques to analyze the content of encrypted traffic
without compromising its security.
Benefits: Provides insight into previously hidden threats within
encrypted traffic and helps detect sophisticated attacks.
Limitations: Requires specialized hardware and software solutions
for DPI, and raises concerns regarding potential data privacy
violations.
Choosing the right approach
The choice between inter-firewall and trans-firewall analytics depends on
several factors, including:
Network size and complexity: Larger and more complex networks
more from inter-firewall analytics for comprehensive monitoring.
Security needs and threats: Trans-firewall analytics is crucial for networks
handling sensitive data and facing advanced threats.
Budgetandresources:Implementingtrans-firewallanalyticsrequires additional
investment in specialized hardware and software.
benefit
Page: 38 / 39
Page: 39 / 39