0% found this document useful (0 votes)

308 views40 pages

Big Data Analytics - CCS334 - Notes - Unit 1 - Understanding Big Data

Uploaded by

bamaraji3

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

308 views40 pages

Big Data Analytics - CCS334 - Notes - Unit 1 - Understanding Big Data

Uploaded by

bamaraji3

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 40

CCS334 BIG DATA ANALYTICS

UNIT I UNDERSTANDING BIG DATA

Introduction to big data – convergence of key trends – unstructured data –
industry examples of big data – web analytics – big data applications– big
data technologies – introduction to Hadoop – open source technologies –
cloud and big data – mobile business intelligence – Crowd sourcing
analytics – inter and trans firewall analytics.

INTRODUCTION TO BIG DATA

What is Big Data

Big data refers to extremely large and diverse collections of structured,
unstructured, and semi-structured data that continues to grow
exponentially over time. These datasets are so huge and complex in
volume, velocity, and variety, that traditional data management systems
cannot store, process, and analyze them.
The amount and availability of data is growing rapidly, spurred on by digital
technology advancements, such as connectivity, mobility, the Internet of
Things (IoT), and artificial intelligence (AI). As data continues to expand and
proliferate, new big data tools are emerging to help companies collect,
process, and analyze data at the speed needed to gain the most value from
it.
Big data describes large and diverse datasets that are huge in volume and
also rapidly grow in size over time. Big data is used in machine learning,
predictive modeling, and other advanced analytics to solve business
problems and make informed decisions
The Vs of big data
Big data definitions may vary slightly, but it will always be described in
terms of volume, velocity, and variety. These big data characteristics are
often referred to as the “3 Vs of
 Volume
As its name suggests, the most common characteristic associated
with big data is its high volume. This describes the enormous amount
of data that is available for collection and produced from a variety of
sources and devices on a continuous basis.
 Velocity
Big data velocity refers to the speed at which data is generated.
Today, data is often produced in real time or near real time, and
therefore, it must also be

Page: 1 / 39
processed, accessed, and analyzed at the same rate to have any
meaningful impact.
 Variety
Data is heterogeneous, meaning it can come from many different
sources and can be structured, unstructured, or semi-structured.
More traditional structured data (such as data in spreadsheets or
relational databases) is now supplemented by unstructured text,
images, audio, video files, or semi- structured formats like sensor
data that can’t be organized in a fixed data schema. big data” and
were first defined by Gartner in 2001.

In addition to these three original Vs, three others that are often mentioned
in relation to harnessing the power of big data: veracity, variability, and
value.
 Veracity:
Big data can be messy, noisy, and error-prone, which makes it
difficult to control the quality and accuracy of the data. Large
datasets can be unwieldy and confusing, while smaller datasets could
present an incomplete picture. The higher the veracity of the data,
the more trustworthy it is.
 Variability:
The meaning of collected data is constantly changing, which can lead
to inconsistency over time. These shifts include not only changes in
context and interpretation but also data collection methods based on
the information that companies want to capture and analyze.
 Value:
It’s essential to determine the business value of the data you collect.
Big data must contain the right data and then be effectively analyzed
in order to yield insights that can help drive decision-making.
Sources of Big Data
These data come from many sources like

o Social networking sites: Facebook, Google, LinkedIn all these sites

generate huge amount of data on a day to day basis as they have
billions of users worldwide.
o E-commerce site: Sites like Amazon, Flipkart, Alibaba generates
huge number of logs from which users buying trends can be traced.
o Weather Station: All the weather station and satellite gives very
huge data which are stored and manipulated to forecast weather.

Page: 2 / 39
o Telecom company: Telecom giants like Airtel, Vodafone study the
user trends and accordingly publish their plans and for this they store
the data of its million users.
o Share Market: Stock exchange across the world generates huge
amount of data through its daily transaction.

How does big data work?

The central concept of big data is that the more visibility you have
into anything, the more effectively you can gain insights to make better
decisions, uncover growth opportunities, and improve your business model.
Making big data work requires three main actions:
1. Integration:
Big data collects terabytes, and sometimes even petabytes, of raw data
from many sources that must be received, processed, and transformed
into the format that business users and analysts need to start analyzing
it.
2. Management:
Big data needs big storage, whether in the cloud, on-premises, or both.
Data must also be stored in whatever form required. It also needs to be
processed and made available in real time. Increasingly, companies are
turning to cloud solutions to take advantage of the unlimited compute
and scalability.
3. Analysis:
The final step is analyzing and acting on big data—otherwise, the
investment won’t be worth it. Beyond exploring the data itself, it’s also
critical to communicate and share insights across the business in a way
that everyone can understand. This includes using tools to create data
visualizations like charts, graphs, and dashboards.

What is big data analytics?

Big data analytics is the process of collecting, examining, and analysing large
amounts of data to discover market trends, insights, and patterns that can
help companies make better business decisions. This information is
available quickly and efficiently so that companies can be agile in crafting
plans to maintain their competitive advantage.
Big data analytics is important because it helps companies leverage their data
to identify opportunities for improvement and optimisation. Across different
business segments, increasing efficiency leads to overall more intelligent
operations, higher profits, and satisfied customers. Big data analytics helps
companies reduce costs and develop better, customer-centric products and
services.

Page: 3 / 39
Technologies such as business intelligence (BI) tools and systems help
organisations take unstructured and structured data from multiple sources.
Users (typically employees) input queries into these tools to understand
business operations and performance. Big data analytics uses the four data
analysis methods to uncover meaningful insights and derive solutions.

Types of big data analytics

Four main types of big data analytics support and inform different business
decisions.
1.Descriptive analytics
Descriptive analytics refers to data that can be easily read and interpreted.
This data helps create reports and visualise information that can detail
company profits and sales.
Example: During the pandemic, a leading pharmaceutical company
conducted data analysis on its offices and research labs. Descriptive
analytics helped them identify consolidated unutilised spaces and
departments, saving the company millions of pounds.
2.Diagnostics analytics
Diagnostics analytics helps companies understand why a problem
occurred. Big data technologies and tools allow users to mine and recover
data that helps dissect an issue and prevent it from happening in the future.
Example: An online retailer’s sales have decreased even though
customers continue to add items to their shopping carts. Diagnostics
analytics helped to understand that the payment page was not working
correctly for a few weeks.
3.Predictive analytics
Predictive analytics looks at past and present data to make predictions.
With artificial intelligence (AI), machine learning, and data mining, users
can analyse the data to predict market trends.
Example: In the manufacturing sector, companies can use algorithms
based on historical data to predict if or when a piece of equipment will
malfunction or break down.
4.Prescriptive analytics
Prescriptive analytics solves a problem, relying on AI and machine learning
to gather and use data for risk management.
Example: Within the energy sector, utility companies, gas producers, and
pipeline owners identify factors that affect the price of oil and gas to hedge
risks.

Page: 4 / 39
Benefits of big data analytics
Incorporating big data analytics into a business or organisation has several
advantages. These include:
Cost reduction: Big data can reduce costs in storing all business data
in one place. Tracking analytics also helps companies find ways to
work more efficiently to cut costs wherever possible.
Product development: Developing and marketing new products,
services, or brands is much easier when based on data collected from
customers’ needs and wants. Big data analytics also helps businesses
understand product viability and to keep up with trends.
Strategic business decisions: The ability to constantly analyse data
helps businesses make better and faster decisions, such as cost and
supply chain optimisation.
Customer experience: Data-driven algorithms help marketing
efforts (targeted ads, for example) and increase customer satisfaction
by delivering an enhanced customer experience.
Risk management: Businesses can identify risks by analysing data
patterns and developing solutions for managing those risks.

UNSTRUCTURED DATA
Types of Big Data
All data cannot be stored in the same way. The methods for data storage
can be accurately evaluated after the type of data has been identified

1.Structured data
Structured data is data whose elements are addressable for effective
analysis. It has been organized into a formatted repository that is typically a
database. It concerns

Page: 5 / 39
all data which can be stored in database in a table with rows and columns.
They have relational keys and can easily be mapped into pre-designed
fields. Today, those data are most processed in the development and
simplest way to manage information. Example: Relational data.

2.Semi-Structured data
Semi-structured data is information that does not reside in a
relational database but that has some organizational properties that make
it easier to analyze. With some processes, you can store them in the
relation database (it could be very hard for some kind of semi-structured
data), but Semi-structured exist to ease space. Example: XML data.

3.Unstructured data
Unstructured data is a data which is not organized in a predefined
manner or does not have a predefined data model, thus it is not a good
fit for a mainstream

Page: 6 / 39
relational database. So for Unstructured data, there are alternative
platforms for storing and managing, it is increasingly prevalent in IT
systems and is used by organizations in a variety of business intelligence
and analytics applications. Example: Word, PDF, Text, Media logs.

Unstructured data is the data which does not conforms to a data

model and has no easily identifiable structure such that it can not be used
by a computer program easily. Unstructured data is not organised in a pre-
defined manner or does not have a pre-defined data model, thus it is not a
good fit for a mainstream relational database.
From 80% to 90% of data generated and collected by organizations is
unstructured, and its volumes are growing rapidly — many times faster
than the rate of growth for structured databases.
Unstructured data stores contain a wealth of information that can be
used to guide business decisions. However, unstructured data has
historically been very difficult to analyze. With the help of AI and machine
learning, new software tools are emerging that can search through vast
quantities of it to uncover beneficial and actionable business intelligence.
Unstructured data vs. structured data
Let’s take structured data first: it’s usually stored in a relational
database or RDBMS, and is sometimes referred to as relational data. It can
be easily mapped into designated fields — for example, fields for zip codes,
phone numbers, and credit cards. Data that conforms to RDBMS structure is
easy to search, both with human-defined queries and with software.

Page: 7 / 39
Unstructured data, in contrast, doesn’t fit into these sorts of pre-
defined data models. It can’t be stored in an RDBMS. And because it
comes in so many formats, it’s a real challenge for conventional software
to ingest, process, and analyze. Simple content searches can be
undertaken across textual unstructured data with the right tools.
Beyond that, the lack of consistent internal structure doesn’t conform
to what typical data mining systems can work with. As a result, companies
have largely been unable to tap into value-laden data like customer
interactions, rich media, and social network conversations. Robust tools for
doing so are only now being developed and commercialized.
What are some examples of unstructured data?
Unstructured data can be created by people or generated by
machines. Here are some examples of the human-generated
variety:
 Email: Email message fields are unstructured and cannot be parsed
by traditional analytics tools. That said, email metadata affords it
some structure, and explains why email is sometimes considered
semi-structured data.
 Text files: This category includes word processing documents,
spreadsheets, presentations, email, and log files.
 Social media and websites: data from social networks like Twitter,
LinkedIn, and Facebook, and websites such as Instagram, photo-
sharing sites, and YouTube.
 Mobile and communications data: For this category, look no further
than text messages, phone recordings, collaboration software, chat,
and instant messaging.
 Media: This data includes digital photos, audio, and
video files. Here are some examples of unstructured data
generated by machines:
 Scientific data: This includes oil and gas surveys, space
exploration, seismic imagery, and atmospheric data.
 Digital surveillance: This category features data like reconnaissance
photos and videos.
 Satellite imagery: This data includes weather data, land forms, and
military movements.
le business intelligence.
Characteristics of Unstructured Data:
 Data neither conforms to a data model nor has any structure.
 Data cannot be stored in the form of rows and columns as in Databases

Page: 8 / 39
 Data does not follow any semantic or rules
 Data lacks any particular format or sequence
 Data has no easily identifiable structure
 Due to lack of identifiable structure, it cannot used by computer
programs easily
Sources of Unstructured Data:
 Web pages
 Images (JPEG, GIF, PNG, etc.)
 Videos
 Memos
 Reports
 Word documents and PowerPoint presentations
 Surveys
Advantages of Unstructured Data:
 Its supports the data which lacks a proper format or sequence
 The data is not constrained by a fixed schema
 Very Flexible due to absence of schema.
 Data is portable
 It is very scalable
 It can deal easily with the heterogeneity of sources.
 These types of data have a variety of business intelligence and
analytics applications.
Disadvantages of Unstructured data:
 It is difficult to store and manage unstructured data due to lack of
schema and structure
 Indexing the data is difficult and error prone due to unclear
structure and not having pre-defined attributes. Due to which
search results are not very accurate.
 Ensuring security to data is difficult task.
Problems faced in storing unstructured data:
 It requires a lot of storage space to store unstructured data.

Page: 9 / 39
 It is difficult to store videos, images, audios, etc.
 Due to unclear structure, operations like update, delete and search
is very difficult.
 Storage cost is high as compared to structured data
 Indexing the unstructured data is difficult
Possible solution for storing Unstructured data:
 Unstructured data can be converted to easily manageable formats
 using Content addressable storage system (CAS) to store
unstructured data. It stores data based on their metadata and a
unique name is assigned to every object stored in it. The object is
retrieved based on content not its location.
 Unstructured data can be stored in XML format.
 Unstructured data can be stored in RDBMS which supports BLOBs
Extracting information from unstructured Data:
unstructured data do not have any structure. So it cannot easily interpreted by
conventional algorithms. It is also difficult to tag and index unstructured
data. So extracting information from them is tough job. Here are possible
solutions:
 Taxonomies or classification of data helps in organising data in
hierarchical structure. Which will make search process easy.
 Data can be stored in virtual repository and be automatically
tagged. For example Documentum.
 Use of application platforms like
XOLAP. XOLAP helps in extracting information from e-
mails and XML based documents
 Use of various data mining tools

BIG DATA INDUSTRY APPLICATIONS

Here are some of the sectors where Big Data is actively used:
Ecommerce - Predicting customer trends and optimizing prices are a few
of the ways e-commerce uses Big Data analytics
Marketing - Big Data analytics helps to drive high ROI marketing
operations, which result in improved sales
Education - Used to develop new and improve existing courses based on market
requirements

Page: 10 / 39
Healthcare - With the help of a patient’s medical history, Big Data analytics
is
used to predict how likely they are to have health issues
Media and entertainment - Used to understand the demand of
shows, movies, songs, and more to deliver a personalized
recommendation list to its users
Banking - Customer income and spending patterns help to predict the
likelihood of choosing various banking offers, like loans and credit cards
Telecommunications - Used to forecast network capacity and improve
customer experience
Government - Big Data analytics helps governments in law
enforcement, among other things

APPLICATIONS OF BIG DATA

In today’s world, there are a lot of data. Big companies utilize those data
for their business growth. By analyzing this data, the useful decision can be
made in various cases as discussed below:
1.Tracking Customer Spending Habit, Shopping Behavior:
In big retails store (like Amazon, Walmart, Big Bazar etc.)
management team has to keep data of customer’s spending habit (in
which product customer spent, in which brand they wish to spent, how
frequently they spent), shopping behavior, customer’s most liked product
(so that they can keep those products in the store). Which product is being
searched/sold most, based on that data, production/collection rate of that
product get fixed.
Banking sector uses their customer’s spending behavior-related
data so that they can provide the offer to a particular customer to buy his
particular liked product by using bank’s credit or debit card with discount
or cashback. By this way, they can send the right offer to the right person
at the right time.
2.Recommendation:
By tracking customer spending habit, shopping behavior, Big retails
store provide a recommendation to the customer. E-commerce site like
Amazon, Walmart, Flipkart does product recommendation. They track what
product a customer is searching, based on that data they recommend that
type of product to that customer.
As an example, suppose any customer searched bed cover on
Amazon. So, Amazon got data that customer may be interested to buy bed
cover. Next time when that customer will go to any google page,
advertisement of various bed covers will be seen. Thus, advertisement of
the right product to the right customer can be sent.

Page: 11 / 39
YouTube also shows recommend video based on user’s previous
liked, watched video type. Based on the content of a video, the user is
watching, relevant advertisement is shown during video running. As an
example suppose someone watching a tutorial video of Big data, then
advertisement of some other big data course will be shown during that
video.
3.Smart Traffic System:
Data about the condition of the traffic of different road, collected
through camera kept beside the road, at entry and exit point of the city,
GPS device placed in the vehicle (Ola, Uber cab, etc.). All such data are
analyzed and jam-free or less jam way, less time taking ways are
recommended. Such a way smart traffic system can be built in the city by
Big data analysis. One more profit is fuel consumption can be reduced.
4.Secure Air Traffic System:
At various places of flight (like propeller etc) sensors present. These
sensors capture data like the speed of flight, moisture, temperature, other
environmental condition. Based on such data analysis, an environmental
parameter within flight are set up and varied.
By analyzing flight’s machine-generated data, it can be estimated
how long the machine can operate flawlessly when it to be
replaced/repaired.
5.Auto Driving Car:
Big data analysis helps drive a car without human interpretation. In
the various spot of car camera, a sensor placed, that gather data like the
size of the surrounding car, obstacle, distance from those, etc. These data
are being analyzed, then various calculation like how many angles to
rotate, what should be speed, when to stop, etc carried out. These
calculations help to take action automatically.
6.Virtual Personal Assistant Tool:
Big data analysis helps virtual personal assistant tool (like Siri in
Apple Device, Cortana in Windows, Google Assistant in Android) to provide
the answer of the various question asked by users. This tool tracks the
location of the user, their local time, season, other data related to question
asked, etc. Analyzing all such data, it provides an answer.
As an example, suppose one user asks “Do I need to take
Umbrella?”, the tool collects data like location of the user, season and
weather condition at that location, then analyze these data to conclude if
there is a chance of raining, then provide the answer.

Page: 12 / 39
7.IoT:
Manufacturing company install IOT sensor into machines to collect operational
data. Analyzing such data, it can be predicted how long machine will work
without any problem when it requires repairing so that company can take
action before the situation when machine facing a lot of issues or gets
totally down. Thus, the cost to replace the whole machine can be saved.
In the Healthcare field, Big data is providing a significant contribution.
Using big data tool, data regarding patient experience is collected and is
used by doctors to give better treatment. IoT device can sense a symptom
of probable coming disease in the human body and prevent it from giving
advance treatment. IoT Sensor placed near-patient, new-born baby
constantly keeps track of various health condition like heart bit rate, blood
presser, etc. Whenever any parameter crosses the safe limit, an alarm sent
to a doctor, so that they can take step remotely very soon.
8.Education Sector:
Online educational course conducting organization utilize big data to
search candidate, interested in that course. If someone searches for
YouTube tutorial video on a subject, then online or offline course provider
organization on that subject send ad online to that person about their
course.
9.Energy Sector:
Smart electric meter read consumed power every 15 minutes and
sends this read data to the server, where data analyzed and it can be
estimated what is the time in a day when the power load is less throughout
the city. By this system manufacturing unit or housekeeper are suggested
the time when they should drive their heavy machine in the night time
when power load less to enjoy less electricity bill.
10. Media and Entertainment Sector:
Media and entertainment service providing company like Netflix,
Amazon Prime, Spotify do analysis on data collected from their users. Data
like what type of video, music users are watching, listening most, how long
users are spending on site, etc are collected and analyzed to set the next
business strategy.

Page: 13 / 39
BIG DATA TECHNOLOGIES

Big data technologies can be categorized into four main types: data
storage, data mining, data analytics, and data visualization [2]. Each of
these is associated with certain tools, and you’ll want to choose the right
tool for your business needs depending on the type of big data technology
required.
1.Data storage
Big data technology that deals with data storage has the capability to fetch,
store, and manage big data. It is made up of infrastructure that allows users
to store the data so that it is convenient to access. Most data storage
platforms are compatible with other programs. Two commonly used tools
are Apache Hadoop and MongoDB.
 Apache Hadoop: Apache is the most widely used big data tool. It is
an open- source software platform that stores and processes big data
in a distributed computing environment across hardware clusters.
This distribution allows for faster data processing. The framework is
designed to reduce bugs or faults, be scalable, and process all data
formats.
 MongoDB: MongoDB is a NoSQL database that can be used to store
large volumes of data. Using key-value pairs (a basic unit of data),
MongoDB categorizes documents into collections. It is written in C,
C++, and JavaScript, and is one of the most popular big data
databases because it can manage and store unstructured data with
ease.
2.Data mining
Data mining extracts the useful patterns and trends from the raw data. Big
data technologies such as Rapidminer and Presto can turn unstructured
and structured data into usable information.
 Rapidminer: Rapidminer is a data mining tool that can be used to
build predictive models. It draws on these two roles as strengths, of
processing and preparing data, and building machine and deep
learning models. The end-to- end model allows for both functions to
drive impact across the organization [3].
 Presto: Presto is an open-source query engine that was originally
developed by Facebook to run analytic queries against their large
datasets. Now, it is available widely. One query on Presto can
combine data from multiple sources within an organization and
perform analytics on them in a matter of minutes.
3.Data analytics
In big data analytics, technologies are used to clean and transform data
into information that can be used to drive business decisions. This next
step (after data

Page: 14 / 39
mining) is where users perform algorithms, models, and predictive
analytics using tools such as Apache Spark and Splunk.
 Apache Spark: Spark is a popular big data tool for data analysis
because it is fast and efficient at running applications. It is faster than
Hadoop because it uses random access memory (RAM) instead of
being stored and processed in batches via MapReduce . Spark
supports a wide variety of data analytics tasks and queries.
 Splunk: Splunk is another popular big data analytics tool for deriving
insights from large datasets. It has the ability to generate graphs,
charts, reports, and dashboards. Splunk also enables users to
incorporate artificial intelligence (AI) into data outcomes.
4.Data visualization
Finally, big data technologies can be used to create stunning visualizations
from the data. In data-oriented roles, data visualization is a skill that is
beneficial for presenting recommendations to stakeholders for business
profitability and operations—to tell an impactful story with a simple graph.
 Tableau: Tableau is a very popular tool in data visualization because
its drag- and-drop interface makes it easy to create pie charts,
bar charts, box plots, Gantt charts, and more. It is a secure platform
that allows users to share visualizations and dashboards in real time.
 Looker: Looker is a business intelligence (BI) tool used to make
sense of big data analytics and then share those insights with other
teams. Charts, graphs, and dashboards can be configured with a
query, such as monitoring weekly brand engagement through social
media analytics.

OPEN SOURCE TECHNOLOGIES / BIG DATA ANALYTICS TOOLS

There are hundreds of data analytics tools out there in the market today but
the selection of the right tool will depend upon your business NEED, GOALS,
and VARIETY to get business in the right direction. Now, let’s check out the
top 10 analytics tools in big data.
1.APACHE Hadoop
It’s a Java-based open-source platform that is being used to store and
process big data. It is built on a cluster system that allows the system to
process data efficiently and let the data run parallel. It can process both
structured and unstructured data from one server to multiple computers.
Hadoop also offers cross-platform support for its users. Today, it is the best
big data analytic tool and is popularly used by many tech giants such as
Amazon, Microsoft, IBM, etc.

Page: 15 / 39
Features of Apache Hadoop:
 Free to use and offers an efficient storage solution for businesses.
 Offers quick access via HDFS (Hadoop Distributed File System).
 Highly flexible and can be easily implemented with MySQL, and JSON.
 Highly scalable as it can distribute a large amount of data in small segments.
 It works on small commodity hardware like JBOD or a bunch of disks.
2.Cassandra
APACHE Cassandra is an open-source NoSQL distributed database
that is used to fetch large amounts of data. It’s one of the most popular
tools for data analytics and has been praised by many tech companies due
to its high scalability and availability without compromising speed and
performance. It is capable of delivering thousands of operations every
second and can handle petabytes of resources with almost zero downtime.
It was created by Facebook back in 2008 and was published publicly.
Features of APACHE Cassandra:
 Data Storage Flexibility: It supports all forms of data i.e.
structured, unstructured, semi-structured, and allows users to
change as per their needs.
 Data Distribution System: Easy to distribute data with the help of
replicating data on multiple data centers.
 Fast Processing: Cassandra has been designed to run on efficient
commodity hardware and also offers fast storage and data
processing.
 Fault-tolerance: The moment, if any node fails, it will be replaced
without any delay.
3.Qubole
It’s an open-source big data tool that helps in fetching data in a value
of chain using ad-hoc analysis in machine learning. Qubole is a data lake
platform that offers end-to-end service with reduced time and effort which
are required in moving data pipelines. It is capable of configuring multi-
cloud services such as AWS, Azure, and Google Cloud. Besides, it also helps
in lowering the cost of cloud computing by 50%.
Features of Qubole:
 Supports ETL process: It allows companies to migrate data from
multiple sources in one place.
 Real-time Insight: It monitors user’s systems and allows them to
view real-time insights
 Predictive Analysis: Qubole offers predictive analysis so that
companies can take actions accordingly for targeting more
acquisitions.
 Advanced Security System: To protect users’ data in the cloud,
Qubole uses an advanced security system and also ensures to protect
any future breaches. Besides, it also allows encrypting cloud data
from any potential threat.

Page: 16 / 39
4.Xplenty
It is a data analytic tool for building a data pipeline by using minimal
codes in it. It offers a wide range of solutions for sales, marketing, and
support. With the help of its interactive graphical interface, it provides
solutions for ETL, ELT, etc. The best part of using Xplenty is its low
investment in hardware & software and its offers support via email, chat,
telephonic and virtual meetings. Xplenty is a platform to process data for
analytics over the cloud and segregates all the data together.
Features of Xplenty:
 Rest API: A user can possibly do anything by implementing Rest API
 Flexibility: Data can be sent, and pulled to databases,
warehouses, and salesforce.
 Data Security: It offers SSL/TSL encryption and the platform is
capable of verifying algorithms and certificates regularly.
 Deployment: It offers integration apps for both cloud & in-house and
supports deployment to integrate apps over the cloud.
5.Spark
APACHE Spark is another framework that is used to process data and
perform numerous tasks on a large scale. It is also used to process data via
multiple computers with the help of distributing tools. It is widely used
among data analysts as it offers easy-to-use APIs that provide easy data
pulling methods and it is capable of handling multi-petabytes of data as
well. Recently, Spark made a record of processing 100 terabytes of data in
just 23 minutes which broke the previous world record of Hadoop (71
minutes). This is the reason why big tech giants are moving towards spark
now and is highly suitable for ML and AI today.
Features of APACHE Spark:
 Ease of use: It allows users to run in their preferred language. (JAVA,
Python, etc.)
 Real-time Processing: Spark can handle real-time streaming
via Spark Streaming
 Flexible: It can run on, Mesos, Kubernetes, or the cloud.
6.Mongo DB
Came in limelight in 2010, is a free, open-source platform and a
document- oriented (NoSQL) database that is used to store a high volume of
data. It uses collections and documents for storage and its document
consists of key-value pairs which are considered a basic unit of Mongo DB. It
is so popular among developers due to its availability for multi-
programming languages such as Python, Jscript, and Ruby.

Page: 17 / 39
Features of Mongo DB:
 Written in C++: It’s a schema-less DB and can hold varieties of
documents inside.
 Simplifies Stack: With the help of mongo, a user can easily store
files without any disturbance in the stack.
 Master-Slave Replication: It can write/read data from the master
and can be called back for backup.
7.Apache Storm
A storm is a robust, user-friendly tool used for data analytics,
especially in small companies. The best part about the storm is that it has
no language barrier (programming) in it and can support any of them. It was
designed to handle a pool of large data in fault-tolerance and horizontally
scalable methods. When we talk about real-time data processing, Storm
leads the chart because of its distributed real-time big data processing
system, due to which today many tech giants are using APACHE Storm in
their system. Some of the most notable names are Twitter, Zendesk,
NaviSite, etc.
Features of Storm:
 Data Processing: Storm process the data even if the node gets disconnected
 Highly Scalable: It keeps the momentum of performance even if
the load increases
 Fast: The speed of APACHE Storm is impeccable and can process up
to 1 million messages of 100 bytes on a single node.
8.SAS
Today it is one of the best tools for creating statistical modeling used by data
analysts. By using SAS, a data scientist can mine, manage, extract or
update data in different variants from different sources. Statistical
Analytical System or SAS allows a user to access the data in any format
(SAS tables or Excel worksheets). Besides that it also offers a cloud platform
for business analytics called SAS Viya and also to get a strong grip on AI &
ML, they have introduced new tools and products.
Features of SAS:
 Flexible Programming Language: It offers easy-to-learn syntax
and has also vast libraries which make it suitable for non-
programmers
 Vast Data Format: It provides support for many programming
languages which also include SQL and carries the ability to read data
from any format.
 Encryption: It provides end-to-end security with a feature called
SAS/SECURE.
9.Data Pine
Datapine is an analytical used for BI and was founded back in 2012
(Berlin, Germany). In a short period of time, it has gained much popularity
in a number of

Page: 18 / 39
countries and it’s mainly used for data extraction (for small-medium
companies fetching data for close monitoring). With the help of its
enhanced UI design, anyone can visit and check the data as per their
requirement and offer in 4 different price brackets, starting from $249 per
month. They do offer dashboards by functions, industry, and platform.
Features of Datapine:
 Automation: To cut down the manual chase, datapine offers a wide
array of AI assistant and BI tools.
Predictive Tool: datapine provides forecasting/predictive
analytics by using historical and current data, it derives the future
outcome.
 Add on: It also offers intuitive widgets, visual analytics &
discovery, ad hoc reporting, etc.
10. Rapid Miner
It’s a fully automated visual workflow design tool used for data
analytics. It’s a no-code platform and users aren’t required to code for
segregating data. Today, it is being heavily used in many industries such
as ed-tech, training, research, etc. Though it’s an open-source platform but
has a limitation of adding 10000 data rows and a single logical processor.
With the help of Rapid Miner, one can easily deploy their ML models to the
web or mobile (only when the user interface is ready to collect real-time
figures).
Features of Rapid Miner:
 Accessibility: It allows users to access 40+ types of files (SAS,
ARFF, etc.) via URL
 Storage: Users can access cloud storage facilities such as AWS and dropbox
 Data validation: Rapid miner enables the visual display of multiple
results in history for better evaluation.

CLOUD AND BIG DATA

1. Big Data:
Big data refers to the data which is huge in size and also increasing
rapidly with respect to time. Big data includes structured data,
unstructured data as well as semi-structured data. Big data cannot be
stored and processed in traditional data management tools it needs
specialized big data management tools. It refers to complex and large
data sets having 5 V’s volume, velocity, Veracity, Value and variety
information assets. It includes data storage, data analysis, data
mining and data visualization.

Page: 19 / 39
Examples of the sources where big data is generated includes social media
data, e- commerce data, weather station data, IoT Sensor data etc.
Characteristics of Big Data :
 Variety of Big data – Structured, unstructured, and semi structured data
 Velocity of Big data – Speed of data generation
 Volume of Big data – Huge volumes of data that is being generated
 Value of Big data – Extracting useful information and making it valuable
 Variability of Big data – Inconsistency which can be shown by the data at
times.
Advantages of Big Data :
 Cost Savings
 Better decision-making
 Better Sales insights
 Increased Productivity
 Improved customer service.
Disadvantages of Big Data :
 Incompatible tools
 Security and Privacy Concerns
 Need for cultural change
 Rapid change in technology
 Specific hardware needs.
2. Cloud Computing :
Cloud computing refers to the on demand availability of computing
resources over internet. These resources includes servers, storage,
databases, software, analytics, networking and intelligence over the
Internet and all these resources can be used as per requirement of the
customer. In cloud computing customers have to pay as per use. It is very
flexible and can be resources can be scaled easily depending upon the
requirement. Instead of buying any IT resources physically, all resources
can be availed depending on the requirement from the cloud vendors.
Cloud computing has three service models i.e Infrastructure as a Service
(IaaS), Platform as a Service (PaaS) and Software as a Service (SaaS).
Examples of cloud computing vendors who provides cloud computing
services are Amazon Web Service (AWS), Microsoft Azure, Google Cloud
Platform, IBM Cloud Services etc.

Page: 20 / 39
Characteristics of Cloud Computing :
 On-Demand availability
 Accessible through a network
 Elastic Scalability
 Pay as you go model
 Multi-tenancy and resource pooling.
Advantages of Cloud Computing :
 Back-up and restore data
 Improved collaboration
 Excellent accessibility
 Low maintenance cost
 On-Demand Self-service.
Disadvantages of Cloud Computing:
 Vendor lock-in
 Limited Control
 Security Concern
 Downtime due to various reason
 Requires good Internet connectivity.

Difference between Big Data and Cloud Computing:

S.No BIG DATA CLOUD COMPUTING

Big data refers to the data which Cloud computing refers to the
01. is huge in size and also on demand availability of
increasing rapidly with respect to computing resources over
time. internet.

Cloud Computing Services includes

Big data includes structured
Infrastructure as a Service (IaaS),
02. data, unstructured data as
Platform as a Service (PaaS) and
well as semi- structured data.
Software as a Service (SaaS).

Volume of data, Velocity of data, On-Demand availability of IT

03.
Variety of data, Veracity of data, resources, broad network access,
and resource pooling,

Page: 21 / 39
Value of data are considered as elasticity and measured service
the 5 most important are considered as the main
characteristics of Big data. characteristics of cloud
computing.

The purpose of big data is to The purpose of cloud computing is

organizing the large volume of to store and process data in cloud
04. data and extracting the useful or availing remote IT services
information from it and using that without physically installing any IT
information for the improvement resources.
of business.

Distributed computing is used for Internet is used to get the cloud

05. analyzing the data and based services from different cloud
extracting the useful information. vendors.

Big data management allows

centralized platform, Cloud computing services are cost
06.
provision for backup and effective, scalable and robust.
recovery and low
maintenance cost.

Some of the challenges of big data Some of the challenges of cloud

are variety of data, data storage computing are availability,
07.
and integration, data processing transformation, security concern,
and resource management. charging model.

Big data refers to huge volume of Cloud computing refers to remote

08. data, its management, and IT resources and different internet
useful information extraction. service models.

Cloud computing is used to store

09. Big data is used to describe data and information on remote
huge volume of data and servers and also processing the
information. data using remote infrastructure.

Some of the sources where big Some of the cloud computing

data is generated includes social vendors who provides cloud
10. computing services are Amazon
media data, e-commerce data,
weather station data, IoT Sensor Web Service (AWS), Microsoft Azure,
data etc. Google Cloud Platform, IBM Cloud
Services etc.

Page: 22 / 39
WEB ANALYTICS

Web Analytics or Online Analytics refers to the analysis of quantifiable and

measurable data of your website with the aim of understanding and
optimizing the web usage.
Web Analytics is the methodological study of online/offline patterns and
trends. It is a technique that you can employ to collect, measure, report,
and analyze your website data. It is normally carried out to analyze the
performance of a website and optimize its web usage.
web analytics used to track key metrics and analyze visitors’ activity and
traffic flow. It is a tactical approach to collect data and generate reports. It
is an ongoing process that helps in attracting more traffic to a site and
thereby, increasing the Return on Investment.

Web analytics focuses on various issues. For example,

 Detailed comparison of visitor data, and Affiliate or referral data.
 Website navigation patterns.
 The amount of traffic your website received over a specified period of time.
 Search engine data.
Web analytics improves online experience for your customers and elevates
your business prospects. There are various Web Analytics tools available in
the market. For example, Google Analytics, Kissmetrics, Optimizely, etc.
Importance of Web Analytics
Web Analytics needed to assess the success rate of a website and its
associated
business. Using Web Analytics, we can −

Page: 23 / 39
 Assess web content problems so that they can be rectified
 Have a clear perspective of website trends
 Monitor web traffic and user flow
 Demonstrate goals acquisition
 Figure out potential keywords
 Identify segments for improvement
 Find out referring sources
Web Analytics Process
The primary objective of carrying out Web Analytics is to optimize the
website in order to provide better user experience. It provides a data-driven
report to measure visitors’ flow throughout the website.
Take a look at the following illustration. It depicts the process of web analytics.
 Set the business goals.
 To track the goal achievement, set the Key Performance Indicators (KPI).
 Collect correct and suitable data.
 To extract insights, Analyze data.
 Based on assumptions learned from the data analysis, Test alternatives.
 Based on either data analysis or website testing, Implement
insights. Types of Web Analytics
There are two types of web analytics −
 On-site − It measures the users’ behaviour once it is on the website.
For
example, measurement of your website performance.
 Off-site − It is the measurement and analysis irrespective of
whether you own or maintain a website. For example, measurement
of visibility, comments, potential audience, etc.
Metrics of Web Analytics
There are three basic metrics of web analytics −
Count
It is most basic metric of measurement. It is represented as a whole
number or a fraction. For example,
 Number of visitors = 12999, Number of likes = 3060, etc.

Page: 24 / 39
 Total sales of merchandise = $54,396.18.
Ratio
It is typically a count divided by some other count. For example, Page
views per visit.
Key Performance Indicator (KPI)
It depends upon the business type and strategy. KPI varies from one
business to another.
Micro and macro Level Data Insights
Google Analytics gives you more insight data accurately. You can
understand the data at two levels micro level and macro level.
Micro Level Analysis
It pertains to an individual or a small group of individuals. For example,
number of times job application submitted, number of times print this
page was clicked, etc.
Macro Level Analysis
It is concerned with the primary business objectives with huge groups of
people such as communities, nation, etc. For example, number of
conversions in a particular demographic.
Web Analysis - What to Measure?
These are the few measurements conducted in web analytics −
 Engagement Rate
It shows how long a person stays on your web page. What all pages he
surf. To make your web pages more engaging, include informative
content, visuals, fonts and bullets.
 Bounce Rate
If a person leaves your website within a span of 30 sec, it is considered
as a bounce. The rate at which users spin back is called the bounce
rate. To minimize bounce rate include related posts, clear call-to-
action and backlinks in your webpages.
 Dashboards
Dashboard is single page view of information important to user. You
can create your own dashboards keeping in mind your
requirements. You may keep only frequently viewed data on
dashboard.
 Event Tracking
Event tracking allows you to track other activities on your website. For
example, you can track downloads and sign-ups through event
tracking.
 Traffic Source

Page: 25 / 39
You can overview traffic sources. You can even filter it further.
Figuring out the key areas can help you learn about the area of
improvement.
 Annotations
It allows you to view a traffic report for past time. You can click on
graph and type in to save it for future study.
 Visitor Flow
It gives you a clear picture of pages visited and the sequence of the
same. Understanding users’ path may help you in re-navigation in
order to give customer a hassle-free navigation.
 Content
It gives you insight about website’s content section. You can see how
each
page is doing, website loading speed, etc.
 Conversions
Analytics lets you track goals and path used to achieve these goals.
You can get details regarding, product performances, purchase
amount, and mode of billing. Web Analytics offer you more than this.
All you need is to analyze things minutely and keep patience.
 Page Load Time
More is the load time, the more is bounce rate. Tracking page load
time is equally important.
 Behavior
Behavior lets you know page views and time spent on website. You
can find out how customer behaves once he is on your website.

MOBILE BUSINESS INTELLIGENCE

Business Intelligence
“Business Intelligence is not just about turning data into information, rather
organizations need that data to impact how their business operates and
responds to the changing marketplace.”
So, it is not all about transforming data into information, though Business
Intelligence significantly involves this process. Business Intelligence is
transforming data into meaningful, actionable insights that enable
organizations to make informed business strategies and tactical decisions.
Mobile Business Intelligence
Business Intelligence delivers relevant and trustworthy information to the
right person at the right time. Mobile business intelligence is the transfer
of business intelligence from the desktop to mobile devices such as the
BlackBerry, iPad, and iPhone.

Page: 26 / 39
The ability to access analytics and data on mobile devices or tablets rather
than desktop computers is referred to as mobile business intelligence. The
business metric dashboard and key performance indicators (KPIs) are more
clearly displayed.
With the rising use of mobile devices, so have the technology that we all
utilise in our daily lives to make our lives easier, including business. Many
businesses have benefited from mobile business intelligence. Essentially,
this post is a guide for business owners and others to educate them on the
benefits and pitfalls of Mobile BI.
Need for mobile BI?
Mobile phones' data storage capacity has grown in tandem with their use.
You are expected to make decisions and act quickly in this fast-paced
environment. The number of businesses receiving assistance in such a
situation is growing by the day.
To expand your business or boost your business productivity, mobile BI can
help, and it works with both small and large businesses. Mobile BI can help
you whether you are a salesperson or a CEO. There is a high demand for
mobile BI in order to reduce information time and use that time for quick
decision making.
As a result, timely decision-making can boost customer satisfaction and
improve an enterprise's reputation among its customers. It also aids in
making quick decisions in the face of emerging risks.
Data analytics and visualisation techniques are essential skills for any team
that wants to organise work, develop new project proposals, or wow clients
with impressive presentations.
Advantages of mobile BI
1. Simple access
Mobile BI is not restricted to a single mobile device or a certain place. You
can view your data at any time and from any location. Having real-time
visibility into a firm improves production and the daily efficiency of the
business. Obtaining a company's perspective with a single click simplifies
the process.
2. Competitive advantage
Many firms are seeking better and more responsive methods to do
business in order to stay ahead of the competition. Easy access to real-
time data improves company opportunities and raises sales and capital.
This also aids in making the necessary decisions as market conditions
change.
3. Simple decision-making
As previously stated, mobile BI provides access to real-time data at any
time and from any location. During its demand, Mobile BI offers the
information. This assists consumers in obtaining what they require at the
time. As a result, decisions are made quickly.

Page: 27 / 39
4.Increase Productivity
By extending BI to mobile, the organization's teams can access critical
company data when they need it. Obtaining all of the corporate data with
a single click frees up a significant amount of time to focus on the smooth
and efficient operation of the firm. Increased productivity results in a
smooth and quick-running firm.
Disadvantages of mobile
1.Stack of data
The primary function of a mobile BI is to store data in a systematic
manner and then present it to the user as required. As a result, Mobile BI
stores all of the information and does end up with heaps of earlier data.
The corporation only needs a small portion of the previous data, but they
need to store the entire information, which ends up in the stack
2.Expensive
Mobile BI can be quite costly at times. Large corporations can continue to
pay for their expensive services, but small businesses cannot. As the cost
of mobile BI is not sufficient, we must additionally consider the rates of IT
workers for the smooth operation of BI, as well as the hardware costs
involved. However, larger corporations do not settle for just one Mobile
BI provider for their organisations; they require multiple. Even when
doing basic commercial transactions, mobile BI is costly.
3 Time consuming
Businesses prefer Mobile BI since it is a quick procedure. Companies are
not patient enough to wait for data before implementing it. In today's
fast-paced environment, anything that can produce results quickly is
valuable. The data from the warehouse is used to create the system,
hence the implementation of BI in an enterprise takes more than 18
months.
4 Data breach
The biggest issue of the user when providing data to Mobile BI is data
leakage. If you handle sensitive data through Mobile BI, a single error can
destroy your data as well as make it public, which can be detrimental to
your business.
Many Mobile BI providers are working to make it 100 percent secure to
protect their potential users' data. It is not only something that mobile BI
carriers must consider, but it is also something that we, as users, must
consider when granting data access authorization.
5 Poor quality data
Because we work online in every aspect, we have a lot of data stored in
Mobile BI, which might be a significant problem. This means that a large
portion of the data

Page: 28 / 39
analysed by Mobile BI is irrelevant or completely useless. This can speed
down the entire procedure. This requires you to select the data that is
important and may be required in the future.
Best Mobile BI tools
1. Si Sense
Sisense is a flexible business intelligence (BI) solution that includes
powerful analytics, visualisations, and reporting capabilities for
managing and supporting corporate data. Businesses can use the
solution to evaluate large, diverse databases and generate relevant
business insights. You may easily view enormous volumes of complex
data with Si Sense's code-first, low-code, and even no-code technologies.
Si Sense was established in 2004 with its headquarters in New York.
Since then, the team has only taken precautionary steps in their
investigation. Once the company had received $ 4 million in funding from
investors, they began to pace its research.
2 SAP Roambi analytics
Roambi analytics is a BI tool that offers a solution that allows you to
fundamentally rethink your data analysis, making it easier and faster
while also increasing your data interaction.
You can consolidate all of your company's data in a single tool using SAP
Roambi Analytics, which integrates all ongoing systems and data. Use of
SAP Roambi analysis is a simple three-step technique. Upload your html
or spreadsheet files first. The information is subsequently transformed
into informative data or graphs, as well as data that may be visualised.
After the data is collected, you may easily share it with your preferred
device. Roambi Analytics was founded in 2008 by a team based in
California.
3 Microsoft Power BI pro
Microsoft's strength BI is an easy-to-use tool for all non-technical
business owners. who are unfamiliar with BI tools but wish to aggregate,
analyse, visualise, and share data you only need a basic understanding
of Excel and other Microsoft tools, and if you are familiar with these, the
Microsoft BI tool can be used as a self-service tool. Microsoft Power BI
has a unique feature that allows users to create subsets of data and
then automatically apply analytics to that information.
4 IBM Cognos Analytics
Cognos Analytics is an IBM-registered web-based business intelligence
tool. Cognos Analytics is now merging with Watsons, and the benefits
for users are extremely exciting. Watson cognos analytics will assist in
connecting and cleaning the users' data, resulting in proper visualised
data.

Page: 29 / 39
That way, the business owner will know where they stand in comparison
to their competitors and where they can grow in the future. It combines
reporting, modelling, analysis, dashboards to help you understand your
organization's data and make sound business decisions.
5 Amazon quick sights
Amazon Quick View assists in the creation and distribution of
interactive BI dashboards to their users, as well as the retrieval of
answers in natural language queries in seconds. Quick sight can be
accessed through any device embedded in any website, portal, or app.
Amazon Quick Sight allows you to quickly and easily create interactive
dashboards and reports for your users. Anyone in your organisation
can securely access those dashboards via browsers or mobile devices.
Quick sight's eye-catching feature is its pay-per-session model, which
allows users to use the creative dashboard created by another without
paying much. The user pays according to the length of the session, with
prices ranging from $0.30 for a 30-minute session to $5 for unlimited
use per month per user.

CROWD SOURCING ANALYTICS

Crowdsourcing is a sourcing model in which an individual or an
organization gets support from a large, open-minded, and rapidly evolving
group of people in the form of ideas, micro-tasks, finances, etc.
Crowdsourcing typically involves the use of the internet to attract a large
group of people to divide tasks or to achieve a target. The term was coined
in 2005 by Jeff Howe and Mark Robinson. Crowdsourcing can help different
types of organizations get new ideas and solutions, deeper consumer
engagement, optimization of tasks, and several other things.
Let us understand this term deeply with the help of an example. Like
GeeksforGeeks is giving young minds an opportunity to share their
knowledge with the world by contributing articles, videos of their respective
domain. Here GeeksforGeeks is using the crowd as a source not only to
expand their community but also to include ideas of several young minds
improving the quality of the content.
Where Can We Use Crowdsourcing?
Crowdsourcing is touching almost all sectors from education to health. It
is not only accelerating innovation but democratizing problem-solving
methods. Some fields where crowdsourcing can be used.
1. Enterprise
2. IT

Page: 30 / 39
3. Marketing
4. Education
5. Finance
6. Science and Health
How to Crowdsource?
1. For scientific problem solving, a broadcast search is used where an
organization mobilizes a crowd to come up with a solution to a
problem.
2. For information management problems, knowledge discovery
and management is used to find and assemble information.
3. For processing large datasets, distributed human intelligence is
used. The organization mobilizes a crowd to process and analyze the
information.
Examples of Crowdsourcing
1. Doritos: It is one of the companies which is taking advantage of
crowdsourcing for a long time for an advertising initiative. They use
consumer-created ads for one of their 30-Second Super Bowl
Spots(Championship Game of Football).
2. Starbucks: Another big venture which used crowdsourcing as a
medium for idea generation. Their white cup contest is a famous
contest in which customers need to decorate their Starbucks cup with
an original design and then take a photo and submit it on social
media.
3. Lays:” Do us a flavor” contest of Lays used crowdsourcing as an
idea-generating medium. They asked the customers to submit their
opinion about the next chip flavor they want.
4. Airbnb: A very famous travel website that offers people to rent their
houses or apartments by listing them on the website. All the listings
are crowdsourced by people.
Crowdsourced Marketing
As discussed already crowdsourcing helps grow businesses grow a lot. May
it be a business idea or just a logo design, crowdsourcing engages
people directly and in turn, saves money and energy. In the upcoming
years, crowdsourced marketing will surely get a boost as the world is
accepting technology faster.
Main Types of Crowdsourcing
Crowdsourcing involves obtaining information or resources from a wide
swath of people. In general, we can break this up into four main
categories:

Page: 31 / 39
 Wisdom - Wisdom of crowds is the idea that large groups of people
are collectively smarter than individual experts when it comes to
problem-solving or identifying values (like the weight of a cow or
number of jelly beans in a jar).
 Creation - Crowd creation is a collaborative effort to design or build
something. Wikipedia and other wikis are examples of this. Open-
source software is another good example.
 Voting - Crowd voting uses the democratic principle to choose a
particular policy or course of action by "polling the audience."
 Funding - Crowdfunding involved raising money for various purposes
by soliciting relatively small amounts from a large number of funders.
Crowdsourcing Sites
Here is the list of some famous crowdsourcing and crowdfunding sites.
1. Kickstarter
2. GoFundMe
3. Patreon
4. RocketHub
Advantages of Crowdsourcing
1. Evolving Innovation: Innovation is required everywhere and in this
advancing world innovation has a big role to play. Crowdsourcing
helps in getting innovative ideas from people belonging to different
fields and thus helping businesses grow in every field.
2. Save costs: There is the elimination of wastage of time of meeting
people and convincing them. Only the business idea is to be proposed
on the internet and you will be flooded with suggestions from the
crowd.
3. Increased Efficiency: Crowdsourcing has increased the efficiency of
business models as several expertise ideas are also funded.
Disadvantages of Crowdsourcing
1. Lack of confidentiality: Asking for suggestions from a large group of
people can bring the threat of idea stealing by other organizations.

2. Repeated ideas: Often contestants in crowdsourcing competitions

submit repeated, plagiarized ideas which leads to time wastage as
reviewing the same ideas is not worthy.

Page: 32 / 39
INTER AND TRANS FIREWALL ANALYTICS

Page: 33 / 39
Page: 34 / 39
Inter-firewall analytics
 Focus: Analyzes traffic flows between different firewalls within a network.
 Methodology: Utilizes data collected from multiple firewalls to
identify anomalies and potential breaches.
 Benefits: Provides a comprehensive view of network traffic flow
and helps identify lateral movement across different security zones.
 Limitations: Requires deployment of multiple firewalls within the
network and efficient data exchange mechanisms between them.

Page: 35 / 39
Page: 36 / 39
Page: 37 / 39
Trans-firewall analytics
 Focus: Analyzes encrypted traffic that traverses firewalls, which
traditional security solutions may not be able to decrypt and inspect.
 Methodology: Uses deep packet inspection (DPI) and other
advanced techniques to analyze the content of encrypted traffic
without compromising its security.
 Benefits: Provides insight into previously hidden threats within
encrypted traffic and helps detect sophisticated attacks.
 Limitations: Requires specialized hardware and software solutions
for DPI, and raises concerns regarding potential data privacy
violations.
Choosing the right approach
The choice between inter-firewall and trans-firewall analytics depends on
several factors, including:
 Network size and complexity: Larger and more complex networks
more from inter-firewall analytics for comprehensive monitoring.
Security needs and threats: Trans-firewall analytics is crucial for networks
handling sensitive data and facing advanced threats.
Budgetandresources:Implementingtrans-firewallanalyticsrequires additional
investment in specialized hardware and software.

benefit

Page: 38 / 39
Page: 39 / 39

Ad3351 Daa Lecture Notes Units 1,2,3
No ratings yet
Ad3351 Daa Lecture Notes Units 1,2,3
79 pages
Ad3351 Daa Unit I
No ratings yet
Ad3351 Daa Unit I
135 pages
ESSS Question Bank
No ratings yet
ESSS Question Bank
12 pages
ESSS Question Bank
No ratings yet
ESSS Question Bank
12 pages
Ccs336 CSM Lab Manual
No ratings yet
Ccs336 CSM Lab Manual
30 pages
CS8091 - Big Data Analytics - Unit 1
No ratings yet
CS8091 - Big Data Analytics - Unit 1
28 pages
Big Data Analytics TEXTBOOK
100% (1)
Big Data Analytics TEXTBOOK
230 pages
CS3352 Fds
No ratings yet
CS3352 Fds
23 pages
CS-602 Computer Networks Lab Manual Updated
No ratings yet
CS-602 Computer Networks Lab Manual Updated
62 pages
Big Data Analytics - CCS334 - Notes - ALL UNITS NOTES
No ratings yet
Big Data Analytics - CCS334 - Notes - ALL UNITS NOTES
130 pages
Transform and Conquer, Presorting
100% (1)
Transform and Conquer, Presorting
2 pages
CCS334 BDA Lab Manual Final
No ratings yet
CCS334 BDA Lab Manual Final
40 pages
Z Phisher
No ratings yet
Z Phisher
11 pages
Lesson - 02 - Manage Azure Identities
No ratings yet
Lesson - 02 - Manage Azure Identities
83 pages
Q.1. Define Problem. What Are Steps in Problem Solving? Definition of Problem
100% (1)
Q.1. Define Problem. What Are Steps in Problem Solving? Definition of Problem
30 pages
Unit 1 - Big Data Analytics - CCS334
No ratings yet
Unit 1 - Big Data Analytics - CCS334
35 pages
Obt356 LSD QB
No ratings yet
Obt356 LSD QB
17 pages
CS-701 BigDataHadoop Unit-1
No ratings yet
CS-701 BigDataHadoop Unit-1
23 pages
Unit 1 BDA
No ratings yet
Unit 1 BDA
38 pages
Lecture Notes-Cns by Suthoju Girija Rani
100% (1)
Lecture Notes-Cns by Suthoju Girija Rani
163 pages
Scenario Based Questions and Answers
No ratings yet
Scenario Based Questions and Answers
17 pages
Bm3353 Fedc QB
No ratings yet
Bm3353 Fedc QB
13 pages
Bm3551 Emoit QB
No ratings yet
Bm3551 Emoit QB
13 pages
ds4015 Big Data Analytics Vignesh K Notes
No ratings yet
ds4015 Big Data Analytics Vignesh K Notes
146 pages
Current Log
No ratings yet
Current Log
75 pages
Embeded and Iot QB 1 Final
No ratings yet
Embeded and Iot QB 1 Final
10 pages
Unit V
100% (1)
Unit V
66 pages
Bm3551 - Esiomt Question Bank
No ratings yet
Bm3551 - Esiomt Question Bank
9 pages
CN QB Final
No ratings yet
CN QB Final
24 pages
Lesson Plan Cb3591 Esss 2025
No ratings yet
Lesson Plan Cb3591 Esss 2025
3 pages
Sapprojectsystem 221214075318 1070386e
No ratings yet
Sapprojectsystem 221214075318 1070386e
52 pages
Cs3591 CN Unit 4 Notes Eduengg
No ratings yet
Cs3591 CN Unit 4 Notes Eduengg
21 pages
Collections in Java - Javatpoint
No ratings yet
Collections in Java - Javatpoint
15 pages
Online Crime System
100% (1)
Online Crime System
58 pages
Unit 4 Software Relibity
100% (1)
Unit 4 Software Relibity
29 pages
CN Answer
No ratings yet
CN Answer
14 pages
Bda A2
No ratings yet
Bda A2
17 pages
Chapter 5 - Security Operations Quiz
No ratings yet
Chapter 5 - Security Operations Quiz
8 pages
Embedd and Iot QB Final Students
No ratings yet
Embedd and Iot QB Final Students
10 pages
Lesson Plan Cb3591 Esss 2025
No ratings yet
Lesson Plan Cb3591 Esss 2025
3 pages
MDN 0212DG
No ratings yet
MDN 0212DG
96 pages
Unit01 03
No ratings yet
Unit01 03
147 pages
ccs341 Data Warehouse Lab Experiments
No ratings yet
ccs341 Data Warehouse Lab Experiments
26 pages
Erp System Presentation
No ratings yet
Erp System Presentation
10 pages
FDS Lab Manual
No ratings yet
FDS Lab Manual
48 pages
Ad3301 Data Exploration and Visualization
No ratings yet
Ad3301 Data Exploration and Visualization
38 pages
Cs3391 Oops Unit 1 Notes Eduengg
No ratings yet
Cs3391 Oops Unit 1 Notes Eduengg
60 pages
CP4153 - Networktechnologies Syllabus
No ratings yet
CP4153 - Networktechnologies Syllabus
3 pages
CSE 2-2 CS & Syllabus - UG - R20
No ratings yet
CSE 2-2 CS & Syllabus - UG - R20
83 pages
SAP GRC Vs ProfileTailor GRC Appsian Security
No ratings yet
SAP GRC Vs ProfileTailor GRC Appsian Security
4 pages
Golden Configuration Catalyst 2018: Verizon, Infosys, Nuviso Networks
No ratings yet
Golden Configuration Catalyst 2018: Verizon, Infosys, Nuviso Networks
19 pages
Anil Resume 3
No ratings yet
Anil Resume 3
5 pages
Computer Organization: A Presentation Submitted by Disha Bhagwat (3503) Anuja Suryan (3546) Ruchita Wani (3551)
No ratings yet
Computer Organization: A Presentation Submitted by Disha Bhagwat (3503) Anuja Suryan (3546) Ruchita Wani (3551)
9 pages
Ad3251 Unit 2 Notes Edu Engg
No ratings yet
Ad3251 Unit 2 Notes Edu Engg
35 pages
CCS341 Data Warehousing
No ratings yet
CCS341 Data Warehousing
7 pages
Data Engineering UNIT-1
100% (1)
Data Engineering UNIT-1
14 pages
CS3551 DC - Int - I - Answer Key 7.9.23
No ratings yet
CS3551 DC - Int - I - Answer Key 7.9.23
68 pages
Text Extraction
No ratings yet
Text Extraction
8 pages
AD3461 ML Lab Manual
No ratings yet
AD3461 ML Lab Manual
32 pages
NO SQL Data Management
No ratings yet
NO SQL Data Management
123 pages
Cloud Computing Lab Manual-New
No ratings yet
Cloud Computing Lab Manual-New
150 pages
ClearSCADA Automation Interface Training Rev 3.0 2007 PDF
No ratings yet
ClearSCADA Automation Interface Training Rev 3.0 2007 PDF
35 pages
BA Lab Manual
No ratings yet
BA Lab Manual
62 pages
Synopsis Major
No ratings yet
Synopsis Major
3 pages
Unit 5
No ratings yet
Unit 5
27 pages
Cs3481 - Dbms Record
No ratings yet
Cs3481 - Dbms Record
63 pages
Bda Super Imp
No ratings yet
Bda Super Imp
35 pages
Os Lab Manual AI&DS
No ratings yet
Os Lab Manual AI&DS
64 pages
20IT503 - Big Data Analytics - Unit2
No ratings yet
20IT503 - Big Data Analytics - Unit2
62 pages
CS8091 Bigdata Analytics Lessonplan With Date
No ratings yet
CS8091 Bigdata Analytics Lessonplan With Date
11 pages
Fellix CV Updated
No ratings yet
Fellix CV Updated
2 pages
Notes - Unit 3 - Map Reduce Applications
No ratings yet
Notes - Unit 3 - Map Reduce Applications
11 pages
Unit-2 Solution
No ratings yet
Unit-2 Solution
22 pages
BCA603T Cryptography and Network Security: Unit - I Contents
No ratings yet
BCA603T Cryptography and Network Security: Unit - I Contents
42 pages
ccs346 Eda
No ratings yet
ccs346 Eda
2 pages
Co Po Mapping Bda With Justiificaton
No ratings yet
Co Po Mapping Bda With Justiificaton
4 pages
CS01207
No ratings yet
CS01207
3 pages
Cisco Lead2pass 200-201 Sample Question 2022-Dec-26 by Barton 88q Vce
No ratings yet
Cisco Lead2pass 200-201 Sample Question 2022-Dec-26 by Barton 88q Vce
12 pages
Cybersecurity Essentials Syllabus
No ratings yet
Cybersecurity Essentials Syllabus
2 pages
Cisco1142 Repeater CFG
No ratings yet
Cisco1142 Repeater CFG
4 pages
FDP Brochure PDF
100% (1)
FDP Brochure PDF
2 pages
Installing JBoss 7.1.1.final On CentOS 6.x - Opensourcearchitect
No ratings yet
Installing JBoss 7.1.1.final On CentOS 6.x - Opensourcearchitect
6 pages
Class XI Practical Assignment Mysql
50% (4)
Class XI Practical Assignment Mysql
6 pages
DBMS Unit 3
No ratings yet
DBMS Unit 3
98 pages
BDA Unit 1-1
No ratings yet
BDA Unit 1-1
21 pages
CS8492-Database Management Systems
No ratings yet
CS8492-Database Management Systems
15 pages
Data & Web Mining: Manoj Pandia, Silicon Institute of Technology
No ratings yet
Data & Web Mining: Manoj Pandia, Silicon Institute of Technology
21 pages
Course: Software Engineering Principles and Practices (Code: 20CS44P) Week-6: Requirement Engineering & Modelling Session No. 01
No ratings yet
Course: Software Engineering Principles and Practices (Code: 20CS44P) Week-6: Requirement Engineering & Modelling Session No. 01
5 pages
BPMN
No ratings yet
BPMN
5 pages
Ccs334 Big Data Analytics
0% (1)
Ccs334 Big Data Analytics
2 pages
Chapter 06 Part1
No ratings yet
Chapter 06 Part1
20 pages
SQL Injection Cheat Sheet: Find and Exploit SQL Injections With
No ratings yet
SQL Injection Cheat Sheet: Find and Exploit SQL Injections With
18 pages
Internet & World Wide Web HOW To PROGRAM - Lecture Notes, Study Materials and Important Questions Answers
No ratings yet
Internet & World Wide Web HOW To PROGRAM - Lecture Notes, Study Materials and Important Questions Answers
15 pages
Data Analytics Lab File Rohit
No ratings yet
Data Analytics Lab File Rohit
23 pages
Installation Instruction For SIESTA V 3
No ratings yet
Installation Instruction For SIESTA V 3
2 pages
Security Trends, Legal, Ethical and Professional Aspects of Security
No ratings yet
Security Trends, Legal, Ethical and Professional Aspects of Security
3 pages
It6006 Data Analytics Syllabus
No ratings yet
It6006 Data Analytics Syllabus
1 page
Virtual Ization
No ratings yet
Virtual Ization
3 pages
DSA Lab Syllabus
No ratings yet
DSA Lab Syllabus
1 page
Cs3353 Foundations of Data Science L T P C 3 0 0 3
No ratings yet
Cs3353 Foundations of Data Science L T P C 3 0 0 3
2 pages
How To Setup A Routed IPSEC VPN Tunnel From Juniper SRX UTM Firewall To Draytek 2820 ADSL Firewall Router
No ratings yet
How To Setup A Routed IPSEC VPN Tunnel From Juniper SRX UTM Firewall To Draytek 2820 ADSL Firewall Router
7 pages

Big Data Analytics - CCS334 - Notes - Unit 1 - Understanding Big Data

Uploaded by

Big Data Analytics - CCS334 - Notes - Unit 1 - Understanding Big Data

Uploaded by

CCS334 BIG DATA ANALYTICS

UNIT I UNDERSTANDING BIG DATA

INTRODUCTION TO BIG DATA

What is Big Data

o Social networking sites: Facebook, Google, LinkedIn all these sites

How does big data work?

What is big data analytics?

Types of big data analytics

Unstructured data is the data which does not conforms to a data

BIG DATA INDUSTRY APPLICATIONS

APPLICATIONS OF BIG DATA

OPEN SOURCE TECHNOLOGIES / BIG DATA ANALYTICS TOOLS

CLOUD AND BIG DATA

Difference between Big Data and Cloud Computing:

S.No BIG DATA CLOUD COMPUTING

Cloud Computing Services includes

Volume of data, Velocity of data, On-Demand availability of IT

The purpose of big data is to The purpose of cloud computing is

Distributed computing is used for Internet is used to get the cloud

Big data management allows

Some of the challenges of big data Some of the challenges of cloud

Big data refers to huge volume of Cloud computing refers to remote

Cloud computing is used to store

Some of the sources where big Some of the cloud computing

Web Analytics or Online Analytics refers to the analysis of quantifiable and

Web analytics focuses on various issues. For example,

MOBILE BUSINESS INTELLIGENCE

CROWD SOURCING ANALYTICS

2. Repeated ideas: Often contestants in crowdsourcing competitions

You might also like