Unit I

The document provides an overview of big data, including its definition, sources, and the convergence of key trends such as cloud computing, AI, and IoT. It discusses the 3 V's of big data (Volume, Variety, Velocity) and expands to 6 V's by adding Veracity, Variability, and Value. Additionally, it highlights various industry applications of big data and the importance of web analytics in understanding user behavior and optimizing online strategies.

Uploaded by

rumanamubarak25

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views64 pages

Unit I

Uploaded by

rumanamubarak25

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 64

UNIT I

UNDERSTANDING BIG DATA

Introduction to big data – convergence of key trends –
unstructured data – industry examples of big data –
web analytics – big data applications– big data
technologies – introduction to Hadoop - open source
technologies – cloud and big data – mobile business
intelligence – Crowd sourcing analytics – inter and trans
firewall analytics.
Big Data
 The huge amount of complex, variously formatted
data generated at high speed, that cannot be
handled, or processed by the traditional system.
Source of Big Data
 Social Media : Facebook, WhatsApp, Twitter,
YouTube, Instagram, etc
 A sensor placed in various places
 Customer Satisfaction Feedback
 IoT Appliances : Smart TV, smart washing machine,
smart coffee machine, smart AC, etc
 E-commerce & Transactional Data
 Machine Data : Satellites, desktop computers,
mobile phones, industrial machines, smart sensors,
SIEM logs, medical and wearable devices, road
cameras, IoT devices, and more
convergence of key trends
 More data and less expensive, faster hardware—that
’s driving this transformation. Today, we ’ve got raw
speed at an affordable price. That cost/benefit has
really been a game changer for us.
 Cloud Computing: Cloud infrastructure allows
companies to store vast amounts of data and access
computing power on-demand. This scalability is
essential for processing big data.
 Artificial Intelligence (AI) and Machine Learning
(ML): These technologies are increasingly being
applied to big data for predictive analytics, pattern
recognition, and automation. AI/ML models can
analyze massive datasets quickly and uncover
insights that would be impossible for humans to
convergence of key trends
 Internet of Things (IoT): IoT devices generate real-
time data from sensors, connected devices, and other
sources. This constant stream of data feeds into big
data analytics, providing businesses with insights into
everything from consumer behavior to operational
efficiency.
 5G Networks: With the rise of 5G, the speed and
connectivity of networks have increased dramatically.
This enables faster data transmission and more
seamless real-time processing of big data from
connected devices.
convergence of key trends
 Edge Computing:With the rise of IoT, much of the
data is generated at the "edge" (in devices and
sensors). Edge computing enables data to be
processed closer to its source, reducing latency and
bandwidth usage, which is especially useful in real-
time applications like autonomous vehicles or
industrial automation.
 Blockchain: Blockchain technology has the potential
to add an extra layer of security and transparency to
big data. For example, it can ensure the integrity of
data being shared across multiple parties in sectors
like healthcare, supply chains, and finance, preventing
tampering and providing an immutable record of
transactions.
3 V’s
 Volume:
 Variety
 Velocity
3 V’s
 Volume
 Volume is a huge amount of data.
 In the year 2016, the estimated global mobile traffic was
6.2 Exabytes (6.2 billion GB) per month. Also, by the
year 2020 we will have almost 40000 Exabytes of data.
 The units used to measure data volume are typically
"bytes," with larger units including kilobytes (KB),
megabytes (MB), gigabytes (GB), terabytes (TB),
petabytes (PB), and exabytes (EB), where each unit
represents a thousand times the previous one; for
example, 1 gigabyte is equal to 1,000 megabytes.
3 V’s
 Variety
 nature of data that is structured, semi-structured
and unstructured data.
 It also refers to heterogeneous sources.
 Structured data: This data is basically an organized data
 Semi- Structured data: This data is basically a semi-
organized data. Ex. Log files
 Unstructured data: It generally refers to data that
doesn’t fit neatly into the traditional row and column
structure of the relational database. Texts, pictures,
videos etc. are the examples of unstructured data which
can’t be stored in the form of rows and columns.
Variety of Data
 ■ Internet data (i.e., clickstream, social media, social networking
links)
 ■ Primary research (i.e., surveys, experiments, observations)
 ■ Secondary research (i.e., competitive and marketplace data,
industry reports, consumer data, business data)
 ■ Location data (i.e., mobile device data, geospatial data)
 ■ Image data (i.e., video, satellite image, surveillance)
 ■ Supply chain data (i.e., EDI, vendor catalogs and pricing,
quality information)
 ■ Device data (i.e., sensors, PLCs, RF devices, LIMs, telemetry)
3 V’s
 Velocity
 high speed of accumulation of data.
 In Big Data velocity data flows in from sources like
machines, networks, social media, mobile phones etc.
 There is a massive and continuous flow of data. This
determines the potential of data that how fast the data
is generated and processed to meet the demands.
 Example: There are more than 3.5 billion searches per
day are made on Google. Also, Facebook users are
increasing by 22%(Approx.) year by year.
6 V’s
 Volume
 Variety
 Velocity
 Veracity: inconsistencies and uncertainty in data, that is data which is
available can sometimes get messy and quality and accuracy are
difficult to control. Example: Data in bulk could create confusion
whereas less amount of data could convey half or Incomplete
Information.
 Variability: How fast or available data that extent is the structure of
your data is changing? How often does the meaning or shape of your
data change?
 Value: The bulk of Data having no Value is of no good to the company,
unless you turn it into something useful.
Industry examples of big data
 Healthcare
 Education
 E-Commerce
 Media & Entertainment
 Finance
 Travel Industry
 Telecom
 Automobile
Industry examples of big data
 Healthcare:
 Predictive Analytics: Big data enables predictive healthcare,
where patterns in patient data (such as age, genetic information,
lifestyle, etc.) can predict the likelihood of developing certain
diseases, allowing for earlier intervention and better-targeted
treatments.
 Personalized Medicine: With genetic sequencing and patient
history, big data helps create personalized treatment plans. AI
and machine learning algorithms can analyze vast amounts of
data to predict how patients will respond to particular drugs or
therapies.
 Remote Monitoring: IoT devices, like wearables and sensors,
provide real-time health data, which can be processed and
analyzed to monitor patients remotely, reducing hospital visits
and improving care.
Industry examples of big data
 Finance:
 Fraud Detection: Banks and financial institutions use big data
to analyze transaction patterns and identify suspicious activity in
real-time. Machine learning algorithms help detect fraudulent
activities more accurately and swiftly.
 Risk Management: Financial institutions rely on big data
analytics to assess risks related to investments, loans, and
market conditions, creating more robust risk models.
 Customer Insights: Financial organizations use big data to
segment customers based on behaviors, preferences, and
financial needs, enabling better-targeted offerings and enhancing
customer satisfaction.
Industry examples of big data
 Retail & E-Commerce:
 Customer Behavior Analysis: Retailers use big data to track
and analyze customer interactions across different platforms.
Understanding patterns like browsing history, purchase
behaviors, and social media activity helps create personalized
marketing campaigns.
 Supply Chain Optimization: Big data aids in demand
forecasting, inventory management, and logistics, ensuring
products are delivered on time while minimizing excess stock
and wastage.
 Price Optimization: Retailers use big data to adjust prices
dynamically based on demand, competitor pricing, and other
external factors, maximizing revenue while keeping customers
satisfied.
Industry examples of big data
 Manufacturing:
 Predictive Maintenance: IoT sensors embedded in machinery
generate data that can be used to predict when a piece of
equipment will fail, reducing downtime and maintenance costs.
 Smart Factories: Big data helps create "smart factories" where
everything from inventory management to production schedules
can be optimized based on real-time data.
 Quality Control: Data from production lines can be analyzed to
detect defects, identify patterns, and improve overall product
quality.
Industry examples of big data
 Telecommunications:
 Network Optimization: Telecom companies use big data
analytics to monitor network traffic in real-time and predict traffic
spikes, enabling them to optimize bandwidth allocation and
improve service quality.
 Customer Churn Prediction: Big data helps predict when a
customer may leave for a competitor, allowing telecom
companies to take preemptive actions (like offering personalized
discounts or better plans).
 Fraud Detection: Similar to financial institutions, telecom
companies use big data to identify unusual patterns of usage,
which may indicate fraudulent activity.
Big Data Analytics
 Big Data Analytics: crunching massive amounts of
information to uncover hidden trends, patterns, and
relationships.
 Involves Collecting, Cleaning and Analyzing Data
 Types of Big Data Analytics:
 Descriptive Analytics: This type helps us understand
past events.
 Diagnostic Analytics: In Diagnostic analytics delves
deeper to uncover the reasons behind past events.
 Predictive Analytics: Predictive analytics forecasts future
events based on past data. Weather forecasting
Big Data Analytics
 Types of Big Data Analytics:
 Prescriptive Analytics: However, this category not only
predicts results but also offers recommendations for action to
achieve the best results. In e-commerce, it may suggest the
best price for a product to achieve the highest possible profit.
 Real-time Analytics: The key function of real-time analytics is
data processing in real time. It swiftly allows traders to make
decisions based on real-time market events.
 Spatial Analytics: Spatial analytics is about the location data.
In urban management, it optimizes traffic flow from the data
unde the sensors and cameras to minimize the traffic jam.
 Text Analytics: Text analytics delves into the unstructured
data of text. In the hotel business, it can use the guest
reviews to enhance services and guest satisfaction.
Web Analytics
 Web Analytics is the methodological study of
online/offline patterns and trends. It is a technique
that can employ to collect, measure, report, and
analyze website data. It is normally carried out to
analyze the performance of a website and optimize its
web usage.
 Web Analytics to assess the success rate of a website
and its associated business.
 Ref:
 https://www.tutorialspoint.com/web_analytics/
web_analytics_quick_guide.htm
 https://guidelines.india.gov.in/activity/unleashing-the-
potential-of-web-analytics-gaining-insights-enhancing-
performance-and-driving-success/
Web Analytics
 Four essential steps:
 1. Data collection: This initial phase involves aggregating fundamental
data, such as website traffic and page views, to provide a
foundational understanding of visitor engagement.
Web Analytics
 Four essential steps:
 2.Data interpretation: The collected data is then transformed into
meaningful metrics and comparisons, shedding light on user behavior,
engagement patterns, and conversion rates.
 3.KPI formulation: Key Performance Indicators (KPIs) are derived from
the acquired data and aligned with the organization’s overarching goals.
These metrics serve as benchmarks for measuring success and tracking
progress.
 4. Online strategy development: In this crucial stage, organizations
define their online objectives and standards based on insights derived
from web analytics. These strategies focus on generating revenue,
reducing costs, and driving growth
Web Analytics
 Web Analytics Categories:
 Off-site web analytics: Off-site analysis involves measuring and
analyzing web data irrespective of website ownership. It encompasses
monitoring potential audience reach, tracking online conversations, and
identifying marketing trends. Off-site analysis aids in understanding
customer preferences, creating relevant content, and refining online
marketing strategies.
 On-site web analytics: On-site web analytics focuses on analyzing
visitor behavior within a specific website. It monitors user interactions,
conversion rates, and website performance metrics, enabling
organizations to optimize user experience and measure the
effectiveness of their digital initiatives. Prominent web analytics tools
such as Google Analytics and Adobe Analytics offer valuable insights
into visitor responses and facilitate data-driven decision-making.
Web Analytics
Key Metrics in On-site Web Analytics:
 Bounce rate: The percentage of visitors who exit the website after viewing only one page.
 Click Path: The sequence of pages viewed by a visitor during their session.
 Hits: The number of requests made to the web server, often inflated as each file is counted as
a hit.
 Pageviews: The count of requested pages, typically representing the number of unique or
total visitors in web analytics.
 Visitors/Unique visitors: The count of distinct sessions or visits made by users within a
specified time period.
 Visit/Session: A collection of pages or requests made by a particular client during a single
session.
 Time on Page/Session Duration: The average duration visitors spend interacting with
website content
 Average Page Depth/Average Pageviews per Session: The average number of pages
viewed per session, reflecting the overall visit depth.
 Average Visit Time: The average duration visitors spend on individual pages within a
website.
 Click: A user’s action of navigating from one page to another through a web link.
 Events: Actions or categories of actions performed on a website, such as page views, clicks,
or form submissions.
Web Analytics
Key Metrics in On-site Web Analytics:
 Exit Rate/Exit Percentage: The percentage of visits where a particular page is the last
page viewed before leaving the site.
 First Visit: The initial visit of a unique customer who has not visited the site before.
 Frequency/Visit Interval: The number of times a visitor accesses the website within a
specified time period, measured by dividing the total number of sessions by the total
number of unique visitors.
 Impressions: The number of times a page element is displayed or viewed.
 New Visitors: Visitors who are accessing the site for the first time.
 Repeat Visitor: A visitor who has visited the site at least once before.
 Returning Visitors: Unique visitors who had previously visited the site and returned
during the reporting period.
 Duration of Visit: The average time spent by visitors on the site during a single visit.
 Single Page Visit: A visit consisting of only one page, not considered as a bounce.
 Site Overlay: An advertising technique that displays statistics or interactive elements
overlaid on a webpage snapshot.
 Clickthrough Rate: The ratio of users who clicked on a link to the total number of users
who viewed a page, email, or advertisement. It measures the effectiveness of online
advertising or email marketing campaigns.
Web Analytics
key strategies for leveraging web analytics to achieve success
 Understanding User Behavior: Web analytics offers insights into user
interactions, preferences, and navigation patterns. By comprehending
user behavior, organizations can identify areas for improvement,
optimize user experience, and enhance website usability.
 Improving Conversion Rate: Analyzing conversion metrics helps
identify bottlenecks in the conversion process and enables data-driven
improvements. By monitoring and optimizing key metrics such as click-
through rates, bounce rates, and abandonment rates, organizations can
optimize landing pages, calls-to-action, and checkout processes to drive
higher conversion rates.
 Personalization and Targeting: Web analytics aids in understanding
the target audience by enabling segmentation based on demographics,
behavior, or interests. This segmentation allows organizations to deliver
personalized content and offers, thereby creating a more tailored and
engaging experience for users.
Web Analytics
key strategies for leveraging web analytics to achieve success
 Content Optimization: Data analysis in web analytics guides content
creation and optimization strategies. By examining popular pages, search
terms, and engagement metrics, organizations can create high-quality content
that resonates with their audience. Additionally, analyzing user engagement
metrics such as time on page and depth can provide insights into content
performance and areas for improvement.
 A/B Testing and Experimentation: Web analytics facilitates A/B testing and
experimentation by providing insights into user behavior and performance
metrics. By testing different elements of web pages, calls-to-action, or
campaigns, organizations can make data-driven decisions and optimize for
better results. Regular testing allows for continuous improvement of strategies
and enhances overall website performance.
 Monitoring Website Performance: Web analytics provides valuable
information on website speed, error rates, and performance issues. By
monitoring these metrics, organizations can identify and resolve issues that
may impact user experience and search engine rankings. Optimizing website
performance enhances user satisfaction, increases conversions, and improves
search engine visibility.
big data technologies
big data technologies
 Apache Cassandra: It is one of the No-SQL databases which is highly
scalable and has high availability. In this, we can replicate data across
multiple data centers. Replication across multiple data centers is
supported. In Cassandra, fault tolerance is one of the big factors in
which failed nodes can be easily replaced without any downtime.
 Apache Hadoop: Hadoop is one of the most widely used big data
technology that is used to handle large-scale data, large file systems by
using Hadoop file system which is called HDFS, and parallel processing
like features using the MapReduce framework of Hadoop. Hadoop is a
scalable system that helps to provide a scalable solution capable of
handling large capacities and capabilities. For example: If you see real
use cases like NextBio is using Hadoop MapReduce and HBase to
process multi-terabyte data sets off the human genome.
big data technologies
 Apache Hive: It is used for data summarization and ad hoc querying which
means for querying and analyzing Big Data easily. It is built on top of Hadoop for
providing data summarization, ad-hoc queries, and the analysis of large
datasets using SQL-like language called HiveQL. It is not a relational database
and not a language for real-time queries. It has many features like: designed for
OLAP, SQL type language called HiveQL, fast, scalable, and extensible.
 Apache Flume: It is a distributed and reliable system that is used to collect,
aggregate, and move large amounts of log data from many data sources toward
a centralized data store.
 Apache Spark: The main objective of spark for speeding up the Hadoop
computational computing software process
 Apache Kafka: It is a distributed publish-subscribe messaging system
 MongoDB: It is based on cross-platform and works on a concept like collection
and document. It has document-oriented storage that means data will be
stored in the form of JSON form. It has features like high availability, replication,
rich queries, support by MongoDB, Auto-Sharding, and Fast in-place updates.
big data technologies
 ElasticSearch: It is a real-time distributed system, and open-source
full-text search and analytics engine. It has features like scalability
factor is high and scalable structured and unstructured data up to
petabytes. an enterprise search engine and big organizations using it,
for example- Wikipedia, GitHub.
Introduction to Hadoop
 Hadoop handles a variety of workloads, including search, log processing,
recommendation systems, data warehousing, and video/image analysis.
 Apache Hadoop is an open-source project administered by the Apache
Software Foundation. The software was originally developed by the world
’s largest Internet companies to capture and analyze the data that they
generate.
 Unlike traditional, structured platforms, Hadoop is able to store any kind of
data in its native format and to perform a wide variety of analyses and
transformations on that data.
 Hadoop stores terabytes, and even petabytes, of data inexpensively. It is
robust and reliable and handles hardware and system failures
automatically, without losing data or interrupting data analyses.
 Hadoop runs on clusters of commodity servers and each of those
servers has local CPUs and disk storage that can be leveraged by the
system.
Introduction to Hadoop
 Components of Hadoop:
 The Hadoop Distributed File System (HDFS). HDFS is the storage
system for a Hadoop cluster. When data lands in the cluster, HDFS
breaks it into pieces and distributes those pieces among the different
servers participating in the cluster. Each server stores just a small
fragment of the complete data set, and each piece of data is replicated
on more than one server.
 MapReduce. Because Hadoop stores the entire dataset in small pieces
across a collection of servers, analytical jobs can be distributed, in
parallel, to each of the servers storing part of the data. Each server
evaluates the question against its local fragment simultaneously and
reports its results back for collation into a comprehensive answer.
MapReduce is the agent that distributes the work and collects the
results.
Introduction to Hadoop
 Fault Tolerance and Performance of HDFS & MapReduce
 Both HDFS and MapReduce are designed to continue to work in the face of system
failures.
 HDFS continually monitors the data stored on the cluster. If a server becomes
unavailable, a disk drive fails, or data is damaged, whether due to
hardware or software problems, HDFS automatically restores the data from one of
the known good replicas stored elsewhere on the cluster.
 Likewise, when an analysis job is running, MapReduce monitors progress of each
of the servers participating in the job. If one of them is slow in returning an
answer or fails before completing its work, MapReduce automatically starts
another instance of that task on another server that has a copy of the data.
Because of the way that HDFS and MapReduce work, Hadoop provides scalable,
reliable, and fault-tolerant services for data storage and analysis at very low
cost.
 Features of Hadoop :
 https://www.geeksforgeeks.org/hadoop-features-of-hadoop-which-makes-it-popular/
Open Source Technologies
 Open-source software is computer software that is available in source
code form under an open-source license that permits users to study,
change, and improve and at times also to distribute the software.
 The open-source name came out of a 1998 meeting in Palo Alto in
reaction to Netscape ’s announcement of a source code release for
Navigator (as Mozilla).
 Advantage of the open source : flexibility, extensibility, and lower cost.
 Apache Hadoop is an open-source project administered by the Apache
Software Foundation. The software was originally developed by the
world ’s largest Internet companies to capture and analyze the data that
they generate.
 Most proprietary vendors have been designing their solutions to plug
and play with technology such as Hadoop. For example, Teradata Aster
designed SQL-H, which is a seamless way to execute SQL and SQL-
MapReduce on Apache Hadoop data.
Cloud and Big Data
 Cloud :
 Organizations can store their big-data efficiently manage them as well
analyze them by leveraging scalability provided through clouds on
demand resources such as storage capacity .
 Benefits of Cloud
 Scalability : large amounts storage when needed most without having to
buy any hardware infrastructure in advance. Easy Scale up and down.
 Cost Effectiveness : saves on costs since organizations only pay for what
has been utilized unlike maintaining on-site infrastructure
 Performance : high performance computing resources like servers with
advanced networking features plus memory based in-memory capabilities
which enable faster data processing real-time analytics
 Accessibility : geographical location should never hinder any business from
getting value out of its information stores hence cloud-based solutions being
accessible everywhere provided there’s internet connection. This encourages
team work among members who are far apart geographically
Cloud and Big Data
 Benefits of Cloud
 Security : sensitive data is well guarded against unauthorized access, modification or loss
hence cloud providers investing heavily in security measures such as encryption, access
control and residency options for compliance purposes.
 Cloud Services for Big Data Analytics
 Data Ingestion
 Managed data pipelines: These services automate the collection, transformation and loading of
data from different sources into your cloud storage i.e., Apache Airflow or AWS Glue offered by
various service providers.
 Streaming ingestion: Real time ingestion can be achieved using services like Apache Kafka
which allows integration with social media feeds among others
 Data Storage
 Object storage: The best option for storing vast quantities of unstructured and semi-structured
data are highly scalable and cost-effective object storage options such as Amazon S3,
Azure Blob Storage, Google Cloud Storage among others.
 Lakes of Data: A cloud data lake serves as a centralized storage system that saves all of the data
in its original format, giving users the opportunity to examine and analyze it at a later time. Time is
saved because of the flexible procedures that may be performed on the data.
 Data Warehouses: When dealing with large datasets, structured schemas are required for
storage and analysis purposes; this is exactly what a cloud data warehouse does. The method
has made querying and reporting processes easier hence faster.
Cloud and Big Data
 Cloud Services for Big Data Analytics
 Data Processing and Transformation
 Managed Hadoop and Spark environments: Complex infrastructure setup can be
avoided by using pre-configured managed Hadoop clusters or Spark clusters provided
by various cloud services.
 Serverless information processing: With serverless compute services like
AWS Lambda or Azure Functions, run data processing tasks without managing servers.
This simplifies development and scaling.
 Data anonymization and masking: Cloud platforms provide tools and services to
comply with privacy regulations by anonymizing or masking confidential datasets.
 Data Analytics and Visualization:
 Business intelligence (BI) tools: Some cloud-based BI applications like Tableau,
Power BI, Looker etc. provide interactive dashboards and reports for visual big data
analysis.
 Managed machine learning (ML) platforms such as Google Cloud AI Platform,
Amazon SageMaker, Azure Machine Learning etc., allow ML models development,
testing, and deployment on massive datasets.
 Predictive analytics and data mining: Cloud platforms are equipped with built-in
facilities both for predictive analytics and data mining that can help you find patterns
or trends in your data to assist you in future forecasting or better decision making.
Cloud and Big Data
 Security Considerations for Cloud-Based Big Data Analytics
 Data encryption: Ensure all stored files/data are encrypted; this helps
safeguard against unauthorized access especially during transmission over
unsecured networks
 Access control: Always make sure that only authorized personnel have access
rights granted either individually or collectively towards particular dataset
 Compliance regulations: Confirm whether these cloud providers comply fully
with relevant industry standards/regulations pertaining data protection act
especially if dealing with health sector related information which should remain
confidential throughout its lifecycle while being processed through various stages
involved till final decision making moment reached upon by responsible parties
concerned here.
 Regular security audits: Regularly conduct comprehensive security audits on
cloud environment to identify any potential vulnerability areas & address them
accordingly before they can be exploited by malicious actors who might wish
take advantage such weaknesses thereby causing harm intentionally against
organization reputation or even financial loss too.
 Data Copying and Restoration: Keep an all-inclusive plan for data copying
and restoration so that you could retrieve your files if a security breach occurs.
Cloud and Big Data
 Applications
 Retail Industry: The Power of Personalization: Retailers use these tools to
process immense volumes of customer information, such as purchase history,
browsing habits and social media sentiment.
 Customize marketing campaigns: Higher conversion rates and increased customer
satisfaction are achieved through targeted email blasts and social media ads that
cater for individual preferences.
 Optimize product recommendations: Recommender systems driven by big data
analytics propose products customers are likely to find interesting thereby increasing
sales and reducing cart abandonment rates.
 Enhance inventory management: Retailers can optimize their inventory levels by
scrutinizing sales trends alongside customer demand patterns which eliminates
stockouts while minimizing clearance sales.
 Healthcare: From Diagnosis to Personalized Care
 Improved diagnosis
 Individual treatment plans based on response to certain drugs or therapies
 Predictive prevention care to identify people at high risk of particular illnesses
before they actually occur thus leading to better outcomes for patients and lower
healthcare expenses.
Mobile BI
 mobile BI refers to the access and use of information via mobile devices.
 With the increasing use of mobile devices for business – not only in
management positions – mobile BI is able to bring business intelligence
and analytics closer to the user when done properly.
 Benefits of mobile BI
 access information in their mobile BI system at any time and from any
location
 improves their daily operations
 react more quickly to a wider range of events.
 speeds up the decision-making process by extending information and
reducing the time spent searching for relevant information
 real-time access to data, operational efficiency is improved
and organizational collaboration is enforced.
 greater availability of information, faster reaction speed and more
efficient working, as well as improving internal
communication and shortening workflows.
Mobile BI
 various ways to implement content on mobile devices
 Provision of PDF reports to a mobile device
 HTML5 site
 Connection of a native application with HTML5 (hybrid application)
 Native application
 Best Mobile BI tools
 Microsoft PowerBI pro
 IBM Cognos Analytics
 SAP Roambi analytics
 Si Sense
 Amazon quick sights
Mobile BI
 Disadvantages of mobile
 Stack of data : The corporation only needs a small portion of the previous
data, but they need to store the entire information, which ends up in the stack
 Expensive: Large corporations can continue to pay for their expensive
services, but small businesses cannot
 Time consuming: implementation of BI in an enterprise takes more time but
Companies are not patient enough to wait for data before implementing it.
 Data breach: the biggest issue of the user when providing data to Mobile BI
is data leakage. If you handle sensitive data through Mobile BI, a single error
can destroy your data as well as make it public, which can be detrimental to
your business.
 Poor quality data Analysis : large portion of the data analysed by Mobile BI
is irrelevant or completely useless. This can speed down the entire procedure.
This requires you to select the data that is important and may be required in
the future.
Mobile BI
 Examples
 A retail store manager uses mobile BI to check current stock levels, analyze
historical sales trends, and make data-driven decisions on reordering
products, all while walking the store floor.
 A sales manager can use a mobile BI app to monitor sales performance, track
individual team member goals, and compare regional sales against targets.
This helps in quickly identifying trends and areas needing attention during
client meetings.
 A customer service manager uses a mobile BI dashboard to view live
customer service metrics, including response times, resolution rates, and
customer feedback, allowing them to address issues in real time while visiting
clients or attending events.
 A healthcare administrator uses mobile BI to monitor patient wait times,
treatment success rates, and hospital resource utilization.
 to view all their bank accounts, credit card balances, and expenses in one
place.
 to monitor their daily steps, calories burned, heart rate, and exercise routines.
Crowd sourcing Analytics
 Crowdsourcing is a sourcing model in which an individual or an
organization gets support from a large, open-minded, and rapidly
evolving group of people in the form of ideas, micro-tasks, finances, etc.
Crowdsourcing typically involves the use of the internet to attract a
large group of people to divide tasks or to achieve a target.
 Crowdsourcing is touching almost all sectors:
 Enterprise
 IT
 Marketing
 Education
 Finance
 Science and Health
 Example: For scientific problem solving, a broadcast search is used
where an organization mobilizes a crowd to come up with a solution to a
problem.
Crowd sourcing Analytics
 Advantages Of Crowdsourcing
 Evolving Innovation
 Save costs
 Increased Efficiency
 Disadvantages Of Crowdsourcing
 Lack of confidentiality
 Repeated ideas
Crowd sourcing Analytics
 Crowdsourcing analytics refers to the process of using collective
intelligence or the collective efforts of a large group of individuals, typically
from various backgrounds and expertise, to gather, analyze, and
interpret data. This approach harnesses the power of the crowd to solve
problems, generate insights, and provide solutions to complex data-
related tasks. It has become an important tool in various fields such as
business intelligence, research, and social sciences.
 Steps Involved:
 Data Collection
 Data Labeling/Annotation ; labeling images, categorizing text, or identifying
objects in videos.
 Crowdsourced Data Validation: verifying its accuracy or checking for
inconsistencies, ensuring that the dataset is clean and reliable for further
analysis
 Data Analysis: analyze data, generate insights, or uncover patterns that may not
be immediately obvious to automated algorithms or small teams
 Crowdsourced Decision-Making: crowd can provide feedback or make decisions
based on data analysis results. For example, product design or customer
preferences may be evaluated through crowdsourcing analytics.
Crowd sourcing Analytics
 General process of crowdsourcing analytics
 Problem Definition: A specific problem or task is defined that requires data
analysis.
 Task Breakdown: The task is broken down into smaller, manageable units that
can be assigned to individuals in the crowd. These tasks are designed in a way
that non-experts can contribute effectively, but there might also be more
advanced tasks for experts within the crowd.
 Crowd Engagement: Crowdsourcing platforms (such as Amazon Mechanical
Turk, Zooniverse, or Kaggle) are used to distribute the tasks to a large pool of
contributors. Workers perform the tasks by providing input or analyzing the data.
Incentives (usually monetary) are often used to encourage participation.
 Data Aggregation: Once the tasks are completed, the results are collected and
aggregated. If multiple individuals worked on similar tasks, there might be a
process for reconciling conflicting or diverse inputs.
 Analysis and Interpretation: The aggregated data is analyzed to extract
insights, trends, or patterns. Depending on the complexity of the data, machine
learning algorithms or human analysts might be used to refine the results.
 Insight Generation: Based on the analysis, insights are drawn that can inform
business decisions, product development, research, or other relevant outcomes.
Crowd sourcing Analytics
 Key Benefits of Crowdsourcing Analytics
 Scalability: Crowdsourcing allows organizations to scale their data collection
or analysis efforts quickly and efficiently. By leveraging a large crowd, the
process becomes faster compared to using a small team.
 Cost-Effective: It can be more cost-effective compared to hiring a dedicated
in-house team of data scientists or analysts, especially for one-off or less
complex tasks.
 Diverse Perspectives: By involving people from different backgrounds and
regions, crowdsourcing analytics can bring in diverse perspectives, which can
lead to more innovative solutions and insights.
 Flexibility: Tasks can be adjusted to fit the needs of the organization,
whether it's about gathering data, validating existing information, or
analyzing a large dataset.
 Access to Expertise: Crowdsourcing can provide access to individuals with
specialized expertise who might not be easily available within the
organization.
Crowd sourcing Analytics
 Applications of Crowdsourcing Analytics
 Market Research: Companies use crowdsourcing analytics to gain insights
into customer behavior, preferences, and opinions. Through platforms like
surveys or focus groups involving the crowd, businesses can gather large
volumes of data.
 Social Media Analysis: Social media platforms can use crowdsourcing to
analyze sentiment, track trends, or categorize content.
 Scientific Research: Crowdsourcing analytics is increasingly used in
research fields like astronomy, biology, and environmental science, where
crowds can contribute to analyzing data, running simulations, or classifying
images from telescopes and cameras.
 Image and Video Recognition: In areas like machine learning, humans can
help classify and tag images or video clips, which are then used to train
algorithms.
 Predictive Analytics: Crowdsourcing can be used for predictive tasks, such
as forecasting trends, product sales, or political outcomes, where a large
number of people contribute to predictions based on data.
Crowd sourcing Analytics
 Challenges of Crowdsourcing Analytics
 Quality Control
 Crowd Reliability
 Bias and Diversity Issues
 Data Privacy and Security
 Task Complexity
 Examples
 Google Maps: Google uses crowdsourcing to improve its map data, allowing
users to contribute information about traffic, road closures, and other map
updates.
 Kaggle Competitions: Kaggle hosts data science competitions where
individuals from around the world participate to solve complex problems using
crowdsourced analysis.
Crowd sourcing Analytics
 There are various types of crowdsourcing, such as
 crowd voting
 crowd purchasing
 wisdom of crowds
 crowd funding and contests.
 Take for example:
 99designs.com/ , which does crowdsourcing of graphic design
 agentanything.com/ , which posts “missions” where agents vie for to run
errands
 33needs.com/ , which allows people to contribute to charitable programs that
make a social impact
Inter and trans firewall analytics
 firewall analytics : analyzing network traffic and identifying threats
 a part of data analytics and are used in network security
Inter and trans firewall analytics
Inter and trans firewall analytics
Inter and trans firewall analytics
 Inter-Firewall Analytics
 Inter-firewall analytics refers to the analysis of data traffic, security
logs, and communication flows within multiple firewalls inside an
organization. It focuses on monitoring, optimizing, and securing
internal network activity.
 Operates within controlled environments (e.g., private networks, VPNs,
data centers).
 Focuses on internal security by monitoring data movement between
multiple firewalls inside the organization.
 Ensures compliance and governance within the internal network.
 Uses AI-driven threat detection for internal security breaches.
 Example:
 Detecting Insider Threats: Identifying suspicious activity from employees or
compromised internal devices.
 Network Performance Optimization: Analyzing traffic between data centers to
optimize network speed.
Inter and trans firewall analytics
 Trans-Firewall Analytics
 Trans-firewall analytics refers to the analysis of data flows and security
risks as information moves across external firewalls (e.g., between an
organization’s internal network and the internet or third-party cloud services).
 Operates beyond the organization's firewall (e.g., internet, external
APIs, cloud services).
 Focuses on securing external interactions, ensuring data privacy,
cybersecurity, and compliance.
 Deals with unstructured, noisy, and high-risk data sources.
 Uses advanced analytics (AI, machine learning, behavioral analysis) to
detect potential cyber threats.
 Example:
 Preventing External Cyber Attacks: Identifying and blocking potential threats
such as DDoS attacks, malware, or phishing attempts.
 Real-Time Data Traffic Monitoring: Tracking incoming and outgoing traffic for
suspicious patterns.
Inter and trans firewall analytics
 Technologies Used in Firewall Analytics
 SIEM (Security Information & Event Management) → Splunk, IBM
QRadar, ArcSight
 Next-Gen Firewalls (NGFW) → Palo Alto Networks, Fortinet, Cisco
Firepower
 Cloud Security Analytics → AWS Security Hub, Azure Sentinel, Google
Chronicle
 Behavioral Analysis → AI-based anomaly detection to track unusual
patterns
References

 https://www.geeksforgeeks.org
 https://barc.com/mobile-bi/
 https://www.analyticssteps.com/blogs/what-mobile-business-intelligence

PetitionIncreaseWaterFlow Loboc-1
No ratings yet
PetitionIncreaseWaterFlow Loboc-1
5 pages
Module I-1
100% (1)
Module I-1
21 pages
List of Compoanies
No ratings yet
List of Compoanies
33 pages
Unit-1.1-Introduction To Big Data
No ratings yet
Unit-1.1-Introduction To Big Data
50 pages
Introduction To Big Data
No ratings yet
Introduction To Big Data
83 pages
BDA
No ratings yet
BDA
148 pages
Seminar Report Alisha
No ratings yet
Seminar Report Alisha
22 pages
big data
No ratings yet
big data
24 pages
BDA U1 copy
No ratings yet
BDA U1 copy
78 pages
8 Revolution of Big Data
No ratings yet
8 Revolution of Big Data
18 pages
Future Revolution On Big Data
No ratings yet
Future Revolution On Big Data
24 pages
UNIT_1 BDA
No ratings yet
UNIT_1 BDA
14 pages
Big Data.pptx (1)
No ratings yet
Big Data.pptx (1)
54 pages
Wibd Notes
No ratings yet
Wibd Notes
32 pages
Big Data Analysis by deshbandhu
No ratings yet
Big Data Analysis by deshbandhu
368 pages
Unit 1 (1)
No ratings yet
Unit 1 (1)
21 pages
BDA ESE Questions
No ratings yet
BDA ESE Questions
22 pages
Unit 1
No ratings yet
Unit 1
74 pages
Dsc652 - Chapter 1 Introduction To Big Data Systems
No ratings yet
Dsc652 - Chapter 1 Introduction To Big Data Systems
27 pages
Unit I-Ch 01-Big Data Introduction
No ratings yet
Unit I-Ch 01-Big Data Introduction
40 pages
Big Data Analytics
No ratings yet
Big Data Analytics
45 pages
UNIT-1:Overview of Big Data
No ratings yet
UNIT-1:Overview of Big Data
10 pages
BD U-1 (Anupam Sir)
No ratings yet
BD U-1 (Anupam Sir)
20 pages
Big Data Unit 1
No ratings yet
Big Data Unit 1
21 pages
ETB 1 (Big data)
No ratings yet
ETB 1 (Big data)
28 pages
Introduction To Bda
No ratings yet
Introduction To Bda
67 pages
Basic Concepts in Big Data 1
No ratings yet
Basic Concepts in Big Data 1
43 pages
ET-Ext
No ratings yet
ET-Ext
217 pages
Big Data Technologies (1)
No ratings yet
Big Data Technologies (1)
9 pages
DBMS Unit1
No ratings yet
DBMS Unit1
30 pages
05-Big Data
No ratings yet
05-Big Data
29 pages
Unit 1
No ratings yet
Unit 1
54 pages
Unit 1_BDS_DS307
No ratings yet
Unit 1_BDS_DS307
47 pages
3 Assignment
No ratings yet
3 Assignment
5 pages
Chapter 1
No ratings yet
Chapter 1
21 pages
BDA 01 - Introduction
No ratings yet
BDA 01 - Introduction
43 pages
Introduction To Big Data Platform
No ratings yet
Introduction To Big Data Platform
20 pages
Prepared By: Asmita Deshmukh
No ratings yet
Prepared By: Asmita Deshmukh
51 pages
BD 1
No ratings yet
BD 1
15 pages
BDA notes part 1
No ratings yet
BDA notes part 1
11 pages
Bsd1313 Chapter 2
No ratings yet
Bsd1313 Chapter 2
40 pages
Big Data CH 1
No ratings yet
Big Data CH 1
62 pages
BIG DATA_UNIT-I
No ratings yet
BIG DATA_UNIT-I
17 pages
Big Data Primer
No ratings yet
Big Data Primer
17 pages
Big Data Class - Introduction
No ratings yet
Big Data Class - Introduction
60 pages
(15) Big Data
No ratings yet
(15) Big Data
10 pages
Big-Data-ppt
No ratings yet
Big-Data-ppt
30 pages
Unit 1(Big Data Analytics)
No ratings yet
Unit 1(Big Data Analytics)
11 pages
IT UNIT 2 Part 1
No ratings yet
IT UNIT 2 Part 1
33 pages
BDA UNIT-1 NOTES
No ratings yet
BDA UNIT-1 NOTES
10 pages
Data Science: Lecture #1
No ratings yet
Data Science: Lecture #1
22 pages
1Big_Data (1)
No ratings yet
1Big_Data (1)
69 pages
BDA Notes
No ratings yet
BDA Notes
96 pages
Big Data Analytics
No ratings yet
Big Data Analytics
25 pages
Lecture 1
No ratings yet
Lecture 1
22 pages
Big Data Report
No ratings yet
Big Data Report
10 pages
Unit 2
No ratings yet
Unit 2
26 pages
Big Data MINING AND TOOLS
No ratings yet
Big Data MINING AND TOOLS
44 pages
Unit-Iii CC&BD CS71
No ratings yet
Unit-Iii CC&BD CS71
89 pages
Big Data..Unit-1 Notes
No ratings yet
Big Data..Unit-1 Notes
16 pages
The Data Whisperer - Making Sense of Big Data
From Everand
The Data Whisperer - Making Sense of Big Data
Keaton Rivers
No ratings yet
Big Data: the Revolution That Is Transforming Our Work, Market and World
From Everand
Big Data: the Revolution That Is Transforming Our Work, Market and World
PAT NAKAMOTO
No ratings yet
Goal
No ratings yet
Goal
6 pages
CAncer
No ratings yet
CAncer
1 page
Unit 1. Grammar and Vocab Exercises - R
No ratings yet
Unit 1. Grammar and Vocab Exercises - R
3 pages
Value Education
No ratings yet
Value Education
2 pages
Free Business Plan Workbook
No ratings yet
Free Business Plan Workbook
19 pages
Financial Plan For MyMuesli (Only Market Research)
No ratings yet
Financial Plan For MyMuesli (Only Market Research)
14 pages
Pentz Viking Art
100% (2)
Pentz Viking Art
18 pages
Third Chapter Lotus Sutra - 3 of 28
No ratings yet
Third Chapter Lotus Sutra - 3 of 28
3 pages
Adoption of Electronic Information Resources-1
No ratings yet
Adoption of Electronic Information Resources-1
8 pages
Microeconomics 9th Edition Michael Parkin Solutions Manual - Free Access To All Available Content For Download
100% (4)
Microeconomics 9th Edition Michael Parkin Solutions Manual - Free Access To All Available Content For Download
53 pages
Gujarat Ayurved University Jamnagar: Experiment No: 1.0 Title
No ratings yet
Gujarat Ayurved University Jamnagar: Experiment No: 1.0 Title
9 pages
Entrepreneurship Cases
No ratings yet
Entrepreneurship Cases
124 pages
311302-Basic Mathematics
No ratings yet
311302-Basic Mathematics
9 pages
BTS Mini Link and Maintenance Issue
No ratings yet
BTS Mini Link and Maintenance Issue
6 pages
C Programming Note (NEB)
100% (1)
C Programming Note (NEB)
16 pages
Health DLL Grade 7 PDF Free
No ratings yet
Health DLL Grade 7 PDF Free
6 pages
Bus2 - Final Maasin, Dumandan, Busano, Belleza
No ratings yet
Bus2 - Final Maasin, Dumandan, Busano, Belleza
55 pages
A Criterion To Define Cross-Flow Fan Design Parameters
No ratings yet
A Criterion To Define Cross-Flow Fan Design Parameters
4 pages
Microscopic and Molecular Methods For Quantitative Phytoplankton Analysis
100% (2)
Microscopic and Molecular Methods For Quantitative Phytoplankton Analysis
120 pages
Information Technology Basics
100% (3)
Information Technology Basics
12 pages
محاضرات الكورس الثاني
No ratings yet
محاضرات الكورس الثاني
162 pages
Microsoft Word Shortcut Keys PDF
No ratings yet
Microsoft Word Shortcut Keys PDF
4 pages
ECPE SampleTest 1003 Test Booklet
No ratings yet
ECPE SampleTest 1003 Test Booklet
28 pages
B9D1 MTN155 A
100% (1)
B9D1 MTN155 A
67 pages
causelist_pdf (83)
No ratings yet
causelist_pdf (83)
20 pages
Rundschau coat2
No ratings yet
Rundschau coat2
4 pages
Purchase Request: Appendix 60
No ratings yet
Purchase Request: Appendix 60
5 pages
Waves, Sound and Optics: Team 8 20 May 2022
No ratings yet
Waves, Sound and Optics: Team 8 20 May 2022
5 pages

Unit I

Uploaded by

Unit I

Uploaded by

UNIT I

UNDERSTANDING BIG DATA

You might also like