0% found this document useful (0 votes)
115 views

CC Unit 3 Imp Questions

Big data is characterized by volume, variety, velocity, and variability. It requires new technologies and techniques for data capture, storage, search, sharing, analytics, and visualization. Organizations face challenges in selecting the right tools, integrating diverse data sources, addressing data security concerns, and developing skills in big data professionals. Emerging trends in big data analytics include data as a service, accessible AI, predictive analytics, quantum computing, edge computing, natural language processing, hybrid clouds, and analyzing previously unused "dark data".

Uploaded by

Sahana Urs
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
115 views

CC Unit 3 Imp Questions

Big data is characterized by volume, variety, velocity, and variability. It requires new technologies and techniques for data capture, storage, search, sharing, analytics, and visualization. Organizations face challenges in selecting the right tools, integrating diverse data sources, addressing data security concerns, and developing skills in big data professionals. Emerging trends in big data analytics include data as a service, accessible AI, predictive analytics, quantum computing, edge computing, natural language processing, hybrid clouds, and analyzing previously unused "dark data".

Uploaded by

Sahana Urs
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Cloud Computing and big data analytics

Unit 3
Important Questions

1.What is Big Data? Explain the characteristics of Big Data


Big Data is a collection of data that is huge in volume, yet growing exponentially with time. It is
a data with so large size and complexity that none of traditional data management tools can
store it or process it efficiently. Big data is also a data but with huge size.

Characteristics Of Big Data Big data can be described by the following characteristics:
• Volume
• Variety
• Velocity

• Variability
(I) Volume – The name Big Data itself is related to a size which is enormous. Size of data plays a
very crucial role in determining value out of data. Also, whether a particular data can actually
be considered as a Big Data or not, is dependent upon the volume of data. Hence, ‘Volume’ is
one characteristic which needs to be considered while dealing with Big Data solutions.
(ii) Variety – The next aspect of Big Data is its variety. Variety refers to heterogeneous sources
and the nature of data, both structured and unstructured. During earlier days, spreadsheets
and databases were the only sources of data considered by most of the applications.
Nowadays, data in the form of emails, photos, videos, monitoring devices, PDFs, audio, etc. are
also being considered in the analysis applications. This variety of unstructured data poses
certain issues for storage, mining and analyzing data.
(iii) Velocity – The term ‘velocity’ refers to the speed of generation of data. How fast the data is
generated and processed to meet the demands, determines real potential in the data. Big Data
Velocity deals with the speed at which data flows in from sources like business processes,
application logs, networks, and social media sites, sensors, Mobile devices, etc. The flow of data
is massive and continuous.
(iv) Variability – This refers to the inconsistency which can be shown by the data at times, thus
hampering the process of being able to handle and manage the data effectively.
2.Explain Big Data framework?
The Big Data Framework
The core objective of the Big Data Framework is to provide a structure for enterprise
organizations that aim to benefit from the potential of Big Data

The Big Data Framework was developed because – although the benefits and business cases of
Big Data are apparent – many organizations struggle to embed a successful Big Data practice in
their organization. The structure provided by the Big Data Framework provides an approach for
organizations that takes into account all organizational capabilities of a successful Big Data
practice.
The main benefits of applying a Big Data framework include:
⦁ The Big Data Framework provides a structure for organizations that want to start with
Big Data or aim to develop their Big Data capabilities further.

⦁ The Big Data Framework includes all organizational aspects that should be taken into
account in a Big Data organization.
⦁ The Big Data Framework is vendor independent. It can be applied to any organization
regardless of choice of technology, specialization or tools.
⦁ The Big Data Framework provides a common reference model that can be used across
departmental functions or country boundaries.
⦁ The Big Data Framework identifies core and measurable capabilities in each of its six
domains so that the organization can develop over time.
3.Explain the challenges and trends in big data?

Challenges of Big Data


Lack of proper understanding of Big Data
started by tools that provide some kind of input data to the system, such as social networks and
web mining algorithms

To provide the mechanisms to connect the data acquisition with the data pre- and post-
processing (analysis) and storage, both in the historical and real-time layers.
Data growth issues
Data is growing in anonymous amount hence Data cleaning usually takes several steps, such as
boilerplate removal (i.e., removing HTML headers in web mining acquisition), language
detection and named entities recognition (for textual resources), and providing extra metadata
such as timestamp, provenance information (yet another overlap with data curation), etc.
Lack of data professionals
The acquisition of media (pictures, video) is a significant challenge, but it is an even bigger
challenge to perform the analysis and storage of video and images and need of professionals in
organization is must to analyze data

Integrating data from a variety of sources

Data variety requires processing the semantics in the data in order to correctly and effectively
merge data from different sources while processing. Works on semantic event processing such
as semantic approximations (Hasan and Curry 2014a), thematic event processing (Hasan and
Curry 2014b), and trigonum tagging (Hasan and Curry 2015) are emerging approaches in this
area, within this context.

Confusion while Big Data tool selection


The main goal when defining a correct data acquisition strategy is therefore to understand the
needs of the system in terms of data volume, variety, and velocity, and take the right decision
on which tool is best to ensure the acquisition and desired throughput.

Securing data

Acquired data must be analyzed and organized, hence the data security is the main concern in
big data

Latest Trends in Big Data Analytics


1. Data as service
Traditionally the Data is stored in data stores, developed to obtain by particular applications.
When the SaaS (software as a service) was popular, Daas was just a beginning. As with
Software-as-a-Service applications, Data as a service uses cloud technology to give users and
applications with on-demand access to information without depending on where the users or
applications may be.
Data as a Service is one of the current trends in big data analytics and will deliver it simpler for
analysts to obtain data for business review tasks and easier for areas throughout a business or
industry to share data.
2. Accessible Artificial Intelligence
Machine Learning, one of the emerging trends in Big Data Analytics can perform algorithms to
parse data, learn from this data, and then make predictions using neural networks. The AI is
used to expose the data results in patterns the technology can understand. Now Artificial
intelligence is a general factor to which is helping both large and small organizations to enhance
their business methods.
3. Predictive Analytics
Big data analytics has always been a fundamental approach for companies to become a
competing edge and accomplish their aims, which is making it recent trends in big data
analytics. They apply basic analytics tools to prepare big data and discover the causes of why
specific issues arise. Predictive methods are implemented to examine modern data and
historical events to know customers and recognize possible hazards and events for a
corporation. Predictive analysis in big data can predict what may occur in the future.
4. Quantum Computing
Using current time technology can take a lot of time to process a huge amount of data.
Whereas, Quantum computers, the latest technology in big data analytics, calculate the
probability of an object’s state or an event before it is measured, which indicates that it can
process more data than classical computers. If only we compress billions of data at once in only
a few minutes, we can reduce processing duration immensely, providing organizations the
possibility to gain timely decisions to attain more aspired outcomes.
5. Edge Computing

Edge processing is about running some processes and moving those processes to a local system
such as any user’s system or IoT device or a server. Edge computing brings computation to a
network’s edge and reduces the amount of long-distance connection that has to happen
between a customer and a server, which is making it the latest trends in big data analytics. Edge
computing provides a boost to Data Streaming, including real-time data Streaming and
processing without containing latency.
6. Natural Language Processing
Natural Language Processing (NLP) lies inside artificial intelligence and works to develop
communication between computers and humans.
The objective of NLP is to read, decode the meaning of the human language. Natural language
processing is mostly based on machine learning, and it is used to develop word processor
applications or translating software. Natural Language Processing Techniques need algorithms
to recognize and obtain the required data from each sentence by applying grammar rules.

7. Hybrid Clouds
A cloud computing system utilizes an on-premises private cloud and a third-party public cloud
with orchestration between two interfaces. Hybrid cloud provides excellent flexibility and more
data deployment options by moving the processes between private and public clouds. An
organization must have a private cloud to gain adaptability with the aspired public cloud. For
that, it has to develop a data center, including servers, storage, LAN, and load balancer. The
organization has to deploy a virtualization layer/hypervisor to support the VMs and containers
and install a private cloud software layer.

8. Dark Data
Dark data is the data that a company does not use in any analytical system. The data is
gathered from several network operations that are not used to determine insights or for
prediction. The organizations might think that this is not the correct data because they are not
getting any outcome from that, but they know that this will be the most valuable thing. As the
data is growing day-by-day, the industry should understand that any unexplored data can be a
security risk for that organization.
4.Explain Big Data Strategies?
Big Data Strategy

A well-defined enterprise Big Data strategy should be actionable for the organizations. In order
to achieve this, organizations can follow the following 5-step approach to formulate their Big
Data strategy:
⦁ Define business objectives

⦁ Execute a current state assessment


⦁ Identify and prioritize Use Cases
⦁ Formulate a Big Data Roadmap
⦁ Embed through Change Management
Each of the steps to formulate a Big Data strategy is explained in further detail below:
Step 1: Define business objectives
In order to leverage Big Data in any organization, it is first necessary to fully understand the
corporate business objectives of the enterprise. What makes an organization successful?
Revenues and profits are often the result of meeting or exceeding business Key Performance
Indicators (KPIs). Start with understanding how an organization is successful, before exploring
how Big Data technologies and solution might enhance the future performance.
Step 2: Execute a current state assessment
In this step, the primary focus is to assess the current business processes, data sources, data
assets, technology assets, capabilities, and policies or the enterprise. The purpose of this
exercise is to help with gap analysis of existing state and the desired future state.

In this stage, it is also important to identify and nurture some data evangelists. These people
truly believe in the power of data in making decisions and may already be using the data and
analytics in a powerful way. By involving these people, asking for their input, it becomes easier
to formulate the roadmap in a later stage.

Step 3: Identify and prioritize Use Cases


In step 3, envision how predictive analytics, prescriptive analytics and ultimately cognitive
analytics (further discussed in chapter 8) can help the organization to accelerate, optimize and
continuously learn, by developing Use Cases that align with the business objectives from step 1.

Step 4: Formulate a Big Data Roadmap


The next step is probably the most intense and contentious phase and without a doubt will
account for majority of the time in formulating data strategy. Based on the current capability
state assessment (step 2) and the identified and prioritized Big Data Use Cases (step 3), the
roadmap can be developed. The Big Data Roadmap outlines which projects (or Use Cases) will
be executed first and what capabilities (knowledge, tools and data) will be increased in the next
3-5 years.
Step 5: Embed through Change Management
Although technically not a part of the Big Data Strategy formulation, Change Management
(involving the hearts and minds of people) will have a profound impact on the success or failure
of a Big Data strategy. Change management should encompass organizational change, cultural
change, technological change, and changes in business processes.
5.Explain the types of data source in big data?

Data are key ingredients for any analytical exercise. Hence, it is important to thoroughly
consider and list all data sources that are of potential interest before starting the analysis. The
rule here is the more data, the better. However, real life data can be dirty because of
inconsistencies, incompleteness, duplication, and merging problems.

The three primary sources of Big Data


Social data comes from the Likes, Tweets & Retweets, Comments, Video Uploads, and general media
that are uploaded and shared via the world’s favorite social media platforms. This kind of data provides
invaluable insights into consumer behavior and sentiment and can be enormously influential in
marketing analytics. The public web is another good source of social data, and tools like Google Trends
can be used to good effect to increase the volume of big data.
Machine data is defined as information which is generated by industrial equipment, sensors that are
installed in machinery, and even web logs which track user behavior. This type of data is expected to
grow exponentially as the internet of things grows ever more pervasive and expands around the world.
Sensors such as medical devices, smart meters, road cameras, satellites, games and the rapidly growing
Internet Of Things will deliver high velocity, value, volume and variety of data in the very near future.

Transactional data is generated from all the daily transactions that take place both online and offline.
Invoices, payment orders, storage records, delivery receipts – all are characterized as transactional data
yet data alone is almost meaningless, and most organizations struggle to make sense of the data that
they are generating and how it can be put to good use.

6.Differentiate structured and unstructured data?

7.Differentiate ETL vs ELT?


8.Explain different collections methods in big data?
Big Data Collection Methods

Transactional Data
Transactional data includes multiple variables, such as what, how much, how and when
customers purchased as well as what promotions or coupons they used.
It’s essential to utilize a good Point of Sale (POS) software because then a business is able to
automatically store this information in a CRM (Customer Relationship Management) software.
Online Marketing Analytics
Every time a user browses a website, information is collected:
⦁ Google Analytics has the ability to provide a lot of demographic insight on each visitor.
This information is useful is building marketing campaigns, as well as website performance
analysis.
⦁ Heatmaps provide information on which sections of each website page generate the
most ‘action’ (mouse clicks or interactions).
⦁ Social media analytics allows for customer demographic as well as behavioural analysis.
And powerful Facebook marketing tools can help you market to audiences that mirror your
current following.
Social Media
In today’s day and age, most of humanity are using social media in one form or another. Nearly
every aspect of our lives is affected. Social media is used in many ways on a frequent basis:
networking, procrastinating, gossiping, sharing, educating, games etc.
Loyalty Cards
The loyalty cards system is great as it rewards repeat customers and encourages more
shopping.
There are so many businesses willing to give customers a discount simply in exchange for their
personal information. Loyalty programs have the power to double overall sales by encouraging
repeat shopping.

Maps
This one is a compelling source of satellite big data and it is used on a mass scale thanks to the
rise of Google Maps and Google Earth. This information has the potential to provide businesses
with customer location demographics and a detailed picture of the kind of people who live and
work in certain areas.
9.Explain Transmission methods?
Data transmission is the process of sending digital or analog data over a communication
medium to one or more computing, network, communication or electronic devices. It enables
the transfer and communication of devices in a point-to-point, point-to-multipoint and
multipoint-to-multipoint environment
There are two methods used to transmit data between digital devices: serial transmission and
parallel transmission. Serial data transmission sends data bits one after another over a single
channel. Parallel data transmission sends multiple data bits at the same time over multiple
channels.
10.Explain the issues of transmission methods?
Transmission methods- Issues
⦁ Theoretical research

⦁ Fundamental problems of big data


⦁ Standardization of big data
⦁ Evolution of BD computing modes
⦁ Technology development

⦁ format conversion of big data


⦁ big data transfer
⦁ real-time performance of big data
⦁ processing of BD

⦁ practical implication
⦁ BD management
⦁ searching, mining, analysis of BD
⦁ Integration and provenance of BD
⦁ big data applications
⦁ data security

⦁ Privacy
⦁ data quality
11.Explain business intelligence concepts and applications?
Business intelligence (BI) is a technology-driven process for analyzing data and delivering
actionable information that helps executives, managers and workers make informed business
decisions.
Business Understanding − This initial phase focuses on understanding the project objectives
and requirements from a business perspective, and then converting this knowledge into a data
mining problem definition. A preliminary plan is designed to achieve the objectives. A decision
model, especially one built using the Decision Model and Notation standard can be used.

Data Understanding − The data understanding phase starts with an initial data collection and
proceeds with activities in order to get familiar with the data, to identify data quality problems,
to discover first insights into the data, or to detect interesting subsets to form hypotheses for
hidden information.

Mining - This phase covers retrieving all data, all activities to construct the final dataset (data
that will be fed into the modeling tool(s)) from the initial raw data. Data preparation tasks are
likely to be performed multiple times, and not in any prescribed order. Tasks include table,
record, and attribute selection as well as transformation and cleaning of data for modeling
tools.

Integration - built a model (or models) that appears to have high quality, from a data analysis
perspective. Creation of the model is generally not the end of the project. Even if the purpose
of the model is to increase knowledge of the data, the knowledge gained will need to be
organized and presented in a way that is useful to the customer.

Decision types
There are two main kinds of decisions: strategic decisions and operational decisions. Strategic
decisions are those that impact the direction of the company.
e.g.: The decision to reach out to a new customer set would be a strategic decision.

In strategic decision-making, the goal itself may or may not be clear


Operational decisions are more routine and tactical decisions, focused on developing greater
efficiency. Operational decisions can be made more efficient using an analysis of past data.

BI Tools
⦁ BI includes a variety of software tools and techniques to provide the managers with the
information and insights needed to run the business.

⦁ BI tools include data warehousing, online analytical processing, social media analytics,
reporting, dashboards, querying, and data mining.
⦁ A spreadsheet tool, such as Microsoft Excel, can act as an easy but effective BI tool by
itself.
⦁ A dashboarding system, such as IBM Cognos, can offer a sophisticated set of tools for
gathering, analyzing, and presenting data.
⦁ Eg: Looker,Qlik sense, IBM cognas,SAP

BI Applications
1.Customer Relationship Management
Maximize the return on marketing campaigns
Improve customer retention

Maximize customer value


2.Healthcare and Wellness
Treatment effectiveness:
Wellness management:

3. Education
Course offerings:
Fund-raising from Alumni and other donors
4. Retail

5.Banking
Automate the loan application process:
6. Financial Services
12.Explain Infrastructure / support requirement for big data analytics

Following are five infrastructure/support requirements for Big Data analytics.

1.Storage: Organizations need to invest in storage solutions that are optimized for big data.
Flash storage is especially attractive due to its performance advantages and high availability.
Another smart option is clustered network-attached storage (NAS). While cloud is also an
option, large organizations find that the expense of constantly transporting data to and from
the cloud makes this option less cost-effective than on-premise storage.
Large users of Big Data — companies such as Google and Facebook — utilize hyperscale
computing environments, which are made up of commodity servers with direct-attached
storage, run frameworks like Hadoop or Cassandra and often use PCIe-based flash storage to
reduce latency. Smaller organizations, meanwhile, often utilize object storage or clustered
network-attached storage (NAS).

2. Processing: Servers intended for Big Data analytics must have adequate processing power.
Currently, the top choice for processors is Intel Skylake. Servers intended for Big Data analytics
must have enough processing power to support this application. Some analytics vendors, such
as Splunk, offer cloud processing options, which can be especially attractive to agencies that
experience seasonal peaks. If an agency has quarterly filing deadlines, for example, that
organization might securely spin up on-demand processing power in the cloud to process the
wave of data that comes in around those dates, while relying on on-premises processing
resources to handle the steadier, day-to-day demands

3. Analytics Software: The choice of Big Data analytics software should be based not only on
what functions the software can perform, but also data security and ease of use. Agencies
must select Big Data analytics products based not only on what functions the software can
complete, but also on factors such as data security and ease of use. One popular function of Big
Data analytics software is predictive analytics — the analysis of current data to make
predictions about the future. Predictive analytics are already used across a number of fields,
including actuarial science, marketing and financial services. Government applications include
fraud detection, capacity planning and child protection, with some child welfare agencies using
the technology to flag high-risk cases.

4. Networking: The massive quantities of information that must go back and back and forth in a
Big Data project require robust networking hardware. Thus capacity for secure network
transports is an essential component of Big Data infrastructure. The massive quantities of
information that must be shuttled back and forth in a Big Data initiative require robust
networking hardware. Many organizations are already operating with networking hardware
that facilitates 10-gigabit connections, and may have to make only minor modifications — such
as the installation of new ports — to accommodate a Big Data initiative. Securing network
transports is an essential step in any upgrade, especially for traffic that crosses network
boundaries.

5. Support: In addition to hardware expertise and software integration expertise, Nor-Tech has
pioneered a comprehensive suite of Big Data analytics support solutions for straightforward
deployment, operation and maintenance. The suite is a thoughtful response to well-known
obstacles that, until now, have prevented many organizations from fully and cost-effectively
leveraging Big Data analytics These solutions include remote visualization, storage guard, SATM
(system ambient temperature monitor), bare metal backup, remote monitoring and
management, Open OnDemand Plus, Bright Cluster Manager for Data Science, etc.
13. What is PCAP?

Packet Capture or PCAP (also known as libpcap) is an application programming interface (API)
that captures live network packet data from OSI model Layers 2-7. Network analyzers like
Wireshark create .pcap files to collect and record packet data from a network. PCAP comes in a
range of formats including Libpcap, WinPcap, and PCAPng.

These PCAP files can be used to view TCP/IP and UDP network packets. If you want to record
network traffic then you need to create a .pcapfile. You can create a .pcapfile by using a
network analyzer or packet sniffing tool like Wireshark or tcpdump. In this article, we’re going
to look at what PCAP is, and how it works.
Advantages of Packet Capturing and PCAP

The biggest advantage of packet capturing is that it grants visibility. You can use packet data to
pinpoint the root cause of network problems. You can monitor traffic sources and identify the
usage data of applications and devices. PCAP data gives you the real-time information you need
to find and resolve performance issues to keep the network functioning after a security event.
For example, you can identify where a piece of malware breached the network by tracking the
flow of malicious traffic and other malicious communications.
As a simple file format, PCAP has the advantage of being compatible with almost any packet
sniffing program you can think of, with a range of versions for Windows, Linux, and Mac OS.
Packet capture can be deployed in almost any environment.

Disadvantages of Packet Capturing and PCAP


Although packet capturing is a valuable monitoring technique it does have its limitations.
Packet analysis allows you to monitor network traffic but doesn’t monitor everything. Many
cyberattacks aren’t launched through network traffic, so you need to have other security
measures in place.
For example, some attackers use USBs and other hardware-based attacks. Consequently, PCAP
file analysis should make up part of your network security strategy but it shouldn’t be your only
line of defense.
Another significant obstacle to packet capturing is encryption. Many cyber attackers use
encrypted communications to launch attacks on networks. Encryption stops your packet sniffer
from being able to access traffic data and identify attacks. That means encrypted attacks will
slip under the radar if you’re relying on PCAP.
14. What are the Core Features of Wireshark?

Wireshark consists of a rich feature set including the following:


⦁ Live capture and offline analysis
⦁ Rich VoIP analysis
⦁ Read/write many different captures file formats
⦁ Capture compressed files (gzip) and decompress them on the fly
⦁ Deep inspection of hundreds of protocols
⦁ Standard three-pane packet browser
⦁ Captured network packets can be browsed via a GUI or TShark utility

⦁ Multi-platform easily ran on Linux, Windows, OS X, and FreeBSD


⦁ Powerful display filters
⦁ Output can be exported to XML, CSV, PostScript, or as a plain text
⦁ Packet list can use coloring rules for quick and intuitive analysis

You might also like