CC Unit 3 Imp Questions
CC Unit 3 Imp Questions
Unit 3
Important Questions
Characteristics Of Big Data Big data can be described by the following characteristics:
• Volume
• Variety
• Velocity
• Variability
(I) Volume – The name Big Data itself is related to a size which is enormous. Size of data plays a
very crucial role in determining value out of data. Also, whether a particular data can actually
be considered as a Big Data or not, is dependent upon the volume of data. Hence, ‘Volume’ is
one characteristic which needs to be considered while dealing with Big Data solutions.
(ii) Variety – The next aspect of Big Data is its variety. Variety refers to heterogeneous sources
and the nature of data, both structured and unstructured. During earlier days, spreadsheets
and databases were the only sources of data considered by most of the applications.
Nowadays, data in the form of emails, photos, videos, monitoring devices, PDFs, audio, etc. are
also being considered in the analysis applications. This variety of unstructured data poses
certain issues for storage, mining and analyzing data.
(iii) Velocity – The term ‘velocity’ refers to the speed of generation of data. How fast the data is
generated and processed to meet the demands, determines real potential in the data. Big Data
Velocity deals with the speed at which data flows in from sources like business processes,
application logs, networks, and social media sites, sensors, Mobile devices, etc. The flow of data
is massive and continuous.
(iv) Variability – This refers to the inconsistency which can be shown by the data at times, thus
hampering the process of being able to handle and manage the data effectively.
2.Explain Big Data framework?
The Big Data Framework
The core objective of the Big Data Framework is to provide a structure for enterprise
organizations that aim to benefit from the potential of Big Data
The Big Data Framework was developed because – although the benefits and business cases of
Big Data are apparent – many organizations struggle to embed a successful Big Data practice in
their organization. The structure provided by the Big Data Framework provides an approach for
organizations that takes into account all organizational capabilities of a successful Big Data
practice.
The main benefits of applying a Big Data framework include:
⦁ The Big Data Framework provides a structure for organizations that want to start with
Big Data or aim to develop their Big Data capabilities further.
⦁ The Big Data Framework includes all organizational aspects that should be taken into
account in a Big Data organization.
⦁ The Big Data Framework is vendor independent. It can be applied to any organization
regardless of choice of technology, specialization or tools.
⦁ The Big Data Framework provides a common reference model that can be used across
departmental functions or country boundaries.
⦁ The Big Data Framework identifies core and measurable capabilities in each of its six
domains so that the organization can develop over time.
3.Explain the challenges and trends in big data?
To provide the mechanisms to connect the data acquisition with the data pre- and post-
processing (analysis) and storage, both in the historical and real-time layers.
Data growth issues
Data is growing in anonymous amount hence Data cleaning usually takes several steps, such as
boilerplate removal (i.e., removing HTML headers in web mining acquisition), language
detection and named entities recognition (for textual resources), and providing extra metadata
such as timestamp, provenance information (yet another overlap with data curation), etc.
Lack of data professionals
The acquisition of media (pictures, video) is a significant challenge, but it is an even bigger
challenge to perform the analysis and storage of video and images and need of professionals in
organization is must to analyze data
Data variety requires processing the semantics in the data in order to correctly and effectively
merge data from different sources while processing. Works on semantic event processing such
as semantic approximations (Hasan and Curry 2014a), thematic event processing (Hasan and
Curry 2014b), and trigonum tagging (Hasan and Curry 2015) are emerging approaches in this
area, within this context.
Securing data
Acquired data must be analyzed and organized, hence the data security is the main concern in
big data
Edge processing is about running some processes and moving those processes to a local system
such as any user’s system or IoT device or a server. Edge computing brings computation to a
network’s edge and reduces the amount of long-distance connection that has to happen
between a customer and a server, which is making it the latest trends in big data analytics. Edge
computing provides a boost to Data Streaming, including real-time data Streaming and
processing without containing latency.
6. Natural Language Processing
Natural Language Processing (NLP) lies inside artificial intelligence and works to develop
communication between computers and humans.
The objective of NLP is to read, decode the meaning of the human language. Natural language
processing is mostly based on machine learning, and it is used to develop word processor
applications or translating software. Natural Language Processing Techniques need algorithms
to recognize and obtain the required data from each sentence by applying grammar rules.
7. Hybrid Clouds
A cloud computing system utilizes an on-premises private cloud and a third-party public cloud
with orchestration between two interfaces. Hybrid cloud provides excellent flexibility and more
data deployment options by moving the processes between private and public clouds. An
organization must have a private cloud to gain adaptability with the aspired public cloud. For
that, it has to develop a data center, including servers, storage, LAN, and load balancer. The
organization has to deploy a virtualization layer/hypervisor to support the VMs and containers
and install a private cloud software layer.
8. Dark Data
Dark data is the data that a company does not use in any analytical system. The data is
gathered from several network operations that are not used to determine insights or for
prediction. The organizations might think that this is not the correct data because they are not
getting any outcome from that, but they know that this will be the most valuable thing. As the
data is growing day-by-day, the industry should understand that any unexplored data can be a
security risk for that organization.
4.Explain Big Data Strategies?
Big Data Strategy
A well-defined enterprise Big Data strategy should be actionable for the organizations. In order
to achieve this, organizations can follow the following 5-step approach to formulate their Big
Data strategy:
⦁ Define business objectives
In this stage, it is also important to identify and nurture some data evangelists. These people
truly believe in the power of data in making decisions and may already be using the data and
analytics in a powerful way. By involving these people, asking for their input, it becomes easier
to formulate the roadmap in a later stage.
Data are key ingredients for any analytical exercise. Hence, it is important to thoroughly
consider and list all data sources that are of potential interest before starting the analysis. The
rule here is the more data, the better. However, real life data can be dirty because of
inconsistencies, incompleteness, duplication, and merging problems.
Transactional data is generated from all the daily transactions that take place both online and offline.
Invoices, payment orders, storage records, delivery receipts – all are characterized as transactional data
yet data alone is almost meaningless, and most organizations struggle to make sense of the data that
they are generating and how it can be put to good use.
Transactional Data
Transactional data includes multiple variables, such as what, how much, how and when
customers purchased as well as what promotions or coupons they used.
It’s essential to utilize a good Point of Sale (POS) software because then a business is able to
automatically store this information in a CRM (Customer Relationship Management) software.
Online Marketing Analytics
Every time a user browses a website, information is collected:
⦁ Google Analytics has the ability to provide a lot of demographic insight on each visitor.
This information is useful is building marketing campaigns, as well as website performance
analysis.
⦁ Heatmaps provide information on which sections of each website page generate the
most ‘action’ (mouse clicks or interactions).
⦁ Social media analytics allows for customer demographic as well as behavioural analysis.
And powerful Facebook marketing tools can help you market to audiences that mirror your
current following.
Social Media
In today’s day and age, most of humanity are using social media in one form or another. Nearly
every aspect of our lives is affected. Social media is used in many ways on a frequent basis:
networking, procrastinating, gossiping, sharing, educating, games etc.
Loyalty Cards
The loyalty cards system is great as it rewards repeat customers and encourages more
shopping.
There are so many businesses willing to give customers a discount simply in exchange for their
personal information. Loyalty programs have the power to double overall sales by encouraging
repeat shopping.
Maps
This one is a compelling source of satellite big data and it is used on a mass scale thanks to the
rise of Google Maps and Google Earth. This information has the potential to provide businesses
with customer location demographics and a detailed picture of the kind of people who live and
work in certain areas.
9.Explain Transmission methods?
Data transmission is the process of sending digital or analog data over a communication
medium to one or more computing, network, communication or electronic devices. It enables
the transfer and communication of devices in a point-to-point, point-to-multipoint and
multipoint-to-multipoint environment
There are two methods used to transmit data between digital devices: serial transmission and
parallel transmission. Serial data transmission sends data bits one after another over a single
channel. Parallel data transmission sends multiple data bits at the same time over multiple
channels.
10.Explain the issues of transmission methods?
Transmission methods- Issues
⦁ Theoretical research
⦁ practical implication
⦁ BD management
⦁ searching, mining, analysis of BD
⦁ Integration and provenance of BD
⦁ big data applications
⦁ data security
⦁ Privacy
⦁ data quality
11.Explain business intelligence concepts and applications?
Business intelligence (BI) is a technology-driven process for analyzing data and delivering
actionable information that helps executives, managers and workers make informed business
decisions.
Business Understanding − This initial phase focuses on understanding the project objectives
and requirements from a business perspective, and then converting this knowledge into a data
mining problem definition. A preliminary plan is designed to achieve the objectives. A decision
model, especially one built using the Decision Model and Notation standard can be used.
Data Understanding − The data understanding phase starts with an initial data collection and
proceeds with activities in order to get familiar with the data, to identify data quality problems,
to discover first insights into the data, or to detect interesting subsets to form hypotheses for
hidden information.
Mining - This phase covers retrieving all data, all activities to construct the final dataset (data
that will be fed into the modeling tool(s)) from the initial raw data. Data preparation tasks are
likely to be performed multiple times, and not in any prescribed order. Tasks include table,
record, and attribute selection as well as transformation and cleaning of data for modeling
tools.
Integration - built a model (or models) that appears to have high quality, from a data analysis
perspective. Creation of the model is generally not the end of the project. Even if the purpose
of the model is to increase knowledge of the data, the knowledge gained will need to be
organized and presented in a way that is useful to the customer.
Decision types
There are two main kinds of decisions: strategic decisions and operational decisions. Strategic
decisions are those that impact the direction of the company.
e.g.: The decision to reach out to a new customer set would be a strategic decision.
BI Tools
⦁ BI includes a variety of software tools and techniques to provide the managers with the
information and insights needed to run the business.
⦁ BI tools include data warehousing, online analytical processing, social media analytics,
reporting, dashboards, querying, and data mining.
⦁ A spreadsheet tool, such as Microsoft Excel, can act as an easy but effective BI tool by
itself.
⦁ A dashboarding system, such as IBM Cognos, can offer a sophisticated set of tools for
gathering, analyzing, and presenting data.
⦁ Eg: Looker,Qlik sense, IBM cognas,SAP
BI Applications
1.Customer Relationship Management
Maximize the return on marketing campaigns
Improve customer retention
3. Education
Course offerings:
Fund-raising from Alumni and other donors
4. Retail
5.Banking
Automate the loan application process:
6. Financial Services
12.Explain Infrastructure / support requirement for big data analytics
1.Storage: Organizations need to invest in storage solutions that are optimized for big data.
Flash storage is especially attractive due to its performance advantages and high availability.
Another smart option is clustered network-attached storage (NAS). While cloud is also an
option, large organizations find that the expense of constantly transporting data to and from
the cloud makes this option less cost-effective than on-premise storage.
Large users of Big Data — companies such as Google and Facebook — utilize hyperscale
computing environments, which are made up of commodity servers with direct-attached
storage, run frameworks like Hadoop or Cassandra and often use PCIe-based flash storage to
reduce latency. Smaller organizations, meanwhile, often utilize object storage or clustered
network-attached storage (NAS).
2. Processing: Servers intended for Big Data analytics must have adequate processing power.
Currently, the top choice for processors is Intel Skylake. Servers intended for Big Data analytics
must have enough processing power to support this application. Some analytics vendors, such
as Splunk, offer cloud processing options, which can be especially attractive to agencies that
experience seasonal peaks. If an agency has quarterly filing deadlines, for example, that
organization might securely spin up on-demand processing power in the cloud to process the
wave of data that comes in around those dates, while relying on on-premises processing
resources to handle the steadier, day-to-day demands
3. Analytics Software: The choice of Big Data analytics software should be based not only on
what functions the software can perform, but also data security and ease of use. Agencies
must select Big Data analytics products based not only on what functions the software can
complete, but also on factors such as data security and ease of use. One popular function of Big
Data analytics software is predictive analytics — the analysis of current data to make
predictions about the future. Predictive analytics are already used across a number of fields,
including actuarial science, marketing and financial services. Government applications include
fraud detection, capacity planning and child protection, with some child welfare agencies using
the technology to flag high-risk cases.
4. Networking: The massive quantities of information that must go back and back and forth in a
Big Data project require robust networking hardware. Thus capacity for secure network
transports is an essential component of Big Data infrastructure. The massive quantities of
information that must be shuttled back and forth in a Big Data initiative require robust
networking hardware. Many organizations are already operating with networking hardware
that facilitates 10-gigabit connections, and may have to make only minor modifications — such
as the installation of new ports — to accommodate a Big Data initiative. Securing network
transports is an essential step in any upgrade, especially for traffic that crosses network
boundaries.
5. Support: In addition to hardware expertise and software integration expertise, Nor-Tech has
pioneered a comprehensive suite of Big Data analytics support solutions for straightforward
deployment, operation and maintenance. The suite is a thoughtful response to well-known
obstacles that, until now, have prevented many organizations from fully and cost-effectively
leveraging Big Data analytics These solutions include remote visualization, storage guard, SATM
(system ambient temperature monitor), bare metal backup, remote monitoring and
management, Open OnDemand Plus, Bright Cluster Manager for Data Science, etc.
13. What is PCAP?
Packet Capture or PCAP (also known as libpcap) is an application programming interface (API)
that captures live network packet data from OSI model Layers 2-7. Network analyzers like
Wireshark create .pcap files to collect and record packet data from a network. PCAP comes in a
range of formats including Libpcap, WinPcap, and PCAPng.
These PCAP files can be used to view TCP/IP and UDP network packets. If you want to record
network traffic then you need to create a .pcapfile. You can create a .pcapfile by using a
network analyzer or packet sniffing tool like Wireshark or tcpdump. In this article, we’re going
to look at what PCAP is, and how it works.
Advantages of Packet Capturing and PCAP
The biggest advantage of packet capturing is that it grants visibility. You can use packet data to
pinpoint the root cause of network problems. You can monitor traffic sources and identify the
usage data of applications and devices. PCAP data gives you the real-time information you need
to find and resolve performance issues to keep the network functioning after a security event.
For example, you can identify where a piece of malware breached the network by tracking the
flow of malicious traffic and other malicious communications.
As a simple file format, PCAP has the advantage of being compatible with almost any packet
sniffing program you can think of, with a range of versions for Windows, Linux, and Mac OS.
Packet capture can be deployed in almost any environment.