Big Data Cat 1
Big Data Cat 1
Big data primarily refers to data sets that are too large or complex to be dealt with by
traditional data-processing application software. Data with many entries (rows) offer
greater statistical power, while data with higher complexity (more attributes or columns)
may lead to a higher false discovery rate.
2) What is Structured Data?
data that is organized and design in a specific way to make it easily readable and
understand by both humans and machines. This is typically achieved through the use of a
well-defined schema or data model, which provides a structure for the data.
3) Difference between Descriptive Analytics and Predictive Analytics?
Descriptive analytics focuses on understanding past events and provides insights into
what has happened.
Predictive analytics aims to forecast future outcomes and understand what could
happen.
4) What is Big Data Visualization?
Data visualization is the graphical representation of information and data. By using visual
elements like charts, graphs, and maps, data visualization tools provide an accessible way
to see and understand trends, outliers, and patterns in data.
This practice is crucial in the data science process, as it helps to make data more
understandable and actionable for a wide range of users, from business professionals to
data scientists.
5) List the 5 V's of Big Data ?
Volume, Velocity, Variety, Veracity, and Value
6) Give any example of using Big Data in today's life?
big data refers to large collections of data that are so complex and expansive that they
cannot be interpreted by humans or by traditional data management systems. When
properly analyzed using modern tools, these huge volumes of data give businesses the
information they need to make informed decisions.
New software developments have recently made it possible to use and track big data sets.Much
of this user information would seem meaningless and unconnected to the humans eye.
However, big data analytic tools can track the relationships between hundreds of types and
sources of data to produce useful business intelligence.
CHARACTERISTICS:
The characteristics of big data include several key attributes, commonly known as the “Vs.”
These characteristics are important for understanding the nature of Big Data. These
characteristics of big data are –
Volume
As the name itself suggest, big data involves large amounts of information. Terabytes,
petabytes, and even larger amounts of data are possible. It needs specialized processing
and storage infrastructure to handle such massive quantities.
Example – Google processes over 3.5 billion searches per day, leading to an annual estimate
of around 1.28 trillion searches which is a really big data.
Velocity
Other characteristics of big data include Velocity. This refers to the speed at which data is
generated, processed, and made available for analysis. With real-time data sources like social
media, sensors, and IoT devices, data is often produced at high speeds, requiring quick
processing capabilities.
Data is flowing continuously in large quantities. This defines the data’s potential, or how
quickly the data can be created and processed to satisfy needs.
Example – Facebook’s user base is increasing by approximately 22% year by year. As of the
latest available data, Facebook had around 2.8 billion monthly active users, reflecting the rapid
pace of user growth.
Variety
One of the main characteristics of big data is Variety. It includes different types of data,
including structured data (e.g., databases), semi-structured data (e.g., XML, JSON), and
unstructured data (e.g., text, images, videos). Managing and analyzing this variety of data
requires flexible and adaptable processing methods.
Example -YouTube has over 500 hours of video content uploaded every minute. This immense
variety includes videos in different formats, resolutions, and content types.
Veracity
Focuses on the correctness and dependability of the data. Big Data sources may contain
inconsistencies, errors, or noise, making it crucial to ensure the quality of the information for
meaningful analysis.
Example – Google’s search algorithms are designed to filter through and prioritize accurate
information from the vast volume of web pages indexed.
Value
Value is one of the important characteristics of big data. It focuses on the goal of extracting
meaningful insights and value from the data. The primary purpose of dealing with big data is
to get information that can lead to improved decision-making and strategic advantages.
Example – Facebook’s advertising revenue amounted to approximately $84.2 billion in the
most recent fiscal year. The value derived from targeted advertising based on user data
contributes significantly to the company’s revenue.
Variability
Variability refers to those characteristics of big data that represents the dynamic nature of data
flow. Big Data sources show changes in volume, velocity, and variety over time, requiring
flexible processing methods.
Example – Twitter experiences variability in data flow, especially during major events. The
platform sees a rise in tweets and user interactions during such events, requiring adaptable
processing methods to handle the fluctuating data volume.
Visibility
Being one of the characteristics of big data, variability refers to the strong nature of data
sources, requiring adaptability in processing methods to handle changes in volume and type
over time.
Example – Google Maps uses Big Data to provide visibility into real-time traffic conditions.
By analyzing data from smartphones and other sources, Google Maps helps users navigate
efficiently by avoiding busy routes.
Volatility
Characteristics of big data include the capture of the temporary nature of certain data. Some
data in big data environments may have a short validity or relevance, which requires
organizations to adapt quickly to changes in the data landscape.
Example – Financial markets generate vast amounts of data in real-time. Stock prices, currency
exchange rates, and commodity prices can be highly volatile.
Moving on from the different characteristics of big data, let’s discuss the types of big data.
Business intelligence is a broad term that encompasses data mining, process analysis,
performance benchmarking, and descriptive analytics. BI parses all the data generated by a
business and presents easy-to-digest reports, performance measures, and trends that inform
management decisions.
A data warehouse, also called an enterprise data warehouse (EDW), is an enterprise data
platform used for the analysis and reporting of structured and semi-structured data from
multiple data sources, such as point-of-sale transactions, marketing automation, customer
relationship management, and more.
Data warehouses include an analytical database and critical analytical components and
procedures. They support ad hoc analysis and custom reporting, such as data pipelines, queries,
and business applications. They can consolidate and integrate massive amounts of current and
historical data in one place and are designed to give a long-range view of data over time. These
data warehouse capabilities have made data warehousing a primary staple of enterprise
analytics that help support informed business decisions.
Traditional data warehouses are hosted on-premises, with data flowing in from relational
databases, transactional systems, business applications, and other source systems.
However, they are typically designed to capture a subset of data in batches and store it
based on rigid schemas, making them unsuitable for spontaneous queries or real-time
analysis. Companies also must purchase their own hardware and software with an on-
premises data warehouse, making it expensive to scale and maintain. In a traditional
warehouse, storage is typically limited compared to compute, so data is transformed
quickly and then discarded to keep storage space free.
Today’s data analytics activities have transformed to the center of all core business
activities, including revenue generation, cost containment, improving operations, and
enhancing customer experiences. As data evolves and diversifies, organizations need more
robust data warehouse solutions and advanced analytic tools for storing, managing, and
analyzing large quantities of data across their organizations.
These systems must be scalable, reliable, secure enough for regulated industries, and
flexible enough to support a wide variety of data types and big data use cases. They also
need to support flexible pricing and compute, so you only pay for what you need instead of
guessing your capacity. The requirements go beyond the capabilities of most legacy data
warehouses. As a result, many enterprises are turning to cloud-based data warehouse
solutions.
A cloud data warehouse makes no trade-offs from a traditional data warehouse, but extends
capabilities and runs on a fully managed service in the cloud. Cloud data warehousing
offers instant scalability to meet changing business requirements and powerful data
processing to support complex analytical queries.
With a cloud data warehouse, you benefit from the inherent flexibility of a cloud
environment with more predictable costs. The up-front investment is typically much lower
and lead times are shorter with on-premises data warehouse solutions because the cloud
service provider manages and maintains the physical infrastructure.