The Three Vs of Big Data
The Three Vs of Big Data
To really understand big data, it’s helpful to have some historical background. Here’s
Gartner’s definition, circa 2001 (which is still the go-to definition): Big data is data that
contains greater variety arriving in increasing volumes and with ever-higher velocity.
This is known as the three Vs.
Put simply, big data is larger, more complex data sets, especially from new data
sources. These data sets are so voluminous that traditional data processing software
just can’t manage them. But these massive volumes of data can be used to address
business problems you wouldn’t have been able to tackle before.
The amount of data matters. With big data, you’ll have to process high volumes of low-
density, unstructured data. This can be data of unknown value, such as Twitter data
feeds, clickstreams on a webpage or a mobile app, or sensor-enabled equipment. For
some organizations, this might be tens of terabytes of data. For others, it may be
hundreds of petabytes.
Velocity
Velocity is the fast rate at which data is received and (perhaps) acted on. Normally, the
highest velocity of data streams directly into memory versus being written to disk. Some
internet-enabled smart products operate in real time or near real time and will require
real-time evaluation and action.
Variety
Variety refers to the many types of data that are available. Traditional data types were
structured and fit neatly in a relational database. With the rise of big data, data comes in
new unstructured data types. Unstructured and semistructured data types, such as text,
audio, and video require additional preprocessing to derive meaning and support
metadata.
Finding value in big data isn’t only about analyzing it (which is a whole other benefit). It’s
an entire discovery process that requires insightful analysts, business users, and
executives who ask the right questions, recognize patterns, make informed
assumptions, and predict behavior.
Integrate
Big data brings together data from many disparate sources and applications. Traditional
data integration mechanisms, such as ETL (extract, transform, and load) generally
aren’t up to the task. It requires new strategies and technologies to analyze big data
sets at terabyte, or even petabyte, scale.
During integration, you need to bring in the data, process it, and make sure it’s
formatted and available in a form that your business analysts can get started with.
Manage
Big data requires storage. Your storage solution can be in the cloud, on premises, or
both. You can store your data in any form you want and bring your desired processing
requirements and necessary process engines to those data sets on an on-demand
basis. Many people choose their storage solution according to where their data is
currently residing. The cloud is gradually gaining popularity because it supports your
current compute requirements and enables you to spin up resources as needed.
Analyze
Your investment in big data pays off when you analyze and act on your data. Get new
clarity with a visual analysis of your varied data sets. Explore the data further to make
new discoveries. Share your findings with others. Build data models with machine
learning and artificial intelligence. Put your data to work.