Document
Document
Big Data Analytics: What It Is, How It Works, Benefits, And Challenges
Each day, your customers generate an abundance of data. Every time they open your email,
use your mobile app, tag you on social media, walk into your store, make an online purchase,
talk to a customer service representative, or ask a virtual assistant about you, those
technologies collect and process that data for your organization. And that’s just your
customers. Each day, employees, supply chains, marketing efforts, finance teams, and more
generate an abundance of data, too. Big data is an extremely large volume of data and
datasets that come in diverse forms and from multiple sources. Many organizations have
recognized the advantages of collecting as much data as possible. But it’s not enough just to
collect and store big data—you also have to put it to use. Thanks to rapidly growing
technology, organizations can use big data analytics to transform terabytes of data into
actionable insights.
1. Collect Data
Data collection looks different for every organization. With today’s technology, organizations
can gather both structured and unstructured data from a variety of sources — from cloud
storage to mobile applications to in-store IoT sensors and beyond. Some data will be stored in
data warehouses where business intelligence tools and solutions can access it easily. Raw or
unstructured data that is too diverse or complex for a warehouse may be assigned metadata
and stored in a data lake.
2. Process Data
Once data is collected and stored, it must be organized properly to get accurate results on
analytical queries, especially when it’s large and unstructured. Available data is growing
exponentially, making data processing a challenge for organizations. One processing option is
batch processing, which looks at large data blocks over time. Batch processing is useful when
there is a longer turnaround time between collecting and analyzing data. Stream processing
looks at small batches of data at once, shortening the delay time between collection and
analysis for quicker decision-making. Stream processing is more complex and often more
expensive.
3. Clean Data
Data big or small requires scrubbing to improve data quality and get stronger results; all data
must be formatted correctly, and any duplicative or irrelevant data must be eliminated or
accounted for. Dirty data can obscure and mislead, creating flawed insights.
4. Analyze Data
Getting big data into a usable state takes time. Once it’s ready, advanced analytics processes
can turn big data into big insights. Some of these big data analysis methods include:
Data mining sorts through large datasets to identify patterns and relationships by identifying
anomalies and creating data clusters.
Predictive analytics uses an organization’s historical data to make predictions about the
future, identifying upcoming risks and opportunities.
Deep learning imitates human learning patterns by using artificial intelligence and machine
learning to layer algorithms and find patterns in the most complex and abstract data.
Create beautiful visualizations with your data.
TRY TABLEAU FOR FREE
Graphic of visualizations
Hadoop is an open-source framework that efficiently stores and processes big datasets on
clusters of commodity hardware. This framework is free and can handle large amounts of
structured and unstructured data, making it a valuable mainstay for any big data operation.
NoSQL databases are non-relational data management systems that do not require a fixed
scheme, making them a great option for big, raw, unstructured data. NoSQL stands for “not
only SQL,” and these databases can handle a variety of data models.
MapReduce is an essential component to the Hadoop framework serving two functions. The
first is mapping, which filters data to various nodes within the cluster. The second is
reducing, which organizes and reduces the results from each node to answer a query.
YARN stands for “Yet Another Resource Negotiator.” It is another component of second-
generation Hadoop. The cluster management technology helps with job scheduling and
resource management in the cluster.
Spark is an open source cluster computing framework that uses implicit data parallelism and
fault tolerance to provide an interface for programming entire clusters. Spark can handle both
batch and stream processing for fast computation.
Tableau is an end-to-end data analytics platform that allows you to prep, analyze, collaborate,
and share your big data insights. Tableau excels in self-service visual analysis, allowing
people to ask new questions of governed big data and easily share those insights across the
organization.
Making big data accessible. Collecting and processing data becomes more difficult as the
amount of data grows. Organizations must make data easy and convenient for data owners of
all skill levels to use.
Maintaining quality data. With so much data to maintain, organizations are spending more
time than ever before scrubbing for duplicates, errors, absences, conflicts, and
inconsistencies.
Keeping data secure. As the amount of data grows, so do privacy and security concerns.
Organizations will need to strive for compliance and put tight data processes in place before
they take advantage of big data.
Finding the right tools and platforms. New technologies for processing and analyzing big data
are developed all the time. Organizations must find the right technology to work within their
established ecosystems and address their particular needs. Often, the right solution is also a
flexible solution that can accommodate future infrastructure changes.
Additional Resources
How data mining works: a guide
READ NOW
Connect with your customers and boost your bottom line with actionable insights.
TRY TABLEAU FOR FREE
English (US)
System Status Blog Developer Contact Us
LEGAL PRIVACY UNINSTALL COOKIE PREFERENCES YOUR PRIVACY CHOICES
LinkedIn Facebook Twitter
©2024 Salesforce, Inc.