Data Analytics_Module-1.1
Data Analytics_Module-1.1
Module-1
Dr. Ramen Pal
Associate Professor
Department of CSE (AI & ML), UEMK
Contact: [email protected]
WhatsApp: 7501038078
01/24/2025 1
Course Details
• Subject Name: Professional Elective - III : Data Analytics
• Credit: 3
• Subject Code: PECCSE602A
• Lecture Hours: 36
01/24/2025 2
Course Outcome
• On completion of the course students will be able to:
CO-1: Discuss with illustration the techniques and methods related to the
area of data collection, pre-processing, and exploratory data analytics.
CO-2: Discuss important terms and techniques on statistics to enable student
to understand the background of different tools or methods used in data
analytics.
CO-3: Use at beginning level of proficiency on the tools of machine learning
to ask questions of and explore patterns in data.
CO-4: Demonstrate intermediate proficiency in the visualization of data to
communicate information and patterns that exist in the data.
01/24/2025 3
Syllabus: Module-1
(Introduction to Data Analytics)
Data science workflow, Automated methods for data collection, Data
and Visualization Models, Data wrangling and cleaning, Exploratory
data analysis, Dimensionality Reduction. Building and evaluation of
models for: Association Analysis, Recommendation Systems, Time-
series data, Text Analysis, Data Mining.
01/24/2025 4
Introduction to data
• We often use the term data to refer to computer information
• This information is either transmitted or stored
• Data comes in numerous forms
• Any kind of information may it be in numbers or text, or pictures is
termed as Data
01/24/2025 5
Types of data
*Grouping method is used to examine nominal data. It usually represented using pie charts
01/24/2025 6
Discrete vs Continuous data
Discrete Continuous
01/24/2025 7
Dataset: Types of data
Data comes in different types. Some of the common types of data include:
Text
Image
Video
Numbers
Spreadsheets
Sound
01/24/2025 8
Real world applications of data
01/24/2025 9
Data Visualization
01/24/2025 10
Data Visualization: Charts
Heat
map
01/24/2025 Fig: Different types of charts 11
What Is Data Science?
01/24/2025 12
What Is Data Science?
• Data science is the domain of study that deals with vast volumes of
data using modern tools and techniques to find unseen patterns,
derive meaningful information, and make business decisions.
• Data science uses complex machine learning algorithms to build
predictive models.
• The data used for analysis can come from many different sources and
presented in various formats.
01/24/2025 13
The Data Science Lifecycle
• Data science’s lifecycle consists of five distinct stages, each with its own tasks:
1. Capture: Data Acquisition, Data Entry, Signal Reception, Data Extraction. This stage involves
gathering raw structured and unstructured data.
2. Maintain: Data Warehousing, Data Cleaning, Data Staging, Data Processing, Data
Architecture. This stage covers taking the raw data and putting it in a form that can be used.
3. Process: Data Mining, Clustering/Classification, Data Modeling, Data Summarization. Data
scientists take the prepared data and examine its patterns, ranges, and biases to determine
how useful it will be in predictive analysis.
4. Analyze: Exploratory/Confirmatory, Predictive Analysis, Regression, Text Mining, Qualitative
Analysis. Here is the real meat of the lifecycle. This stage involves performing the various
analyses on the data.
5. Communicate: Data Reporting, Data Visualization, Business Intelligence, Decision Making.
In this final step, analysts prepare the analyses in easily readable forms such as charts,
graphs, and reports.
01/24/2025 14
Data Science Workflow
01/24/2025 15
Data Science Workflow
01/24/2025 16
Data Science Workflow
01/24/2025 17
Data Wrangling
Data wrangling ensures data is reliable and complete, before professionals
analyze it and use it to create insights. Thanks to this process, those insights are
based on accurate, high-quality data.
Anaconda's “The State of Data Science 2022” report revealed that data scientists
spend about 37.75% of their time data wrangling, a percentage that’s a sharp
reduction from past surveys, which placed the estimate at closer to 80%.
If you’re considering a career in data, at some
point, you’ll likely have to deal with data
wrangling in some capacity.
01/24/2025 18
Data Wrangling
Data wrangling also goes by a few other names, including data cleaning,
scrubbing, and remediation.
It's an umbrella term that describes several processes designed to transform raw
data from messy, complex data sets into more easily used formats.
When you engage in data wrangling, you find and transform data so you can use
it to answer a question or produce valuable insight needed to make decisions.
Professionals conduct data wrangling in one of two ways: manually or
automated.
Data scientists and other team members usually head up the data wrangling
process in businesses with a data team.
In smaller organizations, it may fall to non-data professionals to clean data before
use.
01/24/2025 19
Data Wrangling: Why it
matters?
Imagine building the Taj Mahal on a shoddy foundation or if the builder who
constructed your home slapped it together without paying meticulous detail to
the quality of the foundation and the building supplies. Data wrangling works
similarly as a solid foundation for research and analytics.
Once the process is complete, you'll get results much faster with less chance of
errors or missed opportunities. You make raw data usable when you use data-
wrangling tools and follow the steps. Other benefits include:
Data wrangling enables you to gather data from multiple sources into a central spot.
Cleaning and converting data into a standard format enables you to perform cross-data set
analytics.
Data wrangling prepares data by removing flawed and missing elements, readying it for
data mining, and empowering businesses to make concrete, data-driven decisions.
01/24/2025 20
Data Wrangling: Steps
Harvard Business School Online identifies six common processes used to inform
your approach to data wrangling:
1. Data Discovery
2. Data Structuring
3. Data Cleaning
4. Data Enriching
5. Data Validating, and
6. Data Publishing
If you work with data, you’ll likely also work with several tools to help you easily
navigate the data-wrangling process.
Some popular tools include Tabula, DataWrangler, Pandas, and Python.
Each project might require you to take a slightly different approach and may
present unique challenges throughout the process.
01/24/2025 21
Data Discovery
The first step helps you make sense of
the data you're working with. You'll
also need to keep the primary goal of
the data analysis during this step. For
example, if your organization wants to
gain customer behavior insight, you
might sort customer data according to
location, promotional codes, and
purchases.
01/24/2025 22
Data Structuring
Once you've finished the first step, you
might find raw data that could be more
organized, complete, or misformatted
for your purposes. That's where data
structuring comes into play. This is the
process in which you transform that
raw data into a form appropriate for
the analytical model you want to use to
interpret the data.
01/24/2025 23
Data cleaning
During the data cleaning step, you
remove data errors that might distort
or damage the value of your analysis.
This includes tasks like standardizing
inputs, deleting empty cells, removing
outliers, and deleting blank rows.
Ultimately, the goal is to ensure the
data is as error-free as possible.
01/24/2025 24
Enriching data
Once you've transformed your data
into a more usable state, you must
determine if you have all the data you
need for the project. If you don't, you
can enrich it by adding values from
other data sets. And if you do so, you
might have to repeat steps one through
three for that new data.
01/24/2025 25
Data Validation
When you work on data validation, you
verify that your data is consistent and
of sufficient quality. During this step,
you might find some issues you need to
address or that the data is ready to be
analyzed. This step is typically
completed using automated processes
and requires some programming skills.
01/24/2025 26
Data Publishing
After validating your data, you're
ready to publish it. In this step,
you'll put it into whatever
format you prefer for sharing
with other organization
members for analysis purposes.
Use written reports or digital
files, depending on the nature of
the data and the organization's
overarching goals.
01/24/2025 27
Motivation for a future
Employee
The data-wrangling market itself is predicted to remain strong. According to
Mordor Intelligence, the market could reach $2.28 billion USD by 2026, up
from $1.31 billion USD in 2020.
Your job outlook will likely depend on the role you ultimately choose to
pursue. The average annual salaries for several common roles include:
01/24/2025 29
Who Oversees the Data Science
Process?
• Business Managers: The business managers are the people in charge of overseeing the data
science training method. Their primary responsibility is to collaborate with the data science
team to characterize the problem and establish an analytical method.
• IT Managers: Data science teams are constantly monitored and resourced accordingly to
ensure that they operate efficiently and safely. They may also be in charge of creating and
maintaining IT environments for data science teams.
• Data Science Managers: The data science managers make up the final section of the tea.
They primarily trace and supervise the working procedures of all data science team
members. They also manage and keep track of the day-to-day activities of the three data
science teams. They are team builders who can blend project planning and monitoring with
team growth.
Data scientists are among the most recent analytical data professionals who have the technical
ability to handle complicated issues as well as the desire to investigate what questions need to
be answered.
01/24/2025 30
Who Oversees the Data Science
Process?
• Business Managers: The business managers are the people in charge of overseeing the data
science training method. Their primary responsibility is to collaborate with the data science
team to characterize the problem and establish an analytical method.
• IT Managers: Data science teams are constantly monitored and resourced accordingly to
ensure that they operate efficiently and safely. They may also be in charge of creating and
maintaining IT environments for data science teams.
• Data Science Managers: The data science managers make up the final section of the tea.
They primarily trace and supervise the working procedures of all data science team
members. They also manage and keep track of the day-to-day activities of the three data
science teams. They are team builders who can blend project planning and monitoring with
team growth.
Data scientists are among the most recent analytical data professionals who have the technical
ability to handle complicated issues as well as the desire to investigate what questions need to
be answered.
01/24/2025 31
On a daily basis, a data scientist
may do the following tasks:
• Discover patterns and trends in datasets to get insights.
• Create forecasting algorithms and data models.
• Improve the quality of data or product offerings by utilising machine
learning techniques.
• Distribute suggestions to other teams and top management.
• In data analysis, use data tools such as R, SAS, Python, or SQL.
• Top the field of data science innovations.
01/24/2025 32
What Does a Data Scientist Do?
• A data scientist analyzes business data to extract meaningful insights. In other words, a data scientist
solves business problems through a series of steps, including:
• Before tackling the data collection and analysis, the data scientist determines the problem by asking the
right questions and gaining understanding.
• The data scientist then determines the correct set of variables and data sets.
• The data scientist gathers structured and unstructured data from many disparate sources—enterprise
data, public data, etc.
• Once the data is collected, the data scientist processes the raw data and converts it into a format
suitable for analysis. This involves cleaning and validating the data to guarantee uniformity,
completeness, and accuracy.
• After the data has been rendered into a usable form, it’s fed into the analytic system—ML algorithm or a
statistical model. This is where the data scientists analyze and identify patterns and trends.
• When the data has been completely rendered, the data scientist interprets the data to find
opportunities and solutions.
• The data scientists finish the task by preparing the results and insights to share with the appropriate
stakeholders and communicating the results.
01/24/2025 33
Use of Data Science
• Data science may detect patterns in seemingly unstructured or
unconnected data, allowing conclusions and predictions to be made.
• Tech businesses that acquire user data can utilise strategies to
transform that data into valuable or profitable information.
• Data Science has also made inroads into the transportation industry,
such as with driverless cars.
• Data Science applications provide a better level of therapeutic
customization through genetics and genomics research.
01/24/2025 34
Where Do You Fit in Data
Science?
• Data science offers you the opportunity to focus on and specialize in
one aspect of the field.
1. Data Scientist
Job role: Determine what the problem is, what questions need answers, and where to
find the data. Also, they mine, clean, and present the relevant data.
Skills needed: Programming skills (SAS, R, Python), storytelling and data visualization,
statistical and mathematical skills, knowledge of Hadoop, SQL, and Machine Learning.
2. Data Analyst
Job role: Analysts bridge the gap between the data scientists and the business analysts,
organizing and analyzing data to answer the questions the organization poses. They take
the technical analyses and turn them into qualitative action items.
Skills needed: Statistical and mathematical skills, programming skills (SAS, R, Python),
plus experience in data wrangling and data visualization.
01/24/2025 35
Where Do You Fit in Data
Science?
3. Data Engineer
Job role: Data engineers focus on developing, deploying, managing, and optimizing the
organization’s data infrastructure and data pipelines. Engineers support data scientists by
helping to transfer and transform data for queries.
Skills needed: NoSQL databases (e.g., MongoDB, Cassandra DB), programming languages
such as Java and Scala, and frameworks (Apache Hadoop).
01/24/2025 36
Data Science Tools
Data Analysis: SAS, Jupyter, R Studio, MATLAB, Excel, RapidMiner
Data Warehousing: Informatica/ Talend, AWS Redshift
Data Visualization: Jupyter, Tableau, Cognos, RAW
Machine Learning: Spark MLib, Mahout, Azure ML studio
01/24/2025 37
Applications of Data Science
Data science has found its applications in almost every industry.
Healthcare: Healthcare companies are using data science to build sophisticated medical
instruments to detect and cure diseases.
Gaming: Video and computer games are now being created with the help of data science
and that has taken the gaming experience to the next level.
Image Recognition: Identifying patterns in images and detecting objects in an image is
one of the most popular data science applications.
Recommendation Systems: Netflix and Amazon give movie and product
recommendations based on what you like to watch, purchase, or browse on their
platforms.
Logistics: Data Science is used by logistics companies to optimize routes to ensure faster
delivery of products and increase operational efficiency.
01/24/2025 38
Applications of Data Science
Fraud Detection: Banking and financial institutions use data science and related
algorithms to detect fraudulent transactions.
Internet Search: When we think of search, we immediately think of Google. Right?
Speech recognition: Speech recognition is dominated by data science techniques.
We may see the excellent work of these algorithms in our daily lives. Have you ever
needed the help of a virtual speech assistant like Google Assistant, Alexa, or Siri?
Targeted Advertising: If you thought Search was the most essential data science
use, consider this: the whole digital marketing spectrum. From display banners on
various websites to digital billboards at airports, data science algorithms are utilised
to identify almost anything.
Airline Route Planning: As a result of data science, it is easier to predict flight delays
for the airline industry, which is helping it grow.
Augmented Reality: Last but not least, the final data science applications appear to
be the most fascinating in the future.
01/24/2025 39
Example of Data Science
Here are some brief overviews of a couple of use cases, showing data science’s
versatility.
Law Enforcement: In this scenario, data science is used to help police in Belgium to
better understand where and when to deploy personnel to prevent crime.
Pandemic Fighting: The state of Rhode Island wanted to reopen schools, but was
naturally cautious, considering the ongoing COVID-19 pandemic. The state used data
science to expedite case investigations and contact tracing, enabling a small staff to
handle an overwhelming number of concerned calls from citizens. This information
helped the state set up a call center and coordinate preventative measures.
Driverless Vehicles: Lunewave, a sensor manufacturing company, was looking for a way
to make sensor technology more cost-effective and accurate. They turned to data
science and machine learning to train their sensors to be safer and more reliable, as well
as using data to improve their 3D-printed sensor manufacturing process.
Entertainment: Data science enables streaming services to follow and evaluate what
consumers view, which aids in the creation of new TV series and films.
01/24/2025 40
Example of Data Science
Finance: Banks and credit card firms mine and analyze data in order to detect
fraudulent activities, manage financial risks on loans and credit lines, and assess
client portfolios in order to uncover upselling possibilities.
Manufacturing: Data science applications in manufacturing include supply chain
management and distribution optimization, as well as predictive maintenance to
anticipate probable equipment faults in facilities before they occur.
Healthcare: Machine learning models and other data science components are
used by hospitals and other healthcare providers to automate X-ray analysis and
assist doctors in diagnosing illnesses and planning treatments based on previous
patient outcomes.
Retail: Retailers evaluate client behavior and purchasing trends in order to
provide individualized product suggestions as well as targeted advertising,
marketing, and promotions. Data science also assists them in managing product
inventories and supply chains in order to keep items in stock.
01/24/2025 41
Example of Data Science
Finance: Banks and credit card firms mine and analyze data in order to detect
fraudulent activities, manage financial risks on loans and credit lines, and assess
client portfolios in order to uncover upselling possibilities.
Manufacturing: Data science applications in manufacturing include supply chain
management and distribution optimization, as well as predictive maintenance to
anticipate probable equipment faults in facilities before they occur.
Healthcare: Machine learning models and other data science components are
used by hospitals and other healthcare providers to automate X-ray analysis and
assist doctors in diagnosing illnesses and planning treatments based on previous
patient outcomes.
Retail: Retailers evaluate client behavior and purchasing trends in order to
provide individualized product suggestions as well as targeted advertising,
marketing, and promotions. Data science also assists them in managing product
inventories and supply chains in order to keep items in stock.
01/24/2025 42