0% found this document useful (0 votes)
11 views51 pages

1-Need For Data Science-13!12!2024

This document discusses the importance of data science, its processes, and applications across various industries, emphasizing the need for effective data management and analysis. It outlines the components, tools, and skills required for data scientists, as well as the distinction between business intelligence and data science. The module concludes with a summary of data science's impact on daily life and its role in enhancing decision-making.

Uploaded by

Arth Agrawal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views51 pages

1-Need For Data Science-13!12!2024

This document discusses the importance of data science, its processes, and applications across various industries, emphasizing the need for effective data management and analysis. It outlines the components, tools, and skills required for data scientists, as well as the distinction between business intelligence and data science. The module concludes with a summary of data science's impact on daily life and its role in enhancing decision-making.

Uploaded by

Arth Agrawal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 51

Foundation of Data Science

Module-1
Importance of Data Science
• 1.1 Need for Data Science
• 1.2 What Is Data Science?
• 1.3 Data Science Process
• 1.4 Business Intelligence and Data Science
• 1.5 Prerequisites for a Data Scientist
• 1.6 Components of Data Science
• 1.7 Tools and Skills Needed
• 1.8 Summary
How much data is generated?
• Approximately 402.74 million terabytes of data are created each day
• Around 147 zettabytes of data will be generated this year
• 181 zettabytes of data will be generated in 2025
• Videos account for over half of internet data traffic
• The US has over 2,700 data centers
How much data is generated?
Proportion of Internet Data
Category
Traffic
Video 53.72%

Social 12.69%

Gaming 9.86%

Web browsing 5.67%

Messaging 5.35%

Marketplace 4.54%

File sharing 3.74%

Cloud 2.73%

VPN 1.39%
Audio 0.31%
Type of Media Amount per Minute Amount per Day

Emails sent 231.4 million 333.22 billion

Crypto purchased 90.2 million 129.89 billion

Texts sent 16 million 24.04 billion

Google searches 5.9 million 8.5 billion

Snaps shared on Snapchat 2.43 million 3.5 billion

Pieces of content shared on Facebook 1.7 million 2.45 billion

Swipes on Tinder 1.1 million 1.58 billion

Hours streamed 1 million 1.44 billion

USD spent on Amazon 443,000 637.92 million

USD sent on Venmo 437,600 630.14 million

Tweets shared on Twitter 347,200 499.97 million

Hours spent in Zoom meetings 104,600 150.62 million

USD spent on DoorDash 76,400 110.02 million


• YouTube
• Spotify
• Netflix
Number of Data
Rank Country Region
Centers

1 US North America 5,387

2 Germany Europe 517

3 UK Europe 513

4 China Asia 449

5 Canada North America 336

6 France Europe 315

7 Australia Oceania 308

8 Netherlands Europe 297

9 Russia Europe/Asia 251

10 Japan Asia 220


Need for Data Science
• Some years ago, data was less and mostly available in a structured
form, which could be easily stored in excel sheets, and processed
using BI tools.
• But in today's world, data is becoming so vast, i.e., approximately 2.5
quintals bytes of data is generating on every day, which led to data
explosion.
• While travelling by road, we can recognize lots of data being created,
for example, vehicles speed, traffic light switching, Google map, etc.
which get captured through satellites and transmitted to handheld
devices in real time.
Need for Data Science
• The problem is that we are not doing anything with the data.
We are not able to analyze due to inefficient scientific insights.
• We are creating data, but we are not utilizing it for behavioral
analysis or predictions. If we look at the corporate world, we
find that lots of data reports are generated.
Need for Data Science(cont..)
• Every Company requires data to work, grow, and improve
their businesses.
• Now, handling of such huge amount of data is a challenging
task for every organization.
• So to handle, process, and analysis of this, we required some
complex, powerful, and efficient algorithms and technology,
and that technology came into existence as data Science.
• Following are some main reasons for using data science
technology:
Need for Data Science(cont..)
• With the help of data science technology, we can convert the
massive amount of raw and unstructured data into meaningful
insights.
• Data science technology is opting by various companies, whether
it is a big brand or a startup. Google, Amazon, Netflix, etc, which
handle the huge amount of data, are using data science
algorithms for better customer experience.
• Data science is working for automating transportation such as
creating a self-driving car, which is the future of transportation.
• Data science can help in different predictions such as various
survey, elections, flight ticket confirmation, etc.
Need for Data Science(cont..)
• With the help of data science technology, we can convert the
massive amount of raw and unstructured data into meaningful
insights.
• Data science technology is opting by various companies, whether
it is a big brand or a startup. Google, Amazon, Netflix, etc, which
handle the huge amount of data, are using data science
algorithms for better customer experience.
• Data science is working for automating transportation such as
creating a self-driving car, which is the future of transportation.
• Data science can help in different predictions such as various
survey, elections, flight ticket confirmation, etc.
Need for Data Science
Data science and airline industries
Use of data science in an airline operating system
Use of data science in Fake News Detection
Need for Data Science(cont..)
The main purpose of data science is to compute
better decision making.
Data science domains
What Is Data Science?
Generation of Data Science in LinkedIn:
• The LinkedIn story is the best example for the role of a data
scientist.
• Jonathan Goldman joined LinkedIn in 2006.
• When he joined, LinkedIn was not as popular as it is today.
• LinkedIn management was just trying to get more people in to
connect by searching relative data of peoples.
• There were no activities, discussion in network as per the interest
of members, hence they had started leaving LinkedIn.
• He concluded something is missing in their data.
What Is Data Science?
Generation of Data Science in LinkedIn:
• Jonathan Goldman introduced the idea known as LinkedIn
recommendation system.
• Through the recommendation system, they target advertisements for
various products.
• By adding multiple facilities and interest domains, the LinkedIn
management system can realize that traffic got increased since people
started surfing, finding interest partners, jobs and business through
this system and then LinkedIn started flourishing.
• The job executed by Jonathan Goldman in the way he utilized the data
is the job of a data scientist.
What is Data Science?
• “Data science, also known as data-driven science, is an
interdisciplinary field of scientific methods, processes, algorithms and
systems to extract knowledge or insights from data in various forms,
either structured or unstructured, similar to data mining.”
• “Data science intends to analyze and understand actual phenomena
with ‘data’. In other words, the aim of data science is to reveal the
features or the hidden structure of complicated natural, human, and
social phenomena with data from a different point of view from the
established or traditional theory and method.
What Is Data Science?
• Data science is a deep study of the massive amount of data, which
involves extracting meaningful insights from raw, structured, and
unstructured data that is processed using the scientific method,
different technologies, and algorithms.
• It is a multidisciplinary field that uses tools and techniques to
manipulate the data so that you can find something new and
meaningful.
• Data science uses the most powerful hardware, programming
systems, and most efficient algorithms to solve the data related
problems. It is the future of artificial intelligence.
What Is Data Science?
• Asking the correct questions and analyzing the raw data.
• Modeling the data using various complex and efficient algorithms.
• Visualizing the data to get a better perspective.
• Understanding the data to make better decisions and finding the
final result.
Data Science Application
Data Science Process
Data Science Process
• The first step is asking the right question and exploring
the data.
The problem can be understood by asking the right
questions.
As per the available data for input, the data may be
incomplete for which, exploratory analysis can be done.
Or data cleansing can be done on raw data for accurate data
input which is a part of exploratory analysis.
Data Science Process
• Next step is the modeling .
Perform machine learning, decide which algorithm to use,
and through which model, and then train the model accordingly.
This is the modeling process, which then runs the data
through the model.
Data Science Process
• Next step is the modeling (cont..)
After run through model , the final result get visualized
and prepared the data in such a way to which includes
visualizing the results and preparing a way to
communicate the people can understand.
So it could be in the form of PowerPoint slides, or it
could be in the form of dashboard for proper
visualization, so appropriate communication for easily
understanding can be possible for intended users.
Business Intelligence
• Business intelligence (BI) is software that ingests business data
and presents it in user-friendly views such as reports, dashboards,
charts and graphs.

• Earlier the manufacturing and selling was managed through


enterprise resource planning ERP system or customer relation-
ship management (CRM) systems.

• ERP or CRM system works on relational database system, RDBMS


such as Oracle, SQL and MySQL which use structured data inputs.
Business Intelligence and Data Science
• The business intelligence uses structured data that reports on
dashboards. Data science in addition to structured data also uses a lot of
unstructured data, for example, Web or comments.
• Customer feedback, it may be structured if gathered in formatted
rubrics or may be unstructured if scripted for performance analysis.
That is, data sources may be in different formats.
• Business intelligence uses many statistics such as correlations and
regressions to predict what will be the sales maybe in the future. In the
case of data science, skills are many more as compared to
business intelligence.
Business Intelligence and Data Science
The focus of business intelligence is pretty much on historical
data.
In data science, the historical data is combined it with some
other required information and try to predict the future.
Difference between BI and Data Science
Business intelligence vs data science
Business Intelligence Data Science

Objectives Focuses on identifying historical trends; answers Extracts information from datasets and
questions such as what happened during the last creating forecasts; answers the question of
period and what trends are developing what will happen or which is the most likely
outcome

Skills requirements Basic statistics and business knowledge, as well More technical skillset like coding, data
as data transformation and visualization skills mining, as well as more advanced statistics
and domain knowledge

Data collection and Designed to manage well-organized data Designed to manage a large volume of
management dynamic and less structured data

Complexity More practical in daily business management; More complex in terms of capacity for
less costly and requires fewer resources forecasting, ability to manage dynamic data,
and requirements for more advanced skills
Prerequisite for Data Science
Non-Technical Prerequisite
Curiosity
Critical Thinking
Communication skills
Technical Prerequisite
Machine learning
Mathematical modelling
Statistics
Computer programming
Databases
Non-Technical Prerequisite
Curiosity: To learn data science, one must have curiosities. When
you have curiosity and ask various questions, then you can
understand the business problem easily.
Critical Thinking: It is also required for a data scientist so that you
can find multiple new ways to solve the problem with efficiency.
Communication skills: Communication skills are most important
for a data scientist because after solving a business problem, you
need to communicate it with the team.
Non-Technical Prerequisite
Technical Prerequisite
Machine learning: To understand data science, one needs to understand
the concept of machine learning. Data science uses machine learning
algorithms to solve various problems.
Mathematical modeling: Mathematical modeling is required to make fast
mathematical calculations and predictions from the available data.
Statistics: Basic understanding of statistics is required, such as mean,
median, or standard deviation. It is needed to extract knowledge and
obtain better results from the data.
Computer programming: For data science, knowledge of at least one
programming language is required. R, Python, Spark are some required
computer programming languages for data science.
Databases: The depth understanding of Databases such as SQL, is
essential for data science to get the data and to work with data.
Data Science Components
Data Science Components
Data Science Components
1.Statistics: Statistics is one of the most important components of data
science. Statistics is a way to collect and analyze the numerical data in a large
amount and finding meaningful insights from it.

2. Domain Expertise: In data science, domain expertise binds data science


together. Domain expertise means specialized knowledge or skills of a
particular area. In data science, there are various areas for which we need
domain experts.

3. Data engineering: Data engineering is a part of data science, which


involves acquiring, storing, retrieving, and transforming the data. Data
engineering also includes metadata (data about data) to the data.
Data Science Components(cont..)
4. Visualization: Data visualization is meant by representing data in a visual
context so that people can easily understand the significance of data. Data
visualization makes it easy to access the huge amount of data in visuals.
5. Advanced computing: Heavy lifting of data science is advanced computing.
Advanced computing involves designing, writing, debugging, and maintaining
the source code of computer programs
6.Mathematics: Mathematics is the critical part of data science. Mathematics
involves the study of quantity, structure, space, and changes. For a data
scientist, knowledge of good mathematics is essential.
7. Machine learning: Machine learning is backbone of data science. Machine
learning is all about to provide training to a machine so that it can act as a
human brain. In data science, we use various machine learning algorithms to
solve the problems.
Tools and Skills Needed
Tools and Skills Needed
• Tools Needed
Data Analysis tools: R, Python, Statistics, SAS, Jupyter, R Studio,
MATLAB, Excel, RapidMiner.

Data Warehousing: ETL, SQL, Hadoop, Informatica/Talend, AWS


Redshift

Data Visualization tools: R, Jupyter, Tableau, Cognos.

Machine learning tools: Spark, Mahout, Azure ML studio.


Tools and Skills Needed

• Skills Needed
To become a data scientist, one should have technical language
skills such as R, SAS, SQL, Python, Hive, Pig, Apache spark,
MATLAB. Data scientists must have an understanding of
Statistics, Mathematics, visualization, and communication
skills.
Summary

• This module focuses on the need of data science and its impact on daily
life.
• The concept of data science is elaborated in great detail with its
applications in autonomous cars, airline industries, logistics, digital
marketing, and other possible data science domains.
• The data science process is defined precisely with an illustration of the
role of data science in business intelligence.
• The roles and responsibilities of data scientists, components of data
science, and the tools and skills needed to execute data science-based
applications are explored.

You might also like