0% found this document useful (0 votes)

5 views13 pages

Data Science Unit I

Unit I of the Data Science course introduces the concept of data science, its lifecycle, and the importance of big data. It covers the stages of data science, prerequisites for learning, and the differences between structured and unstructured data. The document also discusses the current scenario of data science, including its applications, ethical considerations, and the demand for skilled professionals.

Uploaded by

Safior

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views13 pages

Data Science Unit I

Uploaded by

Safior

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

DATA SCIENCE UNIT-I

S
Y
.

MR.PRAMOD JADHAO
DATA SCIENCE (UNIT – I)

Unit I Introduction to Data Science

What Is Data Science?
Data science is the domain of study that deals with vast volumes of data using modern tools and techniques
to find unseen patterns, derive meaningful information, and make business decisions. Data science uses
complex machine learning algorithms to build predictive models.
The data used for analysis can come from many different sources and presented in various formats.
The Data Science Lifecycle / Data Science Process
Now that you know what is data science, next up let us focus on the data science lifecycle. Data science’s
lifecycle
consists of five distinct stages, each with its own tasks:
1. Capture: Data Acquisition, Data Entry, Signal Reception, Data Extraction. This stage involves
gathering raw structured and unstructured data.
2. Maintain: Data Warehousing, Data Cleansing, Data Staging, Data Processing, Data Architecture.
This stage covers taking the raw data and putting it in a form that can be used.
3. Process: Data Mining, Clustering/Classification, Data Modeling, Data Summarization. Data
scientists take the prepared data and examine its patterns, ranges, and biases to determine how useful
it will be in predictive analysis.
4. Analyze: Exploratory/Confirmatory, Predictive Analysis, Regression, Text Mining, Qualitative
Analysis. Here is the real meat of the lifecycle. This stage involves performing the various analyses
on the data.
5. Communicate: Data Reporting, Data Visualization, Business Intelligence, Decision Making. In this
final step, analysts prepare the analyses in easily readable forms such as charts, graphs, and reports.
Prerequisites for Data Science
Here are some of the technical concepts you should know about before starting to learn what is data science.
1. Machine Learning: Machine learning is the backbone of data science. Data Scientists need to have a
solid grasp of ML in addition to basic knowledge of statistics.
2. Modeling: Mathematical models enable you to make quick calculations and predictions based on
what you already know about the data. Modeling is also a part of Machine Learning and involves
identifying which algorithm is the most suitable to solve a given problem and how to train these
models.
3. Statistics: Statistics are at the core of data science. A sturdy handle on statistics can help you extract
more intelligence and obtain more meaningful results.
4. Programming: Some level of programming is required to execute a successful data science project.
The most common programming languages are Python, and R. Python is especially popular because
it’s easy to learn, and it supports multiple libraries for data science and ML.
5. Databases: A capable data scientist needs to understand how databases work, how to manage them,
and how to extract data from them.
Need of Data Science….
The principal purpose of Data Science is to find patterns within data. It uses various statistical techniques to
analyze and draw insights from the data. From data extraction, wrangling and pre-processing, a Data
Scientist must examine the data systematically.
Mr.Pramod Jadhao
DATA SCIENCE (UNIT – I)
Then, he has the responsibility of making predictions from the data. The goal of a Data Scientist is to derive
conclusions from the data. Through these conclusions, he is able to assist companies in making smarter
business decisions.

Mr.Pramod Jadhao
DATA SCIENCE (UNIT – I)

Big Data
What is Big Data?
Big Data literally means large amounts of data. Big data is the pillar behind the idea that one can make useful
inferences with a large body of data that wasn’t possible before with smaller datasets. So extremely large
data sets may be analyzed computationally to reveal patterns, trends, and associations that are not transparent
or easy to identify.
Why is everyone interested in Big Data?
Definition: Refers to large volumes of structured, semi-structured, and unstructured data that require
advanced tools and techniques for processing and analysis
Big data is everywhere!
Every time you go to the web and do something that data is collected, every time you buy something from
one of the e-commerce your data is collected. Whenever you go to store data is collected at the point of sale,
when you do Bank transactions that data is there, when you go to Social networks like Facebook, Twitter
that data is collected. Now, these are more social data, but the same thing is starting to happen with real
engineering plants. Real-time data is collected from plants all over the world. Not only these if you are doing
much more sophisticated simulation, molecular simulations, which generates tons of data that is also
collected and stored.

Characteristics:

o Volume: Massive amounts of data generated at high velocity.

o Velocity: Data is generated and processed rapidly.
o Variety: Diverse types of data including text, images, videos, sensor data, etc.
o Veracity: Concerns the accuracy and reliability of data.

How much data is Big Data?

Google processes 20 Petabytes(PB) per day (2008)
Facebook has 2.5 PB of user data + 15 TB per day (2009)
eBay has 6.5 PB of user data + 50 TB per day (2009)
CERN’s Large Hadron Collider(LHC) generates 15 PB a year
So one of the reasons for the acceleration of data science in recent years is the enormous volume of data (e.g
Big Data) currently available and being generated. Not only are huge amounts of data being collected about
many aspects of the world and our lives, but we concurrently have the rise of inexpensive computing. This
has formed the perfect storm in which we have rich data and the tools to analyze it. Advancing computer
memory capacities, more enhanced software, more competent processors, and now, more numerous data
scientists with the skills to put this to use and solve questions using the data! And that’s the big reason why
do we need data science in the future.

Mr.Pramod Jadhao
DATA SCIENCE (UNIT – I)
Difference between big data and little data

Feature Little Data Big Data

Technology Traditional Modern
Collection Generally, it is obtained in The Big Data collection is done by using pipelines
an organized manner than having queues like AWS Kinesis or Google Pub /
is Sub
inserted into the database to balance high-speed data
Volume Data in the range of tens or Size of Data is more than Terabytes
hundreds of Gigabytes
Analysis Data marts(Analysts) Clusters(Data Scientists), Data marts(Analysts)
Areas
Quality Contains less noise as data is less Usually, the quality of data is not guaranteed
collected in a controlled manner
Processing It requires batch-oriented It has both batch and stream processing pipelines
processing pipelines
Database SQL NoSQL
Velocity A regulated and constant flow of Data arrives at extremely high speeds, large volumes
data, data aggregation is slow of data aggregation in a short time
Structure Structured data in tabular Numerous variety of data set including tabular data,
format with fixed text, audio, images, video, logs, JSON etc.(Non
schema(Relational) Relational)
Scalability They are usually vertically scaled They are mostly based on horizontally scaling
architectures, which gives more versatility at a lower
cost
Query only Sequel Python, R, Java, Sequel
Language
Hardware A single server is sufficient Requires more than one server
Value Business Intelligence, analysis and Complex data mining techniques for pattern finding,
reporting recommendation, prediction etc.
Optimization Data can be optimized Requires machine learning techniques for data
manually(human powered) optimization
Storage Storage within enterprises, local Usually requires distributed storage systems on cloud
servers etc. or in external file systems
People Data Analysts, Database Data Scientists, Data Analysts, Database
Administrators and Data Engineers Administrators and Data Engineers
Security Security practices for Small Securing Big Data systems are much more
Data include user privileges, complicated. Best security practices include data
data encryption, hashing, etc. encryption, cluster network isolation, strong access
control protocols etc.
Nomenclature Database, Data Warehouse, Data Data Lake
Mart
Infrastructure Predictable resource allocation, More agile infrastructure with horizontally scalable
mostly vertically hardware
scalable hardware.

Mr.Pramod Jadhao
DATA SCIENCE (UNIT – I)

The current Scenario of data science

Expansion of Applications: Data science is increasingly being applied across diverse domains such as
healthcare, finance, retail, manufacturing, and transportation. It plays a crucial role in optimizing processes,
improving decision-making, and driving innovation.

Integration with AI and Machine Learning: Data science heavily intersects with artificial intelligence
(AI) and machine learning (ML). AI and ML algorithms are utilized for predictive analytics, pattern
recognition, natural language processing (NLP), and computer vision tasks, among others.

Big Data Infrastructure: With the proliferation of big data, data science projects often involve managing
and analysing large datasets using distributed computing frameworks like Hadoop and Spark, as well as
cloud-based solutions provided by Amazon Web Services (AWS), Google Cloud Platform (GCP), and
Microsoft Azure.

Focus on Ethical and Responsible AI: There is a growing emphasis on ethical considerations in data
science and AI applications. Issues such as bias in algorithms, data privacy, transparency, and
accountability are gaining attention, leading to frameworks and guidelines being developed to address these
concerns.

Emerging Technologies: Data science is embracing emerging technologies such as edge computing,
Internet of Things (IoT), and block chain, which generate new types of data and require innovative
approaches for analysis and integration.

Interdisciplinary Collaboration: Data science teams often consist of professionals with diverse
backgrounds in statistics, mathematics, computer science, domain expertise (e.g., healthcare, finance), and
business acumen. Collaborative efforts are essential for successful implementation and deployment of data-
driven solutions.

Demand for Data Professionals: There is a high demand for skilled data scientists, data engineers, and
analysts across industries. Organizations are investing in building data science capabilities to gain
competitive advantage and drive growth.

Education and Training: Educational institutions and online platforms offer a wide range of courses and
programs in data science, catering to individuals seeking to enter or advance their careers in this field.
Continuous learning and upskilling are essential due to the rapid pace of technological change.

Visualization and Communication: Effective data visualization and communication skills are crucial for
data scientists to convey insights and recommendations to stakeholders, aiding in decision-making
processes.

Regulatory Landscape: Data science practices are influenced by regulatory frameworks such as GDPR
(General Data Protection Regulation) in Europe and similar data protection laws globally. Compliance with
these regulations is essential for ethical data handling and user privacy.

Structured data?
Structured data — typically categorized as quantitative data — is highly organized and easily decipherable
by machine learning algorithms. Developed by IBM in 1974, structured query language (SQL) is the
programming language used to manage structured data. By using a relational (SQL) database, business users
can quickly input, search and manipulate structured data.
Pros and cons of structured data
Examples of structured data include dates, names, addresses, credit card numbers, etc. Their benefits are
Mr.Pramod Jadhao
DATA SCIENCE (UNIT – I)
tied to ease of use and access, while liabilities revolve around data inflexibility:
Pros
 Easily used by machine learning (ML) algorithms: The specific and organized architecture of
structured data eases manipulation and querying of ML data.
 Easily used by business users: Structured data does not require an in-depth understanding of different
types of data and how they function. With a basic understanding of the topic relative to the data, users
can easily access and interpret the data.
 Accessible by more tools: Since structured data predates unstructured data, there are more tools
available for using and analyzing structured data.
Cons
 Limited usage: Data with a predefined structure can only be used for its intended purpose, which
limits its flexibility and usability.
 Limited storage options: Structured data is generally stored in data storage systems with rigid
schemas (e.g., “data warehouses”). Therefore, changes in data requirements necessitate an update of
all structured data, which leads to a massive expenditure of time and resources.

Example of structured data in a tabular format:

Consider a simple table representing sales data for a fictional company:

Order Customer Name Product Name Quantity Unit Price Total Amount Order Date
ID
1001 John Doe Laptop 2 $1200 $2400 2024-07-10
1002 Jane Smith Smartphone 1 $800 $800 2024-07-11
1003 David Brown Tablet 3 $500 $1500 2024-07-12

Structured data tools

 OLAP: Performs high-speed, multidimensional data analysis from unified, centralized data stores.
 SQLite: Implements a self-contained, server-less, zero-configuration, transactional relational
database engine.
 MySQL: Embeds data into mass-deployed software, particularly mission-critical, heavy-load
production system.
 PostgreSQL: Supports SQL and JSON querying as well as high-tier programming languages
(C/C+, Java, Python, etc.).
Use cases for structured data
 Customer relationship management (CRM): CRM software runs structured data through analytical
tools to create datasets that reveal customer behavior patterns and trends.
 Online booking: Hotel and ticket reservation data (e.g., dates, prices, destinations, etc.) fits the
“rows and columns” format indicative of the pre-defined data model.
 Accounting: Accounting firms or departments use structured data to process and record financialtransactions.

Mr.Pramod Jadhao
DATA SCIENCE (UNIT – I)

Unstructured data?
Unstructured data, typically categorized as qualitative data, cannot be processed and analyzed via
conventional data tools and methods. Since unstructured data does not have a predefined data model, it is
best managed in non-relational (NoSQL) databases. Another way to manage unstructured data is to use data
lakes to preserve it in raw form.
The importance of unstructured data is rapidly increasing. Recent projections indicate that unstructured data
is over 80% of all enterprise data, while 95% of businesses prioritize unstructured data management.
Pros and cons of unstructured data
Examples of unstructured data include text, mobile activity, social media posts, Internet of Things (IoT)
sensor data, etc. Their benefits involve advantages in format, speed and storage, while liabilities revolve
around expertise and available resources:
Pros
 Native format: Unstructured data, stored in its native format, remains undefined until needed. Its
adaptability increases file formats in the database, which widens the data pool and enables data
scientists to prepare and analyze only the data they need.
 Fast accumulation rates: Since there is no need to predefine the data, it can be collected quickly and
easily.
 Data lake storage: Allows for massive storage and pay-as-you-use pricing, which cuts costs and eases
scalability.
Cons
 Requires expertise: Due to its undefined/non-formatted nature, data science expertise is required to
prepare and analyze unstructured data. This is beneficial to data analysts but alienates unspecialized
business users who may not fully understand specialized data topics or how to utilize their data.
 Specialized tools: Specialized tools are required to manipulate unstructured data, which limits
product choices for data managers.
Unstructured data tools
 MongoDB: Uses flexible documents to process data for cross-platform applications and services.
 DynamoDB: Delivers single-digit millisecond performance at any scale via built-in security, in-
memory caching and backup and restore.
 Hadoop: Provides distributed processing of large data sets using simple programming models and
no formatting requirements.
 Azure: Enables agile cloud computing for creating and managing apps through Microsoft’s data
centers.
Use cases for unstructured data
 Data mining: Enables businesses to use unstructured data to identify consumer behavior, product
sentiment, and purchasing patterns to better accommodate their customer base.
 Predictive data analytics: Alert businesses of important activity ahead of time so they can properly
plan and accordingly adjust to significant market shifts.
 Chatbots: Perform text analysis to route customer questions to the appropriate answer sources.

Mr.Pramod Jadhao
DATA SCIENCE (UNIT – I)

Categorical data?
Qualitative variables measure attributes that can be given only as a property of the variables. The political
affiliation of a person, nationality of a person, the favorite color of a person, and the blood group of a
patient can only be measured using qualitative attributes of each variable. Often these variables have
limited number of possibilities and assume only one of the possible outcomes; i.e. the value is one of the
given categories.
Therefore, these are commonly known as categorical variables. These possible values can be numbers,
letters, names, or any symbol.

Quantitative data?
Quantitative variable records the attributes that can be measured by a magnitude or size; i.e., quantifiable.
Variables measuring temperature, weight, mass or the height of a person or the annual income of a
household are quantitative variables. Not only all the values of these variables are numbers, but each number
gives a sense of value too.
The data in quantitative type belong to either one of the three following types; Ordinal, Interval, and Ratio.
Categorical data always belong to the nominal type. Above mentioned types are formally known as levels
of measurement, and closely related to the way the measurements are made and the scale of each
measurement.

Mr.Pramod Jadhao
DATA SCIENCE (UNIT – I)
Since the form of the data in the two categories is different, different techniques and methods are employed
when gathering, analyzing, and describing.

What is the Difference Between Categorical and Quantitative data?

Definitions of Categorical and Quantitative data:
 Quantitative data are information that has a sensible meaning when referring to its magnitude.
 Categorical data are often information that takes values from a given set of categories or groups.
Characteristics of Categorical and Quantitative data:
Class of measurement:
 Quantitative data belong to ordinal, interval, or ratio classes of measurements.
 Categorical data belong to the nominal class of measurements.
Methods:
 Methods used to analyze quantitative data are different from the methods used for categorical
data, even if the principles are the same, at least the application have significant differences.
Analysis:
 Quantitative data are analyzed using statistical methods in descriptive statistics, regression,
time series, and many more.
 For categorical data, usually descriptive methods and graphical methods are employed. Some
non- parametric tests are also used.

Mr.Pramod Jadhao
DATA SCIENCE (UNIT – I)

Roles & Responsibilities of a Data Scientist

 Management: The Data Scientist plays an insignificant managerial role where he supports the
construction of the base of futuristic and technical abilities within the Data and Analytics field in
order to assist various planned and continuing data analytics projects.
 Analytics: The Data Scientist represents a scientific role where he plans, implements, and assesses
high-level statistical models and strategies for application in the business’s most complex issues.
The Data Scientist develops econometric and statistical models for various problems including
projections, classification, clustering, pattern analysis, sampling, simulations, and so forth.
 Strategy/Design: The Data Scientist performs a vital role in the advancement of innovative
strategies to understand the business’s consumer trends and management as well as ways to solve
difficult business problems, for instance, the optimization of product fulfillment and entire profit.
 Collaboration: The role of the Data Scientist is not a solitary role and in this position, he
collaborates with superior data scientists to communicate obstacles and findings to relevant
stakeholders in an effort to enhance drive business performance and decision-making.
 Knowledge: The Data Scientist also takes leadership to explore different technologies and tools
with the vision of creating innovative data-driven insights for the business at the most agile pace
feasible. In this situation, the Data Scientist also uses initiative in assessing and utilizing new and
enhanced data science methods for the business, which he delivers to senior management of
approval.
 Other Duties: A Data Scientist also performs related tasks and tasks as assigned by the Senior
Data Scientist, Head of Data Science, Chief Data Officer, or the Employer.

Difference Between Data Scientist, Data Analyst, and Data Engineer

Data Scientist, Data Engineer, and Data Analyst are the three most common careers in data science. So
let’s understand who’s data science by comparing it with its similar jobs.

Data Scientist Data Analyst Data Engineer

The focus will be on the The main focus of a data Data Engineers focus on
futuristic display of analyst is on optimization of optimization techniques and the
data. scenarios, for example how an construction of data in a
employee can enhance the conventional manner. The purpose
company’s product growth. of a data engineer is continuously
advancing data
consumption.
Data scientists present both Data formation and cleaning Frequently data engineers operate at
supervised and unsupervised of raw data, interpreting and the back end. Optimized machine
learning of data, say visualization of data to learning algorithms were used for
regression and classification of perform the analysis and to keeping data and making data to be
data, perform the prepared most accurately.
Neural networks, etc. technical summary of data.
Skills required for Data Skills required for Data Skills required for Data Engineer are
Scientist are Python, R, SQL, Analyst are Python, R, SQL, MapReduce, Hive, Pig Hadoop,
Pig, SAS, Apache Hadoop, SAS. techniques.
Java, Perl,
Spark.

Mr.Pramod Jadhao
DATA SCIENCE (UNIT – I)

Some Inspiring Data Scientists

The variety of areas in which data science is used is embodied by looking at examples of data scientists.
 Hilary Mason: She is the co-founder of Fast Forward labs, a machine learning company recently
owned by Cloudera, a data science company. She is a Data Scientist at Accel. Broadly, she works
with data to solve questions about mining the web and also learning the method that how people
communicate with each other through social media.
 Nate Silver: He is one of the most prominent data scientists or statisticians in the world today. He
is the founder of FiveThirtyEight. FiveThirtyEight is a website that applies statistical analysis to
tell compelling stories about elections, politics, sports, science, and lifestyle. He utilizes huge
amounts of public data to predict a diversity of topics; most prominently he predicts who will win
elections in the
U.S. and has an extraordinary track record for accuracy in doing so.
 Daryl Morey: He is the general manager of a US basketball team, the Houston Rockets. He was
awarded the job as GM based on his bachelor’s degree in computer science and his M.B.A. from
M.I.T.

Mr.Pramod Jadhao

Ocs353dsf Unit Wise Notes
100% (2)
Ocs353dsf Unit Wise Notes
121 pages
Seminar On Data Science
100% (7)
Seminar On Data Science
25 pages
Belden Electronic Wire Catalog No 864
No ratings yet
Belden Electronic Wire Catalog No 864
28 pages
Service Manual Ergoselect 1 Rev01 en PDF
No ratings yet
Service Manual Ergoselect 1 Rev01 en PDF
52 pages
Macrame Manual
100% (1)
Macrame Manual
79 pages
Introduction To Datasciecne
No ratings yet
Introduction To Datasciecne
50 pages
Data Science Unit-I
No ratings yet
Data Science Unit-I
13 pages
Fundamentals of Data Science
100% (3)
Fundamentals of Data Science
62 pages
Dsbda Unit 1
No ratings yet
Dsbda Unit 1
119 pages
Unit-1 IDS
No ratings yet
Unit-1 IDS
26 pages
20IT501_BDA_Unit1
No ratings yet
20IT501_BDA_Unit1
18 pages
IDS Complete Notes
No ratings yet
IDS Complete Notes
126 pages
Data Science Unit 1
No ratings yet
Data Science Unit 1
70 pages
IDS-UNIT-1-FINAL (1)
No ratings yet
IDS-UNIT-1-FINAL (1)
30 pages
Fds Module 1
No ratings yet
Fds Module 1
65 pages
AIDS C04-Session-19
No ratings yet
AIDS C04-Session-19
29 pages
mod 3
No ratings yet
mod 3
96 pages
Session 1819
No ratings yet
Session 1819
47 pages
DataScience-Intro
No ratings yet
DataScience-Intro
36 pages
CH1 Introduction To Data Science BS
No ratings yet
CH1 Introduction To Data Science BS
69 pages
Data
No ratings yet
Data
43 pages
Unit1 R Full Material
No ratings yet
Unit1 R Full Material
11 pages
Foundations of Data Science PPT TEXT BOOK
No ratings yet
Foundations of Data Science PPT TEXT BOOK
132 pages
Data Science - FYBCA-Sem-II
No ratings yet
Data Science - FYBCA-Sem-II
13 pages
UNIT 1 PPT 1
No ratings yet
UNIT 1 PPT 1
27 pages
EDS Unit 1?
No ratings yet
EDS Unit 1?
15 pages
FDS - Lecture Notes - III AIML, CSM
No ratings yet
FDS - Lecture Notes - III AIML, CSM
101 pages
Screenshot 2025-04-23 at 8.26.12 AM
No ratings yet
Screenshot 2025-04-23 at 8.26.12 AM
14 pages
M-1-FDS-NOTES-PPT (2) (1)
No ratings yet
M-1-FDS-NOTES-PPT (2) (1)
19 pages
Defining Data Science
100% (1)
Defining Data Science
167 pages
unit-1 .ds
No ratings yet
unit-1 .ds
30 pages
Data Science
No ratings yet
Data Science
85 pages
Data Science
No ratings yet
Data Science
244 pages
Datascience
75% (8)
Datascience
28 pages
2 Data Science Process 06-01-2024
No ratings yet
2 Data Science Process 06-01-2024
32 pages
Unit 3 Part 1
No ratings yet
Unit 3 Part 1
43 pages
unit 1 final (1)
No ratings yet
unit 1 final (1)
75 pages
Data Science: Chapter 1: Introduction To Big Data
100% (2)
Data Science: Chapter 1: Introduction To Big Data
77 pages
Assignament
No ratings yet
Assignament
4 pages
ChatGPT_MyLearning on Big Data, Data Science and Machine Learning
No ratings yet
ChatGPT_MyLearning on Big Data, Data Science and Machine Learning
44 pages
Basic of ds
No ratings yet
Basic of ds
14 pages
FDS CH1
No ratings yet
FDS CH1
4 pages
Vishwha D
No ratings yet
Vishwha D
29 pages
Lecture 1 What Is Data Science Prerequisites, Lifecycle and Applications Simplilearn
No ratings yet
Lecture 1 What Is Data Science Prerequisites, Lifecycle and Applications Simplilearn
5 pages
Data Science M-1 Notes
No ratings yet
Data Science M-1 Notes
34 pages
Unit 1
No ratings yet
Unit 1
60 pages
AD3491 UNIT 1 NOTES EduEngg
100% (1)
AD3491 UNIT 1 NOTES EduEngg
35 pages
DataScience Intro
No ratings yet
DataScience Intro
36 pages
Data Science (UNIT 1)
No ratings yet
Data Science (UNIT 1)
31 pages
Unit I
No ratings yet
Unit I
61 pages
Unit 1-FDS
No ratings yet
Unit 1-FDS
18 pages
Ds Unit 1
No ratings yet
Ds Unit 1
18 pages
Lecture 1 & 2
No ratings yet
Lecture 1 & 2
53 pages
Data Science Class Lecture
No ratings yet
Data Science Class Lecture
22 pages
Unit-1 - IDS
No ratings yet
Unit-1 - IDS
29 pages
Lecture 2-Quick Overview of Data Science
No ratings yet
Lecture 2-Quick Overview of Data Science
18 pages
Inroduction To Data Science
No ratings yet
Inroduction To Data Science
62 pages
Emerging Chapter 2
No ratings yet
Emerging Chapter 2
26 pages
Handbook Introduction of Data Science AY 23-24
No ratings yet
Handbook Introduction of Data Science AY 23-24
171 pages
22UCS303 DS-Unit I-N
No ratings yet
22UCS303 DS-Unit I-N
42 pages
Learn Hadoop in 24 Hours
From Everand
Learn Hadoop in 24 Hours
Alex Nordeen
No ratings yet
The Power of Big Data: Transforming Industries and Shaping the Future
From Everand
The Power of Big Data: Transforming Industries and Shaping the Future
Tom Henricksen
No ratings yet
PYTHON DATA ANALYTICS: Mastering Python for Effective Data Analysis and Visualization (2024 Beginner Guide)
From Everand
PYTHON DATA ANALYTICS: Mastering Python for Effective Data Analysis and Visualization (2024 Beginner Guide)
FLOYD BAX
No ratings yet
Paper Craft 280 Moldes
83% (6)
Paper Craft 280 Moldes
424 pages
Capital Intensity
No ratings yet
Capital Intensity
2 pages
Viscosity Lab Report
100% (1)
Viscosity Lab Report
7 pages
ENTREP MODULE 9 PESTANO Presented
No ratings yet
ENTREP MODULE 9 PESTANO Presented
33 pages
Subject Matter
No ratings yet
Subject Matter
7 pages
Immediate Download Encyclopedia of Policy Studies Second Edition, Revised and Expanded Edition Nagel Ebooks 2024
100% (3)
Immediate Download Encyclopedia of Policy Studies Second Edition, Revised and Expanded Edition Nagel Ebooks 2024
52 pages
Buying and Selling: and Net Profit/Loss
No ratings yet
Buying and Selling: and Net Profit/Loss
19 pages
Power Generation: Schematic Diagram of Diesel Power Station
No ratings yet
Power Generation: Schematic Diagram of Diesel Power Station
7 pages
WLP-in-English-8 - W10 - Q1
No ratings yet
WLP-in-English-8 - W10 - Q1
2 pages
Project File Finalk
No ratings yet
Project File Finalk
30 pages
Catalog IDSI TYCO 2
No ratings yet
Catalog IDSI TYCO 2
196 pages
Study on Essential Newborn Care
No ratings yet
Study on Essential Newborn Care
145 pages
Revised Exemplar - Garcia
No ratings yet
Revised Exemplar - Garcia
11 pages
Iloilo City Regulation Ordinance 2017-148
No ratings yet
Iloilo City Regulation Ordinance 2017-148
3 pages
AE8302-Elements of Aeronautical Engineering: Reg. No
No ratings yet
AE8302-Elements of Aeronautical Engineering: Reg. No
2 pages
Receipt / Resit
No ratings yet
Receipt / Resit
2 pages
Electronic Devices and Circuits.
No ratings yet
Electronic Devices and Circuits.
8 pages
Automotive Industry Agenda
No ratings yet
Automotive Industry Agenda
88 pages
Single Case Research Design and Analysis New Directions for Psychology and Education 1st Edition Thomas R Kratochwill Joel R Levin Editors - The ebook in PDF format is available for download
100% (1)
Single Case Research Design and Analysis New Directions for Psychology and Education 1st Edition Thomas R Kratochwill Joel R Levin Editors - The ebook in PDF format is available for download
72 pages
1.1 System of Units - ME 303-CE31S8 - Engineering Utilities 2 (Basic Mechanical Engineering)
No ratings yet
1.1 System of Units - ME 303-CE31S8 - Engineering Utilities 2 (Basic Mechanical Engineering)
2 pages
MUSC 1004 Test 3 Study Guide
No ratings yet
MUSC 1004 Test 3 Study Guide
2 pages
Mental Status Examination Quick Guide
No ratings yet
Mental Status Examination Quick Guide
3 pages
ATTACHMENT J-Risk Management Report
No ratings yet
ATTACHMENT J-Risk Management Report
28 pages
BMH-FED-SYN-00-L0-DS-001 - Datasheet For Manual Valve - Rev A - AN
No ratings yet
BMH-FED-SYN-00-L0-DS-001 - Datasheet For Manual Valve - Rev A - AN
20 pages
West Bengal State University: Barasat, North 24 Parganas
No ratings yet
West Bengal State University: Barasat, North 24 Parganas
1 page
S10 - Q3 - Answer Key 3
100% (1)
S10 - Q3 - Answer Key 3
6 pages

Data Science Unit I

Uploaded by

Data Science Unit I

Uploaded by

DATA SCIENCE UNIT-I

Unit I Introduction to Data Science

o Volume: Massive amounts of data generated at high velocity.

How much data is Big Data?

Feature Little Data Big Data

The current Scenario of data science

Example of structured data in a tabular format:

Consider a simple table representing sales data for a fictional company:

Structured data tools

What is the Difference Between Categorical and Quantitative data?

Roles & Responsibilities of a Data Scientist

Difference Between Data Scientist, Data Analyst, and Data Engineer

Data Scientist Data Analyst Data Engineer

Some Inspiring Data Scientists

You might also like