0% found this document useful (0 votes)
4 views

DSA Lecture1 (1)

Data science combines domain expertise, programming skills, and knowledge of mathematics and statistics to extract insights from structured and unstructured data. The data science lifecycle includes stages such as understanding business requirements, data collection, preparation, feature extraction, model building, evaluation, and deployment. Key applications of data science span various industries, including demand prediction, customer analytics, and predictive analytics in healthcare.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

DSA Lecture1 (1)

Data science combines domain expertise, programming skills, and knowledge of mathematics and statistics to extract insights from structured and unstructured data. The data science lifecycle includes stages such as understanding business requirements, data collection, preparation, feature extraction, model building, evaluation, and deployment. Key applications of data science span various industries, including demand prediction, customer analytics, and predictive analytics in healthcare.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 15

Data Science Applications

TCT382
Introduction
Data Science
Data science is the field of study that combines
Domain expertise
programming skills
knowledge of mathematics and statistics
to extract meaningful insights from raw, structured, and
unstructured data.
Structured data is highly specific and is stored in a predefined
format, whereas unstructured data is a conglomeration of many
varied types of data that are stored in their native formats.
Data Science
Data science uses the
Hardware
Programming systems
Efficient algorithms
to solve the data related problems.
Data science is all about:
•Asking the correct questions and analyzing the raw data.
•Modeling the data using various complex and efficient algorithms.
•Visualizing the data to get a better perspective.
•Understanding the data to make better decisions and finding the
final result.
Companies are expanding as fast as the data!
There's certainly a lot of
Data, data everywhere… it!

1 Zettabyte 1.8 ZB 8.0 ZB

logarithmic scale
800 EB

Data produced each year


161 EB

5 EB
1 Exabyte

120 PB

100-years of HD video + audio


60 PB
Human brain's capacity
1 Petabyte 14 PB

1 Petabyte == 1000 TB 2002 2006 2009 2011 2015


1 TB = 1000 GB
References
(2015) 8 ZB: http://www.emc.com/collateral/analyst-reports/idc-extracting-value-from-chaos-ar.pdf (2002) 5 EB: http://www2.sims.berkeley.edu/research/projects/how-much-info-2003/execsum.htm
(2011) 1.8 ZB: http://www.emc.com/leadership/programs/digital-universe.htm (life in video) 60 PB: in 4320p resolution, extrapolated from 16MB for 1:21 of 640x480 video
(2009) 800 EB: http://www.emc.com/collateral/analyst-reports/idc-digital-universe-are-you-ready.pdf (w/sound) – almost certainly a gross overestimate, as sleep can be compressed significantly!

(2006) 161 EB: http://www.emc.com/collateral/analyst-reports/expanding-digital-idc-white-paper.pdf (brain) 14 PB: http://www.quora.com/Neuroscience-1/How-much-data-can-the-human-brain-store


Data Science Examples
Demand prediction for the manufacturing industry
Supply chain optimization in the logistics industry
Customer analytics in the retail industry
Recommendation systems in marketing & advertising
Credit scoring for financial institutions
Sales analytics
Predictive analytics in healthcare
Weather predictions in agriculture sector
Life Cycle
The data science lifecycle involves various roles, tools, and processes, which enables
analysts to glean actionable insights. Typically, a data science project undergoes the
following stages:
Business requirement/understanding.
Gathering relevant data
Data Preparation
Feature Extraction
Model Building
Performance Evaluation
Communicating to stake holders
Deployment
Business
requirement/understanding.
In order to build a successful business model, its very important to
first understand the business problem that the client is facing.
In such cases, it is important to take consultation from domain
experts and finally understand the underlying problems that are
present in the system.
A Business Analyst is generally responsible for gathering the
required details from the client and forwarding the data to the data
scientist team for further speculation
Data Collection
The data science project starts with the identification of various
data sources
May include web server logs, social media posts, data from digital
libraries such as the US Census datasets, data accessed through
sources on the internet via APIs, web scraping, or information that
is already present in an excel spreadsheet.
Data collection entails obtaining information from both known
internal and external sources that can assist in addressing the
business issue.
Data Preparation
This stage helps us gain a better understanding of the data and
prepares it for further evaluation.
Additionally, this stage is referred to as Data Cleaning or Data
Wrangling.
The data that we obtain in real time will not be in easily ready
state for going through models, there may be many preliminary
steps to be taken before moving ahead.
The few common steps that will be ensured in this stage are like,
handling missing data, handling outliers, handing noisy data,
removing stop words and featurizing text data for few NLP tasks.
Feature Extraction
By using feature engineering, you can create new data and extract
new features from existing ones.
Format the data according to the desired structure and delete any
unnecessary columns or functions.
Model Building and
Performance Evaluation
This part is where you will be selecting the algorithms and feeding them
with the data prepared from above steps.
The modelling approach involves model building as well as hyper
parameter tuning which is key to make modelling phase a fruitful one.
The next step is to evaluate the goodness of our model
 Here we will compare the performance of different models with respect
to our key performance indicators and we will make sure that all our
business constraints are satisfied by our final model.
We will see the confusion matrices, classification reports etc and decide if
we are good to go ahead or is there any more fine tuning required for our
final model.
Communicating to stake
holders
The final model and all the insights will be communicated with the
project stake holders.
They will validate how our model performs in wide variety of
corner cases and confirm about their inspection.
The best quality of a data scientist is not just doing above steps,
but having the stake holders convinced that the new model will be
providing them so and so advantages over the exiting method
employed.
Deployment
Once we got a go ahead from stakeholders, its time for firing our
model into production.
This may involve collaboration of several teams like data
scientist ,data engineering, Software developers etc based on the
nature of architecture and the problem that we are solving.
Data Scientist Characteristics
Know enough about the business to ask pertinent questions and identify
business pain points.
Apply statistics and computer science, along with business acumen, to data
analysis.
Use a wide range of tools and techniques for preparing and extracting data
—everything from databases and SQL to data mining to data integration
methods.
Extract insights from big data using predictive analytics and
artificial intelligence (AI), including machine learning models,
natural language processing, and deep learning.
Write programs that automate data processing and calculations.
Tell—and illustrate—stories that clearly convey the meaning of results to
decision-makers and stakeholders at every level of technical understanding.
Explain how the results can be used to solve business problems.
Collaborate with other data science team members, such as data and
business analysts, IT architects, data engineers, and application developers.

You might also like