0% found this document useful (0 votes)
52 views

Unit 2 PPT (BA)

MBA (BA) unit 2 ppts

Uploaded by

Dr Shweta RAI
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
52 views

Unit 2 PPT (BA)

MBA (BA) unit 2 ppts

Uploaded by

Dr Shweta RAI
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 33

Unit -2

Data and Data science life


cycle
Data
• Raw facts and figures
• Research data : that has been collected,
observed, generated or created to validate
original research findings
• Data are of two types
I. Primary data
II. Secondary data
Data Collection
• Data collection is the process of gathering and
measuring information on variables of interest
in a systematic and organized manner. It is a
fundamental step in various fields, including
science, research, business, and government,
as it provides the raw material for analysis,
decision-making, and generating insights.
Importance of Data Collection
• It's important to note that data collection should
be conducted rigorously and systematically to
ensure the reliability and validity of the collected
information. The quality of the data directly
impacts the quality of subsequent analyses and
decisions. Additionally, ethical and legal
considerations should always be taken into account
when collecting and handling data, especially in
cases involving human subjects or sensitive
information.
Data collection
Data Management
• Data management refers to the processes and
activities involved in acquiring, storing,
organizing, securing, and maintaining data to
ensure its accuracy, reliability, and
accessibility. Effective data management is
crucial for organizations of all sizes, as it helps
them make informed decisions, meet
regulatory requirements, and optimize their
operations.
Data Management
• Data management is a comprehensive
approach to handling data throughout its
lifecycle, ensuring its quality, security, and
usability while adhering to regulatory
requirements. Effective data management is
essential for organizations to derive
meaningful insights and make informed
decisions in a data-driven world.
Big Data Management
• Big Data management refers to the strategies,
processes, and technologies used to handle and derive
value from large and complex datasets known as "big
data." Big data is characterized by its volume, velocity,
variety, and managing it presents unique challenges
and opportunities. big data management is a complex
and evolving field that encompasses various practices
and technologies to effectively handle large and
diverse datasets. Organizations that can successfully
manage and analyze big data can gain valuable
insights, make data-driven decisions, and gain a
competitive advantage in today's data-centric world.
Big Data Management
Data sources
• Data can be obtained from various sources,
depending on the type and purpose of the data
you need.
• Surveys and Questionnaires
• Government and Public Databases
• Websites
• Social Media
• Books, Journals, and Publications
• Mobile Apps
• Financial Markets
Data Sources…
• Medical Records
• APIs (Application Programming Interfaces)
• etc

Remember that when collecting or using data, it's


important to consider ethical and legal
considerations, including privacy regulations and
data usage agreements. Additionally, data quality
and accuracy should be assessed to ensure that the
data is reliable for your intended purpose.
Importance of Data Quality
• Data quality is of paramount importance in
various aspects of business, research, and
decision-making.
• To ensure data quality, organizations should
implement data quality management practices,
establish data governance frameworks, and
regularly audit and cleanse their data. Data
quality is an ongoing process that requires
attention and investment to maintain its benefits
over time.
Dealing with missing or incomplete Data

• Dealing with missing or incomplete data is a


common challenge in data analysis and machine
learning. Missing data can occur for various
reasons, such as data entry errors, equipment
malfunctions, survey non-responses, or simply
because some information was not collected.
Handling missing data appropriately is crucial to
ensure the accuracy and reliability of your
analyses and models.
• Identify Missing Data
Incomplete Data
• Remove Rows with Missing Data
• Understand the Reasons
• Sensitivity Analysis etc
• Remember that the choice of how to handle
missing data should be driven by the specific
context of your analysis and the nature of the
missing ness. There is no one-size-fits-all
solution, and the most appropriate approach
may vary from one dataset to another
Data Visualization
• Data visualization is the representation of data
through use of common graphics, such as
charts, plots, info graphics, and even
animations. These visual displays of
information communicate complex data
relationships and data-driven insights in a way
that is easy to understand
Data Classification
Data science life cycle
• A data science lifecycle indicates the iterative steps taken to
build, deliver and maintain any data science product. All data
science projects are not built the same, so their life cycle varies
as well.
• Business Requirement-It is something that the business needs
to do or have in order to stay in business. For example, a
business requirement can be: a process they must complete. a
piece of data they need to use for that process.
• Data Acquisition - the process of sampling signals that measure
real-world physical conditions and converting the resulting
samples into digital numeric values that can be manipulated by
a computer.
• Data Preparation-the process of preparing raw data so
that it is suitable for further processing and analysis.
Key steps include collecting, cleaning, and labeling raw
data into a form suitable for machine learning (ML)
algorithms and then exploring and visualizing the data.
• Hypothesis and modeling-It is the basic idea that has
not been tested. A hypothesis is just an idea that
explains something. It must go through a number of
experiments designed to prove or disprove it.
Model: A hypothesis becomes a model after some
testing has been done and it appears to be a valid
observation.
• Evaluation and Interpretation-Interpretation is the
action of explaining the meaning of something, like
the interpretation of the constitution of your
country. Evaluation is the making of a judgment
about the amount, number, or value of something;
assessment, like evaluate the price of a used car.
• Deployment- Model deployment is the process of
putting machine learning models into production.
This makes the model's predictions available to
users, developers or systems, so they can make
business decisions based on data, interact with their
application (like recognize a face in an image) and so
on.
• Operations-DataOps (data operations) is an agile,
process-oriented methodology for developing and
delivering analytics. It brings together DevOps teams
with data engineers and data scientists to provide the
tools, processes, and organizational structures to
support the data-focused enterprise
• Optimization -- a problem where you maximize or
minimize a real function by systematically choosing
input values from an allowed set and computing the
value of the function. That means when we talk about
optimization we are always interested in finding the
best solution

You might also like