DATA analytics previous solved
DATA analytics previous solved
(CS)
CS-364:Data Analy cs
(2019 Credit Pa ern) (Semester VI)
d) What is clustering.
Clustering is a machine learning technique that involves grouping similar data points together.
g) What is outlier.
Outlier is an observa on that lies an abnormal distance from other values in a random sample
from a popula on.
Confidence:
It is the ra o of the no. of transac ons that includes all items in {B} as well as the no of
transac ons that includes all items in {A} to the no of transac ons that includes all items in {A}.
Formula:
Confidence (A -> B) = support(A -> B) / support(A).
One popular regression model is Linear Regression, which models the rela onship between a
dependent variable and one or more independent variables by fi ng a linear equa on to the
observed data. The goal of linear regression is to find the best-fi ng line that minimizes the sum
of squared errors between the predicted values and the actual values. The line is defined by the
slope and intercept, which are es mated from the training data using the method of least
squares.
The equa on of a simple linear regression model is: y = b0 + b1*x, where y is the dependent
variable, x is the independent variable, b0 is the intercept, and b1 is the slope. The slope
represents the change in y for every one-unit increase in x, and the intercept represents the
value of y when x is zero.
The model can be extended to mul ple linear regression, where there are more than one
independent variables. The equa on becomes: y = b0 + b1*x1 + b2*x2 + ... + bn*xn, where n is
the number of independent variables. The slope coefficients b1 to bn represent the change in y
for every one-unit increase in the corresponding x variable, holding all other variables constant.
Linear regression is widely used in fields such as economics, finance, engineering, and social
sciences to model and predict various phenomena, such as stock prices, housing prices, sales,
and customer behavior.
b) Differen ate between stemming and lemma za on.
Stemming Lemmatization
Stemming reduces words to their base Lemmatization reduces words to their base form
or root form by removing suffixes and (known as lemma) based on the word's context and
prefixes. part of speech.
It uses simple and fast rule-based It utilizes more advanced linguistic and language-
approaches. specific algorithms.
Stemmed words may not always be Lemmatized words are always valid words found in a
actual words. dictionary.
o Descrip ve analy cs: Descrip ve analy cs is the simplest type of analy cs that
summarizes the historical data to provide insights into what happened in the past. It
answers ques ons such as "What happened?" and "How many?" Examples of descrip ve
analy cs include summary sta s cs, frequency distribu ons, and data visualiza on.
o Predic ve analy cs: Predic ve analy cs is the type of analy cs that uses sta s cal
models and machine learning algorithms to analyze historical data and make predic ons
about future events. It answers ques ons such as "What is likely to happen?" and "How
likely?" Examples of predic ve analy cs include regression analysis, me series
forecas ng, and classifica on.
o Prescrip ve analy cs: Prescrip ve analy cs is the most advanced type of analy cs that
uses op miza on and simula on techniques to recommend ac ons that will achieve the
best possible outcome. It answers ques ons such as "What should we do?" and "How
can we op mize?" Examples of prescrip ve analy cs include linear programming,
decision trees, and Monte Carlo simula on.
Each type of analy cs has its own strengths and limita ons, and the choice of which type to use
depends on the specific business problem and the available data.
Phase 1: Discovery –
The data science team learn and inves gate the problem.
Develop context and understanding.
Come to know about data sources needed and available for the project.
The team formulates ini al hypothesis that can be later tested with data.