0% found this document useful (0 votes)

139 views

Unit-5 Bda

Uploaded by

indraneel3118

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

139 views

Unit-5 Bda

Uploaded by

indraneel3118

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 21

BDA

UNIT-5

Predictive analytics:
Predictive analytics is a branch of advanced analytics that makes predictions
about future outcomes using historical data combined with statistical modeling,
data mining techniques and machine learning. Companies employ predictive
analytics to find patterns in this data to identify risks and opportunities.
Predictive analytics is often associated with big data and data science.

Types of predictive modeling

Predictive analytics models are designed to assess historical data, discover

patterns, observe trends, and use that information to predict future trends.
Popular predictive analytics models include classification, clustering, and time
series models.

Classification models

Classification models fall under the branch of supervised machine learning

models. These models categorize data based on historical data, describing
relationships within a given dataset. For example, this model can be used to
classify customers or prospects into groups for segmentation purposes.
Alternatively, it can also be used to answer questions with binary outputs, such
answering yes or no or true and false; popular use cases for this are fraud
detection and credit risk evaluation. Types of classification models
include logistic regression, decision trees, random forest, neural networks, and
Naïve Bayes.

Clustering models

Clustering models fall under unsupervised learning. They group data based on
similar attributes. For example, an e-commerce site can use the model to
separate customers into similar groups based on common features and develop
marketing strategies for each group. Common clustering algorithms include k-
means clustering, mean-shift clustering, density-based spatial clustering of
applications with noise (DBSCAN), expectation-maximization (EM) clustering
using Gaussian Mixture Models (GMM), and hierarchical clustering.

Time series models

Time series models use various data inputs at a specific time frequency, such as
daily, weekly, monthly, et cetera. It is common to plot the dependent variable
over time to assess the data for seasonality, trends, and cyclical behavior,
which may indicate the need for specific transformations and model types.
Autoregressive (AR), moving average (MA), ARMA, and ARIMA models are all
frequently used time series models. As an example, a call center can use a time
series model to forecast how many calls it will receive per hour at different
times of day.

Predictive analytics can be deployed in across various industries for different

business problems. Below are a few industry use cases to illustrate how
predictive analytics can inform decision-making within real-world situations.

 Banking: Financial services use machine learning and quantitative tools to

predict credit risk and detect fraud. As an example, BondIT is a company
that specializes in fixed-income asset-management services. Predictive
analytics allows them to support dynamic market changes in real-time in
addition to static market constraints. This use of technology allows it to
both customize personal services for clients and to minimize risk.
 Healthcare: Predictive analytics in health care is used to detect and
manage the care of chronically ill patients, as well as to track specific
infections such as sepsis. Geisinger Health used predictive analytics to
mine health records to learn more about how sepsis is diagnosed and
treated. Geisinger created a predictive model based on health records
for more than 10,000 patients who had been diagnosed with sepsis in the
past. The model yielded impressive results, correctly predicting patients
with a high rate of survival.
 Human resources (HR): HR teams use predictive analytics and employee
survey metrics to match prospective job applicants, reduce employee
turnover and increase employee engagement. This combination of
quantitative and qualitative data allows businesses to reduce their
recruiting costs and increase employee satisfaction, which is particularly
useful when labor markets are volatile.
 Marketing and sales: While marketing and sales teams are very familiar
with business intelligence reports to understand historical sales
performance, predictive analytics enables companies to be more proactive
in the way that they engage with their clients across the customer
lifecycle. For example, churn predictions can enable sales teams to
identify dissatisfied clients sooner, enabling them to initiate
conversations to promote retention. Marketing teams can leverage
predictive data analysis for cross-sell strategies, and this commonly
manifests itself through a recommendation engine on a brand’s website.
 Supply chain: Businesses commonly use predictive analytics to manage
product inventory and set pricing strategies. This type of predictive
analysis helps companies meet customer demand without overstocking
warehouses. It also enables companies to assess the cost and return on
their products over time. If one part of a given product becomes more
expensive to import, companies can project the long-term impact on
revenue if they do or do not pass on additional costs to their customer
base. For a deeper look at a case study, you can read more about
how FleetPride used this type of data analytics to inform their decision
making on their inventory of parts for excavators and tractor trailers.
Past shipping orders enabled them to plan more precisely to set
appropriate supply thresholds based on demand.
Benefits of predictive modelling

 Security: Every modern organization must be concerned with keeping

data secure. A combination of automation and predictive analytics
improves security. Specific patterns associated with suspicious and
unusual end user behavior can trigger specific security procedures.
 Risk reduction: In addition to keeping data secure, most businesses are
working to reduce their risk profiles. For example, a company that
extends credit can use data analytics to better understand if a customer
poses a higher-than-average risk of defaulting. Other companies may use
predictive analytics to better understand whether their insurance
coverage is adequate.
 Operational efficiency: More efficient workflows translate to improved
profit margins. For example, understanding when a vehicle in a fleet used
for delivery is going to need maintenance before it’s broken down on the
side of the road means deliveries are made on time, without the
additional costs of having the vehicle towed and bringing in another
employee to complete the delivery.
 Improved decision making: Running any business involves making
calculated decisions. Any expansion or addition to a product line or other
form of growth requires balancing the inherent risk with the potential
outcome. Predictive analytics can provide insight to inform the decision-
making process and offer a competitive advantage.

Linear regression
Linear regression analysis is used to predict the value of a variable based on the
value of another variable. The variable you want to predict is called the
dependent variable. The variable you are using to predict the other variable's
value is called the independent variable.
This form of analysis estimates the coefficients of the linear equation, involving
one or more independent variables that best predict the value of the dependent
variable. Linear regression fits a straight line or surface that minimizes the
discrepancies between predicted and actual output values. There are simple
linear regression calculators that use a “least squares” method to discover the
best-fit line for a set of paired data. You then estimate the value of X
(dependent variable) from Y (independent variable).

It is a commonly used type of predictive analysis. It is a statistical approach

for modeling the relationship between a dependent variable and a given set of
independent variables.

There are two types of linear regression.

 Simple Linear Regression
 Multiple Linear Regression
Let’s discuss Simple Linear regression using R.

Simple Linear Regression:

It is a statistical method that allows us to summarize and study relationships

between two continuous (quantitative) variables. One variable denoted x is
regarded as an independent variable and the other one denoted y is regarded
as a dependent variable. It is assumed that the two variables are linearly
related. Hence, we try to find a linear function that predicts the response
value(y) as accurately as possible as a function of the feature or independent
variable(x).

For understanding the concept let’s consider a salary dataset where it is given
the value of the dependent variable(salary) for every independent
variable(years experienced).

Salary dataset:
The equation of regression line is given by:
y = a + bx
Where y is the predicted response value, a is the y y-intercept,
intercept, x is the
feature value and b is a slope.
To create the model, let’s evaluate the values of regression coefficient a
and b. And as soon as the estimation of these coefficients is done, the
response
onse model can be predicted. Here we are going to use Least Square
Technique.
The principle of least squares is one of the popular methods for finding
a curve fitting a given data. Say (x1, y1), (x2, y2)….(xn, yn) be n
observations from an experiment. We are are interested in finding a curve
Multiple Linear Regression:
One of the most common types of predictive analysis is multiple linear
regression. This type of analysis allows you to understand the relationship
between a continuous dependent variable and two or more independent variables.

The independent variables can be either continuous (like age and height) or
categorical (like gender and occupation). It's important to note that if your
dependent variable is categorical, you shoul
shouldd dummy code it before running the
analysis.
In multiple linear regression, the dependent variable is the outcome or result
from you're trying to predict. The independent variables are the things that
explain your dependent variable. You can use them to build a model that
accurately predicts your dependent variable from the independent variables.
For your model to be reliable and valid, there are some essential requirements:

 The independent and dependent variables are linearly related.

 There is no strong correlation between the independent variables.

 Residuals have a constant variance.

 Observations should be independent of one another.

 It is important that all variables follow multivariate normality.

Data visualization is a graphical representation of information and data. By

using visual elements like charts, graphs, and maps, data visualization tools
provide an accessible way to see and understand trends, outliers, and patterns
in data. This blog on data visualization techniques will help you understand
detailed techniques and benefits.

Benefits of good data visualization

Our eyes are drawn to colours and patterns. We can quickly identify red from
blue, and square from the circle. Our culture is visual, including everything from
art and advertisements to TV and movies.

Data visualization is another form of visual art that grabs our interest and
keeps our eyes on the message. When we see a chart, we quickly see trends and
outliers. If we can see something, we internalize it quickly. It’s storytelling with
a purpose. If you’ve ever stared at a massive spreadsheet of data and couldn’t
see a trend, you know how much more effective a visualization can be. The uses
of Data Visualization as follows.

 Powerful way to explore data with presentable results.

 Primary use is the pre-processing portion of the data mining
process.
 Supports the data cleaning process by finding incorrect and missing
values.
 For variable derivation and selection means to determine which
variable to include and discarded in the analysis.
 Also play a role in combining categories as part of the data
reduction process.
Data Visualization Techniques

 Box plots
 Histograms
 Heat maps
 Charts
 Tree maps
 Word Cloud/Network diagram

Box Plots

The image above is a box plot. A boxplot is a standardized way of displaying the
distribution of data based on a five-number summary (“minimum”, first quartile
(Q1), median, third quartile (Q3), and “maximum”). It can tell you about your
outliers and what their values are. It can also tell you if your data is
symmetrical, how tightly your data is grouped, and if and how your data is
skewed.

A box plot is a graph that gives you a good indication of how the values in the
data are spread out. Although box plots may seem primitive in comparison to
a histogram or density plot, they have the advantage of taking up less space,
which is useful when comparing distributions between many groups or datasets.
For some distributions/datasets, you will find that you need more information
than the measures of central tendency (median, mean, and mode). You need to
have information on the variability or dispersion of the data.

List of Methods to Visualize Data

 Column Chart: It is also called a vertical bar chart where each

category is represented by a rectangle. The height of the
rectangle is proportional to the values that are plotted.
 Bar Graph: It has rectangular bars in which the lengths are
proportional to the values which are represented.
 Stacked Bar Graph: It is a bar style graph that has various
components stacked together so that apart from the bar, the
components can also be compared to each other.
 Stacked Column Chart: It is similar to a stacked bar; however, the
data is stacked horizontally.
 Area Chart: It combines the line chart and bar chart to show how
the numeric values of one or more groups change over the progress
of a viable
iable area.
 Dual Axis Chart: It combines a column chart and a line chart and
then compares the two variables.
 Line Graph: The data points are connected through a straight line;
therefore, creating a representation of the changing trend.
 Mekko Chart: It can be called a two-dimensional
dimensional stacked chart
with varying column widths.
 Pie Chart: It is a chart where various components of a data set
are presented in the form of a pie which represents their
proportion in the entire data set.
 Waterfall Chart: With the help lp of this chart, the increasing
effect of sequentially introduced positive or negative values can be
understood.
 Bubble Chart: It is a multi-variable
variable graph that is a hybrid of
Scatter Plot and a Proportional Area Chart.
 Scatter Plot Chart: It is also called a scatter chart or scatter
graph. Dots are used to denote values for two different numeric
variables.
 Bullet Graph: It is a variation of a bar graph. A bullet graph is
used to swap dashboard gauges and meters.
 Funnel Chart: The chart determines the flow of users with the
help of a business or sales process.
 Heat Map: It is a technique of data visualization that shows the
level of instances as color in two dimensions.
Histograms

A histogram is a graphical display of data using bars of different heights. In a

histogram, each bar groups numbers into ranges. Taller bars show that more
data falls in that range. A histogram displays the shape and spread of
continuous sample data.

It is a plot that lets you discover, and show, the underlying frequency
distribution (shape) of a set of continuous data. This allows the inspection of
the data for its underlying distribution (e.g., normal distribution), outliers,
skewness, etc. It is an accurate representation of the distribution of numerical
data, it relates only one variable. Includes bin or bucket- the range of values
that divide the entire range of values into a series of intervals and then count
how many values fall into each interval.

Bins are consecutive, non- overlapping intervals of a variable. As the adjacent

bins leave no gaps, the rectangles of histogram touch each other to indicate
that the original value is continuous.

Heat Maps

A heat map is data analysis software that uses colour the way a bar graph uses
height and width: as a data visualization tool.
If you’re looking at a web page and you want to know which areas get the most
attention, a heat map shows you in a visual way that’s easy to assimilate and
make decisions from. It is a graphical representation of data where the
individual values contained in a matrix are represented as colours. Useful for
two purposes: for visualizing correlation tables and for visualizing missing values
in the data. In both cases, the information is conveyed in a two-dimensional
table.
Note that heat maps are useful when examining a large number of values, but
they are not a replacement for more precise graphical displays, such as bar
charts, because colour differences cannot be perceived accurately.

Charts

Line Chart

The simplest technique, a line plot is used to plot the relationship or dependence
of one variable on another. To plot the relationship between the two variables,
we can simply call the plot function.
Bar Charts

Bar charts are used for comparing the quantities of different categories or
groups. Values of a category are represented with the help of bars and they can
be configured with vertical or horizontal bars, with the length or height of each
bar representing the value.

Pie Chart

It is a circular statistical graph which decides slices to illustrate numerical

proportion. Here the arc length of each slide is proportional to the quantity it
represents. As a rule, they are used to compare the parts of a whole and are
most effective when there are limited components and when text and
percentages are included to describe the content. However, they can be
difficult to interpret because the human eye has a hard time estimating areas
and comparing visual angles.

Scatter Charts

Another common visualization technique is a scatter plot that is a two-

dimensional plot representing the joint variation of two data items. Each marker
(symbols such as dots, squares and plus signs) represents an observation. The
marker position indicates the value for each observation. When you assign more
than two measures, a scatter plot matrix is produced that is a series scatter
plot displaying every possible pairing of the measures that are assigned to the
visualization. Scatter plots are used for examining the relationship, or
correlations, between X and Y variables.

Bubble Charts

It is a variation of scatter chart in which the data points are replaced with
bubbles, and an additional dimension of data is represented in the size of the
bubbles.

Timeline Charts

Timeline charts illustrate events, in chronological order — for example the

progress of a project, advertising campaign, acquisition process — in whatever
unit of time the data was recorded — for example week, month, year, quarter.
It shows the chronological sequence of past or future events on a timescale.
Tree Maps

A treemap is a visualization that displays hierarchically organized data as a set

of nested rectangles, parent elements being tiled with their child elements. The
sizes and colours of rectangles are proportional to the values of the data points
they represent. A leaf node rectangle has an area proportional to the specified
dimension of the data. Depending on the choice, the leaf node is coloured, sized
or both according to chosen attributes. They make efficient use of space, thus
display thousands of items on the screen simultaneously.

Word Clouds and Network Diagrams for Unstructured Data

The variety of big data brings challenges because semi-structured, and

unstructured data require new visualization techniques. A word cloud visual
represents the frequency of a word within a body of text with its relative size
in the cloud. This technique is used on unstructured data as a way to display
high- or low-frequency words.

Another visualization technique that can be used for semi-structured or

unstructured data is the network diagram. Network diagrams represent
relationships as nodes (individual actors within the network) and ties
(relationships between the individuals). They are used in many applications, for
example for analysis of social networks or mapping product sales across
geographic areas.

Interaction Techniques for Data Visualisation:

Data Visualization is basically a graphical representation of information and

data. It is a visual content through which people understand the significance of
data. There are various data visualizations and its data visualization methods or
techniques which helps people to understand the importance of data. In general,
patterns, trends, and correlations might go unnoticed in text-based form data
but through visualizations, with various techniques, it can be exposed and
recognized easier with different software.

Interactive data visualization supports exploratory thinking so that decision-

makers can actively investigate intriguing findings. Interactive visualization
supports faster decision making, greater data access and stronger user
engagement along with desirable results in several other metrics. Some of the
key findings include:
 70% of the interactive visualization adopters improve collaboration
and knowledge sharing.
 64% of the interactive visualization adopters improve user trust in
underlying data.
 Interactive Visualization users engage data more frequently.
 Interactive Visualizes are more likely than static visualizers to be
satisfied easily with the use of analytical tools.

The Benefits Of Interactive Data Visualizations

 Finding correlations - Displaying data on a single dashboard can help you

find different connections behind the better and worse performance. For
example, whether sales are higher because of a better online presence, or
because of a recent paid advertising campaign, or whether there is a
correlation between a few recent negative reviews and a drop in sales. It
is important that any decisions or new strategies you wish to implement
are data-driven and have a solid foundation.
 Quick action - As mentioned earlier, you process information faster when
it is visualized. This means you notice issues or gaps in performance
faster and have the ability to act on those findings. It also gives
investors, board members, and other stakeholders a better overview of
the situation.
 Identifying new trends - Humans are good at categorizing things or
recognizing patterns, in fact, our brains are wired to do so. So by
curating performance dashboards, you have a much better chance of
spotting trends and figuring out which of your strategies have been more
effective, as well as other factors that have an impact on your success.
 Complex concepts are simplified - The goal of interactive data
visualization is to convey insights, which leads to business intelligence.
It's easier to tell a story and share your findings when you yourself can
understand them better. Moreover, a lot of tools allow you to drill down
into your insights and generate reports on interactive data that then
could be used when creating easier to 'digest' content.

What Is a Regression Coefficient?

A regression coefficient is the quantity that sits in front of an independent variable in your
regression equation. It is a parameter estimate describing the relationship between one of
the independent variables in your model and the dependent variable.
In the simple linear regression below, the quantity 0.5, which sits in front of the variable X, is
a regression coefficient. The intercept—in this case 2—is also a coefficient, but you’ll hear it
referred to, instead, as the “intercept,” “constant,” or " β0".

For the sake of this article, we will leave the intercept out of our discussion.

Y^=2+0.5X

Regression coefficients tell us about the line of best fit and the estimated relationship
between an independent variable and the dependent variable in our model. In a simple
linear regression with only one independent variable, the coefficient determines the slope of
the regression line; it tells you whether the regression line is upward or downward-sloping
and how steep the line is.

Regression Coefficient in Multiple Regression

Regressions can have more than one dependent variable, and, therefore, more than one
regression coefficient. In the multivariate regression below, there are two independent
variables ( X1 andX2). This means you have two regression coefficients: 0.7 and -3.2. Each
coefficient gives you information about the relationship between one of the independent
variables and the dependent (or response) variable, Y.

Y^=−4+0.7X1−3.2X2

In the regression here, the coefficient 0.7 suggests a positive linear relationship between X1
and Y. If all other independent variables in the model are held constant, as X1 increases by
1 unit, we estimate that Y increases by 0.7 units.

The coefficient in front of X2 is negative, indicating a negative correlation between X2 and Y.

If X2 were to increase by one unit, and all other variables were held constant, we would
predict Y to decrease by -3.2.

Applying the Regression Coefficient

Simple Linear Regression Model

The figure below is a scatterplot showing the relationship between an independent

variable—also called a predictor variable—plotted along the x-axis and the dependent
variable plotted along the y-axis. Each point on the scatter plot represents an observation
from a dataset.
In linear regression, we estimate the relationship between the independent variable (X) and
the dependent variable (Y) using a straight line. There are a few different ways to fit this line,
but the most common method is called the Ordinary Least Squares Method (or OLS).

In OLS, you can find the regression line by minimizing the sum of squared errors. Here the
errors—or residuals—areare the vertical distances between each point on the scatter plot and
the regression
gression line. The regression coefficient on X tells you the slope of the regression line.

Regression Output Tables

We typically perform our regression calculations using statistical software like R or Stata.
When we do this, we not only create scatter plots
plots and lines but also create a regression
output table like the one below. A regression output table is a table summarizing the
regression line, the errors of your model, and the statistical significance of each parameter
estimated by your model.

In the table here, the independent variables ( X1 and X2) are listed in the first column of the
table, and the coefficients on these variables are listed in the second column of the table in
rows 3 and 4.

Regression Coefficient Interpretation

In linear regression, your regression coefficients will be constants that are either positive or
negative. Here is how you can interpret the coefficients.
1. Non-Zero Coefficient

A non-zero regression coefficient indicates a relationship between the independent variable

and the dependent variable.

2. Positive Coefficient

If the regression coefficient is positive, there is a positive relationship between the

independent variable and the dependent variable. As X increases, Y tends to increase, and
as X decreases, Y tends to decrease.

3. Negative Coefficient

If the regression coefficient is negative, there is a negative (or inverse) relationship between
the independent variable and the dependent variable. As X increases, Y tends to decrease,
and as X decreases, Y tends to increase.

Remember, your coefficients are only estimates. You’ll never know with certainty what the
true parameters are, and what the exact relationship is between your variables.

In regression, you can estimate how much of the variation in your independent variable can
be explained by a dependent variable by calculating R2, and you can calculate how
confident you can be in your estimates using tests of statistical significance.

How To Find Regression Coefficients

In a simple OLS linear regression of the form Y = B0+B1X, you can find the regression
coefficient B1 using the following equation.

β0= Cov(Xi,Yi)/ Var(Xi)

Chances are, however, that you will not be solving regression coefficients by hand. Instead,
you’ll use software like Excel, R, or Stata to find your regression coefficients.

Regression Coefficients in Different Types of Regression Models

In this article, we’ve mainly discussed the simplest form of regression: a linear regression
with one independent variable. As you continue to study statistics, you’ll encounter many
more complex forms of regression. In these other regression models, the coefficients might
take on slightly different forms and may need to be interpreted differently.

Here’s a list of some commonly used regression models.

Linear Regression

Linear regression is one of the most basic forms of regression. As you saw earlier, in linear
regression, you find a line of best fit (a regression line) that minimizes the sum of squared
errors. This line models the relationship between a dependent variable and an independent
variable.
Logistic Regression (or Logit Regression)

We use a logistic regression when you want to study a binary outcome, and you are trying to
estimate the likelihood of one of the two possible outcomes occurring. Logistic regression
allows you to predict whether an outcome variable will be true or false, a win or a loss, heads
or tails, 1 or 0, or any other binary set of outcomes.

In logistic regression, you interpret the regression coefficients differently than you would in a
linear model. In linear regression, a coefficient of 2 means that as your independent variable
increases by one unit, your dependent variable is expected to increase by 2 units. In logistic
regression, a coefficient of 2 means that as your independent variable increases by one unit,
the log odds of your dependent variable increase by 2.

Regression Models with Non-linear Terms

In a non-linear regression, you estimate the relationship between your variables using a
curve rather than a line. For example, if we know that the relationship between Y and X1
cannot simply be expressed by a line, but rather with a curve, we may want to include X1,
but also its quadratic version X12 . In this case, we will get two coefficients related to X11;
one for X1 and one for X12 . Something like this:

Y^=−8.2+1.5X1−0.5(X1)2

Here, we no longer can say that if X1 changes by one unit, Y changes by 0.4 units since X1
appears twice in the regression. Instead, the relationship between Y and X1 is non-linear. If
the level of X1 is 1 and we increase it by 1 unit, then Y increases by (1.5 - 1) units.

However, if the level of �1X1 is 2 and we increase it by 1 unit, then Y increases by (1.5 - 2).
This is because the partial derivative of Y with respect to �1X1 is no longer a constant and
is 1.5 - 2 ✕ 0.5 �1X1.

Ridge Regression

Ridge regression is a technique used in machine learning. Statisticians and data scientists
use ridge regressions to adjust linear regressions to avoid overfitting the model to training
data. In ridge regression, the parameters of the model (including the regression coefficients)
are found by minimizing the sum of squared errors plus a value called the ridge regression
penalty.

As a result of the adjustment, the dependent variables become less sensitive to changes in
the independent variable. In other words, the coefficients in a ridge regression tend to be
smaller in absolute value than the coefficients in an OLS regression.

Lasso Regression

Lasso regression is similar to ridge regression. It is an adjustment method used with OLS to
adjust for the risk of overfitting a model to training data. In a Lasso regression, you adjust
your OLS regression line by a value known as the Lasso regression penalty. Similar to the
Ridge regression, the lasso regression penalty shrinks the coefficients in the regression
equation.

Predictive Analytics Complete Notes
No ratings yet
Predictive Analytics Complete Notes
82 pages
Predictive Analytics Seminar Report
100% (3)
Predictive Analytics Seminar Report
10 pages
Leak Testing USP40 Chap 1207.2
0% (1)
Leak Testing USP40 Chap 1207.2
16 pages
Project Report On "Industrial and Organizational Psychology"
No ratings yet
Project Report On "Industrial and Organizational Psychology"
106 pages
Introduction To Hypothesis Testing and Estimation
No ratings yet
Introduction To Hypothesis Testing and Estimation
28 pages
Unit 5
No ratings yet
Unit 5
19 pages
Pa Digital Notes
No ratings yet
Pa Digital Notes
112 pages
Lecture_15
No ratings yet
Lecture_15
5 pages
Unit 3
No ratings yet
Unit 3
11 pages
Basics of Predictive Modeling
No ratings yet
Basics of Predictive Modeling
11 pages
Unit - 4
No ratings yet
Unit - 4
21 pages
Iot Domain Analyst Digital Assignment - 1: Name: Harshith C S Reg No: 18bec0585 Slot: B1
No ratings yet
Iot Domain Analyst Digital Assignment - 1: Name: Harshith C S Reg No: 18bec0585 Slot: B1
6 pages
Chapter 6 Introduction To Predictive Analytics
No ratings yet
Chapter 6 Introduction To Predictive Analytics
46 pages
The Predictive Analytics Model
No ratings yet
The Predictive Analytics Model
6 pages
Module - 03
No ratings yet
Module - 03
28 pages
Predictive Analytics (1)
No ratings yet
Predictive Analytics (1)
10 pages
Predictive Analytics in Operations
No ratings yet
Predictive Analytics in Operations
12 pages
Predictive-Analytics
No ratings yet
Predictive-Analytics
8 pages
Module-1 Predictive Analytics
No ratings yet
Module-1 Predictive Analytics
20 pages
Bda Unit 5
No ratings yet
Bda Unit 5
14 pages
Finals-Predictive-Time-Series-Analysis - Module
No ratings yet
Finals-Predictive-Time-Series-Analysis - Module
14 pages
Lecture 4
No ratings yet
Lecture 4
18 pages
Predictive analytics - Wikipedia
No ratings yet
Predictive analytics - Wikipedia
10 pages
What Is Predictive Analytics
No ratings yet
What Is Predictive Analytics
5 pages
Unit – III - PREDICTIVE ANALYTICS
No ratings yet
Unit – III - PREDICTIVE ANALYTICS
28 pages
Unit - Iii - Ba
No ratings yet
Unit - Iii - Ba
36 pages
BA Unit IV
No ratings yet
BA Unit IV
27 pages
Predictive Modeling Lecture Notes 1
No ratings yet
Predictive Modeling Lecture Notes 1
11 pages
What Is Predictive Modeling
No ratings yet
What Is Predictive Modeling
20 pages
Module 6 Predictive Analytics
No ratings yet
Module 6 Predictive Analytics
20 pages
Predictive Analys
No ratings yet
Predictive Analys
34 pages
1. Introduction
No ratings yet
1. Introduction
28 pages
Q-3-Q-4 - PREDICTIVE ANALYTICS For Class
No ratings yet
Q-3-Q-4 - PREDICTIVE ANALYTICS For Class
32 pages
Predictive Analytics
No ratings yet
Predictive Analytics
7 pages
Predictive Analytics
No ratings yet
Predictive Analytics
9 pages
Business Analytics FT REVIEWER
No ratings yet
Business Analytics FT REVIEWER
9 pages
DSR Notes 3 To 5
No ratings yet
DSR Notes 3 To 5
70 pages
Bda Unit 5
No ratings yet
Bda Unit 5
30 pages
Predictive Analytics A Review of Trends and Techni
No ratings yet
Predictive Analytics A Review of Trends and Techni
7 pages
FDS UNIT 5 QB
No ratings yet
FDS UNIT 5 QB
8 pages
Predictive Analytics
No ratings yet
Predictive Analytics
9 pages
Predictive Analytics
100% (1)
Predictive Analytics
344 pages
insideBIGDATA Guide To Predictive Analytics
No ratings yet
insideBIGDATA Guide To Predictive Analytics
11 pages
What Is Predictive Analytics?
No ratings yet
What Is Predictive Analytics?
31 pages
Seminar: Predictive Analytics
No ratings yet
Seminar: Predictive Analytics
10 pages
Seminar: Predictive Analytics
No ratings yet
Seminar: Predictive Analytics
10 pages
What Is Predictive Analytics_ 5 Examples _ HBS Online
No ratings yet
What Is Predictive Analytics_ 5 Examples _ HBS Online
4 pages
A i and Predictive Analytics
No ratings yet
A i and Predictive Analytics
10 pages
Lecture 1
No ratings yet
Lecture 1
19 pages
Business Analytics: Aviral Apurva Anureet Bansal Devansh Agarwaal Dhwani Dhingra Chirag Verma
No ratings yet
Business Analytics: Aviral Apurva Anureet Bansal Devansh Agarwaal Dhwani Dhingra Chirag Verma
49 pages
Forecasting Article
No ratings yet
Forecasting Article
10 pages
Predictive Analytical Models CHAP 2
No ratings yet
Predictive Analytical Models CHAP 2
24 pages
6 Applications of Predictive Analytics in Business Intelligence
No ratings yet
6 Applications of Predictive Analytics in Business Intelligence
6 pages
Predictive Modeling
No ratings yet
Predictive Modeling
8 pages
Dt. Ananta Prasad Nanda Faculty IBCS, SOA University Sub:BA Outline For Today:D-P-P Analytics
No ratings yet
Dt. Ananta Prasad Nanda Faculty IBCS, SOA University Sub:BA Outline For Today:D-P-P Analytics
14 pages
Predictive Analytics Presentation
No ratings yet
Predictive Analytics Presentation
3 pages
What Is Predictive Analytics and Why Need It
No ratings yet
What Is Predictive Analytics and Why Need It
2 pages
Group-9 Predictive Analytics
100% (2)
Group-9 Predictive Analytics
31 pages
Predictive Analytics
50% (4)
Predictive Analytics
32 pages
Predictive Modeling: Types, Benefits, and Algorithms
No ratings yet
Predictive Modeling: Types, Benefits, and Algorithms
4 pages
Comparative Analysis of Classification Models On Income Prediction
No ratings yet
Comparative Analysis of Classification Models On Income Prediction
5 pages
Applied Predictive Modeling: An Overview of Applied Predictive Modeling
From Everand
Applied Predictive Modeling: An Overview of Applied Predictive Modeling
Steven Taylor
No ratings yet
How to Predict
From Everand
How to Predict
Ethan Patel
No ratings yet
Finite Mixture Modelling Model Specification, Estimation & Application
No ratings yet
Finite Mixture Modelling Model Specification, Estimation & Application
11 pages
Mme 8201-5-Forecasting
No ratings yet
Mme 8201-5-Forecasting
8 pages
Analyzing Inequalities An Introduction To Race Class Gender and Sexuality Using The General Social Survey 1st Edition Harnois Solutions Manual
100% (33)
Analyzing Inequalities An Introduction To Race Class Gender and Sexuality Using The General Social Survey 1st Edition Harnois Solutions Manual
10 pages
Tableau Certified Data Analyst Requirements 221227
No ratings yet
Tableau Certified Data Analyst Requirements 221227
4 pages
Allama Iqbal Open University, Islamabad (Department of Statistics) Warning
No ratings yet
Allama Iqbal Open University, Islamabad (Department of Statistics) Warning
3 pages
The Effect of Interactive Analytical Dashboard Features On Situation Awareness and Task Performance
No ratings yet
The Effect of Interactive Analytical Dashboard Features On Situation Awareness and Task Performance
13 pages
Research Analyst Resume
100% (1)
Research Analyst Resume
8 pages
Previous Year Questions BSO-317 Research Methods
No ratings yet
Previous Year Questions BSO-317 Research Methods
41 pages
1 How Artificial Intelligence Can Help User Testing Analysis
No ratings yet
1 How Artificial Intelligence Can Help User Testing Analysis
23 pages
One-Way Between Groups Analysis of Variance (Anova) : Daniel Boduszek
No ratings yet
One-Way Between Groups Analysis of Variance (Anova) : Daniel Boduszek
17 pages
Analytical CRM - 1
100% (1)
Analytical CRM - 1
29 pages
Timetebles For MBA IV Sem March 2023 Exams
No ratings yet
Timetebles For MBA IV Sem March 2023 Exams
2 pages
Open Ended Lab Assignment
No ratings yet
Open Ended Lab Assignment
2 pages
Ie 361 - Human Factor Engineering: Lecturer Set 2 - Research Methods in HF
No ratings yet
Ie 361 - Human Factor Engineering: Lecturer Set 2 - Research Methods in HF
44 pages
Curve Estimation Explained
50% (2)
Curve Estimation Explained
4 pages
Agricultural Statistical Data Analysis Using Stata by George Boyhan
No ratings yet
Agricultural Statistical Data Analysis Using Stata by George Boyhan
253 pages
The Analyst LAB6
No ratings yet
The Analyst LAB6
13 pages
Introduction To Qualitative Research: Partone
100% (1)
Introduction To Qualitative Research: Partone
25 pages
Classical Multiple Regression
No ratings yet
Classical Multiple Regression
5 pages
Ae1 Panel
No ratings yet
Ae1 Panel
36 pages
Big Data - Iv Bda
No ratings yet
Big Data - Iv Bda
143 pages
Practical Data Analysis Using Jupyter Notebook Learn How To Speak The Language of Data by Extracting Useful and Actionable Insights Using Python by Marc Wintjen
No ratings yet
Practical Data Analysis Using Jupyter Notebook Learn How To Speak The Language of Data by Extracting Useful and Actionable Insights Using Python by Marc Wintjen
309 pages
Chap-THREE-Multiple-Choice (Galang, Cathlyn Joyce)
No ratings yet
Chap-THREE-Multiple-Choice (Galang, Cathlyn Joyce)
7 pages
Farhan Jamil P FM 2017
No ratings yet
Farhan Jamil P FM 2017
78 pages
Chi Square Test
No ratings yet
Chi Square Test
7 pages
Week 1
No ratings yet
Week 1
50 pages
Applied Multivariate Data Analysis Second Edition Brian S. Everitt 2024 Scribd Download
100% (15)
Applied Multivariate Data Analysis Second Edition Brian S. Everitt 2024 Scribd Download
67 pages

Unit-5 Bda

Uploaded by

Unit-5 Bda

Uploaded by

BDA

Types of predictive modeling

Predictive analytics models are designed to assess historical data, discover

Classification models fall under the branch of supervised machine learning

Time series models

Predictive analytics can be deployed in across various industries for different

 Banking: Financial services use machine learning and quantitative tools to

 Security: Every modern organization must be concerned with keeping

It is a commonly used type of predictive analysis. It is a statistical approach

There are two types of linear regression.

Simple Linear Regression:

It is a statistical method that allows us to summarize and study relationships

 The independent and dependent variables are linearly related.

 There is no strong correlation between the independent variables.

 Residuals have a constant variance.

 Observations should be independent of one another.

 It is important that all variables follow multivariate normality.

Data visualization is a graphical representation of information and data. By

Benefits of good data visualization

 Powerful way to explore data with presentable results.

List of Methods to Visualize Data

 Column Chart: It is also called a vertical bar chart where each

A histogram is a graphical display of data using bars of different heights. In a

Bins are consecutive, non- overlapping intervals of a variable. As the adjacent

It is a circular statistical graph which decides slices to illustrate numerical

Another common visualization technique is a scatter plot that is a two-

Timeline charts illustrate events, in chronological order — for example the

A treemap is a visualization that displays hierarchically organized data as a set

Word Clouds and Network Diagrams for Unstructured Data

The variety of big data brings challenges because semi-structured, and

Another visualization technique that can be used for semi-structured or

Interaction Techniques for Data Visualisation:

Data Visualization is basically a graphical representation of information and

Interactive data visualization supports exploratory thinking so that decision-

The Benefits Of Interactive Data Visualizations

 Finding correlations - Displaying data on a single dashboard can help you

What Is a Regression Coefficient?

Regression Coefficient in Multiple Regression

The coefficient in front of X2 is negative, indicating a negative correlation between X2 and Y.

Applying the Regression Coefficient

Simple Linear Regression Model

The figure below is a scatterplot showing the relationship between an independent

Regression Output Tables

Regression Coefficient Interpretation

A non-zero regression coefficient indicates a relationship between the independent variable

If the regression coefficient is positive, there is a positive relationship between the

How To Find Regression Coefficients

β0= Cov(Xi,Yi)/ Var(Xi)

Regression Coefficients in Different Types of Regression Models

Here’s a list of some commonly used regression models.

Regression Models with Non-linear Terms

You might also like