0% found this document useful (0 votes)

10 views10 pages

Assigment2 IndividualReport

This report analyzes customer satisfaction in a travel booking company using data from multiple waves. It involves data cleaning, model selection, and the application of statistical techniques to identify factors influencing satisfaction, with a focus on Net Promoter Score and Review Sentiment. The final model chosen for predicting customer satisfaction is a linear regression model that incorporates several key predictors.

Uploaded by

martavallauremartin

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views10 pages

Assigment2 IndividualReport

Uploaded by

martavallauremartin

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Assignment 2 - Individual Report

This assignment delves into understanding customer satisfaction within a travel booking company. It focuses
on data collected during waves 1, 2, and 5 to gain insights into traveler sentiments over time.

Our initial task was to prepare the dataset for analysis, which involved a meticulous cleaning process. We
aimed to ensure the integrity and reliability of the data by identifying and rectifying errors, as well as
handling missing values. To achieve this, we employed advanced algorithms tailored to refine errors and
eliminate inconsistencies, ultimately enhancing the quality of our dataset.

Throughout this assignment, we will delve into the intricacies of customer satisfaction, exploring the
multitude of variables that may influence this crucial aspect of the travel booking experience. By leveraging
robust statistical techniques and machine learning algorithms, we aim to uncover valuable insights that can
inform strategic decision-making within the travel industry.

Figure 1: BoxPlot of the Variables before cleaning process to understand them

Before getting into the selected model, we are going to look into the predictions and hypotheses we made
prior to the analysis itself. Traveler satisfaction is influenced by factors such as total spending, ease of use,
promotions, age, gender, booking time, device type, flight availability, review sentiment, repeat behavior,
and home region. Higher spending indicates higher expectations and criticism, while greater Ease of Use
ratings indicate higher satisfaction. Promotions offer better value for money, and demographic differences
affect satisfaction. Booking times during off-peak and peak hours, device type, flight availability, review
sentiment, and repeat behavior indicate overall customer contentment.

Looking at the hypotheses, travelers' satisfaction with a website can be influenced by various factors.
Higher spending indicates greater investment in the booking process, leading to higher expectations for
service quality. The ease of use of the website is also important, with users rating it higher. Promotional
offers can provide added value, increasing satisfaction. Demographic factors, such as younger and female
travelers, may also impact satisfaction. Booking during off-peak hours can lead to faster response times and
smoother transactions. Device and operating system differences also impact satisfaction. Flight availability is
also a factor, with a wider selection providing more options. Positive reviews and mentions from satisfied
travelers also contribute to satisfaction. Repeat behavior and engagement with the platform are indicators
of satisfaction.

Before deciding and analyzing more deeply which model was going to be better for the study of consumer
satisfaction, I have done an exhaustive analysis of which model is best.
Model Comparison Explanation: In our pursuit of finding the most effective model for predicting customer
satisfaction, we thoroughly examined various methodologies, including simple linear models, Bayesian
Information Criterion (BIC), decision tree models, and others.

The journey started with a simple linear analysis, which looked at the association between individual
predictor factors and satisfaction. Next, we went on and broadened our research to include multiple linear
regression, which incorporates many predictors at the same time to improve model accuracy.

Additionally, BIC analysis was used to pick the most dependable model among competing alternatives,
taking into account both model complexity and goodness of fit. To assess model performance, we used the
Mean Squared Error (MSE), a prediction accuracy measure that quantifies the average squared difference
between observed and anticipated values.

We determined that the formula-based predictive model was the best option after thorough analysis. This
model displayed beneficial prediction capabilities and provided useful insights into the elements that
influence customer happiness in our dataset.

model <- lm(Satisfaction ~ BookingCount + TotalSpending + Review Sentiment +

NetPromoterScore + WebsiteReviewed, data = final_data) summary(model)

The study examines the relationship between Net Promoter Score and TotalSatisfaction, a key predictor of
customer satisfaction. The coefficient estimates indicate that an increase in Net Promoter Score is
associated with a substantial increase in TotalSatisfaction. Review sentiment is also statistically significant,
indicating that reviews' sentiment plays a role in determining satisfaction. However, other factors like
BookingCount, TotalSpending, and WebsiteReviewed do not have a statistically significant effect on
TotalSatisfaction.

The model fits the data, with a residual standard error of 1.803, multiple R-squared value of 0.6223, and
adjusted R-squared value of 0.6189. The F-statistic of 180.6 indicates that the predictor variables jointly
contribute to explaining the variance in TotalSatisfaction.

The results suggest that NetPromoterScore and ReviewSentiment play a significant role in determining
TotalSatisfaction, while other factors like BookingCount, TotalSpending, and WebsiteReviewed may not have
a discernible impact, still relevant enough for being in the best model. Further investigation is needed to
explore potential reasons for the lack of significance of certain predictors and identify additional factors
influencing TotalSatisfaction.

The study compared the performance of several models using Mean Squared Error (MSE) analysis; the
model that was chosen had the lowest MSE, indicating better predictive ability. The evaluation of the entire
model's results shed light on the relationships between the predictor variables and customer satisfaction,
with some factors exhibiting strong correlations and others having minimal effects. Comparing the model to
other models, the confusion matrix analysis showed that it had greater sensitivity, specificity, positive
predictive value (PPV), negative predictive value (NPV), and balanced accuracy. The linear regression model
outperformed the decision tree model, which was interpretable, but it was more accurate at classifying
instances with different satisfaction levels. The most appropriate model for predicting customer happiness
in the dataset was found to be the linear regression model through a thorough review process that included
a variety of modeling approaches and performance criteria.
APPENDIX
Prediction tree
#TREE PREDICTION MODEL
tree_model <- rpart(Satisfaction ~ BookingCount + TotalSpending + ReviewSentiment + NetPromoterScore +
WebsiteReviewed, data = final_data)
predictions <- predict(tree_model, final_data, type = "class")
[Link](tree_model)

Confusion matrix
conf_matrix <- confusionMatrix(predictions, final_data$Satisfaction)
print(conf_matrix)

Full model:
full_model <- lm(Satisfaction ~ ., data = final_data)
CODE

team_data <-
[Link]("[Link]
[Link]", encoding="UTF-8")

summary(team_data)

names(team_data)

print(colSums([Link](team_data)))

#CLEANING DATA PROCESS

boxplot(team_data[, c("TotalSpending", "InternationalSpend", "DomesticSpend", "Female", "Age",

"PromotionsUsed", "EaseOfUse", "UsageDuration", "BookingCount",
"AvailableFlights", "Satisfaction", "NetPromoterScore", "TotalSatisfaction",
"ReviewSentiment", "WebsiteReturned")])

summary(team_data[, c("TotalSpending", "InternationalSpend", "DomesticSpend", "Female", "Age",

"PromotionsUsed", "EaseOfUse", "UsageDuration", "BookingCount",
"AvailableFlights", "Satisfaction", "NetPromoterScore", "TotalSatisfaction",
"ReviewSentiment")])

#CHECKING FOR OUTLIERS AND DELETING THOSE HOW AREN'T RELEVANT

[Link]("outliers")
library(outliers)
[Link](team_data$Age)
team_data_clean <- subset(team_data, Age != -99)
[Link](team_data_clean$Age)
team_data_clean <- subset(team_data_clean, Age != 99)
[Link](team_data_clean$Age)
team_data_clean <- team_data_clean[[Link](team_data_clean$TotalSpending), ]
team_data_clean <- team_data_clean[![Link](team_data_clean$HomeRegion), ]
team_data_clean <- team_data_clean[![Link](team_data_clean$HomeState), ]

summary(team_data[, c("ReviewSentiment")])
summary(team_data[, c("ReviewMentionsWebsite")])

# Load the dataset

# Separate data into two subsets: Subset A (valid data) and Subset B (missing/outlier data)
subset_A <- team_data_clean[team_data_clean$ReviewSentiment >= 0 &
team_data_clean$ReviewSentiment <= 5, ]
subset_B <- team_data_clean[team_data_clean$ReviewSentiment == 99, ]

# CLEANING AND PREDICTING DATA FROM THE COLUMN ReviewSentiment

model <- lm(ReviewSentiment ~ ., data = subset_A)

predictors_B <- subset_B[, !(colnames(subset_B) %in% "ReviewSentiment")]
predicted_values <- predict(model, newdata = predictors_B)
adjusted_values <- pmax(pmin(predicted_values, 5), 0)
subset_B$ReviewSentiment <- adjusted_values
updated_data <- rbind(subset_A, subset_B)

# CLEANING AND PREDICTING DATA FROM THE COLUMN ReviewMentionsWebsite

subset_C <- updated_data[updated_data$ReviewMentionsWebsite != 99, ]
subset_D <- updated_data[updated_data$ReviewMentionsWebsite == 99, ]
predicted_values <- ifelse(subset_D$WebsiteReviewed == 0, 0, subset_D$WebsiteReviewed)
subset_D$ReviewMentionsWebsite <- predicted_values
updated_data_2 <- rbind(subset_C, subset_D)

[Link]("dplyr")
library(dplyr)

# Filter rows where Wave is 1, 2, or 5

team_data_filtered <- updated_data_2 %>%
filter(Wave == 1 | Wave == 2 | Wave == 5)

# Check the first few rows of the filtered data

head(team_data_filtered)

#CONVERTING THE FINAL DATA AND SEPARATING THE WAVES IM GOING TO BE WORKING ON
final_data <- team_data_filtered

summary(final_data)
print(colSums([Link](final_data)))

#PRELIMINAR ANALYSIS
library(ggplot2)

# Create histogram for HomeRegion and Satisfaction

ggplot(final_data, aes(x = HomeRegion, y = Satisfaction)) +
geom_histogram(stat = "identity", fill = "skyblue", color = "skyblue") +
labs(title = "Distribution of Satisfaction by Home Region",
x = "Home Region",
y = "Satisfaction Score") +
theme_minimal()
# Create histogram for HomeState and Satisfaction
ggplot(final_data, aes(x = HomeState, y = Satisfaction)) +
geom_histogram(stat = "identity", fill = "lightgreen", color = "lightgreen") +
labs(title = "Distribution of Satisfaction by Home State",
x = "Home State",
y = "Satisfaction Score") +
theme_minimal()

# Explore the structure of the dataset

str(final_data)

# Check the first few rows of the dataset

head(final_data)

# Summary statistics of numerical variables

summary(final_data)

# Check for missing values

colSums([Link](final_data))

# Split the data into training and testing sets (80-20 split)
[Link](123) # for reproducibility
train_index <- sample(1:nrow(final_data), 0.8 * nrow(final_data))
train_data <- final_data[train_index, ]
test_data <- final_data[-train_index, ]

# Make predictions on the test set

predictions <- predict(lm_model, newdata = test_data)

# Evaluate model performance

mse <- mean((test_data$Satisfaction - predictions)^2)
rmse <- sqrt(mse)
mae <- mean(abs(test_data$Satisfaction - predictions))
r_squared <- summary(lm_model)$[Link]

# Print model evaluation metrics

cat("Mean Squared Error (MSE):", mse, "\n")
cat("Root Mean Squared Error (RMSE):", rmse, "\n")
cat("Mean Absolute Error (MAE):", mae, "\n")
cat("R-squared:", r_squared, "\n")

#PREDICTIVE MODELS
library(dplyr)

lmAll <- lm(TotalSpending ~ ., data =final_data)

summary(lmAll)

#MODEL 1
model_data1 <- lm(TotalSatisfaction ~ TotalSpending + DomesticSpend + ReviewSentiment , data =
final_data)
summary(model_data1)

# MODEL 2
model_data2 <- final_data %>%
select(TotalSatisfaction, TotalSpending, PromotionsUsed, DeviceTypeMobileTablet, Female)
lm_model_2 <- lm(TotalSatisfaction ~ TotalSpending + PromotionsUsed + DeviceTypeMobileTablet + Female,
data = model_data2)
summary(lm_model_2)

# MODEL 3
model_data3 <- final_data %>%
select(TotalSatisfaction, TotalSpending, PromotionsUsed, EaseOfUse, AvailableFlights)
lm_model_3 <- lm(TotalSatisfaction ~ TotalSpending + PromotionsUsed + EaseOfUse + AvailableFlights, data
= model_data3)
summary(lm_model_3)

# MODEL 4
model_data4 <- final_data %>%
select(Satisfaction, TotalSpending, PromotionsUsed, AvailableFlights)
lm_model_4 <- lm(Satisfaction ~ TotalSpending + PromotionsUsed + AvailableFlights, data = model_data4)
summary(lm_model_4)

# MODEL 5
model_data5 <- final_data %>%
select(TotalSatisfaction, TotalSpending, PromotionsUsed, AvailableFlights)
lm_model_5 <- lm(TotalSatisfaction ~ TotalSpending + PromotionsUsed + AvailableFlights, data =
model_data5)
summary(lm_model_5)

# MODEL 5 Variation
model_data5_variation <- final_data %>%
select(TotalSatisfaction, TotalSpending, PromotionsUsed, EaseOfUse)

# Filter out rows with missing values in TotalSpending column

model_data5_variation_clean <-
model_data5_variation[[Link](model_data5_variation$TotalSpending), ]

# Build a linear regression model

lm_model_5_variation <- lm(TotalSatisfaction ~ TotalSpending + PromotionsUsed + EaseOfUse, data =
model_data5_variation_clean)

# Summary of the model

summary(lm_model_5_variation)

# MODEL 6
model_data6 <- final_data %>%
select(TotalSatisfaction, AvailableFlights, TotalSpending, UsageDuration)
lm_model_6 <- lm(TotalSatisfaction ~ AvailableFlights + TotalSpending + UsageDuration, data =
model_data6)
summary(lm_model_6)

confint(lm_model_6)

# MODEL 8
model_data8 <- final_data %>%
select(Satisfaction, DomesticSpend, AvailableFlights)
lm_model_8 <- lm(Satisfaction ~ DomesticSpend + AvailableFlights , data = model_data8)
summary(lm_model_8)

# BEST MODEL
model <- lm(Satisfaction ~ BookingCount + TotalSpending + UsageDuration + NetPromoterScore , data =
final_data)
summary(model)

model2 <- lm(Satisfaction ~ BookingCount + TotalSpending + ReviewSentiment + NetPromoterScore , data =

final_data)
summary(model2)

#BEST ONE
model <- lm(Satisfaction ~ BookingCount + TotalSpending + ReviewSentiment + NetPromoterScore +
WebsiteReviewed, data = final_data)
summary(model)

# Assuming your dataset is named data, sort TotalSatisfaction in the order you prefer
final_data$Satisfaction <- factor(final_data$Satisfaction, levels = 3:1)
#Create a mosaic plot with the new order
mosaic_model <- mosaicplot(table(final_data$Satisfaction, final_data$ReviewSentiment), main = "Mosaic
Plot of Satisfaction, Booking Count, and Review Sentiment")
# Filter the data for waves 1, 2, and 5
wave_subset <- subset(final_data, Wave %in% c(1, 2, 5))

# Create a mosaic plot

mosaic_model <- mosaicplot(table(wave_subset$Satisfaction, wave_subset$ReviewSentiment), main =
"Satisfaction vs Review Sentiment (Waves 1, 2, 5)")

[Link]("rpart")
library(rpart)
[Link]("[Link]")
library([Link])

#TREE PREDICTION MODEL

tree_model <- rpart(Satisfaction ~ BookingCount + TotalSpending + ReviewSentiment + NetPromoterScore +
WebsiteReviewed, data = final_data)
predictions <- predict(tree_model, final_data, type = "class")
[Link](tree_model)

[Link]("ggplot2")
library(ggplot2)

[Link]("lattice")
library(lattice)

[Link]("caret")
library(caret)

predictions2 <- factor(predictions)

#DECISION TREE
library(rpart)

# Recode Satisfaction variable

final_data$Satisfaction_category <- ifelse(final_data$Satisfaction <= 5, 0, 1)

# Split data into training and testing sets

[Link](123) # For reproducibility
train_indices <- sample(1:nrow(final_data), 0.8 * nrow(final_data))
train_data <- final_data[train_indices, ]
test_data <- final_data[-train_indices, ]

# Fit decision tree model

tree_model <- rpart(Satisfaction_category ~ ., data = train_data, method = "class")
# Evaluate model performance on test data
predicted <- predict(tree_model, test_data, type = "class")
accuracy <- mean(predicted == test_data$Satisfaction_category)
accuracy

library([Link])

# Plot decision tree

[Link](tree_model, type = 4, extra = 2)

# Step 1: Run a multi-linear regression with all predictor variables

full_model <- lm(Satisfaction ~ BookingCount + ReviewSentiment + NetPromoterScore + WebsiteReviewed ,
data = final_data)

# Step 2: Select the most reliable model using BIC

reduced_model <- step(full_model, k = log(nrow(final_data)), direction = "both")

# Step 3: Compare performance using Mean Squared Error (MSE) over test data
# Split data into training and testing sets
[Link](123) # For reproducibility
train_indices <- sample(1:nrow(final_data), 0.8 * nrow(final_data))
train_data <- final_data[train_indices, ]
test_data <- final_data[-train_indices, ]

# Predict using reduced model

predicted <- predict(reduced_model, newdata = test_data)

# Calculate MSE
mse <- mean((predicted - test_data$Satisfaction)^2)

# Encode satisfaction variable into binary outcome

final_data$Satisfaction_binary <- ifelse(final_data$Satisfaction >= 6, "Satisfied", "Unsatisfied")

# Run classification tree model

tree_model <- rpart(Satisfaction_binary ~ ., data = final_data, method = "class")

# Evaluate performance using accuracy scores

# Predict using the decision tree model
predictions <- predict(tree_model, final_data, type = "class")

# Compute accuracy
accuracy <- sum(predictions == final_data$Satisfaction_binary) / nrow(final_data)

# Fit initial full model

full_model <- lm(Satisfaction ~ ., data = final_data)
summary(full_model)
full_model <- lm(Satisfaction ~ ., data = final_data)

BH GF
No ratings yet
BH GF
16 pages
Predicting Customer Satisfaction Using Machine Learning: Insights From Behavioral and Demographic Data
No ratings yet
Predicting Customer Satisfaction Using Machine Learning: Insights From Behavioral and Demographic Data
17 pages
Customer Satisfaction Classification Report
No ratings yet
Customer Satisfaction Classification Report
9 pages
Airline Passenger Satisfaction Analysis
No ratings yet
Airline Passenger Satisfaction Analysis
1 page
Machine Learning For Airline Customer Satisfaction Prediction
No ratings yet
Machine Learning For Airline Customer Satisfaction Prediction
14 pages
Predicting Airline Passengers Satisfaction
100% (7)
Predicting Airline Passengers Satisfaction
70 pages
Capstone Project - Airline Passenger Satisfaction
No ratings yet
Capstone Project - Airline Passenger Satisfaction
18 pages
British Airways
No ratings yet
British Airways
6 pages
British Airways Customer Insights Analysis
No ratings yet
British Airways Customer Insights Analysis
27 pages
Machine Learning Project 1
No ratings yet
Machine Learning Project 1
30 pages
LightGBM for Airline Customer Satisfaction
No ratings yet
LightGBM for Airline Customer Satisfaction
6 pages
Optimize Airline Customer Satisfaction
No ratings yet
Optimize Airline Customer Satisfaction
84 pages
Predicting Airline Customer Satisfaction
No ratings yet
Predicting Airline Customer Satisfaction
2 pages
Airline Satisfaction Analysis with Python
No ratings yet
Airline Satisfaction Analysis with Python
6 pages
Airline Customer Satisfaction Analysis
No ratings yet
Airline Customer Satisfaction Analysis
2 pages
Airline Passenger Satisfaction Insights
100% (1)
Airline Passenger Satisfaction Insights
10 pages
Logistic Regression for Airline Satisfaction
No ratings yet
Logistic Regression for Airline Satisfaction
20 pages
Determinants of Customer Satisfaction at The San Francisco International Airport
No ratings yet
Determinants of Customer Satisfaction at The San Francisco International Airport
9 pages
Python ML Project: Airline Satisfaction Analysis
No ratings yet
Python ML Project: Airline Satisfaction Analysis
3 pages
Online Shoppers' Purchase Intent Analysis
No ratings yet
Online Shoppers' Purchase Intent Analysis
9 pages
BSF Report Draft
No ratings yet
BSF Report Draft
12 pages
Amit Khilare INN Hotels Project ML 1
No ratings yet
Amit Khilare INN Hotels Project ML 1
39 pages
Capstone Assessment
No ratings yet
Capstone Assessment
18 pages
Predictive Analytics For Enhanced Passenger Satisfaction in The Airline Industry: Leveraging Machine Learning To Drive Strategic Decision-Making
No ratings yet
Predictive Analytics For Enhanced Passenger Satisfaction in The Airline Industry: Leveraging Machine Learning To Drive Strategic Decision-Making
6 pages
Data Science: Customer Satisfaction Prediction
No ratings yet
Data Science: Customer Satisfaction Prediction
35 pages
Capstone Project 1
100% (1)
Capstone Project 1
20 pages
Price Fairness and Customer Satisfaction Analysis
No ratings yet
Price Fairness and Customer Satisfaction Analysis
13 pages
Daa 01
No ratings yet
Daa 01
11 pages
SQQP3123 A212 Assignment 2 - Data Mining
No ratings yet
SQQP3123 A212 Assignment 2 - Data Mining
4 pages
Predicting Hotel Booking Cancellations
No ratings yet
Predicting Hotel Booking Cancellations
9 pages
Machine Learning for Airline Satisfaction
No ratings yet
Machine Learning for Airline Satisfaction
5 pages
Predictive Model for Retailers
100% (1)
Predictive Model for Retailers
3 pages
Effect of Customer Experience On Customer Satisfaction: Research Proposal
No ratings yet
Effect of Customer Experience On Customer Satisfaction: Research Proposal
4 pages
Stats - Last - Final
No ratings yet
Stats - Last - Final
15 pages
Airplane Passenger Satisfaction Prediction Final Report
No ratings yet
Airplane Passenger Satisfaction Prediction Final Report
47 pages
GL Project3 Supervised Learning
No ratings yet
GL Project3 Supervised Learning
32 pages
Airline Passenger Satisfaction Analysis
No ratings yet
Airline Passenger Satisfaction Analysis
23 pages
Airline Passenger Satisfaction Prediction
No ratings yet
Airline Passenger Satisfaction Prediction
8 pages
Dynamic Pricing & E-commerce Analysis
No ratings yet
Dynamic Pricing & E-commerce Analysis
20 pages
Social Media Analysis for Tourism Sales
No ratings yet
Social Media Analysis for Tourism Sales
33 pages
Customer Satisfaction Research Methods
No ratings yet
Customer Satisfaction Research Methods
60 pages
Customer Behavior Prediction Strategies
No ratings yet
Customer Behavior Prediction Strategies
8 pages
SMT Capstone PPT Ayushi Rastogi PGPDSBA.O.MAY22.C
No ratings yet
SMT Capstone PPT Ayushi Rastogi PGPDSBA.O.MAY22.C
12 pages
Machine Learning for Consumer Behavior Prediction
No ratings yet
Machine Learning for Consumer Behavior Prediction
19 pages
Insurance Claim Prediction Models Analysis
67% (3)
Insurance Claim Prediction Models Analysis
33 pages
Predictive Modeling Techniques Overview
No ratings yet
Predictive Modeling Techniques Overview
82 pages
ML1+Project+ (Coded) + +Sample+Business+Report
No ratings yet
ML1+Project+ (Coded) + +Sample+Business+Report
56 pages
AMRP Presentation: Project Domain (Marketing Research)
No ratings yet
AMRP Presentation: Project Domain (Marketing Research)
19 pages
Enhancing Mobile Customer Satisfaction
No ratings yet
Enhancing Mobile Customer Satisfaction
45 pages
Telecom Customer Churn Prediction Report
100% (1)
Telecom Customer Churn Prediction Report
38 pages
Regression Analysis of E-commerce Spending
No ratings yet
Regression Analysis of E-commerce Spending
7 pages
Customer Purchase Behavior Prediction
No ratings yet
Customer Purchase Behavior Prediction
2 pages
ML1 Project - Sample Business Report
No ratings yet
ML1 Project - Sample Business Report
56 pages
Predicting User Satisfaction in BD E-commerce
No ratings yet
Predicting User Satisfaction in BD E-commerce
18 pages
Personalized Intelligent Hotel Recommendation System For Online Reservation
No ratings yet
Personalized Intelligent Hotel Recommendation System For Online Reservation
5 pages
Modelling Perceived Quality, Visitor Satisfaction and Behavioural Intentions at The Destination Level
No ratings yet
Modelling Perceived Quality, Visitor Satisfaction and Behavioural Intentions at The Destination Level
11 pages
Telecom Customer Churn Project Report
50% (2)
Telecom Customer Churn Project Report
25 pages
Social Media Tourism - Capstone Project
No ratings yet
Social Media Tourism - Capstone Project
13 pages
Customer Satisfaction Analysis Guide
No ratings yet
Customer Satisfaction Analysis Guide
47 pages
Instrumentation, Measurement and Analysis 4th Edition Chaudhary Nakra Sample
No ratings yet
Instrumentation, Measurement and Analysis 4th Edition Chaudhary Nakra Sample
117 pages
Chemical Bonding: Objective Questions
No ratings yet
Chemical Bonding: Objective Questions
9 pages
Journal of Energy Storage: Sciencedirect
No ratings yet
Journal of Energy Storage: Sciencedirect
15 pages
Asme B16.10 - 2000
No ratings yet
Asme B16.10 - 2000
51 pages
Viscosity-Based Detection System
No ratings yet
Viscosity-Based Detection System
12 pages
Circle Theorems and Angle Calculations
No ratings yet
Circle Theorems and Angle Calculations
6 pages
Exercise
No ratings yet
Exercise
3 pages
Samsung CLP-600 Series Service Manual
100% (1)
Samsung CLP-600 Series Service Manual
254 pages
Mathematics 5 Quarter 2 Module 8 Editted WLP Final Martina Agullana
No ratings yet
Mathematics 5 Quarter 2 Module 8 Editted WLP Final Martina Agullana
22 pages
The Corporate Social Performance-Financial Performance Link
No ratings yet
The Corporate Social Performance-Financial Performance Link
17 pages
Magnetic Braking System
No ratings yet
Magnetic Braking System
28 pages
2D6 Dungeon Core Rules Current Version-14
No ratings yet
2D6 Dungeon Core Rules Current Version-14
1 page
Semmozhi Poonga Report - v2
No ratings yet
Semmozhi Poonga Report - v2
74 pages
House Price Prediction Model Development
No ratings yet
House Price Prediction Model Development
7 pages
KPG
No ratings yet
KPG
3 pages
Emi 2
No ratings yet
Emi 2
10 pages
Final Exam Imp Question
No ratings yet
Final Exam Imp Question
20 pages
Remote Control Panel Operation Guide
No ratings yet
Remote Control Panel Operation Guide
5 pages
Digital Signal Processing Concepts
No ratings yet
Digital Signal Processing Concepts
13 pages
Role Based Access
No ratings yet
Role Based Access
4 pages
Class XII Computer Science Sample Paper
No ratings yet
Class XII Computer Science Sample Paper
13 pages
Linear & Planar Array Antennas Simulation
100% (1)
Linear & Planar Array Antennas Simulation
9 pages
Maths Lit p1 Memo Gr11 Nov2023 - English
100% (1)
Maths Lit p1 Memo Gr11 Nov2023 - English
6 pages
The Significance of Math in Nature
No ratings yet
The Significance of Math in Nature
2 pages
OpenCart Customer & Currency Methods
No ratings yet
OpenCart Customer & Currency Methods
1 page
Pie Chart For Bank Mains Exam Question Bank Set 1 (Eng)
No ratings yet
Pie Chart For Bank Mains Exam Question Bank Set 1 (Eng)
8 pages
MBA Assignment Questions for HR, Marketing, Finance
No ratings yet
MBA Assignment Questions for HR, Marketing, Finance
9 pages
Coastal Geology and Engineering
No ratings yet
Coastal Geology and Engineering
56 pages
Backhoe Loader Report
No ratings yet
Backhoe Loader Report
17 pages
PNR Flow
No ratings yet
PNR Flow
3 pages

Assigment2 IndividualReport

Uploaded by

Assigment2 IndividualReport

Uploaded by

Assignment 2 - Individual Report

Figure 1: BoxPlot of the Variables before cleaning process to understand them

model <- lm(Satisfaction ~ BookingCount + TotalSpending + Review Sentiment +

#CLEANING DATA PROCESS

boxplot(team_data[, c("TotalSpending", "InternationalSpend", "DomesticSpend", "Female", "Age",

summary(team_data[, c("TotalSpending", "InternationalSpend", "DomesticSpend", "Female", "Age",

#CHECKING FOR OUTLIERS AND DELETING THOSE HOW AREN'T RELEVANT

# Load the dataset

# CLEANING AND PREDICTING DATA FROM THE COLUMN ReviewSentiment

model <- lm(ReviewSentiment ~ ., data = subset_A)

# CLEANING AND PREDICTING DATA FROM THE COLUMN ReviewMentionsWebsite

# Filter rows where Wave is 1, 2, or 5

# Check the first few rows of the filtered data

# Create histogram for HomeRegion and Satisfaction

# Explore the structure of the dataset

# Check the first few rows of the dataset

# Summary statistics of numerical variables

# Check for missing values

# Make predictions on the test set

# Evaluate model performance

# Print model evaluation metrics

lmAll <- lm(TotalSpending ~ ., data =final_data)

# Filter out rows with missing values in TotalSpending column

# Build a linear regression model

# Summary of the model

model2 <- lm(Satisfaction ~ BookingCount + TotalSpending + ReviewSentiment + NetPromoterScore , data =

# Create a mosaic plot

#TREE PREDICTION MODEL

predictions2 <- factor(predictions)

# Recode Satisfaction variable

# Split data into training and testing sets

# Fit decision tree model

# Plot decision tree

# Step 1: Run a multi-linear regression with all predictor variables

# Step 2: Select the most reliable model using BIC

# Predict using reduced model

# Encode satisfaction variable into binary outcome

# Run classification tree model

# Evaluate performance using accuracy scores

# Fit initial full model

You might also like