0% found this document useful (0 votes)
13 views34 pages

Project Presentation

Restaurant recomendation ml project

Uploaded by

nirannjanss
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views34 pages

Project Presentation

Restaurant recomendation ml project

Uploaded by

nirannjanss
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 34

City & Cuisine-Based

Restaurant
z z
Recommender
Using Yelp Dataset
z
School of Computer Science And
Engineering
Data Mining and Analysis(18ECSC301)
Course Project
on

Yelp Data Challenge


Round 12

Team Leader: Adarsh Raj Team Members:


 Abhijeet Prakash
 Abhishek Sawant
 Adarsh Raj
 Apeksha Ninnekar
z
INTRODUCT
• Yelp's website, Yelp.com, is a crowd-sourced
local business
. review and social networking
site active in major metropolitan areas.
• Yelp users can submit a review of their products
or services using a one to five star rating system.

• Yelp has over 135 million restaurant and


business reviews worldwide.
Description
z
 Yelp has 135 millions restaurants worldwide.

 Whether you’re looking for a continental food, a


great coffee shop nearby, a new salon, or the best
handyman in town, Yelp is your city guide to
finding the perfect places to eat, shop, drink, relax,
visit and play.
Problem
statement
“To search for and recommend best restaurants
in a city for different kinds of cuisines based on
reviews given by customers .”
Project vision
z

 Yelp contains review data of various restaurants


in a city and helps users in choosing a
restaurant.
 In this project we have used review text for
recommending restaurants to the users for
different cuisines.
 We have investigated features of yelp data for
rating prediction and recommendation tasks
Dataset z

 The size of the Data is 6.84 Gb including the Attributes of Business


data
sub files
 Business Dataset(139 Mb)

 Check-In Dataset (50.3 Mb)

 Photo Dataset (34.9 Mb)

 Review Dataset (4.39 Gb)

 Tips Dataset (203 Mb)

 Users Dataset (2.03 Gb)


Data sets used :
• Business
• review
Exploratory analysis
z

• Graph to check no of food businesses in each


state .

Conclusion: Number of businesses for food categories


was highest in Ontario state i.e. 17907 businesses
Exploratory analysis
z

• Top 10 cities with highest review ratings in Ontario


state.
Exploratory analysis
z

Majority of the stars for food


business are 5-stars
Categories selected
z

Over all food categories:


 ‘Food’, ' Restaurants’, 'Pizza', 'Mexican', 'American (Traditional)',
'American (New)', 'Italian', ''Indian', ' Pakistani', 'Thai', ' Japanese',
'French’,’ Canadian (New ), ' Middle Eastern', 'German', 'Vietnamese',
'Chinese', 'Hungarian'

Cuisines:
 Indian
 Chinese
 Thai
 Italian
 Japanese
Exploratory analysis
z

 Majority of the food categories selected was


restaurant .
Data Reduction
z

After exploratory analysis , we trimmed our


dataset:
We selected instances with:
 Food related businesses

 State as ‘ Ontario’
Methodology
z
Pre processing
z

Dropped :
 28 columns from business file
 4 columns from review file
 Adding new columns: senti-polarity and text clear

Data integration:
 Combining the two dataset : business and the review the total
no of columns after integration is 36 columns and 482384 rows
Predictive tasks
z

There are two major tasks in our project:


 Predict rating from review text:

- Linear support vector machine


classifier
 Find the sentiment polarity and recommend the
top best restaurants for each cuisine type.
– Sentiment polarity
- Mean of star ratings
Linear support vector
machinez
’ Linear support vector machine classifier
 text pre-processing

 removed punctuations, stop words and tokenized the reviews

 converted each review into a vector using tf-idf

Training the model


 split the dataset into training and test set by 80:20 ratio

 build a multiclass svm classifier and fit it to our training set

Test and evaluating the model


 tested the model for 5 classes(1,2,3,4,5 rating)
Using five
z
 ’

classes(1,2,3,4,5)
Task2: Recommending
Restaurants To Users
z

 ’

 Calculate sentiment polarity for each review


text
 Find mean sentiment polarity for each
business_id
 Find mean stars for each business_id
 Considering the business with mean stars
greater than 3.5 and sentiment polarity
greater than 0 as good restaurants.
Plotting graphs of stars vs
sentiment polarity
z
From graph we see that all the stars greater than
3.5 are above 0 of senti-polarity.

 ’
Displaying restaurants on
map z
Mapping
z the restaurants on
world map
Finding the best
z

restaurants
Finding the top best restaurants on YELP:
• based on the stars and the highest senti polarity value
z

 ’
Indian cuisine
z

Finding top restaurants for Indian


cuisine.
z

 ’
WORDCLOUD
z
Chinese cuisine
z
 Finding the top best restaurants for Chinese cuisine.
GUI z
GUI z
z
z

Thank you
z
z
z

You might also like