Python Project Report
Python Project Report
Submitted by
Group 3
Name UB#
Alekya Kumar 50249052
Monalisa Mishra 50250673
Trupti Jadav 50249177
EAS_503_Python_Project_Report_V1.0 1
Abstract
Analysis of Food Inspection based on the data obtained from Chicago Data Portal. The Food
Protection Division of the Chicago Department of Public Health (CDPH) is committed to
maintaining the safety of food bought, sold, or prepared for public consumption in Chicago by
carrying out science-based inspections of all retail food establishments. These inspections promote
public health in areas of food safety and sanitation and prevent the occurrence of food-borne
illness.
Our analysis was done using Python Pandas Library, Matplotlib Library and SQL. We were able
to study about each of the predictors or the variables in the data set. The data set that we dealt with
is mostly of textual type and involved a set of categorical variables. Studying the variables gave us an
idea as to how each of them gives out a meaning for the data points. We were able to come up
with relationships among the various input variables and the same were visualized.
EAS_503_Python_Project_Report_V1.0 2
Table of Contents
1. Introduction……………………………………………………………………………………………………………4
2. Data……………………………………………………………………………………………………………………….5
2.1. Description……………………………………………………………………………………………………….5
2.2. Steps of Cleaning Data……………………………………………………………………………………….6
3. Analysis & Results……………………………………………………………………………………………………6
3.1. Histogram of Results………………………………………………………………………………………….6
3.2. Time Series Plot for Inspection Type and Facility Type………………………………………..7
3.3. Facility Type against No. of Inspections……………………………………………………………….8
3.4. Result and No. of Violation Analysis……………………………………………………………………9
3.5. Inspection Types and its respective counts…………………………………………………………..10
3.6. Risk Analysis in different zip codes…………………………………………………………………….11
3.7. Business Unit Branches that have maximum number of inspections………………………12
3.8. What type of ratings do establishments get?..................................................................13
3.9. Which Restaurant chains fail inspections most?...........................................................14
3.10. Where are failures most common?.............................................................................15
4. Conclusion…………………………………………………………………………………………………………….16
5. Future Research Directions……………………………………………………………………………………..16
EAS_503_Python_Project_Report_V1.0 3
1. Introduction
There are around 15,000 business establishments serving food and drink across the City of
Chicago that are subject to sanitation inspections by the Department of Public Health at least
once in a year. Factors such as worker’s hygiene, correct holding and storing temperatures for
foods and making sure the food comes from authorized sources with expiry dates maintained,
etc. are all critical for ensuring public health. Each year a restaurant is subject to annual
inspections to ensure continued compliance with City ordinances and regulations and to
reduce public’s exposure to any kind of foodborne disease. In addition to recurring
inspections, restaurants may also be inspected in response to a complaint. Some of these
recurring inspections, such as the inspection by the Buildings Department, will be scheduled,
while others will not.
EAS_503_Python_Project_Report_V1.0 4
2. Data
2.1. Description
This data information was obtained from inspections of restaurants and other food and
drink establishments in Chicago from January 1, 2010 to the present. Details of the
significant variables:
EAS_503_Python_Project_Report_V1.0 5
2.2. Steps of Data Cleaning
Unnecessary columns with respect to this analysis were removed.
Data of facility type and inspection type were not maintained uniformly. So, steps
were taken to maintain a uniformity in data in Python and SQL
Steps were taken to remove duplicate inspection reports. 180 duplicate records
were removed.
Violation column contained textual data. For analysis purpose, the data was
parsed and disintegrated to two different columns, containing violation codes and
their respective counts. And violation descriptions were maintained in a separate
csv file.
EAS_503_Python_Project_Report_V1.0 6
3.2. Time Series Plot for Inspection Type and Facility Type
The facility types had a lot of duplicates which include the misspelling of establishment
names. The duplicates were corrected, and the data was presented. Using the pivot table,
the number of inspections each facility type underwent in the entire span of January,2010
to September,2017 was calculated. The count was plotted with each line indicating the
different facility types.
To get a better idea about what the graph depicts, we went ahead and retrieved the count
for each facility type and grouped only the top ten among them.
Fig2. Time Series Plot for Inspection Type and Facility Type
EAS_503_Python_Project_Report_V1.0 7
3.3. Facility Type against No. of Inspections
We analyzed 'Facility Type' who have undergone 500 and above inspections. From the
table, we can see that 'Restaurant' has highest record of inspections done followed by
'Grocery/Retail Store', 'Private/Public School'.
EAS_503_Python_Project_Report_V1.0 8
3.4. Result and No. of Violation Analysis
The base map of city of Chicago shows the areas where the establishments have more
than 10 number of violations, given after inspection. There are three kinds pf results -
Pass, Fail and Pass with conditions. As we can see in the map, “Pass with conditions”
prevails the most in the upper part of the city. The result indicates that each
establishment was awarded a violation in every inspection to be corrected before the next
recurring inspection.
EAS_503_Python_Project_Report_V1.0 9
3.5. Inspection Types and its respective counts
The next plot shows the various types of inspections each establishment undergoes and
the corresponding count of the inspection.
Inspection types which occurred more than 100 times were taken into account. As we
can see from the table, Canvass inspection type has the maximum count. Canvass is the
type of inspection that is done in restaurants/food outlets to check whether the roofing
and the storage area are properly maintained to keep the food in required optimal
temperature.
EAS_503_Python_Project_Report_V1.0 10
3.6. Risk Analysis in different zip codes
'Risk' according to their intensity are displayed on the Chicago map. This shows us the
area wise distribution of the 'Risk'. We cannot tell much about the risk from the map as
they are evenly scattered across the city. Predominantly, Risk 1 and Risk 2 are observed
more. We can also see the combination of risks at many places (indicated by the brown
points on the map). Also there are many crowded points observed in Main City of
Chicago (Eastern region) indicating most inspection zone area.
EAS_503_Python_Project_Report_V1.0 11
3.7. Business Unit Branches that have maximum number of inspections
Business units with branches with most number of inspections were extracted from the
data set. This graph depicts the business unit branch units which underwent the
maximum number of inspections. The number of inspections is directly proportional to
the risk associated with each business unit. Surprisingly, the Admiral café has the largest
number of inspections in the city of Chicago.
EAS_503_Python_Project_Report_V1.0 12
3.8. What type of ratings do establishments get?
Since we do not have a parameter that gives the ratings to the establishments, we have
taken Risk factor for this analytical question. As we can see, establishments related to
food have the maximum risk – Risk1 or Risk2. Banquets, Restaurants, Bakeries have the
maximum risk. As indicated earlier, the restaurants with high risk will have undergone
large number of repeated inspections.
EAS_503_Python_Project_Report_V1.0 13
3.9. Which Restaurant chains fail inspections most?
There are failures common in Restaurants or food outlets. Analysis was made on the
restaurants to check which restaurant faces the most number of failures since failures are
more common in areas where food storage comes into picture. From the data, it’s clear
that Subway, Dunkin Donuts and McDonald’s faced most no. of inspections.
EAS_503_Python_Project_Report_V1.0 14
3.10. Where are failures most common?
From the data, we can see that majority of inspections were done for facility type
Restaurant. Then we checked the Results of these frequently inspected facility type. So
we came up with analysis where we showed the all the Facility types which have
maximum failed results. On doing that we see that Restaurant has highest number of
failed records followed by Grocery/Retail, Private/Public Schools.
EAS_503_Python_Project_Report_V1.0 15
4. Conclusion
The analysis made on the data set gave us a fair idea as to how food inspection is carried out
not only in the city of Chicago, but also other parts of the country. The violations provided
good insight as to how the restaurants are being classified according to the number of the
violation they are penalized with. On top of that, we inferred about the establishments/business
units that are being inspected frequently and the risk involved with each of them. When
factoring all these together, the research provided the vision as to which establishments should
be inspected first.
EAS_503_Python_Project_Report_V1.0 16