DATA 219 - Foundations for Data Science - Spring 2019
Homework #7 — Down the Drain
Due: Wednesday, April 24th, midnight
This is a solo assignment, to be completed on your own.
For this homework, you will be comparing the quality of plumbers and other
businesses in various parts of America, a handy thing to know.
- Choose ten U.S. cities that you have lived in, have visited, or have
- Create an account for yelp.com's Fusion API. Create an
"application" and make note of your Client ID and API key.
- Write code to produce a DataFrame with at least 2000 rows and the
- business — The name of a plumbing (or other) business.
- city — The name of the city in which that business
- price — A number from 1-to-4 indicating how expensive
the business is.
- review — A number from 1-to-5 indicating the number of
stars a Yelp reviewer gave this business.
In order to do this, you'll need to:
- Use Yelp's search API to search for plumbers in one of your
cities. Get at least 100 plumbers. (If Yelp doesn't have information on 100
plumbers, choose a different city instead.)
- Use the business lookup API to find the reviews of each plumber.
Get as many ratings as Yelp has for that business.
- Add all the information to the lists you'll use to create the
- Repeat steps a through c for all ten cities.
Note: Yelp's way of authenticating to the API is a bit different than
Twitter's or food2fork's. You specify the API key in an "HTTP header"
rather than in the URL. This can be accomplished by passing a second argument,
called "headers", to requests.get(). The value you pass for
"headers" should be a dictionary with a single key/value pair. The
key is the word "Authorization". The value is the
concatenation of two strings, separated by a space: the first string is the
literal word "Bearer", and the second word is the (huge)
128-character API key you got when you registered your Yelp application.
- Create your Pandas DataFrame from the lists you've been appending to
during all of your API action.
- Save your DataFrame to a .csv file called
- Create and save a pretty, well-labeled boxplot that shows the city name on
the x-axis and plumber ratings on the y-axis. Sort this boxplot left-to-right
in order of city-with-best-plumbers on the left all the way down to
city-with-worst-plumbers on the right.
- Repeat steps 3 through 6, above, for a business type of your choosing
other than plumbers.
Turning it in
For this homework, you will send me an email with subject line "DATA
219 homework #7 turnin" and four attachments: (1) your
plumbers.csv file, (2) an image file with your plumbers/cities
boxplot, (3) your second .csv file (labeled appropriately for your
second business choice), and (4) your second image file with your
Come to office hours, or send me email with subject line "DATA 219
Homework #7 help!!"