DATA 219 - Foundations for Data Science - Spring 2019

Homework #7 — Down the Drain

Due: Wednesday, April 24th, midnight

This is a solo assignment, to be completed on your own.

Instructions

For this homework, you will be comparing the quality of plumbers and other businesses in various parts of America, a handy thing to know.

  1. Choose ten U.S. cities that you have lived in, have visited, or have heard of.
  2. Create an account for yelp.com's Fusion API. Create an "application" and make note of your Client ID and API key.
  3. Write code to produce a DataFrame with at least 2000 rows and the following columns:

    In order to do this, you'll need to:

    1. Use Yelp's search API to search for plumbers in one of your cities. Get at least 100 plumbers. (If Yelp doesn't have information on 100 plumbers, choose a different city instead.)
    2. Use the business lookup API to find the reviews of each plumber. Get as many ratings as Yelp has for that business.
    3. Add all the information to the lists you'll use to create the DataFrame.
    4. Repeat steps a through c for all ten cities.

    Note: Yelp's way of authenticating to the API is a bit different than Twitter's or food2fork's. You specify the API key in an "HTTP header" rather than in the URL. This can be accomplished by passing a second argument, called "headers", to requests.get(). The value you pass for "headers" should be a dictionary with a single key/value pair. The key is the word "Authorization". The value is the concatenation of two strings, separated by a space: the first string is the literal word "Bearer", and the second word is the (huge) 128-character API key you got when you registered your Yelp application.

  4. Create your Pandas DataFrame from the lists you've been appending to during all of your API action.
  5. Save your DataFrame to a .csv file called plumbers.csv.
  6. Create and save a pretty, well-labeled boxplot that shows the city name on the x-axis and plumber ratings on the y-axis. Sort this boxplot left-to-right in order of city-with-best-plumbers on the left all the way down to city-with-worst-plumbers on the right.
  7. Repeat steps 3 through 6, above, for a business type of your choosing other than plumbers.

Turning it in

For this homework, you will send me an email with subject line "DATA 219 homework #7 turnin" and four attachments: (1) your plumbers.csv file, (2) an image file with your plumbers/cities boxplot, (3) your second .csv file (labeled appropriately for your second business choice), and (4) your second image file with your whatever/cities boxplot.

Getting help

Come to office hours, or send me email with subject line "DATA 219 Homework #7 help!!"