Philly Center City District Sips 2022: An Interactive Map

R-Ladies Philly workshop on webscraping, geocoding, and interactive map-making

Author
Published

September 29, 2022

Silvia Canelón presents Webscraping, Geocoding & Interactive Map-Making with Center City Sips Data. R-Ladies Philly.

Introduction

This workshop is adapted from a blog post of the same name and is accompanied by slides.

The 2022 Center City District Sips website features all of the restaurants participating in the Center City Sips event, but does not offer a map view. This makes it hard to locate a happy hour special nearby, so we’re going to use the data they provide to build an interactive map!

  1. Scrape restaurants and addresses from the website
  2. Geocode the restaurant addresses to obtain geographical coordinates
  3. Build an interactive map with leaflet

Packages

Package Purpose Version
tidyverse Data manipulation and iteration functions 1.3.2.90
here File referencing in project-oriented workflows 0.7.13
knitr Style data frame output into formatted table 1.40
robotstxt Check website for scraping permissions 0.7.13
rvest Scrape the information off of the website 1.0.3
tidygeocoder Geocode the restaurant addresses 1.0.5
leaflet Build the interactive map 2.1.1
leaflet.extras Add extra functionality to map 1.0.0

Scraping the data

We will scrape the data from the 2022 Center City District Sips website, specifically the list view: https://centercityphila.org/explore-center-city/ccd-sips/sips-list-view

Checking site permissions

First we check the site’s terms of service using the robotstxt package, which downloads and parses the site’s robots.txt file.

What we want to look for is whether any pages are not allowed to be crawled by bots/scrapers. In this case there aren’t any, indicated by Allow: /.

get_robotstxt("https://centercityphila.org/explore-center-city/ccd-sips/sips-list-view")
Warning in request_handler_handler(request = request, handler = on_not_found, :
Event: on_not_found
Warning in request_handler_handler(request = request, handler =
on_file_type_mismatch, : Event: on_file_type_mismatch
Warning in request_handler_handler(request = request, handler =
on_suspect_content, : Event: on_suspect_content
[robots.txt]
--------------------------------------

# robots.txt overwrite by: on_suspect_content

User-agent: *
Allow: /



[events]
--------------------------------------

requested:   https://centercityphila.org/explore-center-city/ccd-sips/sips-list-view/robots.txt 
downloaded:  https://centercityphila.org/explore-center-city/ccd-sips/sips-list-view/robots.txt 

$on_not_found
$on_not_found$status_code
[1] 404


$on_file_type_mismatch
$on_file_type_mismatch$content_type
[1] "text/html; charset=utf-8"


$on_suspect_content
$on_suspect_content$parsable
[1] FALSE

$on_suspect_content$content_suspect
[1] TRUE


[attributes]
--------------------------------------

problems, cached, request, class

Harvesting data from the first page

We’ll use the rvest package to scrape the information from the tables of restaurants/bars participating in CCD Sips.

Ideally you would only scrape each page once, so we will check our approach with the first page before writing a function to scrape the remaining pages.

# define the page
url <- "https://centercityphila.org/explore-center-city/ccd-sips/sips-list-view?page=1"

# read the page html
html1 <- read_html(url)

# extract table info
table1 <- 
  html1 |> 
  html_node("table") |> 
  html_table()
table1 |> head() |> kable()
Name Address Phone CCD SIPS Specials
1028 Yamitsuki Sushi & Ramen 1028 Arch Street, Philadelphia, PA 19107 215.629.3888 CCD SIPS Specials
1225 Raw Sushi and Sake Lounge 1225 Sansom St, Philadelphia, PA 19102 215.238.1903 CCD SIPS Specials
1518 Bar and Grill 1518 Sansom St, Philadelphia, PA 19102 267.639.6851 CCD SIPS Specials
Air Grille Garden at Dilworth Park 1 S 15th St, Philadelphia, PA 19102 215.587.2761 CCD SIPS Specials
Aki Nom Nom 1210 Walnut St, Philadelphia, PA 19107 215.985.1838 CCD SIPS Specials
ArtBar 1800 Market St, Philadelphia, PA 19103 215.825.6723 CCD SIPS Specials
# extract hyperlinks to specific restaurant/bar specials
links <- 
  html1 |> 
  html_elements(".o-table__tag.ccd-text-link") |> 
  html_attr("href") |> 
  as_tibble()
links |> head() |> kable()
value
#1028-yamitsuki-sushi-ramen
#1225-raw-sushi-and-sake-lounge
#1518-bar-and-grill
#air-grill-garden-dilworth-park
#aki-nom-nom
#artbar
# add full hyperlinks to the table info
table1Mod <-
  bind_cols(table1, links) |> 
  mutate(Specials = paste0(url, value)) |> 
  select(-c(`CCD SIPS Specials`, value))
table1Mod |> head() |> kable()
Name Address Phone Specials
1028 Yamitsuki Sushi & Ramen 1028 Arch Street, Philadelphia, PA 19107 215.629.3888 https://centercityphila.org/explore-center-city/ccd-sips/sips-list-view?page=1#1028-yamitsuki-sushi-ramen
1225 Raw Sushi and Sake Lounge 1225 Sansom St, Philadelphia, PA 19102 215.238.1903 https://centercityphila.org/explore-center-city/ccd-sips/sips-list-view?page=1#1225-raw-sushi-and-sake-lounge
1518 Bar and Grill 1518 Sansom St, Philadelphia, PA 19102 267.639.6851 https://centercityphila.org/explore-center-city/ccd-sips/sips-list-view?page=1#1518-bar-and-grill
Air Grille Garden at Dilworth Park 1 S 15th St, Philadelphia, PA 19102 215.587.2761 https://centercityphila.org/explore-center-city/ccd-sips/sips-list-view?page=1#air-grill-garden-dilworth-park
Aki Nom Nom 1210 Walnut St, Philadelphia, PA 19107 215.985.1838 https://centercityphila.org/explore-center-city/ccd-sips/sips-list-view?page=1#aki-nom-nom
ArtBar 1800 Market St, Philadelphia, PA 19103 215.825.6723 https://centercityphila.org/explore-center-city/ccd-sips/sips-list-view?page=1#artbar

Harvesting data from the remaining pages

We confirmed that the above approach harvested the information we needed, so we can adapt the code into a function that we can apply to pages 2-3 of the site.

getTables <- function(pageNumber) {
 
  # wait 2 seconds between each scrape
  Sys.sleep(2)
  
  url <- paste0("https://centercityphila.org/explore-center-city/ccd-sips/sips-list-view?page=", pageNumber)
  
  # read the page html
  html <- read_html(url)
  
  # extract table info
  table <- 
    html |> 
    html_node("table") |>
    html_table()
  
  # extract hyperlinks to specific restaurant/bar specials
  links <- 
    html |> 
    html_elements(".o-table__tag.ccd-text-link") |> 
    html_attr("href") |> 
    as_tibble()
  
  # add full hyperlinks to the table info
  tableSpecials <<-
    bind_cols(table, links) |> 
    mutate(Specials = paste0(url, value)) |> 
    select(-c(`CCD SIPS Specials`, value))
}

We can use the getTable() function and the purrr::map_df() function to harvest the table of restaurants/bars from pages 2 and 3.

Then we can combine all the data frames together and saved the complete data frame as an .Rds object so that we won’t have to scrape the data again.

Shortcut

Skip this step and load the data from the data/ folder:

table <- read_rds(here("data", "specialsScraped.Rds"))
# get remaining tables
table2 <- map_df(2:3, getTables) 

# combine all tables
table <- bind_rows(table1Mod, table2)
table |> head() |> kable()
Name Address Phone Specials
1028 Yamitsuki Sushi & Ramen 1028 Arch Street, Philadelphia, PA 19107 215.629.3888 https://centercityphila.org/explore-center-city/ccd-sips/sips-list-view?page=1#1028-yamitsuki-sushi-ramen
1225 Raw Sushi and Sake Lounge 1225 Sansom St, Philadelphia, PA 19102 215.238.1903 https://centercityphila.org/explore-center-city/ccd-sips/sips-list-view?page=1#1225-raw-sushi-and-sake-lounge
1518 Bar and Grill 1518 Sansom St, Philadelphia, PA 19102 267.639.6851 https://centercityphila.org/explore-center-city/ccd-sips/sips-list-view?page=1#1518-bar-and-grill
Air Grille Garden at Dilworth Park 1 S 15th St, Philadelphia, PA 19102 215.587.2761 https://centercityphila.org/explore-center-city/ccd-sips/sips-list-view?page=1#air-grill-garden-dilworth-park
Aki Nom Nom 1210 Walnut St, Philadelphia, PA 19107 215.985.1838 https://centercityphila.org/explore-center-city/ccd-sips/sips-list-view?page=1#aki-nom-nom
ArtBar 1800 Market St, Philadelphia, PA 19103 215.825.6723 https://centercityphila.org/explore-center-city/ccd-sips/sips-list-view?page=1#artbar
# save full table to file
write_rds(table,
          file = here("data", "specialsScraped.Rds"))

Geocoding addresses

The next step is to use geocoding to convert the restaurant/bar addresses to geographical coordinates (longitude and latitude) that we can map. We can use the tidygeocoder package to help us, and specify that we want to use the ArcGIS geocoding service.

# geocode addresses
specials <- 
  table |> 
  geocode(address = Address,
          method = 'arcgis', 
          long = Longitude,
          lat = Latitude)
Passing 60 addresses to the ArcGIS single address geocoder
Query completed in: 27.2 seconds
specials |> head() |> kable()
Name Address Phone Specials Latitude Longitude
1028 Yamitsuki Sushi & Ramen 1028 Arch Street, Philadelphia, PA 19107 215.629.3888 https://centercityphila.org/explore-center-city/ccd-sips/sips-list-view?page=1#1028-yamitsuki-sushi-ramen 39.95342 -75.15750
1225 Raw Sushi and Sake Lounge 1225 Sansom St, Philadelphia, PA 19102 215.238.1903 https://centercityphila.org/explore-center-city/ccd-sips/sips-list-view?page=1#1225-raw-sushi-and-sake-lounge 39.94976 -75.16089
1518 Bar and Grill 1518 Sansom St, Philadelphia, PA 19102 267.639.6851 https://centercityphila.org/explore-center-city/ccd-sips/sips-list-view?page=1#1518-bar-and-grill 39.95024 -75.16664
Air Grille Garden at Dilworth Park 1 S 15th St, Philadelphia, PA 19102 215.587.2761 https://centercityphila.org/explore-center-city/ccd-sips/sips-list-view?page=1#air-grill-garden-dilworth-park 39.95253 -75.16515
Aki Nom Nom 1210 Walnut St, Philadelphia, PA 19107 215.985.1838 https://centercityphila.org/explore-center-city/ccd-sips/sips-list-view?page=1#aki-nom-nom 39.94874 -75.16115
ArtBar 1800 Market St, Philadelphia, PA 19103 215.825.6723 https://centercityphila.org/explore-center-city/ccd-sips/sips-list-view?page=1#artbar 39.95286 -75.17046

Make sure to save the new data frame with geographical coordinates as an .Rds object so you won’t have to geocode the data again! This is particularly important if you ever want to work with a large project.

# save table with geocoded addresses to file
write_rds(specials,
          file = here("data", "specialsGeocoded.Rds"))

Building the map

To build the map, we can use the leaflet package.

Tip

Add a Google Font with a css chunk that imports the font face(s) and weights you want to use (e.g. Red Hat Text)

```{css}
@import url('https://fonts.googleapis.com/css2?family=Red+Hat+Text:ital,wght@0,300;0,400;1,300;1,400&display=swap');
```

Plotting the restaurants/bars

leaflet(data = specials, 
        options = tileOptions(minZoom = 15,
                              maxZoom = 19)) |>
  # add map markers
  addCircles(
    lat = ~ specials$Latitude, 
    lng = ~ specials$Longitude,
    popup = specials$Address,
    label = ~ Name,
    # customize labels
    labelOptions = labelOptions(
      style = list(
        "font-family" = "Red Hat Text, sans-serif",
        "font-size" = "1.2em")
      )
    )

Adding the map background

leaflet(data = specials, 
        options = tileOptions(minZoom = 15,
                              maxZoom = 19)) |>
  # add map markers
  addCircles(
    lat = ~ specials$Latitude, 
    lng = ~ specials$Longitude, 
    popup = specials$Address,
    label = ~ Name,
    # customize labels
    labelOptions = labelOptions(
      style = list(
        "font-family" = "Red Hat Text, sans-serif",
        "font-size" = "1.2em")
      )
    ) |>
  # add map tiles in the background
  addProviderTiles(providers$CartoDB.Positron)

Setting the map view

leaflet(data = specials, 
        options = tileOptions(minZoom = 15,
                              maxZoom = 19)) |>
  # add map markers
  addCircles(
    lat = ~ specials$Latitude, 
    lng = ~ specials$Longitude, 
    popup = specials$Address,
    label = ~ Name,
    # customize labels
    labelOptions = labelOptions(
      style = list(
        "font-family" = "Red Hat Text, sans-serif",
        "font-size" = "1.2em")
      )
    ) |>
  # add map tiles in the background
  addProviderTiles(providers$CartoDB.Positron) |>
  # set the map view
  setView(mean(specials$Longitude), 
          mean(specials$Latitude), 
          zoom = 16)

Adding fullscreen control

leaflet(data = specials, 
        options = tileOptions(minZoom = 15,
                              maxZoom = 19)) |>
  # add map markers
  addCircles(
    lat = ~ specials$Latitude, 
    lng = ~ specials$Longitude, 
    popup = specials$Address,
    label = ~ Name,
    # customize labels
    labelOptions = labelOptions(
      style = list(
        "font-family" = "Red Hat Text, sans-serif",
        "font-size" = "1.2em")
      )
    ) |>
  # add map tiles in the background
  addProviderTiles(providers$CartoDB.Positron) |>
  # set the map view
  setView(mean(specials$Longitude), 
          mean(specials$Latitude), 
          zoom = 16) |>
  # add fullscreen control button
  leaflet.extras::addFullscreenControl()

Customizing map markers

# style pop-ups for the map with inline css styling

# marker for the restaurants/bars
popInfoCircles <- paste(
  "<h2 style='font-family: Red Hat Text, sans-serif; font-size: 1.6em; color:#43464C;'>", 
  "<a style='color: #00857A;' href=", specials$Specials, ">", specials$Name, "</a></h2>",
  "<p style='font-family: Red Hat Text, sans-serif; font-weight: normal; font-size: 1.5em; color:#9197A6;'>", specials$Address, "</p>"
  )
leaflet(data = specials, 
        options = tileOptions(minZoom = 15,
                              maxZoom = 19)) |>
  # add map markers
  addCircles(
    lat = ~ specials$Latitude, 
    lng = ~ specials$Longitude,
    # customize markers
    fillColor = "#009E91",
    fillOpacity = 0.6, 
    stroke = F,
    radius = 12,
    # customize pop-ups
    popup = popInfoCircles,
    label = ~ Name,
    # customize labels
    labelOptions = labelOptions(
      style = list(
        "font-family" = "Red Hat Text, sans-serif",
        "font-size" = "1.2em")
      )
    ) |>
  # add map tiles in the background
  addProviderTiles(providers$CartoDB.Positron) |>
  # set the map view
  setView(mean(specials$Longitude), 
          mean(specials$Latitude), 
          zoom = 16) |> 
  # add fullscreen control button
  leaflet.extras::addFullscreenControl()

Adding a marker at the center

# marker for the center of the map
popInfoMarker <- paste(
  "<h1 style='padding-top: 0.5em; margin-top: 1em; margin-bottom: 0.5em; font-family: Red Hat Text, sans-serif; font-size: 1.8em; color:#43464C;'>", 
  "<a style='color: #00857A;' href='https://centercityphila.org/explore-center-city/ccdsips'>",
  "Center City District Sips 2022", 
  "</a></h1><p style='color:#9197A6; font-family: Red Hat Text, sans-serif; font-size: 1.5em; padding-bottom: 1em;'>", 
  "Philadelphia, PA", "</p>")

# custom icon for the center of the map
centerIcon <-
  makeAwesomeIcon(
    icon = "map-pin",
    iconColor = "#FFFFFF",
    markerColor = "darkblue", # accepts HTML colors
    library = "fa"
  )
leaflet(data = specials, 
        options = tileOptions(minZoom = 15,
                              maxZoom = 19)) |>
  # add map markers
  addCircles(
    lat = ~ specials$Latitude, 
    lng = ~ specials$Longitude, 
    # customize markers
    fillColor = "#009E91",
    fillOpacity = 0.6, 
    stroke = F,
    radius = 12,
    # customize pop-ups
    popup = popInfoCircles,
    label = ~ Name,
    # customize labels
    labelOptions = labelOptions(
      style = list(
        "font-family" = "Red Hat Text, sans-serif",
        "font-size" = "1.2em")
      )
    ) |>
  # add map tiles in the background
  addProviderTiles(providers$CartoDB.Positron) |>
  # set the map view
  setView(mean(specials$Longitude), 
          mean(specials$Latitude), 
          zoom = 16) |> 
  # add fullscreen control button
  leaflet.extras::addFullscreenControl() |> 
  # add marker at the center
  addAwesomeMarkers(
    icon = centerIcon,
    lng = mean(specials$Longitude), 
    lat = mean(specials$Latitude), 
    label = "Center City District Sips 2022",
    # customize labels
    labelOptions = labelOptions(
      style = list(
        "font-family" = "Red Hat Text, sans-serif",
        "font-size" = "1.2em")
      ),
    popup = popInfoMarker,
    popupOptions = popupOptions(maxWidth = 250))