Philly Center City District Sips 2022: An Interactive Map
R-Ladies Philly workshop on webscraping, geocoding, and interactive map-making
Introduction
This workshop is adapted from a blog post of the same name and is accompanied by slides.
The 2022 Center City District Sips website features all of the restaurants participating in the Center City Sips event, but does not offer a map view. This makes it hard to locate a happy hour special nearby, so we’re going to use the data they provide to build an interactive map!
- Scrape restaurants and addresses from the website
- Geocode the restaurant addresses to obtain geographical coordinates
- Build an interactive map with
leaflet
Packages
Package | Purpose | Version |
---|---|---|
tidyverse |
Data manipulation and iteration functions | 1.3.2.90 |
here |
File referencing in project-oriented workflows | 0.7.13 |
knitr |
Style data frame output into formatted table | 1.40 |
robotstxt |
Check website for scraping permissions | 0.7.13 |
rvest |
Scrape the information off of the website | 1.0.3 |
tidygeocoder |
Geocode the restaurant addresses | 1.0.5 |
leaflet |
Build the interactive map | 2.1.1 |
leaflet.extras |
Add extra functionality to map | 1.0.0 |
Scraping the data
We will scrape the data from the 2022 Center City District Sips website, specifically the list view: https://centercityphila.org/explore-center-city/ccd-sips/sips-list-view
Checking site permissions
First we check the site’s terms of service using the robotstxt package, which downloads and parses the site’s robots.txt file.
What we want to look for is whether any pages are not allowed to be crawled by bots/scrapers. In this case there aren’t any, indicated by Allow: /
.
get_robotstxt("https://centercityphila.org/explore-center-city/ccd-sips/sips-list-view")
Warning in request_handler_handler(request = request, handler = on_not_found, :
Event: on_not_found
Warning in request_handler_handler(request = request, handler =
on_file_type_mismatch, : Event: on_file_type_mismatch
Warning in request_handler_handler(request = request, handler =
on_suspect_content, : Event: on_suspect_content
[robots.txt]
--------------------------------------
# robots.txt overwrite by: on_suspect_content
User-agent: *
Allow: /
[events]
--------------------------------------
requested: https://centercityphila.org/explore-center-city/ccd-sips/sips-list-view/robots.txt
downloaded: https://centercityphila.org/explore-center-city/ccd-sips/sips-list-view/robots.txt
$on_not_found
$on_not_found$status_code
[1] 404
$on_file_type_mismatch
$on_file_type_mismatch$content_type
[1] "text/html; charset=utf-8"
$on_suspect_content
$on_suspect_content$parsable
[1] FALSE
$on_suspect_content$content_suspect
[1] TRUE
[attributes]
--------------------------------------
problems, cached, request, class
Harvesting data from the first page
We’ll use the rvest package to scrape the information from the tables of restaurants/bars participating in CCD Sips.
Ideally you would only scrape each page once, so we will check our approach with the first page before writing a function to scrape the remaining pages.
# define the page
url <- "https://centercityphila.org/explore-center-city/ccd-sips/sips-list-view?page=1"
# read the page html
html1 <- read_html(url)
# extract table info
table1 <-
html1 |>
html_node("table") |>
html_table()
table1 |> head() |> kable()
Name | Address | Phone | CCD SIPS Specials |
---|---|---|---|
1028 Yamitsuki Sushi & Ramen | 1028 Arch Street, Philadelphia, PA 19107 | 215.629.3888 | CCD SIPS Specials |
1225 Raw Sushi and Sake Lounge | 1225 Sansom St, Philadelphia, PA 19102 | 215.238.1903 | CCD SIPS Specials |
1518 Bar and Grill | 1518 Sansom St, Philadelphia, PA 19102 | 267.639.6851 | CCD SIPS Specials |
Air Grille Garden at Dilworth Park | 1 S 15th St, Philadelphia, PA 19102 | 215.587.2761 | CCD SIPS Specials |
Aki Nom Nom | 1210 Walnut St, Philadelphia, PA 19107 | 215.985.1838 | CCD SIPS Specials |
ArtBar | 1800 Market St, Philadelphia, PA 19103 | 215.825.6723 | CCD SIPS Specials |
# extract hyperlinks to specific restaurant/bar specials
links <-
html1 |>
html_elements(".o-table__tag.ccd-text-link") |>
html_attr("href") |>
as_tibble()
links |> head() |> kable()
value |
---|
#1028-yamitsuki-sushi-ramen |
#1225-raw-sushi-and-sake-lounge |
#1518-bar-and-grill |
#air-grill-garden-dilworth-park |
#aki-nom-nom |
#artbar |
# add full hyperlinks to the table info
table1Mod <-
bind_cols(table1, links) |>
mutate(Specials = paste0(url, value)) |>
select(-c(`CCD SIPS Specials`, value))
table1Mod |> head() |> kable()
Name | Address | Phone | Specials |
---|---|---|---|
1028 Yamitsuki Sushi & Ramen | 1028 Arch Street, Philadelphia, PA 19107 | 215.629.3888 | https://centercityphila.org/explore-center-city/ccd-sips/sips-list-view?page=1#1028-yamitsuki-sushi-ramen |
1225 Raw Sushi and Sake Lounge | 1225 Sansom St, Philadelphia, PA 19102 | 215.238.1903 | https://centercityphila.org/explore-center-city/ccd-sips/sips-list-view?page=1#1225-raw-sushi-and-sake-lounge |
1518 Bar and Grill | 1518 Sansom St, Philadelphia, PA 19102 | 267.639.6851 | https://centercityphila.org/explore-center-city/ccd-sips/sips-list-view?page=1#1518-bar-and-grill |
Air Grille Garden at Dilworth Park | 1 S 15th St, Philadelphia, PA 19102 | 215.587.2761 | https://centercityphila.org/explore-center-city/ccd-sips/sips-list-view?page=1#air-grill-garden-dilworth-park |
Aki Nom Nom | 1210 Walnut St, Philadelphia, PA 19107 | 215.985.1838 | https://centercityphila.org/explore-center-city/ccd-sips/sips-list-view?page=1#aki-nom-nom |
ArtBar | 1800 Market St, Philadelphia, PA 19103 | 215.825.6723 | https://centercityphila.org/explore-center-city/ccd-sips/sips-list-view?page=1#artbar |
Harvesting data from the remaining pages
We confirmed that the above approach harvested the information we needed, so we can adapt the code into a function that we can apply to pages 2-3 of the site.
getTables <- function(pageNumber) {
# wait 2 seconds between each scrape
Sys.sleep(2)
url <- paste0("https://centercityphila.org/explore-center-city/ccd-sips/sips-list-view?page=", pageNumber)
# read the page html
html <- read_html(url)
# extract table info
table <-
html |>
html_node("table") |>
html_table()
# extract hyperlinks to specific restaurant/bar specials
links <-
html |>
html_elements(".o-table__tag.ccd-text-link") |>
html_attr("href") |>
as_tibble()
# add full hyperlinks to the table info
tableSpecials <<-
bind_cols(table, links) |>
mutate(Specials = paste0(url, value)) |>
select(-c(`CCD SIPS Specials`, value))
}
We can use the getTable()
function and the purrr::map_df()
function to harvest the table of restaurants/bars from pages 2 and 3.
Then we can combine all the data frames together and saved the complete data frame as an .Rds
object so that we won’t have to scrape the data again.
Name | Address | Phone | Specials |
---|---|---|---|
1028 Yamitsuki Sushi & Ramen | 1028 Arch Street, Philadelphia, PA 19107 | 215.629.3888 | https://centercityphila.org/explore-center-city/ccd-sips/sips-list-view?page=1#1028-yamitsuki-sushi-ramen |
1225 Raw Sushi and Sake Lounge | 1225 Sansom St, Philadelphia, PA 19102 | 215.238.1903 | https://centercityphila.org/explore-center-city/ccd-sips/sips-list-view?page=1#1225-raw-sushi-and-sake-lounge |
1518 Bar and Grill | 1518 Sansom St, Philadelphia, PA 19102 | 267.639.6851 | https://centercityphila.org/explore-center-city/ccd-sips/sips-list-view?page=1#1518-bar-and-grill |
Air Grille Garden at Dilworth Park | 1 S 15th St, Philadelphia, PA 19102 | 215.587.2761 | https://centercityphila.org/explore-center-city/ccd-sips/sips-list-view?page=1#air-grill-garden-dilworth-park |
Aki Nom Nom | 1210 Walnut St, Philadelphia, PA 19107 | 215.985.1838 | https://centercityphila.org/explore-center-city/ccd-sips/sips-list-view?page=1#aki-nom-nom |
ArtBar | 1800 Market St, Philadelphia, PA 19103 | 215.825.6723 | https://centercityphila.org/explore-center-city/ccd-sips/sips-list-view?page=1#artbar |
Geocoding addresses
The next step is to use geocoding to convert the restaurant/bar addresses to geographical coordinates (longitude and latitude) that we can map. We can use the tidygeocoder package to help us, and specify that we want to use the ArcGIS geocoding service.
# geocode addresses
specials <-
table |>
geocode(address = Address,
method = 'arcgis',
long = Longitude,
lat = Latitude)
Passing 60 addresses to the ArcGIS single address geocoder
Query completed in: 27.2 seconds
Name | Address | Phone | Specials | Latitude | Longitude |
---|---|---|---|---|---|
1028 Yamitsuki Sushi & Ramen | 1028 Arch Street, Philadelphia, PA 19107 | 215.629.3888 | https://centercityphila.org/explore-center-city/ccd-sips/sips-list-view?page=1#1028-yamitsuki-sushi-ramen | 39.95342 | -75.15750 |
1225 Raw Sushi and Sake Lounge | 1225 Sansom St, Philadelphia, PA 19102 | 215.238.1903 | https://centercityphila.org/explore-center-city/ccd-sips/sips-list-view?page=1#1225-raw-sushi-and-sake-lounge | 39.94976 | -75.16089 |
1518 Bar and Grill | 1518 Sansom St, Philadelphia, PA 19102 | 267.639.6851 | https://centercityphila.org/explore-center-city/ccd-sips/sips-list-view?page=1#1518-bar-and-grill | 39.95024 | -75.16664 |
Air Grille Garden at Dilworth Park | 1 S 15th St, Philadelphia, PA 19102 | 215.587.2761 | https://centercityphila.org/explore-center-city/ccd-sips/sips-list-view?page=1#air-grill-garden-dilworth-park | 39.95253 | -75.16515 |
Aki Nom Nom | 1210 Walnut St, Philadelphia, PA 19107 | 215.985.1838 | https://centercityphila.org/explore-center-city/ccd-sips/sips-list-view?page=1#aki-nom-nom | 39.94874 | -75.16115 |
ArtBar | 1800 Market St, Philadelphia, PA 19103 | 215.825.6723 | https://centercityphila.org/explore-center-city/ccd-sips/sips-list-view?page=1#artbar | 39.95286 | -75.17046 |
Make sure to save the new data frame with geographical coordinates as an .Rds
object so you won’t have to geocode the data again! This is particularly important if you ever want to work with a large project.
Building the map
To build the map, we can use the leaflet package.
Plotting the restaurants/bars
leaflet(data = specials,
options = tileOptions(minZoom = 15,
maxZoom = 19)) |>
# add map markers
addCircles(
lat = ~ specials$Latitude,
lng = ~ specials$Longitude,
popup = specials$Address,
label = ~ Name,
# customize labels
labelOptions = labelOptions(
style = list(
"font-family" = "Red Hat Text, sans-serif",
"font-size" = "1.2em")
)
)
Adding the map background
leaflet(data = specials,
options = tileOptions(minZoom = 15,
maxZoom = 19)) |>
# add map markers
addCircles(
lat = ~ specials$Latitude,
lng = ~ specials$Longitude,
popup = specials$Address,
label = ~ Name,
# customize labels
labelOptions = labelOptions(
style = list(
"font-family" = "Red Hat Text, sans-serif",
"font-size" = "1.2em")
)
) |>
# add map tiles in the background
addProviderTiles(providers$CartoDB.Positron)
Setting the map view
leaflet(data = specials,
options = tileOptions(minZoom = 15,
maxZoom = 19)) |>
# add map markers
addCircles(
lat = ~ specials$Latitude,
lng = ~ specials$Longitude,
popup = specials$Address,
label = ~ Name,
# customize labels
labelOptions = labelOptions(
style = list(
"font-family" = "Red Hat Text, sans-serif",
"font-size" = "1.2em")
)
) |>
# add map tiles in the background
addProviderTiles(providers$CartoDB.Positron) |>
# set the map view
setView(mean(specials$Longitude),
mean(specials$Latitude),
zoom = 16)
Adding fullscreen control
leaflet(data = specials,
options = tileOptions(minZoom = 15,
maxZoom = 19)) |>
# add map markers
addCircles(
lat = ~ specials$Latitude,
lng = ~ specials$Longitude,
popup = specials$Address,
label = ~ Name,
# customize labels
labelOptions = labelOptions(
style = list(
"font-family" = "Red Hat Text, sans-serif",
"font-size" = "1.2em")
)
) |>
# add map tiles in the background
addProviderTiles(providers$CartoDB.Positron) |>
# set the map view
setView(mean(specials$Longitude),
mean(specials$Latitude),
zoom = 16) |>
# add fullscreen control button
leaflet.extras::addFullscreenControl()
Customizing map markers
# style pop-ups for the map with inline css styling
# marker for the restaurants/bars
popInfoCircles <- paste(
"<h2 style='font-family: Red Hat Text, sans-serif; font-size: 1.6em; color:#43464C;'>",
"<a style='color: #00857A;' href=", specials$Specials, ">", specials$Name, "</a></h2>",
"<p style='font-family: Red Hat Text, sans-serif; font-weight: normal; font-size: 1.5em; color:#9197A6;'>", specials$Address, "</p>"
)
leaflet(data = specials,
options = tileOptions(minZoom = 15,
maxZoom = 19)) |>
# add map markers
addCircles(
lat = ~ specials$Latitude,
lng = ~ specials$Longitude,
# customize markers
fillColor = "#009E91",
fillOpacity = 0.6,
stroke = F,
radius = 12,
# customize pop-ups
popup = popInfoCircles,
label = ~ Name,
# customize labels
labelOptions = labelOptions(
style = list(
"font-family" = "Red Hat Text, sans-serif",
"font-size" = "1.2em")
)
) |>
# add map tiles in the background
addProviderTiles(providers$CartoDB.Positron) |>
# set the map view
setView(mean(specials$Longitude),
mean(specials$Latitude),
zoom = 16) |>
# add fullscreen control button
leaflet.extras::addFullscreenControl()
Adding a marker at the center
# marker for the center of the map
popInfoMarker <- paste(
"<h1 style='padding-top: 0.5em; margin-top: 1em; margin-bottom: 0.5em; font-family: Red Hat Text, sans-serif; font-size: 1.8em; color:#43464C;'>",
"<a style='color: #00857A;' href='https://centercityphila.org/explore-center-city/ccdsips'>",
"Center City District Sips 2022",
"</a></h1><p style='color:#9197A6; font-family: Red Hat Text, sans-serif; font-size: 1.5em; padding-bottom: 1em;'>",
"Philadelphia, PA", "</p>")
# custom icon for the center of the map
centerIcon <-
makeAwesomeIcon(
icon = "map-pin",
iconColor = "#FFFFFF",
markerColor = "darkblue", # accepts HTML colors
library = "fa"
)
leaflet(data = specials,
options = tileOptions(minZoom = 15,
maxZoom = 19)) |>
# add map markers
addCircles(
lat = ~ specials$Latitude,
lng = ~ specials$Longitude,
# customize markers
fillColor = "#009E91",
fillOpacity = 0.6,
stroke = F,
radius = 12,
# customize pop-ups
popup = popInfoCircles,
label = ~ Name,
# customize labels
labelOptions = labelOptions(
style = list(
"font-family" = "Red Hat Text, sans-serif",
"font-size" = "1.2em")
)
) |>
# add map tiles in the background
addProviderTiles(providers$CartoDB.Positron) |>
# set the map view
setView(mean(specials$Longitude),
mean(specials$Latitude),
zoom = 16) |>
# add fullscreen control button
leaflet.extras::addFullscreenControl() |>
# add marker at the center
addAwesomeMarkers(
icon = centerIcon,
lng = mean(specials$Longitude),
lat = mean(specials$Latitude),
label = "Center City District Sips 2022",
# customize labels
labelOptions = labelOptions(
style = list(
"font-family" = "Red Hat Text, sans-serif",
"font-size" = "1.2em")
),
popup = popInfoMarker,
popupOptions = popupOptions(maxWidth = 250))