We would like to create some maps of fire hydrant violations by borough. We also want to investigate whether we can find hydrants that are most frequently ticketed.
After trying several different geolocating methods, we found that NYCGeoSearch provides the best platform to obtain geographical coordinates (latitude, longitude) from the street addresses we were provided. This platform is advantageous as we can manipulate the search terms by altering the website’s URL. After passing the website into the fromJSON()
function from the jsonlite
package, we can scrape the page for the data we are interested in (coordinates and borough).
hydrant <-
violation %>%
filter(violation %in% c("FIRE HYDRANT"), !is.na(geo_nyc_address))
pb <- progress_bar$new(total = nrow(hydrant)) # adding progress bar for sanity
get_coord <- function(url) {
pb$tick()
json_output <- fromJSON(url(url))$features
coord <- json_output$geometry[2]
borough <- json_output$properties$borough
out_df <-
tibble(
coordinates = coord$coord,
mapped_borough = borough
)
return(out_df)
}
hydrant_lat_long <-
hydrant %>%
mutate(
new_url = paste0("https://geosearch.planninglabs.nyc/v1/search?text=", geo_nyc_address, "&size=25"),
coord = map(new_url, get_coord)
) %>%
unnest(coord) %>%
filter(mapped_borough == borough)
The address information provided in our raw data is not very complete and not standardized. as some observations are missing address components (such as house number or street name) and some observations have address components that are not recognizable by the platform (i.e. “30 feet west of Broadway”). As such, the platform produces many possible coordinates for some addresses while not being able to find any coordinates for other observations.
We will work with only the observations that NYCGeoSearch was able to find location coordinates for. We will further restrict this dataset to only include coordinate results that were in the same borough as reported on the citation, and then select just the first coordinate results, if multiple exist.
Finally, we will process the resulting list column through a for
loop to extract the latitude and longitudes from each coordinate to their own columns (named lat
and long
).
hydrant_lat_long <-
hydrant_lat_long %>%
group_by(summons_number, geo_nyc_address) %>%
slice(1)
for (i in 1:nrow(hydrant_lat_long)) {
hydrant_lat_long$lat[i] <- hydrant_lat_long$coordinates[[i]][2]
hydrant_lat_long$long[i] <- hydrant_lat_long$coordinates[[i]][1]
}
Running through all previous code chunks up until here takes approximately 7 hours. We will go ahead and save the output from running this code in hydrant_lat_long.csv
so that we can read the data from this file into R, instead of running the previous chunks.
write_csv(hydrant_lat_long, "hydrant_lat_long.csv")
We will also load a dataset containing locations of all fire hydrants in NYC, from a file named Hydrants.csv
. After tidying and recoding some of the variables, we end up with a dataset where each record corresponds with a single fire hydrant. This dataset will contain the variables borough
(borough the fire hydrant is located in), unitid
(hydrant identifier), and lat
and long
(hydrant lat/long coordinates).
hydrant_lat_long <- read_csv("hydrant_lat_long.csv") %>% select(-coordinates)
hydrant_lat_long <- hydrant_lat_long %>% mutate(textlab = paste0("Summons Number: ", summons_number, "\nFine Amount: $", fine_amount))
hydrant_actual <- read_csv("Hydrants.csv")
hydrant_actual <-
hydrant_actual %>%
mutate(borough = case_when(
BORO == 1 ~ "Manhattan",
BORO == 2 ~ "Bronx",
BORO == 3 ~ "Brooklyn",
BORO == 4 ~ "Queens",
BORO == 5 ~ "Staten Island"
)) %>%
rename(
lat = LATITUDE,
long = LONGITUDE,
unitid = UNITID
)
We will now plot all the points that we have produced to get a sense of how the fire hydrants are distributed across NYC and where fire hydrant violations occurred in the city.
hydrant_violation_plot <-
hydrant_lat_long %>%
plot_ly(
lat = ~lat,
lon = ~long,
type = "scattermapbox",
mode = "markers",
alpha = 0.2,
color = ~borough) %>%
layout(
mapbox = list(
style = 'carto-positron',
zoom = 9,
center = list(lon = -73.9, lat = 40.7)))
hydrant_violation_location_plot <-
hydrant_violation_plot %>%
add_trace(
data = hydrant_actual %>% mutate(borough = "Fire Hydrant"),
lat = ~lat,
lon = ~long,
type = "scattermapbox",
mode = "markers",
alpha = 0.02,
color = ~borough
) %>%
layout(
mapbox = list(
style = 'carto-positron',
zoom = 9,
center = list(lon = -73.9, lat = 40.7)),
title = "<b> Parking Violation and Hydrant Location </b>",
legend = list(title = list(text = "Borough of Violation or Hydrant", size = 9),
orientation = "h",
font = list(size = 9)))
hydrant_violation_location_plot
This map produces points as expected. We see there are more fire hydrant violations in higher-density areas, particularly regions closer to (or in) Manhattan. We see a similar pattern in the distribution of fire hydrants throughout NYC. If we look a little closer at Manhattan, we can observe some distinct regions of higher fire hydrant violations throughout the borough – notably, around SoHo/downtown, Midtown, around the perimeter of Central Park, and around the Upper East Side.
Finally, we want to find the hydrants that are most commonly ticketed.
According to NYC regulations, cars (or, drivers) are ticketed if they park within 15 feet of a hydrant. We will try to associate violation location points with the fire hydrant that caused the violation by constructing hitboxes of approximately 15 feet in radius around each hydrant and see how many violation points are within those boxes. We will double this radius to somewhat adjust for a lack of precision in the violation coordinates we have obtained. Towards this end, we will construct square hitboxes around our hydrant points such that each side of the square is 60 feet wide.
We will use the function destPoint()
from the package geosphere
to calculate this. This function takes a latitude and longitude coordinate, a bearing (measured from North), and a distance in meters. We will use this function to obtain the coordinates of the vertices of a square centered around the position of the fire hydrant, 35 feet (10 meters) in radius (measured from center to edge).
After we have done so, we will utilize a few functions from the sp
package. First, we will use SpatialPolygons
(and associated function Polygons
) to construct the squares around each hydrant. Then, the function over
will tell us which violation location points land within the hydrant hitboxes we have created.
# distance in meters
d = 10
square_coord <- function(lat, long, dist = 5){
point_init <- destPoint(c(long, lat), b = 0, d = dist)
point1 <- destPoint(point_init, b = 270, d = dist) %>% as.data.frame()
point2 <- destPoint(point1, b = 180, d = dist*2) %>% as.data.frame()
point3 <- destPoint(point2, b = 90, d = dist*2) %>% as.data.frame()
point4 <- destPoint(point3, b = 0, d = dist*2) %>% as.data.frame()
sq <- bind_rows(point1, point2, point3, point4, point1)
x <- sq %>% pull(lat)
y <- sq %>% pull(lon)
out_mat <- cbind(x, y)
return(out_mat)
}
to_poly <- function(polymat, id){
poly <- Polygons(list(Polygon(polymat)), ID = id)
return(poly)
}
first_step <- hydrant_actual %>%
mutate(
sq = map2(.x = lat, .y = long, ~square_coord(lat = .x, long = .y, dist = d)),
polys = map2(sq, unitid, to_poly)
) %>%
select(polys) %>%
pull(polys)
squares <- SpatialPolygons(first_step)
# now, get points ready
x = hydrant_lat_long %>% pull(lat)
y = hydrant_lat_long %>% pull(long)
xy = cbind(x,y)
dimnames(xy)[[1]] = hydrant_lat_long %>% pull(summons_number)
pts = SpatialPoints(xy)
# check if drawn squares have points in them
hits <- over(squares, pts, returnList = TRUE)
hit_hydrants <- tibble(id = names(hits), hits = hits)
hit_hydrants1 <- hit_hydrants %>% unnest(hits)
hit_hydrants2 <- hit_hydrants1 %>% group_by(id) %>% summarise(num_hits = n()) %>% arrange(-num_hits)
isolated_hydrants <- inner_join(hydrant_actual, hit_hydrants2, by = c("unitid" = "id"))
We see that there are 837 hydrants. We will first reverse geocode the hydrant coordinates to allow the output to be more useful to the human user. We will use the revgeo
function from the revgeo
package and use Google’s reverse geocode API to get the addresses. Since using Google’s API is not exactly free, we will perform this once and save the addresses in a CSV file.
hydrant_loc <- revgeo(isolated_hydrants %>% pull(long), isolated_hydrants %>% pull(lat), provider = "google", API = "API") # API key has been removed
address <- hydrant_loc %>% unlist()
hydrant_loc <- bind_cols(isolated_hydrants, address = address)
write_csv(hydrant_loc, "isolated_hydrant_loc.csv")
Let’s now create our plot!
isolated_hydrants <- read_csv("isolated_hydrant_loc.csv")
isolated_hydrants_plot <-
isolated_hydrants %>%
mutate(
borough = "Hydrant",
textlab = paste0("Number of Associated Violations: ", num_hits, "\nAddress: ", address)
)
hydrant_plot <-
hydrant_lat_long %>%
plot_ly(
lat = ~lat,
lon = ~long,
type = "scattermapbox",
mode = "markers",
alpha = 0.01,
color = ~borough,
text = ~textlab,
colors = "viridis")
plot <- hydrant_plot %>% add_trace(
data = isolated_hydrants_plot,
lat = ~lat,
lon = ~long,
type = "scattermapbox",
mode = "markers",
size = ~num_hits,
text = ~textlab,
color = ~borough,
marker = list(
color = "red"
),
alpha = 0.5
)
plot %>% layout(
mapbox = list(
style = 'carto-positron',
zoom = 9,
center = list(lon = -73.9, lat = 40.7)),
title = "<b> Most Frequently Ticketed Hydrants</b>",
legend = list(title = list(text = "Borough of Violation or Hydrant", size = 9),
orientation = "h",
font = list(size = 9)))
Interestingly, it seems that the hydrant whose hitbox intersected the most violation points is located in Queens. The following table shows the ten most frequently ticketed hydrants:
hydrants_table <-
isolated_hydrants %>%
arrange(-num_hits) %>%
filter(row_number() <= 10) %>%
select(unitid, address, num_hits) %>%
rename(
c("Hydrant ID" = "unitid", "Hydrant Address" = "address", "Number of Tickets" = "num_hits")
)
hydrants_table %>% knitr::kable()
Hydrant ID | Hydrant Address | Number of Tickets |
---|---|---|
H301940 | 2 Atlantic Ave., Pier 7, Brooklyn, NY 11201, USA | 40 |
H202171 | 522 Morris Ave, Bronx, NY 10451, USA | 34 |
H311717 | 150 Graham Ave, Brooklyn, NY 11206, USA | 26 |
H107599 | 25 James St, New York, NY 10038, USA | 21 |
H420999 | 197-50A Peck Ave, Fresh Meadows, NY 11365, USA | 16 |
H205952 | 530 E 144th St, Bronx, NY 10454, USA | 15 |
H203269 | 1468 College Ave, Bronx, NY 10457, USA | 15 |
H101773 | Minetta Green, S/e Corner Minetta Lane and, 6th Ave, New York, NY 10012, USA | 14 |
H332046 | 8 Old Fulton St, Brooklyn, NY 11201, USA | 14 |
H326869 | 11 Schenck Ct, Brooklyn, NY 11207, USA | 13 |
It is important to note that the above spatial analysis will only be as precise as the geocoded violation coordinates will be.