Lesson 5 Analyze Spatial: sf
5.1 Overview
Questions
- How to elegantly conduct complex spatial analysis by chaining operations?
- What is the percent area of water by region across the United States?
Objectives
- Use the
%>%
operator (aka “then” or “pipe”) to pass output from one function into input of the next. - Calculate metrics on spatial attributes.
- Aggregate spatial data with metrics.
- Display a map of results.
5.2 Prerequisites
R Skill Level: Intermediate - you’ve got basics of R down.
You will use the sf
package for vector data along with the dplyr
package for calculating and manipulating attribute data.
5.3 States: read and plot
Similar to Lesson 9: Handling Spatial Projection & CRS in R, we’ll start by reading in a polygon shapefile using the sf
package. Then use the default plot()
function to see what it looks like.
## here() starts at /Users/bbest/github/nps-r-workshop
states_shp <- here("data/neon-us-boundary/US-State-Boundaries-Census-2014.shp")
# read in states
states <- read_sf(states_shp)
# plot the states
plot(states)
## Warning: plotting the first 9 out of 10 attributes; use max.plot = 10 to
## plot all
Notice the default plot on sf
objects outputs colorized values of the first 9 of 10 columns. Use the suggestion from the warning to plot the 10th column.
## [1] "STATEFP" "STATENS" "AFFGEOID" "GEOID" "STUSPS" "NAME"
## [7] "LSAD" "ALAND" "AWATER" "region" "geometry"
## Observations: 58
## Variables: 11
## $ STATEFP <chr> "06", "11", "12", "13", "16", "17", "19", "21", "22",...
## $ STATENS <chr> "01779778", "01702382", "00294478", "01705317", "0177...
## $ AFFGEOID <chr> "0400000US06", "0400000US11", "0400000US12", "0400000...
## $ GEOID <chr> "06", "11", "12", "13", "16", "17", "19", "21", "22",...
## $ STUSPS <chr> "CA", "DC", "FL", "GA", "ID", "IL", "IA", "KY", "LA",...
## $ NAME <chr> "California", "District of Columbia", "Florida", "Geo...
## $ LSAD <chr> "00", "00", "00", "00", "00", "00", "00", "00", "00",...
## $ ALAND <dbl> 403483823181, 158350578, 138903200855, 148963503399, ...
## $ AWATER <dbl> 20483271881, 18633500, 31407883551, 4947080103, 23977...
## $ region <chr> "West", "Northeast", "Southeast", "Southeast", "West"...
## $ geometry <MULTIPOLYGON [°]> MULTIPOLYGON Z (((-118.594 ..., MULTIPOL...
## Simple feature collection with 58 features and 10 fields
## geometry type: MULTIPOLYGON
## dimension: XYZ
## bbox: xmin: -124.7258 ymin: 24.49813 xmax: -66.9499 ymax: 49.38436
## epsg (SRID): 4326
## proj4string: +proj=longlat +datum=WGS84 +no_defs
## # A tibble: 58 x 11
## STATEFP STATENS AFFGEOID GEOID STUSPS NAME LSAD ALAND AWATER region
## <chr> <chr> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <chr>
## 1 06 017797… 0400000… 06 CA Cali… 00 4.03e11 2.05e10 West
## 2 11 017023… 0400000… 11 DC Dist… 00 1.58e 8 1.86e 7 North…
## 3 12 002944… 0400000… 12 FL Flor… 00 1.39e11 3.14e10 South…
## 4 13 017053… 0400000… 13 GA Geor… 00 1.49e11 4.95e 9 South…
## 5 16 017797… 0400000… 16 ID Idaho 00 2.14e11 2.40e 9 West
## 6 17 017797… 0400000… 17 IL Illi… 00 1.44e11 6.20e 9 Midwe…
## 7 19 017797… 0400000… 19 IA Iowa 00 1.45e11 1.08e 9 Midwe…
## 8 21 017797… 0400000… 21 KY Kent… 00 1.02e11 2.39e 9 South…
## 9 22 016295… 0400000… 22 LA Loui… 00 1.12e11 2.38e10 South…
## 10 24 017149… 0400000… 24 MD Mary… 00 2.51e10 6.98e 9 North…
## # ... with 48 more rows, and 1 more variable: geometry <MULTIPOLYGON [°]>
## [1] "STATEFP" "STATENS" "AFFGEOID" "GEOID" "STUSPS" "NAME"
## [7] "LSAD" "ALAND" "AWATER" "region" "geometry"
## [1] "sf" "tbl_df" "tbl" "data.frame"
The class of the states
object is both a simple feature (sf
) as well as a data frame, which means the many useful functions available to a data frame (or “tibble”) can be applied.
To plot the column of interest, feed the “slice” of that column to the plot()
function.
Question: To motivate the spatial analysis for the rest of this lesson, you will answer this question: “What is the percent water by region?”
5.4 Challenge: analytical steps?
Outline a sequence of analytical steps needed to arrive at the answer.
5.4.1 Answers
- Sum the area of water (
AWATER
) and land (ALAND
) per region. - Divide the area of water (
AWATER
) by the area of land (ALAND
) per region to arrive at percent water. - Show table of regions sorted by percent water.
- Show map of regions by percent water with a color ramp and legend.
5.5 Regions: calculate % water
- Use the
%>%
operator (aka “then” or “pipe”) to pass output from one function into input of the next.- In RStudio, see menu Help > Keyboard Shortcuts Help for a shortcut to the “Insert Pipe Operator”.
- Calculate metrics on spatial attributes.
- In RStudio, see menu Help > Cheatsheets > Data Manipulation with dplyr, tidyr.
- Aggregate spatial data with metrics.
regions = states %>%
group_by(region) %>%
summarize(
water = sum(AWATER),
land = sum(ALAND)) %>%
mutate(
pct_water = water / land * 100 %>% round(2))
# object
regions
## Simple feature collection with 5 features and 4 fields
## geometry type: GEOMETRY
## dimension: XYZ
## bbox: xmin: -124.7258 ymin: 24.49813 xmax: -66.9499 ymax: 49.38436
## epsg (SRID): 4326
## proj4string: +proj=longlat +datum=WGS84 +no_defs
## # A tibble: 5 x 5
## region water land pct_water geometry
## <chr> <dbl> <dbl> <dbl> <GEOMETRY [°]>
## 1 Midwest 1.84e11 1.94e12 9.49 MULTIPOLYGON Z (((-82.86334 41.693…
## 2 Northe… 1.09e11 8.69e11 12.5 MULTIPOLYGON Z (((-76.04621 38.025…
## 3 Southe… 1.04e11 1.36e12 7.61 MULTIPOLYGON Z (((-81.81169 24.568…
## 4 Southw… 2.42e10 1.46e12 1.66 POLYGON Z ((-94.48587 33.63787 0, …
## 5 West 5.76e10 2.43e12 2.37 MULTIPOLYGON Z (((-118.594 33.0359…
Notice the geometry in the column. To remove the geometry column pipe to st_set_geometry(NULL)
. To arrange in descending order use arrange(desc(pct_water))
.
## # A tibble: 5 x 4
## region water land pct_water
## <chr> <dbl> <dbl> <dbl>
## 1 Northeast 108922434345 869066138232 12.5
## 2 Midwest 184383393833 1943869253244 9.49
## 3 Southeast 103876652998 1364632039655 7.61
## 4 West 57568049509 2432336444730 2.37
## 5 Southwest 24217682268 1462631530997 1.66
5.7 Regions: ggplot
The ggplot2
library can visualise sf objects.
- In RStudio, see menu Help > Cheatsheets > Data Visualization with ggplot2.
5.8 Regions: recalculate area
So far you’ve used the ALAND
column for area of the state. But what if you were not provided the area and needed to calculate it? Because the states
are in geographic coordinates, you’ll need to either transform to an equal area projection and calculate area, or use geodesic calculations. Thankfully, the sf
library provides area calculations with the st_area()
and uses the geosphere::distGeo()
to perform geodesic calculations (ie trigonometric calculation accounting for the spheroid nature of the earth). Since the states
data has the unusual aspect of a z dimension, you’ll need to first remove that with the st_zm()
function.
library(geosphere)
library(units)
regions = states %>%
mutate(
water_m2 = AWATER %>% set_units(m^2),
land_m2 = geometry %>% st_zm() %>% st_area()) %>%
group_by(region) %>%
summarize(
water_m2 = sum(water_m2),
land_m2 = sum(land_m2)) %>%
mutate(
pct_water = water_m2 / land_m2)
# table
regions %>%
st_set_geometry(NULL) %>%
arrange(desc(pct_water))
## # A tibble: 5 x 4
## region water_m2 land_m2 pct_water
## <chr> [m^2] [m^2] [1]
## 1 Northeast 108922434345 9.117041e+11 0.11947126
## 2 Midwest 184383393833 1.987268e+12 0.09278233
## 3 Southeast 103876652998 1.427079e+12 0.07278971
## 4 West 57568049509 2.467170e+12 0.02333363
## 5 Southwest 24217682268 1.483765e+12 0.01632178
5.9 Challenge: project & recalculate area
Use st_transform()
with a USA Contiguous Albers Equal Area Conic Projection that minimizes distoration, and then calculate area using the st_area()
function.
5.9.1 Answers
library(geosphere)
library(units)
# Proj4 of http://spatialreference.org/ref/esri/usa-contiguous-albers-equal-area-conic/
crs_usa = '+proj=aea +lat_1=29.5 +lat_2=45.5 +lat_0=37.5 +lon_0=-96 +x_0=0 +y_0=0 +ellps=GRS80 +datum=NAD83 +units=m +no_defs'
regions = states %>%
st_transform(crs_usa) %>%
mutate(
water_m2 = AWATER %>% set_units(m^2),
land_m2 = geometry %>% st_zm() %>% st_area()) %>%
group_by(region) %>%
summarize(
water_m2 = sum(water_m2),
land_m2 = sum(land_m2)) %>%
mutate(
pct_water = water_m2 / land_m2)
# table
regions %>%
st_set_geometry(NULL) %>%
arrange(desc(pct_water))
## # A tibble: 5 x 4
## region water_m2 land_m2 pct_water
## <chr> [m^2] [m^2] [1]
## 1 Northeast 108922434345 9.117031e+11 0.11947138
## 2 Midwest 184383393833 1.987266e+12 0.09278246
## 3 Southeast 103876652998 1.427078e+12 0.07278973
## 4 West 57568049509 2.467167e+12 0.02333367
## 5 Southwest 24217682268 1.483758e+12 0.01632185
5.10 Key Points
- The
sf
package can take advantage of chaining spatial operations using the%>%
operator. - Data manipulation functions in
dplyr
such asgroup_by()
,summarize()
andmutate()
work onsf
objects. - Area can be calculated a variety of ways. Geodesic is preferred if starting with geographic coordinates (vs projected).