Lesson 5 Analyze Spatial: sf

5.1 Overview

Questions

  • How to elegantly conduct complex spatial analysis by chaining operations?
  • What is the percent area of water by region across the United States?

Objectives

  • Use the %>% operator (aka “then” or “pipe”) to pass output from one function into input of the next.
  • Calculate metrics on spatial attributes.
  • Aggregate spatial data with metrics.
  • Display a map of results.

5.2 Prerequisites

R Skill Level: Intermediate - you’ve got basics of R down.

You will use the sf package for vector data along with the dplyr package for calculating and manipulating attribute data.

5.3 States: read and plot

Similar to Lesson 9: Handling Spatial Projection & CRS in R, we’ll start by reading in a polygon shapefile using the sf package. Then use the default plot() function to see what it looks like.

## here() starts at /Users/bbest/github/nps-r-workshop
## Warning: plotting the first 9 out of 10 attributes; use max.plot = 10 to
## plot all

Notice the default plot on sf objects outputs colorized values of the first 9 of 10 columns. Use the suggestion from the warning to plot the 10th column.

##  [1] "STATEFP"  "STATENS"  "AFFGEOID" "GEOID"    "STUSPS"   "NAME"    
##  [7] "LSAD"     "ALAND"    "AWATER"   "region"   "geometry"
## Observations: 58
## Variables: 11
## $ STATEFP  <chr> "06", "11", "12", "13", "16", "17", "19", "21", "22",...
## $ STATENS  <chr> "01779778", "01702382", "00294478", "01705317", "0177...
## $ AFFGEOID <chr> "0400000US06", "0400000US11", "0400000US12", "0400000...
## $ GEOID    <chr> "06", "11", "12", "13", "16", "17", "19", "21", "22",...
## $ STUSPS   <chr> "CA", "DC", "FL", "GA", "ID", "IL", "IA", "KY", "LA",...
## $ NAME     <chr> "California", "District of Columbia", "Florida", "Geo...
## $ LSAD     <chr> "00", "00", "00", "00", "00", "00", "00", "00", "00",...
## $ ALAND    <dbl> 403483823181, 158350578, 138903200855, 148963503399, ...
## $ AWATER   <dbl> 20483271881, 18633500, 31407883551, 4947080103, 23977...
## $ region   <chr> "West", "Northeast", "Southeast", "Southeast", "West"...
## $ geometry <MULTIPOLYGON [°]> MULTIPOLYGON Z (((-118.594 ..., MULTIPOL...
## Simple feature collection with 58 features and 10 fields
## geometry type:  MULTIPOLYGON
## dimension:      XYZ
## bbox:           xmin: -124.7258 ymin: 24.49813 xmax: -66.9499 ymax: 49.38436
## epsg (SRID):    4326
## proj4string:    +proj=longlat +datum=WGS84 +no_defs
## # A tibble: 58 x 11
##    STATEFP STATENS AFFGEOID GEOID STUSPS NAME  LSAD    ALAND  AWATER region
##    <chr>   <chr>   <chr>    <chr> <chr>  <chr> <chr>   <dbl>   <dbl> <chr> 
##  1 06      017797… 0400000… 06    CA     Cali… 00    4.03e11 2.05e10 West  
##  2 11      017023… 0400000… 11    DC     Dist… 00    1.58e 8 1.86e 7 North…
##  3 12      002944… 0400000… 12    FL     Flor… 00    1.39e11 3.14e10 South…
##  4 13      017053… 0400000… 13    GA     Geor… 00    1.49e11 4.95e 9 South…
##  5 16      017797… 0400000… 16    ID     Idaho 00    2.14e11 2.40e 9 West  
##  6 17      017797… 0400000… 17    IL     Illi… 00    1.44e11 6.20e 9 Midwe…
##  7 19      017797… 0400000… 19    IA     Iowa  00    1.45e11 1.08e 9 Midwe…
##  8 21      017797… 0400000… 21    KY     Kent… 00    1.02e11 2.39e 9 South…
##  9 22      016295… 0400000… 22    LA     Loui… 00    1.12e11 2.38e10 South…
## 10 24      017149… 0400000… 24    MD     Mary… 00    2.51e10 6.98e 9 North…
## # ... with 48 more rows, and 1 more variable: geometry <MULTIPOLYGON [°]>
##  [1] "STATEFP"  "STATENS"  "AFFGEOID" "GEOID"    "STUSPS"   "NAME"    
##  [7] "LSAD"     "ALAND"    "AWATER"   "region"   "geometry"
## [1] "sf"         "tbl_df"     "tbl"        "data.frame"

The class of the states object is both a simple feature (sf) as well as a data frame, which means the many useful functions available to a data frame (or “tibble”) can be applied.

To plot the column of interest, feed the “slice” of that column to the plot() function.

Question: To motivate the spatial analysis for the rest of this lesson, you will answer this question: “What is the percent water by region?

5.4 Challenge: analytical steps?

Outline a sequence of analytical steps needed to arrive at the answer.

5.4.1 Answers

  1. Sum the area of water (AWATER) and land (ALAND) per region.
  2. Divide the area of water (AWATER) by the area of land (ALAND) per region to arrive at percent water.
  3. Show table of regions sorted by percent water.
  4. Show map of regions by percent water with a color ramp and legend.

5.5 Regions: calculate % water

  • Use the %>% operator (aka “then” or “pipe”) to pass output from one function into input of the next.
    • In RStudio, see menu Help > Keyboard Shortcuts Help for a shortcut to the “Insert Pipe Operator”.
  • Calculate metrics on spatial attributes.
  • Aggregate spatial data with metrics.
## Simple feature collection with 5 features and 4 fields
## geometry type:  GEOMETRY
## dimension:      XYZ
## bbox:           xmin: -124.7258 ymin: 24.49813 xmax: -66.9499 ymax: 49.38436
## epsg (SRID):    4326
## proj4string:    +proj=longlat +datum=WGS84 +no_defs
## # A tibble: 5 x 5
##   region      water     land pct_water                            geometry
##   <chr>       <dbl>    <dbl>     <dbl>                      <GEOMETRY [°]>
## 1 Midwest   1.84e11  1.94e12      9.49 MULTIPOLYGON Z (((-82.86334 41.693…
## 2 Northe…   1.09e11  8.69e11     12.5  MULTIPOLYGON Z (((-76.04621 38.025…
## 3 Southe…   1.04e11  1.36e12      7.61 MULTIPOLYGON Z (((-81.81169 24.568…
## 4 Southw…   2.42e10  1.46e12      1.66 POLYGON Z ((-94.48587 33.63787 0, …
## 5 West      5.76e10  2.43e12      2.37 MULTIPOLYGON Z (((-118.594 33.0359…

Notice the geometry in the column. To remove the geometry column pipe to st_set_geometry(NULL). To arrange in descending order use arrange(desc(pct_water)).

## # A tibble: 5 x 4
##   region           water          land pct_water
##   <chr>            <dbl>         <dbl>     <dbl>
## 1 Northeast 108922434345  869066138232     12.5 
## 2 Midwest   184383393833 1943869253244      9.49
## 3 Southeast 103876652998 1364632039655      7.61
## 4 West       57568049509 2432336444730      2.37
## 5 Southwest  24217682268 1462631530997      1.66

5.6 Regions: plot

Now plot the regions.

5.8 Regions: recalculate area

So far you’ve used the ALAND column for area of the state. But what if you were not provided the area and needed to calculate it? Because the states are in geographic coordinates, you’ll need to either transform to an equal area projection and calculate area, or use geodesic calculations. Thankfully, the sf library provides area calculations with the st_area() and uses the geosphere::distGeo() to perform geodesic calculations (ie trigonometric calculation accounting for the spheroid nature of the earth). Since the states data has the unusual aspect of a z dimension, you’ll need to first remove that with the st_zm() function.

## # A tibble: 5 x 4
##   region        water_m2      land_m2  pct_water
##   <chr>            [m^2]        [m^2]        [1]
## 1 Northeast 108922434345 9.117041e+11 0.11947126
## 2 Midwest   184383393833 1.987268e+12 0.09278233
## 3 Southeast 103876652998 1.427079e+12 0.07278971
## 4 West       57568049509 2.467170e+12 0.02333363
## 5 Southwest  24217682268 1.483765e+12 0.01632178

5.10 Key Points

  • The sf package can take advantage of chaining spatial operations using the %>% operator.
  • Data manipulation functions in dplyr such as group_by(), summarize() and mutate() work on sf objects.
  • Area can be calculated a variety of ways. Geodesic is preferred if starting with geographic coordinates (vs projected).