Due: noon Wed, Jan 28 2015 via GauchoSpace

# Introduction

In this lab, you’ll explore physical and climatic agents that determine vegetation types. In lab 1, you qualitatively assessed that the evergreen land cover type associated with steep, northern exposures tend to recieve less sun and are therefore moister and cooler. Now we’ll quantiatively determine these physical-biotic associations using a more detailed land cover on vegetation available through USGS for the Santa Ynez area of Santa Barbara County with additional data layers prepared by Frank Davis per Davis & Dozier (1990). We’ll determine these associations by fitting a recursive partitioning model.

# Start

To get started:

1. Download *lab3_agents.7z** from GauchoSpace into your course home directory (eg H:\esm215). Right-click on the file -> 7-Zip -> Extract Here. Navigate into the newly expanded lab3_agents folder and double-click on the ArcMap document lab3.mxd.

2. Download *lab3.Rmd** from GauchoSpace into the lab directory (ie H:\esm215\lab3_agents). Right-click on it -> Open with -> RStudio. Replace the working directory with your own (be sure to replace slashes.).

# Data

geol A subregion of the 1:250,000 scale geologic map of CA. 30 m raster. Legend is available at http://ngmdb.usgs.gov/ngm-bin/ILView.pl?sid=329_1.sid&vtype=b

soil A subregion of the ****1:24,000 scale soil survey map (SSURGO, http://soils.usda.gov/survey/geography/ssurgo/description.html. 30 m raster data. SSURGO maps are the most detailed soil survey maps available for most of the U.S. and are used extensively for landscape-scale analysis.

elev - 28m raster: Shuttle imaging radar topographic data. Values are elevations in meters above sea level.

slope****- 28 m raster: slope angle in degrees derived from synezdem

flow 28 m raster: flow accumulation model, derived from synezdem for a subregion corresponding to subsoil30. Pixel values are the drainage area for each pixel. (The data are noisy because errors in the dem propagate to disrupt drainage topology.)

sunw - 28 m the data raster: integrated clearsky shortwave radiation, units are watts/sq. m., for December-Feb.

suns - 28 m the data raster: annual ****clearsky shortwave radiation, units are watts/sq. m.

lc   - 30 m raster: 2010 vegetation/land cover map produced from thematic mapper satellite imagery. Actual vegetation mapped by USFS/TNC to Alliances (dominant overstory species or community types) according to NatureServe/US vegetation standard based on Thematic mapper imagery and elevation data.

In lab spend some time learning to display the data. Overlay individual layers and combinations on the air photo. Zoom in and out. Play with the symbology.

In particular, examine apparent land cover pattern (air photo) and vegetation pattern (SBColfireveg) in relationship to geology, soils, fire history and topographic factors like elevation, slope, radiation and flow accumulation.

## Quantitative analysis of an environmental database: Hierarchical environmental filters (or not).

What controls vegetation pattern in the Santa Ynez Hills and Valley subsection? Landscape theory posits that pattern could vary from one landscape to another and reflect local physical controls, disturbance history and plant dispersal processes.

Various techniques exist to quantify the relationship between vegetation pattern and environmental factors at different scales. Here you will learn a method known as mutual information analysis. Basically, you will measure the statistical association between vegetation types and each of a set of environmental variables.

# load necessary libraries having useful functions, suppressing startup messages and warnings
suppressPackageStartupMessages(suppressWarnings({
library(rpart)   # recursive partitioning
library(party)   # recursive partytioning
library(partykit)
}))

setwd('H:/esm215/lab3_agents')

# get lookup table matching hierarchical vegetative landcovers

# get stack of rasters
r = stack(
raster('rasters/lc.img'),
raster('rasters/elev.tif'),
raster('rasters/slope.tif'),
raster('rasters/aspect.tif'),
raster('rasters/suns.tif'),
raster('rasters/sunw.tif'),
raster('rasters/flow.tif'),
raster('rasters/geol.tif'))

# extract raster values into data frame
D = as.data.frame(getValues(r))
D$lc = factor(D$lc)
D$geol = factor(D$geol)

D = cbind(id=1:nrow(D), D)

# merge D with lookup table for landcovers
D = merge(D, lu, by.x='lc', by.y='VALUE', all.x=T)

# look at just the NLDC high level landcover for Forest & Woodland
nldc = 'Forest & Woodland'
table(subset(D, NVC_CLASS == nldc, lc))
##
##    39    40    41    42    45   165   183   277   278   282   296   297
## 22881  1121 64430  2057  3150   614    25  3319  6178     1     0     0
##   300   302   303   304   383   384   432   471   472   489   539   540
##     0     0     0     0     0     0     0     0     0     0     0     0
##   547   556   557   579   581   582   583   584
##     0     0     0     0     0     0     0     0
# fit model for recursive partitioning
mdl = rpart(lc ~ elev + slope + aspect + suns + sunw + flow,
method='class',
data=subset(D, NVC_CLASS == nldc))

# examine fitted model
print(mdl)
## n= 103776
##
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
##
## 1) root 103776 39350 41 (0.22 0.011 0.62 0.02 0.03 0.0059 0.00024 0.032 0.06 9.6e-06)
##   2) slope>=3.67 97330 34210 41 (0.23 0.0077 0.65 0.021 0.032 0.0059 0.00026 0.032 0.021 1e-05) *
##   3) slope< 3.67 6446  2297 278 (0.046 0.058 0.2 0.0028 0.0084 0.0057 0 0.033 0.64 0)
##     6) suns>=5.548e+05 1324   557 41 (0.15 0.046 0.58 0.0083 0.029 0.003 0 0.016 0.17 0) *
##     7) suns< 5.548e+05 5122  1192 278 (0.018 0.061 0.11 0.0014 0.0029 0.0064 0 0.038 0.77 0) *
summary(mdl)
## Call:
## rpart(formula = lc ~ elev + slope + aspect + suns + sunw + flow,
##     data = subset(D, NVC_CLASS == nldc), method = "class")
##   n= 103776
##
##        CP nsplit rel error xerror     xstd
## 1 0.07226      0    1.0000 1.0000 0.003972
## 2 0.01393      1    0.9277 0.9305 0.003912
## 3 0.01000      2    0.9138 0.9164 0.003898
##
## Variable importance
##  slope   suns   elev   sunw aspect
##     70     12     10      7      2
##
## Node number 1: 103776 observations,    complexity param=0.07226
##   predicted class=41   expected loss=0.3791  P(node) =1
##     class counts: 22881  1121 64430  2057  3150   614    25  3319  6178     1
##    probabilities: 0.220 0.011 0.621 0.020 0.030 0.006 0.000 0.032 0.060 0.000
##   left son=2 (97330 obs) right son=3 (6446 obs)
##   Primary splits:
##       slope  < 3.67   to the right, improve=3777.0, (0 missing)
##       elev   < 86.27  to the right, improve=3428.0, (0 missing)
##       sunw   < 154500 to the left,  improve=1838.0, (0 missing)
##       suns   < 511100 to the left,  improve=1600.0, (0 missing)
##       aspect < 64.99  to the left,  improve= 282.7, (0 missing)
##   Surrogate splits:
##       elev   < 85.87  to the right, agree=0.945, adj=0.115, (0 split)
##       aspect < -0.5   to the right, agree=0.940, adj=0.028, (0 split)
##       flow   < 2518   to the left,  agree=0.938, adj=0.000, (0 split)
##
## Node number 2: 97330 observations
##   predicted class=41   expected loss=0.3514  P(node) =0.9379
##     class counts: 22585   750 63124  2039  3096   577    25  3104  2029     1
##    probabilities: 0.232 0.008 0.649 0.021 0.032 0.006 0.000 0.032 0.021 0.000
##
## Node number 3: 6446 observations,    complexity param=0.01393
##   predicted class=278  expected loss=0.3563  P(node) =0.06211
##     class counts:   296   371  1306    18    54    37     0   215  4149     0
##    probabilities: 0.046 0.058 0.203 0.003 0.008 0.006 0.000 0.033 0.644 0.000
##   left son=6 (1324 obs) right son=7 (5122 obs)
##   Primary splits:
##       suns   < 554800 to the right, improve=638.0, (0 missing)
##       sunw   < 177800 to the right, improve=593.0, (0 missing)
##       elev   < 86     to the right, improve=578.0, (0 missing)
##       slope  < 1.958  to the right, improve=185.6, (0 missing)
##       aspect < 46.47  to the right, improve=105.6, (0 missing)
##   Surrogate splits:
##       sunw < 179900 to the right, agree=0.911, adj=0.566, (0 split)
##       elev < 269.4  to the right, agree=0.830, adj=0.170, (0 split)
##
## Node number 6: 1324 observations
##   predicted class=41   expected loss=0.4207  P(node) =0.01276
##     class counts:   202    61   767    11    39     4     0    21   219     0
##    probabilities: 0.153 0.046 0.579 0.008 0.029 0.003 0.000 0.016 0.165 0.000
##
## Node number 7: 5122 observations
##   predicted class=278  expected loss=0.2327  P(node) =0.04936
##     class counts:    94   310   539     7    15    33     0   194  3930     0
##    probabilities: 0.018 0.061 0.105 0.001 0.003 0.006 0.000 0.038 0.767 0.000
# plot the model using the pretty partytioning class
mdl_p = as.party(mdl)
print(mdl_p)
##
## Model formula:
## lc ~ elev + slope + aspect + suns + sunw + flow
##
## Fitted party:
## [1] root
## |   [2] slope >= 3.67: 41 (n = 97330, err = 35%)
## |   [3] slope < 3.67
## |   |   [4] suns >= 554759.22: 41 (n = 1324, err = 42%)
## |   |   [5] suns < 554759.22: 278 (n = 5122, err = 23%)
##
## Number of inner nodes:    2
## Number of terminal nodes: 3
plot(mdl_p)

# predict landcover classes
D_nldc = subset(D, D$NVC_CLASS == nldc) D_nldc$lc_pred = predict(mdl, D_nldc, type='vector')
D_nldc$lc_pred = levels(D_nldc$lc)[D_nldc$lc_pred] # examine matches table(D_nldc[,c('lc','lc_pred')]) ## lc_pred ## lc 278 41 ## 39 94 22787 ## 40 310 811 ## 41 539 63891 ## 42 7 2050 ## 45 15 3135 ## 165 33 581 ## 183 0 25 ## 277 194 3125 ## 278 3930 2248 ## 282 0 1 ## 296 0 0 ## 297 0 0 ## 300 0 0 ## 302 0 0 ## 303 0 0 ## 304 0 0 ## 383 0 0 ## 384 0 0 ## 432 0 0 ## 471 0 0 ## 472 0 0 ## 489 0 0 ## 539 0 0 ## 540 0 0 ## 547 0 0 ## 556 0 0 ## 557 0 0 ## 579 0 0 ## 581 0 0 ## 582 0 0 ## 583 0 0 ## 584 0 0 # turn into raster D_nldc$lc_pred = as.integer(as.character(D_nldc\$lc_pred))
#D = merge(D, D_nldc[,c('id','lc_pred')], by.x='id', by.y='id', all.x=T)
#...