Short Introduction to Geostatistical and Spatial Data Analysis with GRASS 5.x and R statistical data language

[Table of contents ] DRAFT Document!

Part 2: Using GRASS 5 and R

After having installed the GRASS5/R interface (see: GRASS 5; GRASS 6), you may read the local GRASS/R interface help pages:
help.start(GRASS)

Using R/GRASS 5 with Project ASSIST/Leics dataset

The sample dataset "Leics" is available from http://grass.itc.it/gdp/  -> Project ASSIST, download "leics" dataset there [direct link to ASSIST].

Start GRASS with "leics" dataset:

grass5
   LOCATION:   leics
   MAPSET:     PERMANENT

Or (the "grassdata5" represents the locate database path):

grass5 grassdata5/leics/PERMANENT
Start R within GRASS, call the R-GRASS function library within R:
R
library(GRASS)
Read the GRASS 5 environment into R:
G <-gmeta()

Useful commands:

#current memory size:
gc()
# see current GRASS location settings:
summary.grassmeta(G)
# print list of available raster maps:
system("g.list rast")
# get help in R environment:
help(plot.grassmeta)
# alternative method to get help in R environment:
?plot.grassmeta
# get HTML help in R environment:
help.start(plot.grassmeta)

The GRASS/R interface allows for loading GRASS maps into R for further analysis. We start with an example. Load the landuse raster map into object "leics":

leics <- rast.get(G, rlist=c("landcov"), catlabels=TRUE)
Note: "rlist" can be a character vector of one or more raster layers. This allows to load one or more maps into a R object. The "catlabels" asks for landcov labels rather than values, treating it as factor data (categorial data): "catlabels=F" would load "1, 2, 3, ..." instead of "Background, Industry, ...". To see object details, run:
names(leics)
str(leics)
Now we load two raster maps into object "leics" (catlabels has to be treated differently):
leics <- rast.get(G, rlist=c("topo", "landcov"), catlabels=as.logical(c(F,T)))
As we load these two maps into the same object name, the first which we had loaded before, will be overwritten. See object info:
names(leics)
 [1] "topo"    "landcov.f"
str(leics)
List of 2
 $ topo     : num [1:57600] 80 80 80 80 80 80 80 80 78 76 ...
 $ landcov.f: Factor w/ 9 levels "Background","In..",..: 6 6 6 6 6 6 6 6 6 6 ...
length(leics)
[1] 2
is.factor(leics$landcov)
 [1] TRUE

is.factor(leics$topo)  [1] FALSE

This new object "leics" is a called "dataframe". To access access its contents, use a $ character to separate the member from the frame. To print some statistics of "topo" frame member, enter:
summary(leics$topo)
   Min. 1st Qu.  Median    Mean 3rd Qu.   Max.
   40.0    62.0   100.0  115.3   167.0   278.0
Plot topo values against land coverage:
boxplot(topo ~ landcov.f, data=leics)
Note: if you enlarge the boxplot window you will see the full legend.

Plotting elevation map with different colors:

plot(G, leics$topo)
plot(G, leics$topo, col=terrain.colors(20))
(20) defines the number of colors to be used.

Plotting landcover map (note the "codes" to plot categorial data as we have loaded "landcov" with "catlabels=T"):

plot(G, codes(leics$landcov))
plot(G, codes(leics$landcov), col=rainbow(8))
Applying a function to each cell of a ragged array (see tapply command):
tapply(leics$topo, leics$landcov, summary)
Arable
   Min. 1st Qu.  Median    Mean 3rd Qu.   Max.
    40     63      81     104     153    270

$Industry    Min. 1st Qu.  Median    Mean 3rd Qu.   Max.   40.00   42.00   59.00   83.61 120.00  220.00

$Pasture    Min. 1st Qu.  Median    Mean 3rd Qu.   Max.    40.0    75.0   145.0  133.5   180.0   278.0

$Quarry    Min. 1st Qu.  Median    Mean 3rd Qu.   Max.    40.0    59.0   161.0  135.3   177.0   198.0

$Residential    Min. 1st Qu.  Median    Mean 3rd Qu.  Max.    40.0    53.0    60.0   76.3    80.0   218.0

$Scrub    Min. 1st Qu.  Median    Mean 3rd Qu.   Max.    40.0   100.0   161.0   148.2  196.0   270.0

$Water    Min. 1st Qu.  Median    Mean 3rd Qu.   Max.   47.00   65.00   99.00   99.27 131.00  221.00

$Woodland    Min. 1st Qu.  Median    Mean 3rd Qu.   Max.    40.0   118.0   156.0   142.9  174.0   248.0

Analyse vice verse(... will take some time for calculation):
tapply(leics$landcov, leics$topo, summary)
$"40"
     Arable    Industry    Pasture      Quarry Residential      Scrub
       1545        367        1225          2         460        571
      Water    Woodland          0         90

$"41"      Arable    Industry    Pasture      Quarry Residential      Scrub         132         36          76         0          47         22       Water    Woodland          0          1 [...]

To remove the "leics" object from R memory, enter:
rm(leics)

Null values in maps

Several analysis methods in R don't allow for NA values (NULL, no data) values for calculations. As GIS raster data regularly contain such NA raster cells, these have to be removed for the analysis. Later, when storing the result map back into GRASS, these NA values have to be stored back to keep the pixel positions in the map. Note: This temporary NA elimination is only allowed if your analysis doesn't depend on the pixel positions within the map!

After loading a map, first we can test if any NA raster cells are present in the map (example: slope map from below):

slope <- rast.get(G, rlist="slope")
summary(slope$slope)
If you look at the right column, you find the number of NAs present in slope map listed.
slope$slope[slope$slope == "NA" ]
This command lists all NA values in the slope map. We can also query their cell numbers:
which(is.na(slope$slope))
Most of the NAs are to be found at map boundaries. For later use we store these position numbers of NAs and values:
slope.nas<-which(is.na(slope$slope))
slope.vals<-which(!is.na(slope$slope))
myslope <- slope$slope[!is.na(slope$slope)]
str(myslope)
str(slope$slope)
You can see the difference in number of values. We have the new slope map in "myslope" now. As a second map we load the "landcov" map from GRASS:
landcov <- rast.get(G, rlist="landcov")
landcov.nas <- which(is.na(landcov$landcov))
mylandcov <- landcov$landcov[!is.na(landcov$landcov)]
summary(myslope)
summary(mylandcov)
Now further statistical analysis can be done.

Restoring the NAs is a three step procedure:
1. Create a new empty numeric vector:

new.myslope <- numeric(length(slope$slope))
2. Put the NAs in the right places:
new.myslope[slope.nas] <- NA
3. Put the values in the right places:
new.myslope[slope.vals] <- myslope
For a test you could draw this map, it should be looking identical to the original map:
plot(G, new.myslope)
plot(G, slope$slope)
A mathematical check is a simple subtraction:
summary(slope$slope - new.myslope)
The result should be "0" for Min, Median and Max.

Dealing with legends

Drawing a legend requires some work:
Generate levels, analyse topo range for useful breaks:

library(MASS)
truehist(leics$topo, nbins=100)
legendlevels <- as.ordered(cut(leics$topo, breaks=c(50,100,150,200, 250)))
plot(G, leics$topo, col=terrain.colors(20))
legend(x=c(452000,456000), y=c(316000,320000), pch=c(1:5),
       legend=levels(as.ordered(legendlevels)))
[to be improved, see krige.G example]

Check the loaded objects:

objects()
gc()

----------------------------
Another example with "leics" (the system() function allows calls to external commands):

system("g.list rast")
Calculate slope/aspect directly in GRASS:
system("r.slope.aspect el=topo as=aspect sl=slope")
Load results into R:
terrain<-rast.get(G, rlist=c("slope","aspect"))
Describe data structure:
str(terrain)
is.factor(terrain$slope)
[1] FALSE
Plot the map:
plot(G, terrain$slope)
Choose colors, get some info first:
?rainbow
Plot with colors:
plot(G, terrain$slope, col=terrain.colors(20))
Plot with gray scale:
plot(G, terrain$aspect, col=gray(0:8 / 8))
Clean R memory:
rm(terrain)

Analysing the Maas contaminated soils dataset

(this section is using material provided by Roger Bivand, Norway) This dataset is build-in into the GRASS/R interface and well known from various other geostatistical software and books. Data source: Burrough, P. & McDonnell, R. (1998) Principles of geographical information systems, New York: Oxford, pp. 309-311.

The utm.maas data frame has 98 rows and 13 columns. It records contamination of the environment by selected metals along the Dutch bank of the River Maas just north of Maastricht. The original data were positioned in relation to local coordinates (TDN coordinates offset by x-178600, y-329700; TDN are stereographic projection, Bessel ellipsoid, origin at 5.387638889E, 52.156160556N, false origin x=155000, y=463000), and have been reprojected to UTM zone 32, using the WGS84 ellipsoid. The pre-defined location can also be downloaded (655kb). See additionally utm help entry.

You have to define this (empty) GRASS location:

 projection: UTM
 ellipsoid: WGS84
 zone: 32
 north: 5652930
 south: 5650610
 west: 269870
 east: 272460
 nsres: 10
 ewres: 10
 rows: 232
 cols: 259

This data frame contains the following columns:

[,1]  east  numeric  UTM zone 32 eastings coordinates m
[,2]  north numeric  UTM zone 32 northings coordinates m
[,3]  x     numeric  local coordinates m
[,4]  y     numeric  coordinates m
[,5]  elev  numeric  local elevation m
[,6]  d.river  numeric  distance from main River Maas channel m
[,7]  Cd    numeric  ppm
[,8]  Cu    numeric  ppm
[,9]  Pb    numeric  ppm
[,10] Zn    numeric  ppm
[,11] LOI   numeric  percentage organic matter loss on ignition
[,12] Flfd  numeric  flood frequency class
[,13] Soil  numeric  soil type
Start GRASS and R within GRASS:
grass5 grassdata5/maas/PERMANENT
R
library(GRASS)
G<-gmeta()
data(utm.maas)
Run the built-in example (written by Roger Bivand):
example(utm.maas)
This example is performing as follows (here slightly extended): First a new object is created with ordered data (five classes with defined thresholds);
Zn.o <- as.ordered(cut(utm.maas$Zn, labels = c("insignificant",
    "low", "medium", "high", "crisis"), breaks = c(100,200,
    400, 700, 1000, 2000), include.lowest = T))
Look at this table:
table(Zn.o)
Plot the ordered data as map with legend and title:
plot(utm.maas$east, utm.maas$north, pch = codes(Zn.o), xlab = "", ylab = "", asp = 1)
legend(x = c(269900, 270500), y = c(5652000, 5652700), pch = c(1:5),
legend = levels(Zn.o))
title("Burrough & McDonnell 1998 p. 107")
Plot histogram of flood frequency classes ("h" defines box width) - not in example:
str(utm.maas$Fldf)
library(MASS)
truehist(utm.maas$Fldf, h=0.01)
Plot map of flood frequency classes - not in example:
plot(utm.maas$east, utm.maas$north, pch = utm.maas$Fldf)
legend(x = c(269900, 270500), y = c(5652000, 5652700), pch = c(1:3),
legend = levels(as.ordered(utm.maas$Fldf)))
You can identify points in this map using the mouse (the value will be printed into the map): - not in example
identify(utm.maas$east, utm.maas$north, labels = utm.maas$Fldf)
Analyse the "flood frequency class Fldf": mean and standard deviation
round(tapply(utm.maas$Zn, as.factor(utm.maas$Fldf), mean), 2)
round(tapply(utm.maas$Zn, as.factor(utm.maas$Fldf), sd), 2)
The same we do with log transformed data:
round(tapply(log(utm.maas$Zn), as.factor(utm.maas$Fldf), mean), 3)
round(tapply(log(utm.maas$Zn), as.factor(utm.maas$Fldf), sd), 3)
And again with exp transformed data:
round(exp(round(tapply(log(utm.maas$Zn), as.factor(utm.maas$Fldf), mean), 3)), 2)
Now we look at the Zn concentration distribution (normal and log transformed):
hist(utm.maas$Zn, breaks = seq(0, 2000, 100), col = "grey")
hist(log(utm.maas$Zn), breaks = seq(3.5, 8.5, 0.25), col = "grey")
The histogram of the non-transformed data shows the typical skewness in the distribution as usually found for soil data. Finally some statistical tests will be done: Welch Two Sample t-test (t.test help page):
library(stats)
t.test(utm.maas$Zn[utm.maas$Fldf == 2], utm.maas$Zn[utm.maas$Fldf == 3])
cat("NB: B&McD 1998, p. 108, their relative variance is (1-(RSquared))\n")
Analysis of Variance Table (anova help page, lm linear model page):
anova(lm(Zn ~ as.factor(Fldf), data = utm.maas))
summary(lm(Zn ~ as.factor(Fldf), data = utm.maas))
anova(lm(log(Zn) ~ as.factor(Fldf), data = utm.maas))
summary(lm(log(Zn) ~ as.factor(Fldf), data =
utm.maas))
Contour plot of Kernel density plot:
contour.G(G, kde2d.G(utm.maas$east, utm.maas$north,
h=c(300,300), G=G, Z=utm.maas$Zn)*maasmask)
points(utm.maas$east, utm.maas$north, pch=codes(Zn.o))
legend(x=c(269800, 270500), y=c(5652300, 5653000),
pch=c(1:5), legend=levels(Zn.o))
title("Kernel density representation of soil Zn")
Another nice example of kernel density plots:
example(kde2d.G)
Kriging example (written by Roger Bivand):
library(sgeostat)
maas.pts <- point(utm.maas, x="east", y="north")
maas.pairs <- pair(maas.pts, num.lags=10, maxdist=1000)
maas.evg <- est.variogram(maas.pts, maas.pairs, a1="Zn")
plot(maas.evg)
text(maas.evg$bins, maas.evg$classic/2, labels=maas.evg$n, pos=1)
abline(h=25000)
abline(h=170000)
maas.fit.exp <- fit.exponential(maas.evg, c0=24803.4, ce=158232,
    ae=361.604, plot.it=T, iterations=0)
cat("Using parameters from B&McD, p. 142 without fitting.\n")
res.G1 <- krige.G(G, maas.pts, "Zn", maas.fit.exp, maasmask)
summary(res.G1$zhat)
summary(res.G1$sehat)
plot(G, res.G1$zhat, breaks=round(seq(120,1660,length=9)), col=grey(9:2/9))
plot(G, ifelse(is.na(maasmask), 1, NA), col="wheat", add=T)
legend(c(269900, 270600), c(5652000, 5652900), bty="n", bg="wheat",
 legend=c(rev(legtext(round(seq(120,1660,length=9)))),"mask=NA"),
 fill=c(rev(grey(9:2/9)), "wheat"))
title("Ordinary kriging predictions")
ln.Zn <- data.frame(x=utm.maas$east, y=utm.maas$north, z=log(utm.maas$Zn))
ln.Zn.pts <- point(ln.Zn)
ln.Zn.evg <- est.variogram(ln.Zn.pts, maas.pairs, a1="z")
plot(ln.Zn.evg)
ln.Zn.fit.sph <- fit.spherical(ln.Zn.evg, c0=0.0536187, cs=0.581735,
   as=892.729, plot.it=T, iterations=0)
cat("Using parameters from Gstat user's manual, p. 14\n")
res.G2 <- krige.G(G, ln.Zn.pts, "z", ln.Zn.fit.sph, maasmask)
summary(res.G2$zhat)
summary(res.G2$sehat)
plot(G, res.G2$zhat, breaks=c(0, seq(4.8,7.4,length=9)), col=c("yellow", grey(9:2/9)))
plot(G, ifelse(is.na(maasmask), 1, NA), col="wheat", add=T)
legend(c(269900, 270600), c(5652000, 5652900), bty="n", bg="wheat",
 legend=c(rev(legtext(seq(4.8,7.4,length=9))),"< 4.8", "mask=NA"),
 fill=c(rev(grey(9:2/9)), "yellow", "wheat"))
title("Ordinary kriging predictions ln(Zinc)")

# to be continued...


Running R/GRASS with own data (in your own location)

Start GRASS 5 with your location/mapset settings. Start R within GRASS:

grass5
R
library(GRASS)
G<-gmeta()
Now the following functions are available (subject to change, depends on GRASS/R version):
contour.G           Equal scale contour plots for GRASS raster and site data
east                Reads GRASS metadata from the current LOCATION
get.cygwinstring    Access to underlying environment variables within GRASS
get.GISDBASE        Access to underlying environment variables within GRASS
get.GRASSChkGISRC   Access to underlying environment variables within GRASS
get.LOCATION        Access to underlying environment variables within GRASS
get.MAPSET          Access to underlying environment variables within GRASS
get.mapsets         Access GRASS readable mapsets search path
gmeta               Reads GRASS metadata from the current LOCATION
GRASS.connect       Access to underlying environment variables within GRASS
interp.new.G        Interpolation from sites to raster, Akima's new method
kde2d.G             Two-Dimensional Kernel Density Estimation on a GRASS Grid
krige.G             Kriging prediction for GRASS maps
legtext             Equal scale plots for GRASS raster and site data
maas.metadata       Soil pollution data set
maasmask            Soil pollution data set
make.maas.location  Reads GRASS metadata from the current LOCATION
north               Reads GRASS metadata from the current LOCATION
obsno               Reads GRASS metadata from the current LOCATION
pcbs                PCBs in an area of South Wales
plot.grassmeta      Equal scale plots for GRASS raster and site data
prmat2.G            Kriging prediction for GRASS maps
rast.get            Import GRASS raster files
rast.put            Exports to GRASS raster data files
refresh.mapsets     Access GRASS readable mapsets search path
reverse             Reads GRASS metadata from the current LOCATION
semat2.G            Kriging prediction for GRASS maps
set.cygwinstring    Access to underlying environment variables within GRASS
set.GISDBASE        Access to underlying environment variables within GRASS
set.LOCATION        Access to underlying environment variables within GRASS
set.MAPSET          Access to underlying environment variables within GRASS
sites.get           Import GRASS sites file
sites.put           Export GRASS sites file
sites.put2          Export GRASS sites file
summary.grassmeta   Display GRASS metadata
trmat.G             Evaluate Trend Surface over a GRASS Grid
utm.maas            Soil pollution data set
Note: You can unload the loaded package pkg by
detach("package:GRASS")
To get a list of loaded objects:
objects()

Synthetic data for tests using R and GRASS

You can easily generate synthetic test patters with GRASS and analyse them in R. An example:

To test stationarity (semivariogram modelling) we generate a sine surface using r.mapcalc:

1. Start GRASS with an existing location:

grass5
2. Generate the sine surface
r.mapcalc sinesurface="sin(row())*sin(col())"
3. look at it within GRASS:
d.mon x0
d.rast sinesurface
or use NVIZ:
nviz2.2 elevation=sinesurface
4. Now we generate random points from this surface to produce our "geostatistical sample area":
r.random in=sinesurface sites=sample ns=1000
s.info sample
We have 1000 samples in sites format in a new map "sample".

5. For analysis we start R within GRASS and call the GRASS-interface library:

R
library(GRASS)
G<-gmeta()
The rast.get command loads the GRASS map into R:
sample <- sites.get(G, "sample")
str(sample)
plot(G,sample)

#proceed with variogram stuff


Written by (c) 2000-2005 Markus Neteler