Short Introduction to Geostatistical and Spatial Data Analysis with GRASS and R statistical data language

[Table of contents]
DRAFT Document!

Learning statistical data analysis using "R"

The R examples described here are (should be) platform independent.

1. Basics

To start R, enter:
R
(Windows: click onto the Rgui icon)

R provides a set of base functions, which can be extended by using contributed packages (to be installed additionally).
Say, we needed such a contrib-package and therefore download and install the MASS library from Venable and Ripley. MASS provides many statistical functions and also numerous sample datasets:

#don't get confused: the code package is called VR, the library MASS:
install.packages("VR")
library(MASS)

To get R command help, you can always add a ? to the command:

?library
#or open the browser:
help.start()

Additionally there is the FAQ).
To get out of this (if already feeling sleepy):

q()

R is an object oriented data analysis language. The assignment operator is used to store results:

x <- 1+1

To get the result, just enter the name of the object:

x

Basics: Drawing a function

A simple example is to plot a sine curve. First we have to define the interval and number of data points to support the curve. Note: By redefining x we overwrite the old object:

x <- seq(-2*pi, 2*pi, len = 100)
x
str(x)
summary(x)

Then we can plot it (the type parameter specifies the line type):

matplot(x, sin(x), type="l")

To get an idea about the various "matplot()" options, run:

?matplot

You can see some "matplot()" examples by running:

example(matplot)

R comes along with a large number of published datasets. To get a list about currently accessible datasets, enter:

data()

To get a description of a specific dataset, enter (e.g. volcano dataset):

?volcano

Before using such data, we generate a set of coordinate pairs (500 points) normally distributed (just to get some experience with R):

x <- rnorm(500)
y <- rnorm(500)

Plot them:

plot(x,y)
xy plot
Look at the histogram:
library(MASS)
truehist(x)
truehist(y, nbin=50)
truehist
The nbins parameter allows to define the number of bins (those bars you can see there) in the histogram.

Let's add some z-values. To get numbers larger that 1 we multiply the values generated by rnorm:

z <- 1000 * rnorm(500)

To proceed, we need to link these three variables. They can be bounded into a "data frame":

dataset <- data.frame(x,y,z)

Now we have these variables twice, once individual, once in a dataframe. To see the objects currently available, use:

ls()

You can look at an object by entering its name. Here you will receive a long list of numbers with four columns (number of element, x, y, z):

dataset

Get a summary of it:

summary(dataset)

To look at the data structure (you will need to do this regularly), use the str() function:

str(dataset)

As we have a dataframe now, the access to a variable is different:

mean(x)
which delivers the mean of x (the individual variable). To access x in the frame, you have to enter instead:
mean(dataset$x)

Hopefully you get the same results. Generally you could remove the individual variables if you will continue to work with the dataframe:

rm(x,y,z)
ls()

2. Visualization

If you want to visualize a dataset, you can display individual data points (using points() function) or surfaces/contour lines. To display contour lines on the current point dataset, a surface needs to be interpolated:
library(akima)
surface <- interp(dataset$x, dataset$y, dataset$z)
contour(surface)
contour plot

We want to see a colored surface with legend:

filled.contour(surface)
filled contour plot

... and a perspective view:

persp(surface, col="green", expand=0.3)

New developments (2003) such as iPlots and RGL will improve the visualization capabilities of R. Also xyplot() of the lattice library is very powerful.

perspective plot (wireframe)

Finally we plot a color filled sine curve:

# set "frame":
plot(c(-pi, 2*pi), c(-1,1), type="n", ylab="sin(x)", xlab="x")
x1 <- seq(-pi, 0, length=101)
x2 <- seq(0, 2*pi, length=101)
polygon(c(0, -pi, x1), c(0, 0, sin(x1)), col="red")
polygon(c(2*pi, 0, x2), c(0, 0, sin(x2)), col="green")

# print plot into EPS file:
dev.copy2eps()
color filled sine curve

Import/export using ASCII-files (tables), manual table entry

We already removed the x, y and z variables from the memory in this session. To demonstrate the export of a single variable, we have to restore it from the dataframe (or generate a new one using rnorm() for example):
x <- dataset$x

To export this single variable, use:

write(x, file="~/testfile.asc")

To export a dataframe, use:

write.table(dataset, file="~/testfile2.asc")

You probably want to use a column separator different from white space:

write.table(dataset, file="~/testfile3.asc", sep=":")

Read the table back into R:

dataset.new <- read.table("~/testfile.asc", header=T, sep=":")

Remove the x, y and z variable from memory of R (we still have the dataframe):

rm(x)

List available objects:

ls()

Now we want to get a summary of the dataframe variables (Min, 1st. Quartile, Median, Mean, 3rd Qu., Max, NA's):

summary(dataset)

If you want to enter data tables from keyboard, R provides the function data.entry(). To enter a 3x4 matrix with value 0 pre-defining the fields, enter:

x <- matrix(0,3,4)
data.entry(x)

data.entry

Plotting to file (EPS, PS)

Saving into Postscript files/Xfig files:
postscript("test.ps")
plot(x)
dev.off()
The dev.off() function writes the file. Xfig export:
xfig("test.fig")
plot(x)
dev.off()
Plotting into a EPS file can be done with:
dev.copy2eps()

Graphical user interface

There are a few graphical user interfaces available for R now. One of those is "R commander":
# may not work on all systems:
install.packages("rgl", dependencies=TRUE)

# does not neccesarily need 'rgl':
install.packages("Rcmdr", dependencies=TRUE)
Then launch the GUI with:
library(Rcmdr)
This will open the R Commander.

To re-lauch it, run:

Commander()
Demo (cited from the R Commander page: For a quick test/tour:

End of basic stuff.



© 2000-2004 Markus Neteler
Last update: $Date: 2006-06-23 09:22:45 +0200 (Fri, 23 Jun 2006) $