For the R/PostgreSQL connection we need two extra packages. Since 10/2003 there is a new Rdbi/PostgreSQL interface for R-stats. Download it from BioConductor (but not the old SourceForge version!):
Interface installation (enter in shell terminal, appropriate version number):
R CMD INSTALL Rdbi_1.0.2.tar.gz R CMD INSTALL RdbiPgSQL_1.0.2.tar.gz
Note: The RODBC interface in an alternative, but the data transfer is rather slow.
PostgreSQL installation
We assume that you have a working installation of PostgreSQL 7.3 or later. For GNU/Linux RPM packages are available, for Mac OSX packages from FINK.
psql template1
Welcome to psql 7.3.2, the PostgreSQL interactive terminal.
Type: \copyright for distribution terms
\h for help with SQL commands
\? for help on internal slash commands
\g or terminate with semicolon to execute query
\q to quit
template1=> \q
#maybe only user "postgres" is available, so start like this:
psql -U postgres template1
Welcome to psql 7.3.2, the PostgreSQL interactive terminal.
[...]
\q
Server does not run? This requires 'root' permissions:
su #start postgresql daemon: service postgresql start exit #To enable the PostgreSQL server to launch after boot: chkconfig --list postgresql chkconfig postgresql on chkconfig --list postgresql #... should be changed to 'on' in some levels now.
Server running? Yes (maybe it tells you that the user name does not exist).
We add a new user now:
#become 'root': su #become special user 'postgres' su postgres #create new PostgreSQL user (warning, we don't use a passowrd here!): createuser -a -d pablo #... CREATE USER should be printed exit exit
Now retry to connect (see above, 'psql ...').
Then we explore a bit more (note that the TAB key expands command and table names). Every query has to be followed with semicolon:
psql template1
[...]
template1=> SELECT * from pg_user;
[...]
#show all tables (called 'relation' in SQL speech):
\dt
No relations found.
#what time is it?
SELECT CURRENT_TIMESTAMP;
timestamptz
-------------------------------
2003-11-01 13:24:42.960403+00
(1 row)
#get some help for all backslash commands:
\?
[...]
#get some help for all commands:
\h
Available help:
ABORT CREATE TABLE EXECUTE
[...]
#print help for 'CREATE TABLE' command:
\h CREATE TABLE
[...]
#get out of PostgreSQL:
\q
Now we want to create an own database.
This requires 'root' permissions:
su #stop postgresql daemon (never ! modify databases with PostgreSQL still running): service postgresql stop #just notes, we won't do it now: # NOTE 1: new users are added in: /var/lib/pgsql/data/pg_hba.conf # it is recommended to use "md5" encryption when defining users # NOTE 2: if network access is desired, activate "tcpip_socket = true" in # /var/lib/pgsql/data/postgresql.conf #so we start postgresql daemon again: service postgresql start #-> should print "ok" #to create a new database, we login as 'postgres' user: su postgres createdb grasscourse exit #exit from root exit #now we are normal user again. #Check available databases (the new DB 'grasscourse' should be listed): psql -l
Now we create a new table within this database:
psql grasscourse
#Note: column definitions are comma separated, no comma after last column,
#to make it more sophisticated, we add a constraint (Y, N) to
# column 'enjoyed' to only accepts these two values for that column:
CREATE TABLE cities
(
id integer,
name varchar(50),
lastvisit date,
enjoyed varchar(1) check (enjoyed in ('Y','N'))
);
# -> CREATE TABLE should be printed
#list all tables:
\d
#get column info:
\d cities
#Insert data into the new table (use single quotes!):
#Date order: YY-mm-dd
INSERT INTO cities VALUES ( 1, 'Trento', '2003-11-01', 'Y');
# -> INSERT some_numbers should be printed
INSERT INTO cities VALUES ( 2, 'Milano', '2003-10-12', 'N');
INSERT INTO cities VALUES ( 3, 'Povo', '2003-11-25', 'Y');
#test for the constraint:
INSERT INTO cities VALUES ( 4, 'Bolzano', '2003-10-23', 'A');
# -> should fail: 'A' not accepted
#test for the date
INSERT INTO cities VALUES ( 4, 'Bolzano', '2003-10-33', 'Y');
# -> should fail: day 33 not accepted
#now we query the table:
SELECT * FROM cities;
id | name | lastvisit | enjoyed
----+--------+------------+---------
1 | Trento | 2003-11-01 | Y
2 | Milano | 2003-10-12 | N
3 | Povo | 2003-11-25 | Y
(3 rows)
#now we query the table, ordered by one colum:
SELECT * FROM cities order by lastvisit;
id | name | lastvisit | enjoyed
----+--------+------------+---------
2 | Milano | 2003-10-12 | N
1 | Trento | 2003-11-01 | Y
3 | Povo | 2003-11-25 | Y
(3 rows)
#now we query the table, but only two columns:
SELECT name,lastvisit FROM cities order by lastvisit;
name | lastvisit
--------+------------
Milano | 2003-10-12
Trento | 2003-11-01
Povo | 2003-11-25
(3 rows)
#leave the 'psql' terminal:
\q
R #it will appear something like this: R : Copyright 2003, The R Development Core Team Version 1.8.0 (2003-10-08) R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type 'license()' or 'licence()' for distribution details. R is a collaborative project with many contributors. Type 'contributors()' for more information. Type 'demo()' for some demos, 'help()' for on-line help, or 'help.start()' for a HTML browser interface to help. Type 'q()' to quit R. >
plot(rnorm(123)) q() Save workspace image? [y/n/c]: n
Since you are able to start and leave R, we can try a sample session:
R
#get help in terminal window:
?help
#use q key to leave the help, cursor and space to browse the text.
#open help in browser:
help.start()
1+1
# [1] 2
#NOTE: if you get lost while entering a command, CTRL-C is your friend to
#get back to the prompt.
x <- 1+1
x
# [1] 2
#make a sequence of 12 numbers:
seq(1,12)
#assign result to a variable:
x <- seq(1,12)
#print variable contents:
x
# [1] 1 2 3 4 5 6 7 8 9 10 11 12
#what's this variable:
str(x)
#simple statistics of this variable:
summary(x)
#see command help (in browser, if still open):
?summary
#make some categorical data (called "factors" in R speech):
rgb <- c("red", "green", "blue")
rgb
rgb[2]
#have some more:
infra <- c("nir", "mir", "fir")
thermal <- c("tir", NA, NA)
#for convenience we can handle variable composites in a "data frame":
landsat5 <- data.frame(rgb, infra, thermal)
landsat5
str(landsat5)
#Now we look at some sample maps:
# Maunga Whau (Mt Eden) is one of about 50 volcanos in the Auckland
# volcanic field. This data set gives topographic information for
# Maunga Whau on a 10m by 10m grid.
data(volcano)
str(volcano)
#... is a matrix of numerics
plot(volcano)
#...looks strange!
#but a nice map:
filled.contour(volcano, color = terrain.colors, asp = 1)
title(main = "Maunga Whau volcano: filled contour map")
Up to now we used only base functions and data. There are many extensions, called R libraries which provide enormous functionality. One is the "GRASS" library to link R and GRASS:
#load GRASS library (GRASS/R interface): library(GRASS) #there are sample data built-in, e.g. river Maas soil pollution data: ?utm.maas #many commands in R provide an example. You can just run it: example(utm.maas) #get out of this R session: q() Save workspace image? [y/n/c]: n
Next task is to use the Rdbi interface to connect R and PostgreSQL:
R library(Rdbi) conn <- dbConnect(PgSQL(), host="localhost", dbname="grasscourse") #list available tables: dbListTables(conn)
Reading data (the table we created above) from PostgreSQL into R:
res <- dbSendQuery(conn, "select * from cities") cities <- dbGetResult(res) str(cities) `data.frame': 3 obs. of 4 variables: $ id : int 1 1 1 $ name : chr "Trento" "Milano" "Povo" $ lastvisit: chr "11-01-2003" "10-12-2003" "11-25-2003" $ enjoyed : chr "Y" "N" "Y" #disconnect: dbDisconnect(conn) #print the downloaded data frame: # date: mm-dd-YYYY cities id name lastvisit enjoyed 1 1 Trento 11-01-2003 Y 2 2 Milano 10-12-2003 N 3 3 Povo 11-25-2003 Y #make the character date a real date: tmp <- strptime(cities$lastvisit, "%m-%d-%Y") tmp [1] "2003-11-01" "2003-10-12" "2003-11-25" # YYYY-mm-dd is new date format, so identical to PostgreSQL order #update in the data frame: #cities$lastvisit <- strptime(cities$lastvisit, "%m-%d-%Y")
So far, so nice. Next step is to learn writing data from R to PostgreSQL (e.g. you results): For illustration we use some meteo data, a time series of Nottingham average air temperatures at Nottingham Castle in degrees F for 20 years (note that the MASS library is part of the VR bundle):
library(MASS) data(nottem) #let's look at the data (note: Fahrenheit, not Celsius!)): str(nottem) summary(nottem) nottem plot(nottem) #connect to PostgreSQL: library(Rdbi) conn <- dbConnect(PgSQL(), host="localhost", dbname="grasscourse") #write data to PG database as new table "nottingtemp": dbWriteTable(conn, nottem, "nottingtemp") dbDisconnect(conn)
psql grasscourse \d SELECT * from nottingtemp; \q
grass57
#reset region:
g.region -dP
#within GRASS, start R:
R
[...]
library(GRASS)
#load the GRASS environment into R:
G <- gmeta()
# see current GRASS location settings:
summary.grassmeta(G)
# print list of available raster maps:
system("g.list rast")
#NOTE: You always need to specify "G" for the next commands!
#
#load the 'elevation.dem' raster map into R:
elev <- rast.get(G,"elevation.dem")
str(elev)
#look at the map:
plot(G, elev$elevation.dem)
plot(G, elev$elevation.dem, col=terrain.colors(20))
#draw elevation histogram:
library(MASS)
truehist(elev$elevation.dem, nbins=200)
#Is that normally distributed?
qqnorm(elev$elevation.dem)
qqline(elev$elevation.dem)
#-> no, deviates from qqline
#delete the object from R memory:
rm(elev)
#Now we import two raster maps into a data frame:
#cat: T (true) or F (false) enables to copy over category labels rather than
#numeric values:
spearfish <- rast.get(G, c("elevation.dem","vegcover"),cat=(c(F,T)))
str(spearfish)
summary(spearfish$elevation.dem)
summary(spearfish$vegcover.f)
#we can also plot the categorical 'vegcover' map, converting to numeric
#on-the-fly:
plot(G, unclass(spearfish$vegcover))
#get basic statistics on the distribution of vegetation related to
#elevation:
tapply(spearfish$elevation.dem, spearfish$vegcover, summary)
#make a boxplot:
boxplot(tapply(spearfish$elevation.dem, spearfish$vegcover, summary))
#vice-verse (warning: long calculation...):
tapply(spearfish$vegcover, spearfish$elevation.dem, summary)
grass57 #set resolution/region to Celsius temperature map from lecture 4: g.region rast=temp_celsius_12Jul2000 -p #within GRASS, start R: R library(GRASS) G <- gmeta() #NOTE: You always need to specify "G" for the next commands! # #load the 'elevation.dem' raster map into R: elev <- rast.get(G,"elevation.dem") str(elev) #load the 'temp_celsius_12Jul2000' raster map into R: celsiusmap <- rast.get(G,"temp_celsius_12Jul2000") str(celsiusmap) #Note that R changes underscores to dots #now we extract sample points. First get 200 coordinate pairs: str(G) #East: E.random <- sample(G$xseq,200) #North: N.random <- sample(G$yseq,200) #merge to a data frame: randompoints <- data.frame(E.random,N.random) str(randompoints) #nice, but complicated. Easier to fetch the cell indexes: randompoints <- sample(G$Ncells,200) #now we query the maps at the randomly selected cell numbers: elevsamples <- elev$elevation.dem[randompoints] hist(elevsamples) celsiussamples <- celsiusmap$temp.celsius.12Jul2000[randompoints] hist(celsiussamples) #...the sample data are ready. #look at the plot: plot(celsiussamples,elevsamples) #Linear regression analysis (Linear Model): lm(elevsamples ~ celsiussamples) summary(lm(elevsamples ~ celsiussamples)) #For a related discussion, compare Dalgaard 2002, p.97ff #plot in one graph data and regression line (note parameter order!): plot(celsiussamples,elevsamples) abline(lm(elevsamples ~ celsiussamples)) #The result is: temperature decreasing with height, but the linear #correlations is poor (as expected).
#simulate 4 time series for 4 stations, 500 days:
sim1 <- as.double(arima.sim(list(order=c(1,1,0),ar=0.7),n=499))
sim2 <- as.double(arima.sim(list(order=c(1,1,0),ar=0.7),n=499))
sim3 <- as.double(arima.sim(list(order=c(1,1,0),ar=0.7),n=499))
sim4 <- as.double(arima.sim(list(order=c(1,1,0),ar=0.7),n=499))
#station heights:
height1 <- seq(122,122,l=500)
height2 <- seq(332,332,l=500)
height3 <- seq(1543,1543,l=500)
height4 <- seq(534,534,l=500)
#generate 500 days:
ISOdate(2002, 7, 11) - ISOdate(2001, 2, 26)
days <- seq(ISOdate(2001, 2, 26),ISOdate(2002, 7, 10), by="d")
days
#we need a special table structure:
tmpsim <- print(c(sim1,sim2,sim3,sim4))
tmpdate <- print(c(days,days,days,days))
tmpheight <- print(c(height1,height2,height3,height4))
#put all in one data frame:
meteo <- data.frame(tmpheight,tmpdate,tmpsim)
names(meteo) <- c("height","date","var")
rm(tmpheight,tmpdate,tmpsim)
str(meteo)
library(lattice)
xyplot(var ~ date | height, data=meteo, type="l")