Nov 2008

Lucky to have (access to) a cluster? Here some notes on how to do geospatial number crunching on it a.k.a. HPC (High Performance Computing).

Preparing the disks
We decided to use the ext3 file system. An initial problem was the formatting of the RAID5 disk set since it exceeded the file system specifications. Then, setting the ext3 block size to 4k instead of 1k we could format it.

Storage: a home for GIS data
The disks are available via NFS to all nodes (blades in our case). All raw/original data sets and the GRASS database are sitting in an NFS exported directory which I even link on my laptop to easily add/access/modify stuff.

Front-end machine and blades configuration
The cluster is a (currently) 56CPU blades system, we’ll expand to 128 CPUs later this year (16 blades with 2 procs a 4 core and 16GB RAM per blade). Additionally, we have a front-end machine to run the job manager and to link in further disks, all via NFS.
The blades are configured diskless, i.e. that once started, they receive their operating system from the front end machine via network (10GB/s ethernet). Like this, we have a single directory on the front end which contains all software, this is then propagated to all blades. Very convenient. We use Scientific Linux (the LiveDVD copied onto the disk, there is a special directory to store your modifications which are then merged in on the fly once you boot the blades, pretty cool concept). The job software is (SUN) Grid Engine, also free software. Job control with GRASS I have described here:

https://grass.osgeo.org/wiki/Parallel_GRASS_jobs
-> Grid Engine

GRASS: Avoiding replicated import of large data files through virtual linking
New in GRASS 6.4 is that you can just register a geodata file on the fly with r.external. Altogether I have 1.4TB of new GIS data from our province, naturally I didn’t want to by a new disk array just for my provincial GRASS location! Here r.external comes handy to minimize the “import” to a few bytes. As expected, it leverages GDAL to get data into GRASS, the overhead is minimal.

Power consumption
Power consumption is measured, too: The entire system consumes around 2000W (each blade less than 200W), so it’s going into the direction of “green” computing (there is no such thing!). If we had a solar panel at least…

Outcome
All in all a very nice solution. I made a stress test and removed all internal switches and shut down the blades while I was processing 8000 MODIS satellite maps. Everything survived and the Grid engine job manager collected the crashed jobs and restarted them without complaining. All resulting maps are collected in the target GRASS mapset and could be even exported to common GIS formats, if needed.
If you want to run Web Processing Services (e.g., pyWPS), you can likewise send each session to a node, giving you enormous possibilities for your customers.

Edit 2014:

“Big data” challenges on a cluster – limits and our solutions to overcome them:

  • 2008: internal 10Gb network connection way too slow
    • Solution: TCP jumbo frames enabled (MTU > 8000) to speed up the internal NFS transfer
  • 2009: hitting an ext3 filesystem limitation (not more than 32k subdirectories allowed but having more files in cell_misc/ as each GRASS GIS raster map consists of multiple files)
    • Solution: adopting XFS filesystem [yikes! …. all to be reformated, i.e. some terabyes had to be “parked” temporarily]
  • 2012: Free inodes on XFS exceeded
    • Solution: Update XFS version [err, reformat everything again]
  • 2013: I/O saturation in NFS connection between chassis and blades
    • Solution: reduction to one job per blade (queue management), 21 blades * 2.5 billion input pixels + 415 million output pixels
  • 2014: GlusterFS saturation
    • Solution: Buy and use a new 48 port switch, ti implement 8-channel trunking (= 8 Gb/s)

Just to update you on the GRASS Web statistics development, here the grass.osgeo.org statistics (remember, we have MANY mirror sites):

Month Unique Number Pages Hits Bandwidth
visitors of
visits
Jan 2008 39223 74088 291166 715946 101.23 GB
Feb 2008 38984 74043 218314 623770 107.09 GB
Mar 2008 40674 73389 223666 621816 107.04 GB
Apr 2008 5490 15702 135134 403726 220.87 GB
May 2008 20613 104556 912263 2242942 1442.31 GB

(this includes of course search engine traffic)

It appears that many visitors came back in May who downloaded the long awaited GRASS 6.3.0 release from 23 Apr 2008.

Some outstanding hits for May (views, only grass.osgeo.org):
10095 /grass63/binary/mswindows/native/
3271 /grass63/binary/mswindows/native/WinGRASS-6.3.0-Setup.exe

This points out of obvious need for a portable, in this case also MS-Windows compliant GIS which GRASS 6.3.0 now is! Fetch native winGRASS with installer or GRASS for MacOSX or GRASS for Linux or …

GRASS GIS releases version 6.3.0
23 April 2008
https://grass.osgeo.org

GRASS 6.3.0 is a “technology preview” release, the first beta on the path to GRASS 6.4-stable, and also marks the start of work on GRASS 7. As such GRASS 6.3.0 is not intended to be a stable release with ongoing support, but after five months of quality-assurance review users can be confident to use this version for their day to day work, indeed due to the open development model many already do.

This release brings hundreds of new module features, supported data formats, and language translations, as well as a number of exciting enhancements to the GIS. A prototype of the new wxPython user interface is debuted, and for the the first since its inception with a port from the VAX 11/780 in 1983, GRASS will run on a non-UNIX based platform: MS-Windows. This is currently still in an experimental state and we hope that widespread testing of 6.3.0 will mean the 6.4 release of WinGRASS will be fully functional and robust. Existing users will be happy to know that these new features do not disrupt the base GIS which remains as solid as ever and fully backwards compatible with earlier GRASS 6.0 and GRASS 6.2 releases.

Several infrastructure changes accompany this release with the project becoming a founding member of the Open Source Geospatial Foundation (OSGeo). This includes a new home for the website, the Wiki help system, source code repository, community add-on module repository, integrated bug tracking system, and formal membership for the project in a non-profit legal entity. We hope that these changes will guarantee that the GRASS community will be well supported and vibrant well into the future.

The Geographic Resources Analysis Support System (GRASS) is a Geographic Information System (GIS) used for spatial modeling, visualization of both raster and vector data, geospatial data management and analysis, processing of satellite and aerial imagery, and production of sophisticated presentation graphics and hardcopy maps. GRASS combines powerful raster, vector, and geospatial processing engines into a single integrated software package.

The GRASS GIS project is developed under the terms of the GNU General Public License (the GPL) by volunteers the world over. GRASS differs from many other GIS software packages used in the professional world in that it is developed and distributed by users for users, mostly on a volunteer basis, in the open, and is given away for free. Emphasis is placed on interoperability and unlimited access to data as well as on software flexibility and evolution rate. The source code is freely available allowing for immediate customization, examination of the underlying algorithms, addition of new features, and fast bug fixing.

GRASS is currently used around the world in academic and commercial settings as well as by many governmental agencies and environmental consulting companies.

Software download at https://grass.osgeo.org/download/ and numerous mirror sites.

Full story at https://grass.osgeo.org/announces/announce_grass630.html

A fifth release candidate of GRASS 6.3.0 is now available:

Source code download:
https://grass.osgeo.org/grass63/source/grass-6.3.0RC5.tar.gz

An initial announcement has been drafted at
https://trac.osgeo.org/grass/wiki/Release/6.3.0RC5-News

Key fixes include improved portability for MS-Windows (native support), several module fixes, and especially the introduction of a new, Python based portable graphical user interface which includes a new vector digitizer.

About GRASS GIS:
Commonly referred to as GRASS, this is a Geographic Information System (GIS) used for geospatial data management and analysis, image processing, graphics/maps production, spatial modeling, and visualization. GRASS is currently used in academic and commercial settings around the world, as well as by many governmental agencies and environmental consulting companies. GRASS is official project of the Open Source Geospatial Foundation.

Web site: https://grass.osgeo.org/

27 November 2007
https://grass.osgeo.org

The development team is happy to announce that a new bugfix version of GRASS GIS has been released today. This release fixes a number of bugs discovered in the 6.2.2 source code. It is primarily for stability purposes and adds minimal new features. Besides bug fixes it also includes a number of new message translations and updates for the help pages and projection database.

Highlights include further maturation of the GRASS 6 GUI, vector, and database code. Some improvements have been backported from the GRASS 6.3 development branch where new development continues at a strong pace of approximately one code commit every hour, including major work on an all new cross-platform wxPython GUI and a native MS Windows port (from 6.3.0 onwards).

The Geographic Resources Analysis Support System, commonly referred to as GRASS, is a Geographic Information System (GIS) combining powerful raster, vector, remote sensing and and geospatial processing engines into a single integrated software suite. GRASS includes tools for spatial modeling, visualization of raster and vector data, management and analysis of geospatial data, and the processing of satellite and aerial imagery. It also provides the capability to produce sophisticated 4D presentation graphics and hardcopy maps.

GRASS is currently used around the world in academic and commercial settings as well as by many governmental agencies and environmental consulting companies. It runs on a variety of popular hardware platforms and is Free open-source software released under the terms of the GNU General Public License.

GRASS is a proposed founding project of the new Open Source Geospatial Foundation. In support of the movement towards consolidation in the open source geospatial software world, GRASS is tightly integrated with the latest GDAL/OGR libraries. This enables access to an extensive range of raster and vector formats, including OGC-conformal Simple Features. GRASS also makes use of the highly regarded PROJ.4 software library with support for most known map projections and the easy definition of new and rare map projections via custom parameterization. Strong links are maintained with the QuantumGIS and R Statistics projects with integrated GRASS toolkits available for both.

Software download at https://grass.osgeo.org/download/ and numerous
mirror sites.

Full story at https://grass.osgeo.org/announces/announce_grass623.html

The new edition of Open Source GIS: A GRASS GIS Approach is now available! With this third edition, we enter the new era of GRASS 6, the first release that includes substantial new code developed by the International GRASS Development Team. The dramatic growth in open source software libraries has made GRASS 6 development more efficient, and has enhanced GRASS interoperability with a wide range of open source and proprietary geospatial tools. The book is based on GRASS 6.3.

Thoroughly updated with material related to GRASS6, the third edition includes new sections on attribute database management and SQL support, vector networks analysis, lidar data processing and new graphical user interfaces. All chapters are updated with numerous practical examples using the first release of a comprehensive, state-of-the-art geospatial data set. This new OSGeo Educational data set along with additional material can be downloaded from https://www.grassbook.org/

A MS-Windows native binary of the current 6.3.0 Release Candidate 1 is now available at:
https://geog-pc40.ulb.ac.be/grass/wingrass/wingrass63RC1.zip
Read the README for installation, known issues and other information. This version no longer requires a n installation of Cygwin.

The QGIS team announces the release of QGIS 0.9.0 (the package for MS-Windows contains GRASS 6.3.0RC1). The QGIS team offers also packages for Linux and MacOSX.

Also getting mad to connect your vector points to the graph? Stuff like “node is unreachable”? Same thing here, till last week. In GRASS 6.3 (still CVS), the v.net tool was extended to easily connect nodes to a graph. Just specify the vector network map and the nodes map along with a distance threshold to only connect in a reasonable way.

Screenshot examples for GRASS vector networking can be seen here: travelling salesman analysis, Steiner Trees, and Iso-distances. Enjoy.

The GRASS-News editors and OSGeo Promotion and Visibility Committee announce the first combined GRASS-News / OSGeo-News volume. You can find the full pdf (5.3 MB) as well as PDFs of individual articles on the GRASS webpage:
https://grass.itc.it/newsletter/index.php
or directly via:
https://www.osgeo.org/content/news/news_archive/GRASS_OSGeo_News_vol4.pdf

A first edition of OSGeo-News will be published in 2007 with interesting articles covering various topics of OpenSource projects. Please visit https://www.osgeo.org in the next couple of weeks to find more detailed information how to submit articles.