TripleGeo: An open-source tool for extracting geospatial features into RDF triples

TripleGeo is a utility developed by the Institute for the Management of Information Systems at Athena Research Center under the EU/FP7 project GeoKnow: Making the Web an Exploratory for Geospatial Knowledge. This generic purpose, open-source tool can be used for integrating features from geospatial databases into RDF triples.

TripleGeo is based on open-source utility geometry2rdf. TripleGeo is written in Java and is still under development; more enhancements will be included in future releases. However, all supported features have been tested and work smoothly in both MS Windows and Linux platforms.

The web site for TripleGeo provides more details about the project, its architecture, usage tips, and foreseen extensions.

Quick start

How to use TripleGeo: You have 2 options: either build from source (using Apache Ant) or use the prepackaged binaries (JARs) shipped with this code.

1.a Build from source

  • Build (with ant):
    mkdir build
    ant compile
  • Package as a jar (with ant):
    ant package
    If build finishes successfully, generated JARs will be placed under build/jars.

1.b Use prepackaged JARs

In order to use TripleGeo for extracting triples from a spatial dataset, the user should follow these steps (in a Windows platform, but these are similar in Linux as well):
  • Download the current software bundle from https://github.com/GeoKnow/TripleGeo/archive/master.zip
  • Extract the downloaded .zip file into a separate folder, e.g., c:\temp.
  • Open a terminal window (in DOS or in Linux) and navigate to the directory where TripleGeo has been extracted, e.g., cd c:\temp\TripleGeo-master. This directory must be the one that holds the LICENSE file. For convenience, this is where you can place your configuration file (e.g., options.conf), although you can specify another path for your configuration if you like.
  • Normally, under this same folder there must be a lib/ subdirectory with the required libraries. Make sure that the actual TripleGeo.jar is under the bin/ subdirectory.
  • Verify that Java JRE (or SDK) ver 1.7 or later is installed. Currently installed version of Java can be checked using java –version from the command line.
  • Next, specify all properties in the required configuration file, e.g., options.conf. You must specify correct paths to files (i.e., in[parameters inputFile, outputFile, and tmpDir), which are RELATIVE to the executable.
  • In case that triples will be extracted from ESRI shapefiles, give the following command (in one line):
    java -cp lib/*;bin/TripleGeo.jar eu.geoknow.athenarc.triplegeo.ShpToRdf options.conf
    Make sure that the specified paths to .jar files are correct. You must modify these paths to the libraries and/or the configuration file, if you run this command from a path other than the one containing the LICENSE file, as specified in step (3).
  • While conversion is running, it periodically issues notifications about its progress. Note that for large datasets (i.e., hundreds of thousands of records), conversion may take several minutes. As soon as processing is finished and all triples are written into a file, the user is notified about the total amount of extracted triples and the overall execution time.

2. Usage and examples

The current distribution comes with a dummy configuration file options.conf. This file contains indicative values for the most important properties when accessing data from ESRI shapefiles or a spatial DBMS. Self-contained brief instructions can guide you into the extraction process.

Run the jar file from the command line in several alternative modes, depending on the input data source (of course, you should change the directory separator to the one your OS understands, e.g. ":" in the case of *nix systems):

In case that triples will be extracted from ESRI shapefiles, and assuming that binaries are bundled together in triplegeo.jar, give a command like this:
java -cp "./lib/*;./build/jars/triplegeo.jar" eu.geoknow.athenarc.triplegeo.ShpToRdf options.conf

Alternatively, if triples will be extracted from a geospatially-enabled DBMS (e.g., Oracle Spatial), give a command like this:
java -cp "./lib/*;./build/jars/triplegeo.jar" eu.geoknow.athenarc.triplegeo.wkt.RdbToRdf options.conf

Wait until the process gets finished, and verify that the resulting output file is according to your specifications.

The current distribution also offers transformations from other geographical formats, and it also supports GML datasets aligned to EU INSPIRE Directive. More specifically, TripleGeo can transform into RDF triples geometries available in GML (Geography Markup Language) and KML (Keyhole Markup Language). It can also handle INSPIRE-aligned GML data for seven Data Themes (Annex I). Assuming that binaries are bundled together in triplegeo.jar, you may transform such datasets as follows:
  • In case that triples will be extracted from a GML file, give a command like this:
    java -cp "./lib/*;./build/jars/triplegeo.jar" eu.geoknow.athenarc.triplegeo.GmlToRdf
  • In case that triples will be extracted from a KML file, give a command like this:
    java -cp "./lib/*;./build/jars/triplegeo.jar" eu.geoknow.athenarc.triplegeo.KmlToRdf
  • In case that triples will be extracted from an INSPIRE-aligned GML file, you must first configure XSL stylesheet Inspire_main.xsl with specific parameters and then give a command like this:
    java -cp "./lib/*;./build/jars/triplegeo.jar" eu.geoknow.athenarc.triplegeo.InspireToRdf
An alternative way to run the TripleGeo utility (the jar file) is provided via ant targets:
in the case of a shapefile input:
ant run-on-shp -Dconfig=options.conf
in the case of the relational database:
ant run-on-rdb -Dconfig=options.conf
in the case of a GML input:
ant run-on-gml -Dinput=sample.gml -Doutput=sample.rdf
in the case of a KML input:
ant run-on-kml -Dinput=sample.kml -Doutput=sample.rdf
in the case of an INSPIRE-aligned XML input:
ant run-on-inspire -Dinput=sample.xml -Doutput=sample.rdf

Indicative configuration files for several cases are available here in order to assist you when preparing your own.

Input

The current version of TripleGeo utility can access geometries from:

  • ESRI shapefiles, a widely used file-based format for storing geospatial features.
  • Geographical data stored in GML (Geography Markup Language) and KML (Keyhole Markup Language).
  • INSPIRE-aligned datasets for seven Data Themes (Annex I) in GML format: Addresses, Administrative Units, Cadastral Parcels, GeographicalNames, Hydrography, Protected Sites, and Transport Networks (Roads).
  • Several geospatially-enabled DBMSs, including: Oracle Spatial, PostGIS, MySQL, and IBM DB2 with Spatial extender.

Sample geographic datasets for testing are available in ESRI shapefile format.

Output

In terms of output serializations, triples can be obtained in one of the following formats: RDF/XML (default), RDF/XML-ABBREV, N-TRIPLES, N3, TURTLE (TTL).

Concerning geospatial representations, triples can be exported according to:

Resulting triples are written into a local file, so that they can be readily imported into a triple store.

License

The contents of this project are licensed under the GPL v3 License.