TripleGeo is a utility developed by the Institute for the Management of Information Systems at Athena Research Center under the EU/FP7 project GeoKnow: Making the Web an Exploratory for Geospatial Knowledge. This generic purpose, open-source tool can be used for integrating features from geospatial databases into RDF triples.
TripleGeo is based on open-source utility geometry2rdf. TripleGeo is written in Java and is still under development; more enhancements will be included in future releases. However, all supported features have been tested and work smoothly in both MS Windows and Linux platforms.
The web site for TripleGeo provides more details about the project, its architecture, usage tips, and foreseen extensions.
1.a Build from source
- Build (with ant):
- Package as a jar (with ant):
If build finishes successfully, generated JARs will be placed under
1.b Use prepackaged JARsIn order to use TripleGeo for extracting triples from a spatial dataset, the user should follow these steps (in a Windows platform, but these are similar in Linux as well):
- Download the current software bundle from https://github.com/GeoKnow/TripleGeo/archive/master.zip
Extract the downloaded .zip file into a separate folder, e.g.,
Open a terminal window (in DOS or in Linux) and navigate to the directory where TripleGeo has been extracted, e.g.,
cd c:\temp\TripleGeo-master. This directory must be the one that holds the LICENSE file. For convenience, this is where you can place your configuration file (e.g., options.conf), although you can specify another path for your configuration if you like.
- Normally, under this same folder there must be a lib/ subdirectory with the required libraries. Make sure that the actual TripleGeo.jar is under the bin/ subdirectory.
- Verify that Java JRE (or SDK) ver 1.7 or later is installed. Currently installed version of Java can be checked using
java –versionfrom the command line.
- Next, specify all properties in the required configuration file, e.g., options.conf. You must specify correct paths to files (i.e., in[parameters inputFile, outputFile, and tmpDir), which are RELATIVE to the executable.
- In case that triples will be extracted from ESRI shapefiles, give the following command (in one line):
java -cp lib/*;bin/TripleGeo.jar eu.geoknow.athenarc.triplegeo.ShpToRdf options.conf
Make sure that the specified paths to .jar files are correct. You must modify these paths to the libraries and/or the configuration file, if you run this command from a path other than the one containing the LICENSE file, as specified in step (3).
- While conversion is running, it periodically issues notifications about its progress. Note that for large datasets (i.e., hundreds of thousands of records), conversion may take several minutes. As soon as processing is finished and all triples are written into a file, the user is notified about the total amount of extracted triples and the overall execution time.
2. Usage and examples
The current distribution comes with a dummy configuration file
options.conf. This file contains indicative values for the most important properties when accessing data from ESRI shapefiles or a spatial DBMS. Self-contained brief instructions can guide you into the extraction process.
Run the jar file from the command line in several alternative modes, depending on the input data source (of course, you should change the directory separator to the one your OS understands, e.g. ":" in the case of *nix systems):
In case that triples will be extracted from ESRI shapefiles, and assuming that binaries are bundled together in
triplegeo.jar, give a command like this:
java -cp "./lib/*;./build/jars/triplegeo.jar" eu.geoknow.athenarc.triplegeo.ShpToRdf options.conf
Alternatively, if triples will be extracted from a geospatially-enabled DBMS (e.g., Oracle Spatial), give a command like this:
java -cp "./lib/*;./build/jars/triplegeo.jar" eu.geoknow.athenarc.triplegeo.wkt.RdbToRdf options.conf
Wait until the process gets finished, and verify that the resulting output file is according to your specifications.The current distribution also offers transformations from other geographical formats, and it also supports GML datasets aligned to EU INSPIRE Directive. More specifically, TripleGeo can transform into RDF triples geometries available in GML (Geography Markup Language) and KML (Keyhole Markup Language). It can also handle INSPIRE-aligned GML data for seven Data Themes (Annex I). Assuming that binaries are bundled together in
triplegeo.jar, you may transform such datasets as follows:
- In case that triples will be extracted from a GML file, give a command like this:
java -cp "./lib/*;./build/jars/triplegeo.jar" eu.geoknow.athenarc.triplegeo.GmlToRdf
- In case that triples will be extracted from a KML file, give a command like this:
java -cp "./lib/*;./build/jars/triplegeo.jar" eu.geoknow.athenarc.triplegeo.KmlToRdf
- In case that triples will be extracted from an INSPIRE-aligned GML file, you must first configure XSL stylesheet
Inspire_main.xslwith specific parameters and then give a command like this:
java -cp "./lib/*;./build/jars/triplegeo.jar" eu.geoknow.athenarc.triplegeo.InspireToRdf
in the case of a shapefile input:
ant run-on-shp -Dconfig=options.conf
in the case of the relational database:
ant run-on-rdb -Dconfig=options.conf
in the case of a GML input:
ant run-on-gml -Dinput=sample.gml -Doutput=sample.rdf
in the case of a KML input:
ant run-on-kml -Dinput=sample.kml -Doutput=sample.rdf
in the case of an INSPIRE-aligned XML input:
ant run-on-inspire -Dinput=sample.xml -Doutput=sample.rdf
Indicative configuration files for several cases are available here in order to assist you when preparing your own.
The current version of TripleGeo utility can access geometries from:
- ESRI shapefiles, a widely used file-based format for storing geospatial features.
- Geographical data stored in GML (Geography Markup Language) and KML (Keyhole Markup Language).
- INSPIRE-aligned datasets for seven Data Themes (Annex I) in GML format: Addresses, Administrative Units, Cadastral Parcels, GeographicalNames, Hydrography, Protected Sites, and Transport Networks (Roads).
- Several geospatially-enabled DBMSs, including: Oracle Spatial, PostGIS, MySQL, and IBM DB2 with Spatial extender.
Sample geographic datasets for testing are available in ESRI shapefile format.
In terms of output serializations, triples can be obtained in one of the following formats: RDF/XML (default), RDF/XML-ABBREV, N-TRIPLES, N3, TURTLE (TTL).
Concerning geospatial representations, triples can be exported according to:
- the GeoSPARQL standard for several geometric types (including points, linestrings, and polygons)
- the WGS84 RDF Geoposition vocabulary for point features
- the Virtuoso RDF vocabulary for point features.
Resulting triples are written into a local file, so that they can be readily imported into a triple store.
The contents of this project are licensed under the GPL v3 License.