Fedora 4 Import/Export Utility

Build Status LICENSE

Work in progress

Open issues can be found here.

Requirements:

Additional requirements for building:

Building

mvn clean install

Modes of execution

The standalone import/export utility can be run in either of two ways:

  1. By passing in individual command-line arguments to the executable jar file
  2. By passing in a single configuration file that contains the standard command-line arguments

The first time you run the utility with command-line arguments, a configuration file containing your provided arguments will be written to a file, the location of which will be displayed at the command line.

$ java -jar fcrepo-import-export.jar --mode export --resource http://localhost:8080/rest --dir /tmp/test --binaries
INFO 15:15:10.048 (ArgParser) Saved configuration to: /tmp/importexport.config
INFO 15:15:10.091 (Exporter) Running exporter...

Running the import/export utility with command-line arguments

$ java -jar target/fcrepo-import-export-<version>.jar --mode [import|export] [options]

To change the import-export logging level (default is INFO), set the fcrepo.log.importexport system property when running the command, e.g.: (Note, available logging levels are: TRACE, DEBUG, INFO, WARN, and ERROR.)

$ java -Dfcrepo.log.importexport=WARN -jar target/fcrepo-import-export-<version>.jar --mode [import|export] [options]

To control the format of the exported RDF, the RDF language/serialization format can also be specified by adding, e.g.:

--rdfLang application/ld+json

The list of RDF languages supported:

For example, to export all of the resources from a Fedora repository at http://localhost:8080/rest/, and put binaries and rdf in /tmp/test:

java -jar fcrepo-import-export.jar --mode export --resource http://localhost:8080/rest/ --dir /tmp/test --binaries

To then load that data into an empty Fedora repository at the same URL, run the same command, but using --mode import instead of --mode export.

To enable the audit log, use the -a or --auditLog:

java -jar fcrepo-import-export.jar --mode export --resource http://localhost:8080/rest/ --dir /tmp/test --binaries --auditLog

You can also set the audit log directory with -Dfcrepo.log.importexport.logdir=/some/directory.

To export using a predicate other than ldp:contains, use the -p or --predicates option with a coma-separated list of predicates:

java -jar fcrepo-import-export.jar --mode export --resource http://localhost:8080/rest/ --dir /tmp/test --binaries --predicate http://pcdm.org/models#hasMember,http://www.w3.org/ns/ldp#contains

To map URIs when importing into a Fedora repository running at a different URI, use the -M or --map option to translate the URIs. For example, if you exported from http://localhost:8984/rest/dev/ and are importing into http://example.org:8080/fedora/rest/:

java -jar fcrepo-import-export.jar --mode import --resource http://example.org:8080/fedora/rest/ --dir /tmp/test --binaries --map http://localhost:8984/rest/dev/,http://example.org:8080/fedora/rest/

To retrieve inbound references (for example, when exporting a collection and you also want to export the members that link to the collection), use the -i or --inbound option. When enabled, resources that are linked to or from with the specified predicates will be exported.

To retrieve external content binaries (binaries on other systems linked to with the message/external-body content type), use the -x or --external option. When enabled, the external binaries will be retrieved and included in the export. When disabled, they will not be retrieved, and only pointers to them will be exported.

If running against a version of fedora in which fedora:lastModified, fedora:lastModifiedBy, fedora:created and fedora:createdBy cannot be set, run the import in legacy mode. WARNING: the imported resources will have different values for these fields than the original resources!

java -jar fcrepo-import-export.jar --mode import --resource http://example.org:8080/fedora/rest/ --dir /tmp/test --binaries --legacyMode

Running the import/export utility with BagIt support

The import-export-utility supports import and export of BagIt bags and has BagIt specific command line arguments in order to support a number of use cases. In order to provide additional support for custom metadata, bag profiles, and serialization, the bagit-support library is used for bagging operations.

BagIt Profile

BagIt Profiles allow creators and consumers of Bags to agree on optional components of the Bags they are exchanging. Each profile is defined using a json file which outlines the constraints according to the BagIt Profiles specification.

To enable a bag profile, use the -g or --bag-profile option. The import/export utility currently supports the following bag profiles:

BagIt Metadata

User supplied metadata for tag files can be provided with a Yaml file specified by the -G or --bag-config option.

The configuration file specified must have a top-level key matching the name of the metadata file with sub keys for each field you wish to manually supply. For example, setting metadata elements in the bag-info.txt:

bag-info.txt:
  Source-Organization: org.fcrepo
  Organization-Address: https://github.com/fcrepo4-labs/fcrepo-import-export

Note: The import-export-utility will generate values for the Bagging-Date, Payload-Oxum, Bag-Size, and BagIt-Profile-Identifier fields as part of the export process.

Profile Requirements

Depending on the BagIt Profile used, certain fields are required:

Serialization

The import-export-utility supports serialization as part of import and export. For both import and export the format used for serialization MUST be in a bag profile's Accepted-Serialization. If not, the process will fail with a list of accepted formats.

Import

During import if the import-export-utility detects that a bag is a regular file it will attempt to deserialize the bag based on the content type of the file.

Export

For export, if a bag profile allows serialization the format can be specified with -s or --bag-serialization along with the desired format. Currently, the following formats are supported:

Profile Requirements

Similar to the Bag Metadata, each BagIt Profile specifies if it allows serialization and what type of formats are accepted:

Bag Profile Serialization Supported Formats
default Optional tar
aptrust Optional tar
beyondtherepository Optional tar, zip, gzip
metaarchive Optional tar
perseids Required tar, zip, gzip

BagIt Examples

Note: All examples use a Fedora repository at http://localhost:8080/rest/

Export using the default bag profile as a tarball with user supplied metadata

Create bagit-config.yml with a bag-info.txt section for metadata:

bag-info.txt:
  Source-Organization: York University Libraries
  Organization-Address: 4700 Keele Street Toronto, Ontario M3J 1P3 Canada
  Contact-Name: Nick Ruest
  Contact-Phone: +14167362100
  Contact-Email: [email protected]
  External-Description: Sample bag exported from fcrepo
  External-Identifier: SAMPLE_001
  Bag-Group-Identifier: SAMPLE
  Internal-Sender-Identifier: SAMPLE_001
  Internal-Sender-Description: Sample bag exported from fcrepo

Execute the import-export-utility:

java -jar fcrepo-import-export.jar --mode export --resource http://localhost:8080/rest --dir /tmp/example_bag --binaries --bag-profile default --bag-serialization tar --bag-config /tmp/bagit-config.yml

Export using the APTrust profile with user supplied metadata

Create bagit-config-aptrust.yml with bag-info.txt and aptrust-info.txt sections:

bag-info.txt:
  Source-Organization: York University Libraries
  Organization-Address: 4700 Keele Street Toronto, Ontario M3J 1P3 Canada
  Contact-Name: Nick Ruest
  Contact-Phone: +14167362100
  Contact-Email: [email protected]
  External-Description: Sample bag exported from fcrepo
  External-Identifier: SAMPLE_001
  Bag-Group-Identifier: SAMPLE
  Internal-Sender-Identifier: SAMPLE_001
  Internal-Sender-Description: Sample bag exported from fcrepo
aptrust-info.txt:
  Access: Restricted
  Title: Sample fcrepo bag
  Storage-Region: Standard

Execute the import-export-utility:

java -jar fcrepo-import-export.jar --mode export --resource http://localhost:8080/rest --dir /tmp/example_bag --binaries --bag-profile aptrust --bag-config /tmp/bagit-config.yml

Additional tag files can be created by adding top-level keys in the user supplied Yaml file like the aptrust-info.txt added in the bagit-config-aptrust.yml example.

Running the import/export utility with a configuration file

$ java -jar target/fcrepo-import-export-<version>.jar -c /path/to/config/file

The easiest way to see an example of the configuration file is to run the utility with command-line arguments and inspect the configuration file created.

That configuration file is Yaml and allows for the following options:

and will look something like the following:

mode: export
dir: /tmp/test
resource: http://localhost:8080/rest/1

Namespaces

Currently, if the first use of a particular namespace occurs in RDF that is POSTed or PUT to the repository, regardless of any specific prefix binding supplied in the submitted graph, Fedora will instead bind the new namespace to a system-generated prefix in the form "ns00x". While this behavior is not incorrect, it is inconvenient: prefixes generated during import will not match prefix bindings in the source repository. In order to avoid this behavior, follow the steps below:

Setting -Dfcrepo.modeshape.configuration=file:/path/to/repository.json does not work if deploying Fedora using the one-click jar.

Import/Export Format

Import/export data format

Import/Export Scenarios

Import/export scenarios

Maintainers