Flux can export semantic triples to a variety of RDF file formats, allowing you to easily exchange large numbers of triples and quads with other systems and users.

Table of contents

Usage

The export-rdf-files command requires a query for selecting documents to export and a directory path for writing RDF files to:

  • ./bin/flux export-rdf-files \
        --connection-string "flux-example-user:password@localhost:8004" \
        --collections example \
        --path destination
    
  • bin\flux export-rdf-files ^
        --connection-string "flux-example-user:password@localhost:8004" ^
        --collections example ^
        --path destination
    

Similar to exporting documents, the export-rdf-files command supports the following options for selecting the documents that contain the triples you wish to export:

Option Description
--collections Comma-delimited sequence of collection names.
--directory A database directory for constraining on URIs.
--graphs Comma-delimited sequence of MarkLogic graph names.
--options Name of a REST API search options document; typically used with a string query.
--query A structured, serialized CTS, or combined query expressed as JSON or XML.
--string-query A string query utilizing MarkLogic’s search grammar.
--uris Newline-delimited sequence of document URIs to retrieve.

You may specify any combination of these options, with the exception that --query will be ignored if --uris is specified.

Prior to Flux 1.2.0, Flux required at least one of --collections, --directory, --graphs, --query, --string-query, or --uris to be specified. Starting in Flux 1.2.0, if you do not specify any of those options, then Flux will select all documents that the configured MarkLogic user is able to read.

For each document matching the query specified by your options above, Flux will retrieve the triples from the document and write them to a file. You must specify a --path option for where files should be written. See Specifying a path for more information on paths.

By default, Flux will write files using the standard Turtle or TTL format. You can change this via the --format option, with the following choices supported:

Specifying the number of files to write

The --file-count option controls how many files are written by Flux. The default equals the number of partitions used for reading triples from MarkLogic, which is controlled by the --partitions-per-forest option, which has a default value of 4. For example, if the database you are querying has 3 forests, Flux will write 12 files by default.

Specifying a base IRI

With the --base-iri option, you can specify a base IRI to prepend to the graph associated with each triple when the graph is relative. If the graph for a triple is absolute, the base IRI will not be prepended.

Overriding the graph

For some use cases involving exporting triples with their graphs to files containing quads, it may not be desirable to reference the graph that each triple belongs to in MarkLogic. You can use --graph-override to specify an alternative graph value that will then be associated with every triple that Flux writes to a file.

gzip compression

To compress each file written by Flux using gzip, simply include --gzip as an option.

Exporting consistent results

By default, Flux uses MarkLogic’s support for point-in-time queries when querying for documents, thus ensuring a consistent snapshot of data. Point-in-time queries depend on the same MarkLogic system timestamp being used for each query. Because system timestamps can be deleted when MarkLogic merges data, you may encounter the following error that causes an export command to fail:

Server Message: XDMP-OLDSTAMP: Timestamp too old for forest

To resolve this issue, you must enable point-in-time queries for your database by configuring the merge timestamp setting. The recommended practice is to use a negative value that exceeds the expected duration of the export operation. For example, a value of -864,000,000,000 for the merge timestamp would give the export operation 24 hours to complete.

Alternatively, you can disable the use of point-in-time queries by including the following option:

--no-snapshot

The above option will not use a snapshot for queries but instead will query for data at multiple points in time. As noted above in the guide for consistent snapshots, you may get unpredictable results if your query matches on data that changes during the export operation. If your data is not changing, this approach is recommended as it avoids the need to configure merge timestamp.