Flux can import archive files containing documents and their associated metadata. This includes archives written via the export-archive-files
command as well as archives written by MarkLogic Content Pump, which are hereafter referred to as “MLCP archives”.
Table of contents
- Usage
- Importing MLCP archives
- Restricting metadata
- Specifying an encoding
- Importing large binary files in archives
- Common errors
Usage
The import-archive-files
command will import the documents and metadata files in a ZIP file produced by the export-archive-files
command. You must specify at least one --path
option along with connection information for the MarkLogic database you wish to write to:
-
./bin/flux import-archive-files \ --path /path/to/files \ --connection-string "flux-example-user:password@localhost:8004" \ --permissions flux-example-role,read,flux-example-role,update
-
bin\flux import-archive-files ^ --path path\to\files ^ --connection-string "flux-example-user:password@localhost:8004" ^ --permissions flux-example-role,read,flux-example-role,update
Importing MLCP archives
You can also import MLCP archives that were produced via the EXPORT
command in MLCP. The import-mlcp-archive-files
command is used instead, and it also requires at least one --path
option along with connection information for the MarkLogic database you wish to write to:
-
./bin/flux import-mlcp-archive-files \ --path /path/to/files \ --connection-string "flux-example-user:password@localhost:8004" \ --permissions flux-example-role,read,flux-example-role,update
-
bin\flux import-mlcp-archive-files ^ --path /path/to/files ^ --connection-string "flux-example-user:password@localhost:8004" ^ --permissions flux-example-role,read,flux-example-role,update
Restricting metadata
By default, all metadata associated with a document in an archive will be included when the document is written to MarkLogic. This is true for both the import-archive-files
command and the import-mlcp-archive-files
command. This is typically desirable so that metadata like collections and permissions in the archive can be applied to the imported documents.
You can instead restrict which types of metadata are included via the --categories
option. This option accepts a comma-delimited sequence of the following metadata types:
collections
permissions
quality
properties
metadatavalues
For example, the following option will only include the collections and properties found in each metadata entry in an archive ZIP file or MLCP archive ZIP file:
--categories collections,properties
Specifying an encoding
MarkLogic stores all content in the UTF-8 encoding. If your archive files use a different encoding, you must specify that via the --encoding
option so that the content can be correctly translated to UTF-8 when written to MarkLogic - e.g.:
-
./bin/flux import-archive-files \ --path source \ --encoding ISO-8859-1 \ --connection-string "flux-example-user:password@localhost:8004" \ --permissions flux-example-role,read,flux-example-role,update
-
bin\flux import-archive-files ^ --path source ^ --encoding ISO-8859-1 ^ --connection-string "flux-example-user:password@localhost:8004" ^ --permissions flux-example-role,read,flux-example-role,update
Importing large binary files in archives
When exporting archives, you can use the --streaming
option introduced in Flux 1.1.0 to ensure that large binary documents in MarkLogic can be streamed to an archive file. When importing archives with large binary files, you should likewise use the --streaming
option to ensure that each large binary can be read into MarkLogic without exhausting the memory available to Flux or MarkLogic.
As streaming each entry requires Flux to only send one document at a time to MarkLogic, you should not use this option when importing smaller files that easily fit into the memory available to Flux.
When using --streaming
, the following options will have no effect due to Flux not reading the file contents into memory and always sending one file per request to MarkLogic:
--batch-size
--encoding
--failed-documents-path
--uri-template
You typically will also not want to use the --transform
option as applying a REST transform in MarkLogic to a very large binary document may exhaust the amount of memory available to MarkLogic.
In addition, when streaming documents to MarkLogic, URIs will be encoded. For example, an entry named /my file.json
will result in a URI of /my%20file.json
. This is due to an issue in the MarkLogic REST API endpoint that will be resolved in a future server release.
Common errors
If you use Flux 1.0.x to import an archive created by Flux 1.1.x or higher, you may receive an error containing the following message:
com.marklogic.spark: Could not find metadata entry for entry
To solve this, you should use Flux 1.1.0 or higher to import the archive. Flux 1.1.0 and higher can also import archives created by Flux 1.0.x.