Flux can import any type of file as-is, with the contents of the file becoming a new document in MarkLogic. The term “generic files” is used in this context to refer to files that do not require any special processing other than potentially decompressing the files.

Table of contents

Usage

The import-files command imports a set of files into MarkLogic, with each file being written as a separate document. You must specify at least one --path option along with connection information for the MarkLogic database you wish to write to. For example:

  • ./bin/flux import-files \
        --path /path/to/files \
        --connection-string "flux-example-user:password@localhost:8004" \
        --permissions flux-example-role,read,flux-example-role,update
    
  • bin\flux import-files ^
        --path path\to\files ^
        --connection-string "flux-example-user:password@localhost:8004" ^
        --permissions flux-example-role,read,flux-example-role,update
    

Controlling document URIs

Each document will have an initial URI based on the absolute path of the associated file. See common import features for details on adjusting this URI. In particular, the --uri-replace option is often useful for removing most of the absolute path to produce a concise, self-describing URI.

Specifying a document type

The type of each document written to MarkLogic is determined by the file extension found in the URI along with the set of MIME types configured in MarkLogic. For unrecognized file extensions, or URIs that do not have a file extension, you can force a document type via the --document-type option. The value of this option must be one of JSON, XML, or TEXT.

Specifying an encoding

MarkLogic stores all content in the UTF-8 encoding. If your files use a different encoding, you must specify that via the --encoding option so that the content can be correctly translated to UTF-8 when written to MarkLogic:

  • ./bin/flux import-files \
        --path source \ 
        --encoding ISO-8859-1 \
        --connection-string "flux-example-user:password@localhost:8004" \
        --permissions flux-example-role,read,flux-example-role,update
    
  • bin\flux import-files ^
        --path source ^
        --encoding ISO-8859-1 ^
        --connection-string "flux-example-user:password@localhost:8004" ^
        --permissions flux-example-role,read,flux-example-role,update
    

Importing large binary files

Flux can leverage MarkLogic’s support for large binary documents by importing binary files of any size. To ensure that binary files of any size can be loaded, consider using the --streaming option introduced in Flux 1.1.0. When this option is set, Flux will stream the contents of each file from its source directly into MarkLogic, thereby avoiding reading the contents of a file into memory.

As streaming a file requires Flux to only send one document at a time to MarkLogic, you should not use this option when importing smaller files that easily fit into the memory available to Flux.

When using --streaming, the following options will have no effect due to Flux not reading the file contents into memory and always sending one file per request to MarkLogic:

  • --batch-size
  • --encoding
  • --failed-documents-path
  • --uri-template

You typically will also not want to use the --transform option as applying a REST transform in MarkLogic to a very large binary document may exhaust the amount of memory available to MarkLogic.

In addition, when streaming documents to MarkLogic, URIs will be encoded. For example, a file named my file.json will result in a URI of /my%20file.json. This is due to an issue in the MarkLogic REST API endpoint that will be resolved in a future server release.

Importing gzip files

To import gzip files with each file being decompressed before written to MarkLogic, include the --compression option with a value of GZIP. You can also import gzip files as-is - i.e. without decompressing them - by not including the --compression option. The --streaming option introduced in Flux 1.1.0 can also be used for very large gzip files that may not fit into the memory available to Flux or to MarkLogic.

Importing ZIP files

To import each entry in a ZIP file as a separate document, include the --compression option with a value of ZIP. Each document will have an initial URI based on both the absolute path of the ZIP file and the name of the ZIP entry. You can also use the --document-type option as described above to force a document type for any entry that has a file extension not recognized by MarkLogic. The --streaming option introduced in Flux 1.1.0 can also be used for ZIP files containing very large binary files that may not fit into the memory available to Flux or to MarkLogic.