Flux can import any type of file as-is, with the contents of the file becoming a new document in MarkLogic. The term “generic files” is used in this context to refer to files that do not require any special processing other than potentially decompressing the files.
Table of contents
- Usage
- Controlling document URIs
- Specifying a document type
- Specifying an encoding
- Importing large binary files
- Importing gzip files
- Importing ZIP files
Usage
The import-files
command imports a set of files into MarkLogic, with each file being written as a separate document. You must specify at least one --path
option along with connection information for the MarkLogic database you wish to write to. For example:
-
./bin/flux import-files \ --path /path/to/files \ --connection-string "flux-example-user:password@localhost:8004" \ --permissions flux-example-role,read,flux-example-role,update
-
bin\flux import-files ^ --path path\to\files ^ --connection-string "flux-example-user:password@localhost:8004" ^ --permissions flux-example-role,read,flux-example-role,update
Controlling document URIs
Each document will have an initial URI based on the absolute path of the associated file. See common import features for details on adjusting this URI. In particular, the --uri-replace
option is often useful for removing most of the absolute path to produce a concise, self-describing URI.
Specifying a document type
The type of each document written to MarkLogic is determined by the file extension found in the URI along with the set of MIME types configured in MarkLogic. For unrecognized file extensions, or URIs that do not have a file extension, you can force a document type via the --document-type
option. The value of this option must be one of JSON
, XML
, or TEXT
.
Specifying an encoding
MarkLogic stores all content in the UTF-8 encoding. If your files use a different encoding, you must specify that via the --encoding
option so that the content can be correctly translated to UTF-8 when written to MarkLogic:
-
./bin/flux import-files \ --path source \ --encoding ISO-8859-1 \ --connection-string "flux-example-user:password@localhost:8004" \ --permissions flux-example-role,read,flux-example-role,update
-
bin\flux import-files ^ --path source ^ --encoding ISO-8859-1 ^ --connection-string "flux-example-user:password@localhost:8004" ^ --permissions flux-example-role,read,flux-example-role,update
Importing large binary files
Flux can leverage MarkLogic’s support for large binary documents by importing binary files of any size. To ensure that binary files of any size can be loaded, consider using the --streaming
option introduced in Flux 1.1.0. When this option is set, Flux will stream the contents of each file from its source directly into MarkLogic, thereby avoiding reading the contents of a file into memory.
As streaming a file requires Flux to only send one document at a time to MarkLogic, you should not use this option when importing smaller files that easily fit into the memory available to Flux.
When using --streaming
, the following options will have no effect due to Flux not reading the file contents into memory and always sending one file per request to MarkLogic:
--batch-size
--encoding
--failed-documents-path
--uri-template
You typically will also not want to use the --transform
option as applying a REST transform in MarkLogic to a very large binary document may exhaust the amount of memory available to MarkLogic.
In addition, when streaming documents to MarkLogic, URIs will be encoded. For example, a file named my file.json
will result in a URI of /my%20file.json
. This is due to an issue in the MarkLogic REST API endpoint that will be resolved in a future server release.
Importing gzip files
To import gzip files with each file being decompressed before written to MarkLogic, include the --compression
option with a value of GZIP
. You can also import gzip files as-is - i.e. without decompressing them - by not including the --compression
option. The --streaming
option introduced in Flux 1.1.0 can also be used for very large gzip files that may not fit into the memory available to Flux or to MarkLogic.
Importing ZIP files
To import each entry in a ZIP file as a separate document, include the --compression
option with a value of ZIP
. Each document will have an initial URI based on both the absolute path of the ZIP file and the name of the ZIP entry. You can also use the --document-type
option as described above to force a document type for any entry that has a file extension not recognized by MarkLogic. The --streaming
option introduced in Flux 1.1.0 can also be used for ZIP files containing very large binary files that may not fit into the memory available to Flux or to MarkLogic.