This guide describes how to get started with Flux with some examples demonstrating its functionality.
Table of contents
Setup
You can download the latest release of the Flux application zip from the latest Flux release page. The Flux application zip is titled marklogic-flux-1.2.0.zip
. You can extract this zip to any location on your filesystem that you prefer.
Deploying the example application
The examples in this guide, along with examples found throughout this documentation, depend on a small MarkLogic application that can be deployed to your own instance of MarkLogic server. The application can be downloaded from the latest Flux release page in a zip titled marklogic-flux-getting-started-1.2.0.zip
. To use Flux with this example application, perform the following steps:
- Extract the
marklogic-flux-getting-started-1.2.0.zip
file to any location on your local filesystem. - Run
cd marklogic-flux-getting-started-1.2.0
to change to the directory created by extracting the ZIP file. - Create a file named
gradle-local.properties
and addmlPassword=your MarkLogic admin user password
to it. - Examine the contents of the
gradle.properties
file to ensure that the value ofmlHost
points to your MarkLogic server and that the value ofmlRestPort
is a port available for a new MarkLogic app server to use. - Run
./gradlew -i mlDeploy
to deploy the example application.
The example application consists of a REST API app server on port 8004 in your MarkLogic installation. The application also includes a “flux-example-user” MarkLogic user that has the necessary MarkLogic roles and privileges for running the examples in this guide. Finally, the application includes a MarkLogic TDE template that creates a view in MarkLogic for the purpose of demonstrating commands that utilize a MarkLogic Optic query.
It is recommended to extract the Flux application zip into the marklogic-flux-getting-started-1.2.0
directory so that you can easily execute the examples in this guide. After extracting the application zip, the directory should have a structure similar to this (not all files may be shown):
./marklogic-flux-getting-started-1.2.0
build.gradle
./data
./marklogic-flux-1.2.0
./gradle
gradle.properties
gradlew
gradlew.bat
./src
Usage
You can run Flux without any options to see the list of available commands. If you are using Flux to run these examples, first change your current directory to where you extract Flux:
cd marklogic-flux-1.2.0
And then run the Flux executable without any options:
-
./bin/flux
-
bin\flux
As shown in the usage, every command is invoked by specifying its name and one or more options required to run the command. To see the usage for a particular command, such as import-files
, run:
-
./bin/flux help import-files
-
bin\flux help import-files
Required options are marked with an asterisk - “*”. Additionally, every command requires that either --connection-string
or --host
and --port
be specified so that Flux knows which MarkLogic cluster to connect to.
The --connection-string
option provides a succinct way of defining the host, port, username, and password when the MarkLogic app server you connect to requires basic or digest authentication. Its value is of the form (user):(password)@(host):(port)/(optionalDatabaseName)
. For example:
-
./bin/flux import-files --connection-string "my-user:my-secret@localhost:8000" ...
-
bin\flux import-files --connection-string "my-user:my-secret@localhost:8000" ...
Options can also be read from a file; see the Common Options guide for more information.
Importing data
Flux allows for data to be imported from a variety of data sources, such as a local filesystem, S3, or any database accessible via a JDBC driver. The example project contains a gzipped CSV file generated via Mockaroo. Run the below command to load this file; the data loaded will then be used to demonstrate other Flux capabilities:
-
./bin/flux import-delimited-files \ --path ../data/employees.csv.gz \ --connection-string "flux-example-user:password@localhost:8004" \ --permissions flux-example-role,read,flux-example-role,update \ --collections employee \ --uri-template "/employee/{id}.json"
-
bin\flux import-delimited-files ^ --path ..\data\employees.csv.gz ^ --connection-string "flux-example-user:password@localhost:8004" ^ --permissions flux-example-role,read,flux-example-role,update ^ --collections employee ^ --uri-template "/employee/{id}.json"
By accessing your MarkLogic qconsole, you can see that the employee
collection in the flux-example-content
database now has 1000 JSON documents, one for each line in the gzipped CSV file. Each JSON document has a URI based on the value of the “id” row used to construct the document.
Importing via JDBC
The import-jdbc
command in Flux supports reading rows from any database with a supported JDBC driver. Similar to other tools that support JDBC access, you must first add your database’s JDBC driver to the Flux classpath by adding the JDBC driver jar to the ./ext
directory in the Flux installation.
The following shows a notional example of reading rows from a Postgres database (this example will not work as it requires a separate Postgres database; it is only included for reference):
-
./bin/flux import-jdbc \ --jdbc-url "jdbc:postgresql://localhost/dvdrental?user=postgres&password=postgres" \ --jdbc-driver "org.postgresql.Driver" \ --query "select * from customer" \ --connection-string "flux-example-user:password@localhost:8004" \ --permissions flux-example-role,read,flux-example-role,update \ --collections customer
-
bin\flux import-jdbc ^ --jdbc-url "jdbc:postgresql://localhost/dvdrental?user=postgres&password=postgres" ^ --jdbc-driver "org.postgresql.Driver" ^ --query "select * from customer" ^ --connection-string "flux-example-user:password@localhost:8004" ^ --permissions flux-example-role,read,flux-example-role,update ^ --collections customer
See the Import guide for further details, including how you can aggregate rows together via a SQL join, thus producing hierarchical documents with nested data structures.
Exporting data
Flux supports several commands for exporting data from MarkLogic, either as documents or rows, to a variety of destinations. Commands that export documents support a variety of queries, while commands that export rows use the MarkLogic Optic API to select rows. The following shows an example of exporting the 1000 employee documents to a single ZIP file:
-
./bin/flux export-files \ --connection-string "flux-example-user:password@localhost:8004" \ --collections employee \ --path export \ --compression zip \ --zip-file-count 1
-
bin\flux export-files ^ --connection-string "flux-example-user:password@localhost:8004" ^ --collections employee ^ --path export ^ --compression zip ^ --zip-file-count 1
The above command specifies a collection of documents to export. You can also use the --query
option to specify a structured query, serialized CTS query, or complex query, either as JSON or XML. You can also use --string-query
to leverage MarkLogic’s search grammar for selecting documents.
The following command shows a collection, a string query, and a structured query used together, resulting in 4 JSON documents being written to ./export/employee
:
-
./bin/flux export-files \ --connection-string "flux-example-user:password@localhost:8004" \ --collections employee \ --string-query Engineering \ --query '{"query": {"value-query": {"json-property": "job_title", "text": "Junior Executive"}}}' \ --path export \ --pretty-print
-
bin\flux export-files ^ --connection-string "flux-example-user:password@localhost:8004" ^ --collections employee ^ --string-query Engineering ^ --query "{\"query\": {\"value-query\": {\"json-property\": \"job_title\", \"text\": \"Junior Executive\"}}}" ^ --path export ^ --pretty-print
See the Export guide for more information.
Exporting to S3
Flux allows for data to be exported to S3, with the same approach working for importing data as well. You can reference an S3 bucket path via the s3a://
prefix. The --s3-add-credentials
option will then use the AWS SDK to access your AWS credentials; please see the AWS documentation for information on how to configure your credentials.
The following shows an example of exporting to S3 with a fictitious bucket name. You can use this with your own S3 bucket, ensuring that your AWS credentials give you access to writing to the bucket:
-
./bin/flux export-files \ --connection-string "flux-example-user:password@localhost:8004" \ --collections employee \ --compression zip \ --zip-file-count 1 \ --path s3a://bucket-name-changeme/ \ --s3-add-credentials
-
bin\flux export-files ^ --connection-string "flux-example-user:password@localhost:8004" ^ --collections employee ^ --compression zip ^ --zip-file-count 1 ^ --path s3a://bucket-name-changeme/ ^ --s3-add-credentials
Exporting rows
Flux allows for exporting rows from MarkLogic via an Optic query and writing the data to a variety of row-oriented destinations, such as Parquet files or an RDBMS. The following demonstrates writing rows to Parquet files:
-
./bin/flux export-parquet-files \ --connection-string "flux-example-user:password@localhost:8004" \ --path export/parquet \ --query "op.fromView('example', 'employees', '')"
-
bin\flux export-parquet-files ^ --connection-string "flux-example-user:password@localhost:8004" ^ --path export\parquet ^ --query "op.fromView('example', 'employees', '')"
You can also export rows via JDBC. Like the example above for importing via JDBC, this is a notional example only. Change the details in it to match your database and JDBC driver, ensuring that the JDBC driver jar is in the ./ext
directory of your Flux installation:
-
./bin/flux export-jdbc \ --connection-string "flux-example-user:password@localhost:8004" \ --query "op.fromView('example', 'employees', '')" \ --jdbc-url "jdbc:postgresql://localhost/postgres?user=postgres&password=postgres" \ --jdbc-driver "org.postgresql.Driver" \ --table employees \ --mode overwrite
-
bin\flux export-jdbc ^ --connection-string "flux-example-user:password@localhost:8004" ^ --query "op.fromView('example', 'employees', '')" ^ --jdbc-url "jdbc:postgresql://localhost/postgres?user=postgres&password=postgres" ^ --jdbc-driver "org.postgresql.Driver" ^ --table employees ^ --mode overwrite
Previewing commands
For many data movement use cases, it can be helpful to see a preview of the data read from a particular source before any processing occurs to write that data to a destination. Flux supports this via a --preview
option that accepts a number of records to read and display, but without writing the data anywhere. The following example shows how an export command can preview 10 rows read from MarkLogic without writing any data to files:
-
./bin/flux export-parquet-files \ --connection-string "flux-example-user:password@localhost:8004" \ --query "op.fromView('example', 'employees')" \ --path export/parquet \ --preview 10
-
bin\flux export-parquet-files ^ --connection-string "flux-example-user:password@localhost:8004" ^ --query "op.fromView('example', 'employees')" ^ --path export\parquet ^ --preview 10
See the Common Options guide for more information.
Reprocessing data
The reprocess
command in Flux allows for custom code - either JavaScript or XQuery - to be executed for selecting and processing data in MarkLogic. The following shows an example of adding a new collection to each of the employee documents:
-
./bin/flux reprocess \ --connection-string "flux-example-user:password@localhost:8004" \ --read-javascript "cts.uris(null, null, cts.collectionQuery('employee'))" \ --write-javascript "declareUpdate(); xdmp.documentAddCollections(URI, 'reprocessed')"
-
bin\flux reprocess ^ --connection-string "flux-example-user:password@localhost:8004" ^ --read-javascript "cts.uris(null, null, cts.collectionQuery('employee'))" ^ --write-javascript "declareUpdate(); xdmp.documentAddCollections(URI, 'reprocessed')"
In qconsole, you can see that the 1000 employee documents are now also in the reprocessed
collection. You can also use Flux and its --count
option, which allows you to get a count of all the data read by a command without processing or writing any of the data:
-
./bin/flux export-files \ --connection-string "flux-example-user:password@localhost:8004" \ --path export \ --collections reprocessed \ --count
-
bin\flux export-files ^ --connection-string "flux-example-user:password@localhost:8004" ^ --path export ^ --collections reprocessed ^ --count
For more information, please see the Reprocessing guide.
Copying data
The copy
command in Flux is similar to the commands for exporting data, but instead allows you to read documents from one MarkLogic database and write them to another MarkLogic. When copying, you may want to include different categories of metadata for each document - collections, permissions, quality, properties, and metadata values. This is accomplished via the --categories
option, with the default value of content,metadata
returning both the document and all of its metadata.
The following shows how to copy the 1000 employee documents to the out-of-the-box Documents database in your MarkLogic instance via the App-Services app server assumed to be listening on port 8000:
-
./bin/flux copy \ --connection-string "flux-example-user:password@localhost:8004" \ --collections employee \ --output-connection-string "flux-example-user:password@localhost:8000"
-
bin\flux copy ^ --connection-string "flux-example-user:password@localhost:8004" ^ --collections employee ^ --output-connection-string "flux-example-user:password@localhost:8000"
For more information, please see the Copying guide.