Getting Started

This guide describes how to get started with Flux with some examples demonstrating its functionality.

Setup
- Deploying the example application
Usage
Importing data
- Importing via JDBC
Exporting data
- Exporting to S3
- Exporting rows
Previewing commands
Reprocessing data
Copying data

Setup

You can download the latest release of the Flux application zip from the latest Flux release page. The Flux application zip is titled marklogic-flux-1.3.0.zip. You can extract this zip to any location on your filesystem that you prefer.

Deploying the example application

The examples in this guide, along with examples found throughout this documentation, depend on a small MarkLogic application that can be deployed to your own instance of MarkLogic server. The application can be downloaded from the latest Flux release page in a zip titled marklogic-flux-getting-started-1.3.0.zip. To use Flux with this example application, perform the following steps:

Extract the marklogic-flux-getting-started-1.3.0.zip file to any location on your local filesystem.
Run cd marklogic-flux-getting-started-1.3.0 to change to the directory created by extracting the ZIP file.
Create a file named gradle-local.properties and add mlPassword=your MarkLogic admin user password to it.
Examine the contents of the gradle.properties file to ensure that the value of mlHost points to your MarkLogic server and that the value of mlRestPort is a port available for a new MarkLogic app server to use.
Run ./gradlew -i mlDeploy to deploy the example application.

The example application consists of a REST API app server on port 8004 in your MarkLogic installation. The application also includes a “flux-example-user” MarkLogic user that has the necessary MarkLogic roles and privileges for running the examples in this guide. Finally, the application includes a MarkLogic TDE template that creates a view in MarkLogic for the purpose of demonstrating commands that utilize a MarkLogic Optic query.

It is recommended to extract the Flux application zip into the marklogic-flux-getting-started-1.3.0 directory so that you can easily execute the examples in this guide. After extracting the application zip, the directory should have a structure similar to this (not all files may be shown):

./marklogic-flux-getting-started-1.3.0
    build.gradle
    ./data
    ./marklogic-flux-1.3.0
    ./gradle
    gradle.properties
    gradlew
    gradlew.bat
    ./src  

Usage

You can run Flux without any options to see the list of available commands. If you are using Flux to run these examples, first change your current directory to where you extract Flux:

cd marklogic-flux-1.3.0

And then run the Flux executable without any options:

Unix
Windows

```
./bin/flux
```
```
bin\flux
```

As shown in the usage, every command is invoked by specifying its name and one or more options required to run the command. To see the usage for a particular command, such as import-files, run:

Unix
Windows

```
./bin/flux help import-files
```
```
bin\flux help import-files
```

Required options are marked with an asterisk - “*”. Additionally, every command requires that either --connection-string or --host and --port be specified so that Flux knows which MarkLogic cluster to connect to.

The --connection-string option provides a succinct way of defining the host, port, username, and password when the MarkLogic app server you connect to requires basic or digest authentication. Its value is of the form (user):(password)@(host):(port)/(optionalDatabaseName). For example:

Unix
Windows

./bin/flux import-files --connection-string "my-user:my-secret@localhost:8000" ...

bin\flux import-files --connection-string "my-user:my-secret@localhost:8000" ...

Options can also be read from a file; see the Common Options guide for more information.

Importing data

Flux allows for data to be imported from a variety of data sources, such as a local filesystem, S3, or any database accessible via a JDBC driver. The example project contains a gzipped CSV file generated via Mockaroo. Run the below command to load this file; the data loaded will then be used to demonstrate other Flux capabilities:

Unix
Windows

./bin/flux import-delimited-files \
    --path ../data/employees.csv.gz \
    --connection-string "flux-example-user:password@localhost:8004" \
    --permissions flux-example-role,read,flux-example-role,update \
    --collections employee \
    --uri-template "/employee/{id}.json"

bin\flux import-delimited-files ^
    --path ..\data\employees.csv.gz ^
    --connection-string "flux-example-user:password@localhost:8004" ^
    --permissions flux-example-role,read,flux-example-role,update ^
    --collections employee ^
    --uri-template "/employee/{id}.json"

By accessing your MarkLogic qconsole, you can see that the employee collection in the flux-example-content database now has 1000 JSON documents, one for each line in the gzipped CSV file. Each JSON document has a URI based on the value of the “id” row used to construct the document.

Importing via JDBC

The import-jdbc command in Flux supports reading rows from any database with a supported JDBC driver. Similar to other tools that support JDBC access, you must first add your database’s JDBC driver to the Flux classpath by adding the JDBC driver jar to the ./ext directory in the Flux installation.

The following shows a notional example of reading rows from a Postgres database (this example will not work as it requires a separate Postgres database; it is only included for reference):

Unix
Windows

./bin/flux import-jdbc \
    --jdbc-url "jdbc:postgresql://localhost/dvdrental?user=postgres&password=postgres" \
    --jdbc-driver "org.postgresql.Driver" \
    --query "select * from customer" \
    --connection-string "flux-example-user:password@localhost:8004" \
    --permissions flux-example-role,read,flux-example-role,update \
    --collections customer

bin\flux import-jdbc ^
    --jdbc-url "jdbc:postgresql://localhost/dvdrental?user=postgres&password=postgres" ^
    --jdbc-driver "org.postgresql.Driver" ^
    --query "select * from customer" ^
    --connection-string "flux-example-user:password@localhost:8004" ^
    --permissions flux-example-role,read,flux-example-role,update ^
    --collections customer

See the Import guide for further details, including how you can aggregate rows together via a SQL join, thus producing hierarchical documents with nested data structures.

Exporting data

Flux supports several commands for exporting data from MarkLogic, either as documents or rows, to a variety of destinations. Commands that export documents support a variety of queries, while commands that export rows use the MarkLogic Optic API to select rows. The following shows an example of exporting the 1000 employee documents to a single ZIP file:

Unix
Windows

./bin/flux export-files \
    --connection-string "flux-example-user:password@localhost:8004" \
    --collections employee \
    --path export \
    --compression zip \
    --zip-file-count 1

bin\flux export-files ^
    --connection-string "flux-example-user:password@localhost:8004" ^
    --collections employee ^
    --path export ^
    --compression zip ^
    --zip-file-count 1

The above command specifies a collection of documents to export. You can also use the --query option to specify a structured query, serialized CTS query, or complex query, either as JSON or XML. You can also use --string-query to leverage MarkLogic’s search grammar for selecting documents.

The following command shows a collection, a string query, and a structured query used together, resulting in 4 JSON documents being written to ./export/employee:

Unix
Windows

./bin/flux export-files \
    --connection-string "flux-example-user:password@localhost:8004" \
    --collections employee \
    --string-query Engineering \
    --query '{"query": {"value-query": {"json-property": "job_title", "text": "Junior Executive"}}}' \
    --path export \
    --pretty-print

bin\flux export-files ^
    --connection-string "flux-example-user:password@localhost:8004" ^
    --collections employee ^
    --string-query Engineering ^
    --query "{\"query\": {\"value-query\": {\"json-property\": \"job_title\", \"text\": \"Junior Executive\"}}}" ^
    --path export ^
    --pretty-print

See the Export guide for more information.

Exporting to S3

Flux allows for data to be exported to S3, with the same approach working for importing data as well. You can reference an S3 bucket path via the s3a:// prefix. The --s3-add-credentials option will then use the AWS SDK to access your AWS credentials; please see the AWS documentation for information on how to configure your credentials.

The following shows an example of exporting to S3 with a fictitious bucket name. You can use this with your own S3 bucket, ensuring that your AWS credentials give you access to writing to the bucket:

Unix
Windows

./bin/flux export-files \
    --connection-string "flux-example-user:password@localhost:8004" \
    --collections employee \
    --compression zip \
    --zip-file-count 1 \
    --path s3a://bucket-name-changeme/ \
    --s3-add-credentials

bin\flux export-files ^
    --connection-string "flux-example-user:password@localhost:8004" ^
    --collections employee ^
    --compression zip ^
    --zip-file-count 1 ^
    --path s3a://bucket-name-changeme/ ^
    --s3-add-credentials

Exporting rows

Flux allows for exporting rows from MarkLogic via an Optic query and writing the data to a variety of row-oriented destinations, such as Parquet files or an RDBMS. The following demonstrates writing rows to Parquet files:

Unix
Windows

./bin/flux export-parquet-files \
    --connection-string "flux-example-user:password@localhost:8004" \
    --path export/parquet \
    --query "op.fromView('example', 'employees', '')" 

bin\flux export-parquet-files ^
    --connection-string "flux-example-user:password@localhost:8004" ^
    --path export\parquet ^
    --query "op.fromView('example', 'employees', '')" 

You can also export rows via JDBC. Like the example above for importing via JDBC, this is a notional example only. Change the details in it to match your database and JDBC driver, ensuring that the JDBC driver jar is in the ./ext directory of your Flux installation:

Unix
Windows

./bin/flux export-jdbc \
    --connection-string "flux-example-user:password@localhost:8004" \
    --query "op.fromView('example', 'employees', '')" \
    --jdbc-url "jdbc:postgresql://localhost/postgres?user=postgres&password=postgres" \
    --jdbc-driver "org.postgresql.Driver" \
    --table employees \
    --mode overwrite

bin\flux export-jdbc ^
    --connection-string "flux-example-user:password@localhost:8004" ^
    --query "op.fromView('example', 'employees', '')" ^
    --jdbc-url "jdbc:postgresql://localhost/postgres?user=postgres&password=postgres" ^
    --jdbc-driver "org.postgresql.Driver" ^
    --table employees ^
    --mode overwrite

Previewing commands

For many data movement use cases, it can be helpful to see a preview of the data read from a particular source before any processing occurs to write that data to a destination. Flux supports this via a --preview option that accepts a number of records to read and display, but without writing the data anywhere. The following example shows how an export command can preview 10 rows read from MarkLogic without writing any data to files:

Unix
Windows

./bin/flux export-parquet-files \
    --connection-string "flux-example-user:password@localhost:8004" \
    --query "op.fromView('example', 'employees')" \
    --path export/parquet \
    --preview 10

bin\flux export-parquet-files ^
    --connection-string "flux-example-user:password@localhost:8004" ^
    --query "op.fromView('example', 'employees')" ^
    --path export\parquet ^
    --preview 10

See the Common Options guide for more information.

Reprocessing data

The reprocess command in Flux allows for custom code - either JavaScript or XQuery - to be executed for selecting and processing data in MarkLogic. The following shows an example of adding a new collection to each of the employee documents:

Unix
Windows

./bin/flux reprocess \
    --connection-string "flux-example-user:password@localhost:8004" \
    --read-javascript "cts.uris(null, null, cts.collectionQuery('employee'))" \
    --write-javascript "declareUpdate(); xdmp.documentAddCollections(URI, 'reprocessed')" 

bin\flux reprocess ^
    --connection-string "flux-example-user:password@localhost:8004" ^
    --read-javascript "cts.uris(null, null, cts.collectionQuery('employee'))" ^
    --write-javascript "declareUpdate(); xdmp.documentAddCollections(URI, 'reprocessed')" 

In qconsole, you can see that the 1000 employee documents are now also in the reprocessed collection. You can also use Flux and its --count option, which allows you to get a count of all the data read by a command without processing or writing any of the data:

Unix
Windows

./bin/flux export-files \
    --connection-string "flux-example-user:password@localhost:8004" \
    --path export \
    --collections reprocessed \
    --count 

bin\flux export-files ^
    --connection-string "flux-example-user:password@localhost:8004" ^
    --path export ^
    --collections reprocessed ^
    --count 

For more information, please see the Reprocessing guide.

Copying data

The copy command in Flux is similar to the commands for exporting data, but instead allows you to read documents from one MarkLogic database and write them to another MarkLogic. When copying, you may want to include different categories of metadata for each document - collections, permissions, quality, properties, and metadata values. This is accomplished via the --categories option, with the default value of content,metadata returning both the document and all of its metadata.

The following shows how to copy the 1000 employee documents to the out-of-the-box Documents database in your MarkLogic instance via the App-Services app server assumed to be listening on port 8000:

Unix
Windows

./bin/flux copy \
    --connection-string "flux-example-user:password@localhost:8004" \
    --collections employee \
    --output-connection-string "flux-example-user:password@localhost:8000"

bin\flux copy ^
    --connection-string "flux-example-user:password@localhost:8004" ^
    --collections employee ^
    --output-connection-string "flux-example-user:password@localhost:8000"

For more information, please see the Copying guide.

Table of contents