Flux supports importing data from any database that offers a JDBC driver. You can select rows from a table or via a SQL query, and each row will be written as a new document in MarkLogic. Additionally, you can aggregate related rows together to form hierarchical documents that are written to MarkLogic.

Table of contents

JDBC driver installation

To import data from a database, you must obtain the database’s JDBC driver JAR file and add it to the ./ext directory location in the Flux installation directory. Any JAR file placed in the ./ext directory is added to the classpath of Flux.

Configuring a JDBC connection

The import-jdbc command requires that you specify connection details for the database you wish to read from via JDBC. Connection details are specified via the following options:

  • --jdbc-url is required and specifies the JDBC connection URL.
  • --jdbc-driver is required specifies the main class name of the JDBC driver.
  • --jdbc-user specifies an optional user to authenticate as (this may already be specified via --jdbc-url).
  • --jdbc-password specifies an optional password to authenticate with (this may already be specified via --jdbc-url).

Importing data

To import all rows in a table, use the --query option with a SQL query selecting all rows:

  • ./bin/flux import-jdbc \
        --query "SELECT * FROM customer" \
        --jdbc-url "..." \
        --jdbc-driver "..." \
        --connection-string "flux-example-user:password@localhost:8004" \
        --permissions flux-example-role,read,flux-example-role,update
    
  • bin\flux import-jdbc ^
        --query "SELECT * FROM customer" ^
        --jdbc-url "..." ^
        --jdbc-driver "..." ^
        --connection-string "flux-example-user:password@localhost:8004" ^
        --permissions flux-example-role,read,flux-example-role,update
    

The SQL query can contain any syntax supported by your database.

Specifying a JSON root name

By default, each column a row will become a top-level field in the JSON document written to MarkLogic. It is often useful to have a single “root” field in a JSON document so that it is more self-describing. It can help with indexing purposes in MarkLogic as well. To include a JSON root field, use the --json-root-name option with a value for the name of the root field. The data read from a row will then be nested under this root field.

Creating XML documents

To create an XML document for each row instead of a JSON document, include the --xml-root-name option to specify the name of the root element in each XML document. You can optionally include --xml-namespace to specify a namespace for the root element that will then be inherited by every child element as well.

Ignoring null fields

By default, Flux will include any fields in a data source that have a null value when creating JSON or XML documents. You can instead ignore fields with a null value via the --ignore-null-fields option:

  • ./bin/flux import-jdbc \
        --ignore-null-fields \
        --query "..." \ 
        etc...
    
  • bin\flux import-jdbc ^
        --ignore-null-fields ^
        --query "..." ^
        etc...
    

The decision on whether to include null fields will depend on your application requirements. For example, if your documents have large numbers of null fields, you may find them to be noise and decide to ignore them. In another case, it may be important to query for documents that have a particular field with a value of null.

Aggregating rows

The import-jdbc command supports aggregating related rows together to produce hierarchical documents. See Aggregating rows for more information.

Advanced options

The import-jdbc command reuses Spark’s support for reading via a JDBC driver. You can include any of the Spark JDBC options via the -P option to control how JDBC is used. These options are expressed as -PoptionName=optionValue.