Flux supports importing data from any database that offers a JDBC driver. You can select rows from a table or via a SQL query, and each row will be written as a new document in MarkLogic. Additionally, you can aggregate related rows together to form hierarchical documents that are written to MarkLogic.
Table of contents
- JDBC driver installation
- Configuring a JDBC connection
- Importing data
- Specifying a JSON root name
- Creating XML documents
- Ignoring null fields
- Aggregating rows
- Generating a TDE template
- Advanced options
JDBC driver installation
To import data from a database, you must obtain the database’s JDBC driver JAR file and add it to the ./ext
directory location in the Flux installation directory. Any JAR file placed in the ./ext
directory is added to the classpath of Flux.
Configuring a JDBC connection
The import-jdbc
command requires that you specify connection details for the database you wish to read from via JDBC. Connection details are specified via the following options:
--jdbc-url
is required and specifies the JDBC connection URL.--jdbc-driver
is required specifies the main class name of the JDBC driver.--jdbc-user
specifies an optional user to authenticate as (this may already be specified via--jdbc-url
).--jdbc-password
specifies an optional password to authenticate with (this may already be specified via--jdbc-url
).
Importing data
To select data to import, use either the --query
option with a SQL query, or the --table
option with a table name. Support for the --table
option was added in Flux 1.4.0.
The first example below shows usage of the --query
option, which can be any valid SQL query that your database supports:
-
./bin/flux import-jdbc \ --query "SELECT * FROM customer" \ --jdbc-url "..." \ --jdbc-driver "..." \ --connection-string "flux-example-user:password@localhost:8004" \ --permissions flux-example-role,read,flux-example-role,update
-
bin\flux import-jdbc ^ --query "SELECT * FROM customer" ^ --jdbc-url "..." ^ --jdbc-driver "..." ^ --connection-string "flux-example-user:password@localhost:8004" ^ --permissions flux-example-role,read,flux-example-role,update
The next example shows usage of the --table
option, which will select all rows from the specified table:
-
./bin/flux import-jdbc \ --table "customer" \ --jdbc-url "..." \ --jdbc-driver "..." \ --connection-string "flux-example-user:password@localhost:8004" \ --permissions flux-example-role,read,flux-example-role,update
-
bin\flux import-jdbc ^ --table "customer" ^ --jdbc-url "..." ^ --jdbc-driver "..." ^ --connection-string "flux-example-user:password@localhost:8004" ^ --permissions flux-example-role,read,flux-example-role,update
The --table
option maps to the dbtable
option for the Spark JDBC data source. Per the Spark documentation, in addition to a table name, you can also specify “anything that is valid in the FROM
clause of a SQL query… such as a subquery in parentheses”.
Specifying a JSON root name
By default, each column a row will become a top-level field in the JSON document written to MarkLogic. It is often useful to have a single “root” field in a JSON document so that it is more self-describing. It can help with indexing purposes in MarkLogic as well. To include a JSON root field, use the --json-root-name
option with a value for the name of the root field. The data read from a row will then be nested under this root field.
Creating XML documents
To create an XML document for each row instead of a JSON document, include the --xml-root-name
option to specify the name of the root element in each XML document. You can optionally include --xml-namespace
to specify a namespace for the root element that will then be inherited by every child element as well.
Ignoring null fields
By default, Flux will include any fields in a data source that have a null value when creating JSON or XML documents. You can instead ignore fields with a null value via the --ignore-null-fields
option:
-
./bin/flux import-jdbc \ --ignore-null-fields \ --query "..." \ etc...
-
bin\flux import-jdbc ^ --ignore-null-fields ^ --query "..." ^ etc...
The decision on whether to include null fields will depend on your application requirements. For example, if your documents have large numbers of null fields, you may find them to be noise and decide to ignore them. In another case, it may be important to query for documents that have a particular field with a value of null.
Aggregating rows
The import-jdbc
command supports aggregating related rows together to produce hierarchical documents. See Aggregating rows for more information.
Generating a TDE template
The import-jdbc
command supports generating TDE templates to make imported data immediately available for relational queries. See TDE template generation for more information.
Advanced options
The import-jdbc
command reuses Spark’s support for reading via a JDBC driver. You can include any of the Spark JDBC options via the -P
option to control how JDBC is used. These options are expressed as -PoptionName=optionValue
.