The custom-export-rows and custom-export-documents commands allow you to read rows and documents respectively from MarkLogic and write the results to a custom target.

Table of contents

Usage

With the required --target option, you can specify any Spark data source or the name of a third-party Spark connector. For a third-party Spark connector, you must include the necessary JAR files for the connector in the ./ext directory of your Flux installation. Note that if the connector is not available as a single “uber” jar, you will need to ensure that the connector and all of its dependencies are added to the ./ext directory.

As an example, Flux does not provide an out-of-the-box command that uses the Spark Text data source. You can use this data source via custom-export-rows:

  • ./bin/flux custom-export-rows \
        --connection-string "flux-example-user:password@localhost:8004" \
        --query "op.fromView('schema', 'view')" \
        --target text \
        -Ppath=export
    
  • bin\flux custom-export-rows ^
        --connection-string "flux-example-user:password@localhost:8004" ^
        --query "op.fromView('schema', 'view')" ^
        --target text ^
        -Ppath=export
    

Exporting rows

When using custom-export-rows with an Optic query to select rows from MarkLogic, each row sent to the connector or data source defined by --target will have a schema based on the output of the Optic query. You may find the --preview and --preview-schema options helpful in understanding what data will be in these rows. See Common Options for more information.

Exporting documents

When using custom-export-documents, each document returned by MarkLogic will be represented as a Spark row with the following column definitions:

  1. URI containing a string.
  2. content containing a byte array.
  3. format containing a string.
  4. collections containing an array of strings.
  5. permissions containing a map of strings and arrays of strings representing roles and permissions.
  6. quality containing an integer.
  7. properties containing an XML document serialized to a string.
  8. metadataValues containing a map of string keys and string values.

These are normal Spark rows that can be written via Spark data sources like Parquet and ORC. If using a third-party Spark connector, you will likely need to understand how that connector will make use of rows defined via the above schema in order to get your desired results.