The custom-export-rows and custom-export-documents commands allow you to read rows and documents respectively from MarkLogic and write the results to a custom target.

Table of contents

Usage

With the required --target option, you can specify any Spark data source or the name of a third-party Spark connector. For a third-party Spark connector, you must include the necessary JAR files for the connector in the ./ext directory of your Flux installation. Note that if the connector is not available as a single “uber” jar, you will need to ensure that the connector and all of its dependencies are added to the ./ext directory.

As an example, Flux does not provide an out-of-the-box command that uses the Spark Text data source. You can use this data source via custom-export-rows:

  • ./bin/flux custom-export-rows \
        --connection-string "flux-example-user:password@localhost:8004" \
        --query "op.fromView('schema', 'view')" \
        --target text \
        --spark-prop path=export
    
  • bin\flux custom-export-rows ^
        --connection-string "flux-example-user:password@localhost:8004" ^
        --query "op.fromView('schema', 'view')" ^
        --target text ^
        --spark-prop path=export
    

Exporting rows

When using custom-export-rows with an Optic query to select rows from MarkLogic, each row sent to the connector or data source defined by --target will have a schema based on the output of the Optic query. You may find the --preview and --preview-schema options helpful in understanding what data will be in these rows. See Common Options for more information.

Exporting documents

When using custom-export-documents, each document returned by MarkLogic will be represented as a Spark row with the following column definitions:

  1. URI containing a string.
  2. content containing a byte array.
  3. format containing a string.
  4. collections containing an array of strings.
  5. permissions containing a map of strings and arrays of strings representing roles and permissions.
  6. quality containing an integer.
  7. properties containing an XML document serialized to a string.
  8. metadataValues containing a map of string keys and string values.

These are normal Spark rows that can be written via Spark data sources like Parquet and ORC. If using a third-party Spark connector, you will likely need to understand how that connector will make use of rows defined via the above schema in order to get your desired results.


This site uses Just the Docs, a documentation theme for Jekyll.