The instructions below should be followed before attempting any of the examples in the guides for specific Spark environments, as those examples depend on an application being deployed to MarkLogic.
Obtaining the connector
The MarkLogic connector can be downloaded from this repository’s Releases page. Each Spark environment should have documentation on how to include third-party connectors; please consult your Spark environment’s documentation on how to achieve this.
Deploy an example application
The connector allows a user to specify an Optic query to select rows to retrieve from MarkLogic. The query depends on a MarkLogic view that projects data from documents in MarkLogic into rows.
To facilitate trying out the connector, perform the following steps to deploy an example application to your MarkLogic server that includes a TDE view and some documents that conform to that view. These instructions depend on using Docker to install and initialize an instance of MarkLogic. If you already have an instance of MarkLogic running, you can skip step 4 below, but ensure that the gradle.properties file in the extracted directory contains valid connection properties for your instance of MarkLogic.
- From this repository’s Releases page, select the latest release and download the
marklogic-spark-getting-started-2.6.0.zipfile. - Extract the contents of the downloaded zip file.
- Open a terminal window and go to the directory created by extracting the zip file; the directory should have a name of “marklogic-spark-getting-started-2.6.0”.
- Run
docker-compose up -dto start an instance of MarkLogic - Ensure that the
./gradlewfile is executable; depending on your operating system, you may need to runchmod 755 gradlewto make the file executable. - Run
./gradlew -i mlDeployto deploy the example application.
After the deployment finishes, your MarkLogic server will now have the following:
- An app server named
spark-examplelistening on port 8003. - A database named
spark-example-contentthat contains 1000 JSON documents in a collection namedemployee. - A TDE with a schema name of
exampleand a view name ofemployee. - A user named
spark-example-userwith a password ofpasswordthat can be used with the Spark connector and MarkLogic’s qconsole tool.
To verify that your application was deployed correctly, access your MarkLogic server’s qconsole tool via http://localhost:8000/qconsole . You can authenticate as the spark-example-user user that was created above, as it’s generally preferable to test as a non-admin user.
After authenticating, perform the following steps:
- In the “Database” dropdown, select
spark-example-content. - In the “Query Type” dropdown, select
Optic DSL. - Enter the following query into an editor in qconsole:
op.fromView('example', 'employee').limit(10). - Click on the “Run” button. This should display 10 JSON objects, each being a projection of a row from an employee document in the database.