The instructions below should be followed before attempting any of the examples in the guides for specific Spark environments, as those examples depend on an application being deployed to MarkLogic.
Obtaining the connector
The MarkLogic connector can be downloaded from this repository’s Releases page. Each Spark environment should have documentation on how to include third-party connectors; please consult your Spark environment’s documentation on how to achieve this.
Deploy an example application
The connector allows a user to specify an Optic query to select rows to retrieve from MarkLogic. The query depends on a MarkLogic view that projects data from documents in MarkLogic into rows.
To facilitate trying out the connector, perform the following steps to deploy an example application to your MarkLogic server that includes a TDE view and some documents that conform to that view. These instructions depend on using Docker to install and initialize an instance of MarkLogic. If you already have an instance of MarkLogic running, you can skip step 4 below, but ensure that the gradle.properties
file in the extracted directory contains valid connection properties for your instance of MarkLogic.
- From this repository’s Releases page, select the latest release and download the
marklogic-spark-getting-started-2.4.2.zip
file. - Extract the contents of the downloaded zip file.
- Open a terminal window and go to the directory created by extracting the zip file; the directory should have a name of “marklogic-spark-getting-started-2.4.2”.
- Run
docker-compose up -d
to start an instance of MarkLogic - Ensure that the
./gradlew
file is executable; depending on your operating system, you may need to runchmod 755 gradlew
to make the file executable. - Run
./gradlew -i mlDeploy
to deploy the example application.
After the deployment finishes, your MarkLogic server will now have the following:
- An app server named
spark-example
listening on port 8003. - A database named
spark-example-content
that contains 1000 JSON documents in a collection namedemployee
. - A TDE with a schema name of
example
and a view name ofemployee
. - A user named
spark-example-user
with a password ofpassword
that can be used with the Spark connector and MarkLogic’s qconsole tool.
To verify that your application was deployed correctly, access your MarkLogic server’s qconsole tool via http://localhost:8000/qconsole . You can authenticate as the spark-example-user
user that was created above, as it’s generally preferable to test as a non-admin user.
After authenticating, perform the following steps:
- In the “Database” dropdown, select
spark-example-content
. - In the “Query Type” dropdown, select
Optic DSL
. - Enter the following query into an editor in qconsole:
op.fromView('example', 'employee').limit(10)
. - Click on the “Run” button. This should display 10 JSON objects, each being a projection of a row from an employee document in the database.