The MarkLogic REST rows service supports operations for querying for rows via several query languages. The MarkLogic Python client simplifies submitting queries for rows and converting responses into useful data structures.
Table of contents
- Setup
- Optic queries
- SQL queries
- SPARQL queries
- GraphQL queries
- Choosing an output format
- Integration with pandas
- Providing additional arguments
Setup
The examples below require documents to be loaded along with a TDE view that projects rows from the documents. You must also have performed the instructions in the setup guide.
Run the following in a Python shell to load 4 documents, each capturing details about a musician:
from marklogic import Client
from marklogic.documents import Document, DefaultMetadata
client = Client('http://localhost:8000', digest=('python-user', 'pyth0n'))
client.documents.write([
DefaultMetadata(permissions={"rest-reader": ["read", "update"]}, collections=["musician"]),
Document("/musician1.json", {"lastName": "Armstrong", "firstName": "Louis", "dob": "1901-08-04"}),
Document("/musician2.json", {"lastName": "Byron", "firstName": "Don", "dob": "1958-11-08"}),
Document("/musician3.json", {"lastName": "Coltrane", "firstName": "John", "dob": "1926-09-23"}),
Document("/musician4.json", {"lastName": "Davis", "firstName": "Miles", "dob": "1926-05-26"})
])
Now load a TDE view via the following:
tde_view = {
"template": {
"context": "/",
"collections": ["musician"],
"rows": [{
"schemaName": "example",
"viewName": "musician",
"columns": [
{"name": "lastName", "scalarType": "string", "val": "lastName"},
{"name": "firstName", "scalarType": "string", "val": "firstName"},
{"name": "dob", "scalarType": "date", "val": "dob"}
]
}]
}
}
client.documents.write(
Document(
"/tde/musicians.json", tde_view,
permissions={"rest-reader": ["read", "update"]},
collections=["http://marklogic.com/xdmp/tde"]
),
params={"database": "Schemas"}
)
Optic queries
The MarkLogic Optic API allows for rows to be queried from documents via a SQL-like syntax. The MarkLogic REST API rows service accepts Optic queries either as an Optic Query DSL statement or as a serialized plan.
Since using an Optic DSL query is often the easiest approach, a DSL query can be submitted as the first argument without any name:
client.rows.query("op.fromView('example', 'musician')")
The above will return a JSON object that captures each of the matching rows along with definitions for each column. See the section below on choosing an output format for controlling how data is returned.
You can use a named argument as well:
client.rows.query(dsl="op.fromView('example', 'musician')")
For some use cases, it may be helpful to capture an Optic query as a serialized plan. A serialized plan can then be submitted via the plan
argument:
plan = '{"$optic":{"ns":"op", "fn":"operators", "args":[{"ns":"op", "fn":"from-view", "args":["example", "musician"]}]}}'
client.rows.query(plan=plan)
Optic supports many different types of queries and operations; please see the documentation for further information on much more powerful and flexible queries than shown in these examples, which are intended solely for demonstration of how to submit an Optic query.
SQL queries
MarkLogic supports SQL queries against views. SQL queries can be submitted via the sql
argument:
client.rows.query(sql="select * from example.musician order by lastName")
This will return a JSON object that captures each of the matching rows along with definitions for each column. See the section below on choosing an output format for controlling how data is returned.
SPARQL queries
MarkLogic supports the SPARQL query language, allowing for SPARQL queries to be run against views, similar to Optic and SQL. SPARQL queries can be submitted via the sparql
argument:
sparql = "PREFIX musician: <http://marklogic.com/column/example/musician/> SELECT * WHERE {?s musician:lastName ?lastName} ORDER BY ?lastName"
client.rows.query(sparql=sparql)
This will return a JSON object that captures each of the matching rows along with definitions for each column. See the section below on choosing an output format for controlling how data is returned.
GraphQL queries
MarkLogic supports GraphQL queries to be run against views. A GraphQL query can be submitted via the graphql
argument:
client.rows.query(graphql="query myQuery { example_musician { lastName firstName dob } }")
This will return a JSON object containing the matching rows. MarkLogic will only return a JSON object for GraphQL queries; the format
argument described below will not have any impact when submitting a GraphQL query.
Choosing an output format
The MarkLogic REST endpoint for querying rows supports several options for how data is returned via the format
parameter. The client.rows.query
function allows for an output format to be selected via a format
argument. The following table defined the possible values:
format value | Output format |
---|---|
json | JSON object with keys of ‘columns’ and ‘rows’. |
xml | XML document defining the columns and rows. |
csv | CSV text with the first row defining the columns. |
json-seq | A line-delimited JSON sequence with the first row defining the columns. |
Integration with pandas
pandas is a widely used data analysis tool. A pandas DataFrame can easily be constructed by retrieving rows via format=csv
and invoking pandas.read_csv
.
The following example assumes that you have pandas available in your Python shell already. If not, exit your shell, run pip install pandas
, and then restart your shell, re-creating the client
object as shown at the top of this page.
import io
import pandas
rows = client.rows.query("op.fromView('example', 'musician')", format="csv")
df = pandas.read_csv(io.StringIO(rows))
Printing the df
object will yield the following:
example.musician.lastName example.musician.firstName example.musician.dob
0 Byron Don 1958-11-08
1 Davis Miles 1926-05-26
2 Armstrong Louis 1901-08-04
3 Coltrane John 1926-09-23
Providing additional arguments
The client.rows.query
method provides a **kwargs
argument, so you can pass in any other arguments you would normally pass to requests
. For example:
response = client.rows.query("op.fromView('example', 'musician')", params={"database": "Documents"})
Please see the rows endpoint documentation for information on additional parameters. If you are submitting a GraphQL query, then see the GraphQL endpoint documentation for information on parameters for that endpoint.