Searching documents

The POST /v1/search endpoint in the MarkLogic REST API supports returning content and metadata for each matching document. Similar to reading multiple documents via the GET /v1/documents endpoint, the data is returned in a multipart HTTP response. The MarkLogic Python client simplifies use of this operation by returning a list of Document instances via the client.documents.search method.

Setup for examples
Searching via a search string
Searching via a complex query
Controlling search results
Providing additional arguments
Returning the original HTTP response
Referencing a transaction
Error handling

Setup for examples

The examples below all assume that you have created a new MarkLogic user named “python-user” as described in the setup guide. To run these examples, please run the following script first, which will create a Client instance that interacts with the out-of-the-box “Documents” database in MarkLogic:

from marklogic import Client
from marklogic.documents import Document, DefaultMetadata

client = Client('http://localhost:8000', digest=('python-user', 'pyth0n'))
client.documents.write([
    DefaultMetadata(permissions={"rest-reader": ["read", "update"]}, collections=["python-search-example"]),
    Document("/search/doc1.json", {"text": "hello world"}),
    Document("/search/doc2.json", {"text": "hello again"})
])

Searching via a search string

The search endpoint in the REST API provides several ways of submitting a query. The simplest approach is by submitting a search string that utilizes the the MarkLogic search grammar:

# Find documents with the term "hello" in them.
docs = client.documents.search("hello")
assert len(docs) == 2

# Find documents with the term "world" in them.
docs = client.documents.search("world")
assert len(docs) == 1

The search string in the example corresponds to the q argument, which is the first argument in the method and thus does not need to be named.

With a search string, you may wish to reference a set of MarkLogic search options as well. You can configure a set of options via the MarkLogic REST API and then refer to them by name via the options argument. For example, if your options are named myOptions, you would include options=myOptions as an argument to client.documents.search.

Searching via a complex query

More complex queries can be submitted via the query parameter. The value of this parameter must be one of the following:

For each of the above approaches, the query can be either a dictionary (for use when defining the query via JSON) or a string of XML. Based on the type, the client will set the appropriate Content-type header.

Examples of a structured query:

# JSON
docs = client.documents.search(query={"query": {"term-query": {"text": "hello"}}})
assert len(docs) == 2

# XML
query = "<query xmlns='http://marklogic.com/appservices/search'>\
        <term-query><text>hello</text></term-query></query>"
docs = client.documents.search(query=query)
assert len(docs) == 2

Examples of a serialized CTS query:

# JSON
query = {"ctsquery": {"wordQuery": {"text": "hello"}}}
docs = client.documents.search(query=query)
assert len(docs) == 2

# XML
query = "<word-query xmlns='http://marklogic.com/cts'><text>hello</text></word-query>"
docs = client.documents.search(query=query)
assert len(docs) == 2

Examples of a combined query:

# JSON
options = {"constraint": {"name": "c1", "word": {"element": {"name": "text"}}}}
query = {
    "search": {"options": options},
    "qtext": "c1:hello",
}
docs = client.documents.search(query=query)
assert len(docs) == 2

# XML
query = "<search xmlns='http://marklogic.com/appservices/search'><options>\
        <constraint name='c1'><word><element name='text'/></word></constraint>\
        </options><qtext>c1:hello</qtext></search>"
docs = client.documents.search(query=query)
assert len(docs) == 2

Controlling search results

The search endpoint supports a variety of parameters for controlling the search request. For convenience, several of the more commonly used parameters are available as arguments in the client.documents.search method:

# Specify the starting point and page length.
docs = client.documents.search("hello", start=2, page_length=5)
assert len(docs) == 1

# Search via a collection without any search string.
docs = client.documents.search(collections=["python-search-example"])
assert len(docs) == 2

Metadata for each document can be retrieved via the categories argument. The acceptable values for this argument match those of the category parameter in the search endpoint documentation: content, metadata, metadata-values, collections, permissions, properties, and quality.

The following shows different examples of configuring the categories argument:

# Retrieve all content and metadata for each matching document.
docs = client.documents.search("hello", categories=["content", "metadata"])
assert "python-search-example" in docs[0].collections
assert "python-search-example" in docs[1].collections

# Retrieve only permissions for each matching document.
docs = client.documents.search("hello", categories=["permissions"])
assert docs[0].content is None
assert docs[1].content is None

Providing additional arguments

The client.documents.search method provides a **kwargs argument, so you can pass in any other arguments you would normally pass to requests. For example:

docs = client.documents.search("hello", params={"database": "Documents"})
assert len(docs) == 2

Please see the application developer’s guide for more information on searching documents.

Returning the original HTTP response

Starting in the 1.1.0 release, the client.documents.search method accepts a return_response argument. When that argument is set to True, the original response is returned. This can be useful for custom processing of the response or debugging requests.

Referencing a transaction

Starting in the 1.1.0 release, you can reference a REST API transaction via the tx argument. See the guide on transactions for further information.

Error handling

If the client.documents.read method receives an HTTP response with a status code of 200, then the client will return a list of Document instances. For any other status code, the client will return the requests Response object, providing access to the error details returned by the MarkLogic REST API.

The status_code and text fields in the Response object will typically be of the most interest when debugging a problem. Please see Response API documentation for complete information on what’s available in this object.

Table of contents