The POST /v1/search endpoint in the MarkLogic REST API supports returning content and metadata for each matching document. Similar to reading multiple documents via the GET /v1/documents endpoint, the data is returned in a multipart HTTP response. The MarkLogic Python client simplifies use of this operation by returning a list of Document
instances via the client.documents.search
method.
Table of contents
- Setup for examples
- Searching via a search string
- Searching via a complex query
- Controlling search results
- Providing additional arguments
- Returning the original HTTP response
- Referencing a transaction
- Error handling
Setup for examples
The examples below all assume that you have created a new MarkLogic user named “python-user” as described in the setup guide. To run these examples, please run the following script first, which will create a Client
instance that interacts with the out-of-the-box “Documents” database in MarkLogic:
from marklogic import Client
from marklogic.documents import Document, DefaultMetadata
client = Client('http://localhost:8000', digest=('python-user', 'pyth0n'))
client.documents.write([
DefaultMetadata(permissions={"rest-reader": ["read", "update"]}, collections=["python-search-example"]),
Document("/search/doc1.json", {"text": "hello world"}),
Document("/search/doc2.json", {"text": "hello again"})
])
Searching via a search string
The search endpoint in the REST API provides several ways of submitting a query. The simplest approach is by submitting a search string that utilizes the the MarkLogic search grammar:
# Find documents with the term "hello" in them.
docs = client.documents.search("hello")
assert len(docs) == 2
# Find documents with the term "world" in them.
docs = client.documents.search("world")
assert len(docs) == 1
The search string in the example corresponds to the q
argument, which is the first argument in the method and thus does not need to be named.
With a search string, you may wish to reference a set of MarkLogic search options as well. You can configure a set of options via the MarkLogic REST API and then refer to them by name via the options
argument. For example, if your options are named myOptions
, you would include options=myOptions
as an argument to client.documents.search
.
Searching via a complex query
More complex queries can be submitted via the query
parameter. The value of this parameter must be one of the following:
For each of the above approaches, the query can be either a dictionary (for use when defining the query via JSON) or a string of XML. Based on the type, the client will set the appropriate Content-type header.
Examples of a structured query:
# JSON
docs = client.documents.search(query={"query": {"term-query": {"text": "hello"}}})
assert len(docs) == 2
# XML
query = "<query xmlns='http://marklogic.com/appservices/search'>\
<term-query><text>hello</text></term-query></query>"
docs = client.documents.search(query=query)
assert len(docs) == 2
Examples of a serialized CTS query:
# JSON
query = {"ctsquery": {"wordQuery": {"text": "hello"}}}
docs = client.documents.search(query=query)
assert len(docs) == 2
# XML
query = "<word-query xmlns='http://marklogic.com/cts'><text>hello</text></word-query>"
docs = client.documents.search(query=query)
assert len(docs) == 2
Examples of a combined query:
# JSON
options = {"constraint": {"name": "c1", "word": {"element": {"name": "text"}}}}
query = {
"search": {"options": options},
"qtext": "c1:hello",
}
docs = client.documents.search(query=query)
assert len(docs) == 2
# XML
query = "<search xmlns='http://marklogic.com/appservices/search'><options>\
<constraint name='c1'><word><element name='text'/></word></constraint>\
</options><qtext>c1:hello</qtext></search>"
docs = client.documents.search(query=query)
assert len(docs) == 2
Controlling search results
The search endpoint supports a variety of parameters for controlling the search request. For convenience, several of the more commonly used parameters are available as arguments in the client.documents.search
method:
# Specify the starting point and page length.
docs = client.documents.search("hello", start=2, page_length=5)
assert len(docs) == 1
# Search via a collection without any search string.
docs = client.documents.search(collections=["python-search-example"])
assert len(docs) == 2
Metadata for each document can be retrieved via the categories
argument. The acceptable values for this argument match those of the category
parameter in the search endpoint documentation: content
, metadata
, metadata-values
, collections
, permissions
, properties
, and quality
.
The following shows different examples of configuring the categories
argument:
# Retrieve all content and metadata for each matching document.
docs = client.documents.search("hello", categories=["content", "metadata"])
assert "python-search-example" in docs[0].collections
assert "python-search-example" in docs[1].collections
# Retrieve only permissions for each matching document.
docs = client.documents.search("hello", categories=["permissions"])
assert docs[0].content is None
assert docs[1].content is None
Providing additional arguments
The client.documents.search
method provides a **kwargs
argument, so you can pass in any other arguments you would normally pass to requests
. For example:
docs = client.documents.search("hello", params={"database": "Documents"})
assert len(docs) == 2
Please see the application developer’s guide for more information on searching documents.
Returning the original HTTP response
Starting in the 1.1.0 release, the client.documents.search
method accepts a return_response
argument. When that argument is set to True
, the original response is returned. This can be useful for custom processing of the response or debugging requests.
Referencing a transaction
Starting in the 1.1.0 release, you can reference a REST API transaction via the tx
argument. See the guide on transactions for further information.
Error handling
If the client.documents.read
method receives an HTTP response with a status code of 200, then the client will return a list of Document
instances. For any other status code, the client will return the requests
Response
object, providing access to the error details returned by the MarkLogic REST API.
The status_code
and text
fields in the Response
object will typically be of the most interest when debugging a problem. Please see Response API documentation for complete information on what’s available in this object.