Corb is a Java tool designed for bulk content-reprocessing. Essentially, it lists all the documents in a collection (or all the documents in the database), and then uses a pool of worker threads to apply an XQuery module to each document.
If the target XDBC server is configured with a library location on the filesystem:
/opt/MarkLogic/myapp/src/
and the modules are installed in
/opt/MarkLogic/myapp/src/reprocessing
,
then the MODULE-ROOT should be set to /reprocessing/
.If the target XDBC server is configured with a library location pointing to a database:
The entry point is the main method in the com.marklogic.developer.corb.Manager class. Corb requires 3 command-line arguments:
com.marklogic.developer.corb.Manager \ XCC-CONNECTION-URI COLLECTION-NAME XQUERY-MODULE [ THREAD-COUNT [ URIS-MODULE [ MODULE-ROOT [ MODULES-DATABASE [ INSTALL ] ] ] ] ]
The URIS-MODULE
must be an XQuery main module,
and must return a sequence of (xs:integer, xs:string*)
.
The first item must be the size of the subsequence sequence of URIs.
For example, this simple URIS-MODULE
would return all available URIs from the URI Lexicon,
behaving just as Corb normally would with COLLECTION-NAME=""
and no URIS-MODULE
.
(: simple URIS-MODULE example :) let $uris := cts:uris('', 'document') return (count($uris), $uris)
This example may be extended to intersect a cts:query
, etc.
The following sample invocation
uses a sample medline-reprocessing XQuery module,
which is included in corb.jar
.
You can also download medline-iso8601.xqy.
java -cp $HOME/lib/java/xcc.jar:$HOME/lib/java/corb.jar \ com.marklogic.developer.corb.Manager \ xcc://admin:admin@localhost:9002/ "" \ medline-iso8601.xqy
Another sample invocation, using a processing module loaded from the filesystem, an alternate URI selection module, and 2 threads.
java -cp $HOME/lib/java/xcc.jar:$HOME/lib/java/corb.jar \ com.marklogic.developer.corb.Manager \ xcc://admin:admin@localhost:9002/ "" \ /home/myproject/src/custom-transform.xqy 2 \ /home/myproject/src/custom-uri-selection.xqy
A third sample invocation. Using 4 threads, custom modules pre-installed in the 'mydb' database processing with /preprocessing/custom-transform.xqy and using URIs returned by /preprocessing/custom-uri-selection.xqy:
java -cp $HOME/lib/java/xcc.jar:$HOME/lib/java/corb.jar \ com.marklogic.developer.corb.Manager \ xcc://admin:admin@localhost:9002/ "" \ custom-transform.xqy 4 \ custom-uri-selection.xqy \ /preprocessing/ \ mydb false
As Corb processes the documents, various progress messages will be logged.