Deploy to MarkLogic Data Hub Service

Data Hub Service

You can deploy your DHF project in the cloud instead of setting up your own. The MarkLogic Data Hub Service (DHS) is a cloud-based solution that provides a preconfigured MarkLogic cluster in which you can run flows and from which you can serve harmonized data.

In a DHS environment, the databases, app servers, and security roles are automatically set up. Admins can create user accounts.

To learn more about MarkLogic Data Hub Service (DHF), see MarkLogic Data Hub Service and the DHS documentation.

Deploying a DHF Project to DHS

After you create and test your project locally (your development environment) using Data Hub Framework, you can deploy your project to a DHS cluster (your production environment).

DHF projects and DHS projects have the following default configurations:

  • Ports and load balancers for app servers

    app servers ports DHS load balancers
    staging 8010 curation
    final 8011 operations
    jobs 8013 analytics
  • Roles — The DHS roles are automatically created as part of provisioning your DHS environment. See Data Hub Service Roles.

    flow-developer-role flowDeveloper
    flow-operator-role flowOperator
  • Database names, if customized in the DHF environment

  • Some DHS-only settings in the file, including mlIsHostLoadBalancer and mlIsProvisionedEnvironment, which are set to true to enable DHF to work correctly in DHS

If your endpoints are private, you need a bastion host inside a virtual private cloud (VPC) that can access the MarkLogic VPC. The bastion host securely relays:

  • the requests from the outside world to MarkLogic
  • the results from MarkLogic to the requester

If your endpoints are publicly available, you can use any machine that is set up as a peer of the MarkLogic VPC.


  • A DHF project that has been set up and tested locally
  • A provisioned MarkLogic Data Hub Service environment
    • For private endpoints only: A bastion host inside a virtual private cloud (VPC)
    • Information from your DHS administrator:
      • Your DHS host name (typically, the curation endpoint)
      • REST curation endpoint URL (including port number) for testing
      • The username and password of the user account associated with each of the following roles. (See Creating a User.)
        • endpointDeveloper
        • endpointUser
        • flowDeveloper
        • flowOperator
  • MarkLogic Content Pump


  1. Copy your entire DHF project directory to the machine from which you will access the endpoints, and perform the following steps on that machine.
  2. Open a command-line window, and navigate to your DHF project root directory.
  3. At your project root, create a new file.

    NOTE: If you use a different name for the properties file,

    • The filename must be in the format gradle-{env}.properties, where {env} is any string you want to represent an environment. For example, you can store the settings for your development environment in
    • Remember to update the value of the -PenvironmentName parameter to {env} in the Gradle commands in the following steps.

    a. Copy the following to the new file:


    b. Replace the values.

    Key Replace the value with …
    mlDHFVersion The DHF version to use in your production environment.
    mlHost The name of your DHS host. Tip: The host name is the domain name of the DHS final endpoint (remove ‘http://’ and the ‘:’ and port number from the endpoint URL).
    The username and password of the user account assigned to the flowOperator role. Note: This can also be a user account assigned to the flowDeveloper role if additional permissions are required.
    The username and password of the user account assigned to the flowDeveloper role.
    ml*DbName The names of the DHS databases, if customized.
    ml*AppserverName The names of the DHS app servers, if customized.
    ml*Port The ports that your DHS project is configured with, if not the defaults.
  4. Install the DHF core modules.

    ./gradlew hubInstallModules -PenvironmentName=DHS
    gradlew.bat hubInstallModules -PenvironmentName=DHS
  5. Install the plugins for your project.

    ./gradlew mlLoadModules -PenvironmentName=DHS
    gradlew.bat mlLoadModules -PenvironmentName=DHS
  6. If you are using DHF 4.0.2 or later, load the indexes in the DHS databases.

    ./gradlew mlUpdateIndexes -PenvironmentName=DHS
    gradlew.bat mlUpdateIndexes -PenvironmentName=DHS
  7. Run the input flows using MarkLogic Content Pump (MLCP).

    You can also use any of the following:

  8. Run the harmonization flows.

    ./gradlew hubRunFlow -PentityName=MyAwesomeEntity -PflowName=MyHarmonizeFlow -PflowType=harmonize -PenvironmentName=DHS
    gradlew.bat hubRunFlow -PentityName=MyAwesomeEntity -PflowName=MyHarmonizeFlow -PflowType=harmonize -PenvironmentName=DHS
  9. Verify that your documents are in the databases.

    a. In the following URLs, replace OPERATIONS-REST-ENDPOINT-URL and CURATION-REST-ENDPOINT-URL with the appropriate endpoint URLs from your DHS administrator.

    Final database http://OPERATIONS-REST-ENDPOINT-URL:8011/v1/search
    Staging database http://CURATION-REST-ENDPOINT-URL:8010/v1/search


    b. In a web browser, navigate to one of the URLs.

    The result is an XML list of all your documents in the database. Each item in the list includes the document’s URI, path, and other metadata, as well as a preview of the content.


If you update your flows after the initial project upload, you can redeploy your flow updates by running gradle mlLoadModules again and then running the flows.

See Also