Tutorial: Install the Data Hub Framework

1 - Set Up the Project Directory and Sample Data

  1. Create a directory called data-hub. This directory will be referred to as “your project root” or simply “root”.
  2. Download the quick-start-4.3.2.war file and place it your project root directory.
  3. Under your project root, create a directory called input.
  4. Download the sample data .zip file. Expand it, as needed.
  5. Copy the subdirectories (e.g., campaigns, customers, orders) inside the sample data .zip file into the input directory.

Result

Your project directory structure will be as follows:

  data-hub
  ├─ quick-start-4.3.2.war
  └─ input
     ├─ campaigns
     ├─ customers
     ├─ issuehistories
     ├─ issues
     ├─ orders
     ├─ parties
     ├─ products
     │  ├─ games
     │  └─ misc
     ├─ responses
     └─ supportcustomers

2 - Start QuickStart

  1. Open a command-line window, and navigate to your DHF project root directory.
  2. Run the QuickStart .war.
    • To use the default port number for the internal web server (port 8080):
      java -jar quick-start-4.3.2.war
      
    • To use a custom port number; e.g., port 9000:
      java -jar quick-start-4.3.2.war --server.port=9000
      

Result

3 - Install the Data Hub

  1. Open a web browser, and navigate to http://localhost:8080.

  2. Browse to your project root directory. Then click NEXT.
  3. Click INITIALIZE to initialize your project directory.
  4. After initializing your Data Hub Framework project, your project directory contains additional files and directories. Click NEXT.
  5. Choose the local environment, then click NEXT.
  6. Enter your MarkLogic Server credentials, then click LOGIN.
  7. Click INSTALL to install the data hub into MarkLogic.
  8. Wait for the installation to complete.
  9. When installation is complete, Click FINISHED.

Result

When installation is complete, the Dashboard page displays the three initial databases and the number of documents in each.

  • Staging contains incoming data.
  • Final contains harmonized data.
  • Jobs contains data about the jobs that are run and tracing data about each harmonized document.

See Also