Tutorial: Harmonize the Order Data by Custom Code

Harmonization of the Order entity is more complex.

  • The price property of the entity model is the total amount for the entire order; therefore, it must be calculated.
  • The product property is an array of the products ordered, but they are not represented as an array in the source.

Therefore, we must use DHF code scaffolding to generate the harmonization code and then customize it.

We have already loaded the Order raw data by:

In this section, we will:

1 - Define the Entity Model

We assume the following about the Order data:

  • Each product is identified by its SKU.
  • Each order can have more than one product.
  • Each product in the order has a specified quantity.
  • Each order includes a total amount, which must be calculated.

Based on these assumptions, we will add the following properties to the Order entity model for harmonization:

Name Type Other settings Notes
id string Used as the primary key because order ID is unique for each order. Needs an element range index.
total decimal   The calculated total amount of the entire order.
products Product entity Cardinality: 1..∞ An array of pointers to the Product entities in our FINAL database.

To define the Order entity model,

entity properties

  1. In QuickStart's navigation bar, click Entities.
  2. At the top of the Order entity card, click the pencil icon to edit the Order entity definition.
  3. In the Order entity editor, click + in the Properties section to add a new property.
    1. Set Name to id.
    2. Set Type to string.
    3. To make id the primary key, click the area in the key column for the id row.
    4. To specify that id needs an element range index, click the area in the lightning bolt column for the id row.
  4. Click + again to add another property.
    1. Set Name to total.
    2. Set Type to decimal.
  5. Click + again to add another property.
    1. Set Name to products.
    2. Set Type to the entity Product.
    3. To indicate that the entity can have multiple instances of this property, set Cardinality to 1..∞.
  6. Click SAVE.
  7. If prompted to update the index, click Yes.
  8. Drag the bottom-right corner of the entity card to resize it and see the newly added properties.

Result

Because the Order entity contains pointers to the Product entity, an arrow connects the Order entity card to the Product entity card with the cardinality we selected (1..∞).

2 - Create the Harmonize Flow

Harmonization uses the data in your STAGING database to generate canonical entity instances (documents) in the FINAL database.

To create a harmonization flow for the Order entity,

Create Harmonize Flow form

  1. In QuickStart’s navigation bar, click Flows.
  2. Expand the tab named Order in the left panel.
  3. Click the + for Harmonize Flows.
  4. In the Create Harmonize Flow dialog, set Harmonize Flow Name to Harmonize Orders.

  5. Click CREATE.

Because we used the default Create Structure from Entity Definition and we did not specify a mapping, DHF creates boilerplate code based on the entity model. This code includes default initialization for the entity properties, which we will customize.

3 - Customize the Harmonize Flow

3a - Customize the Collector Plugin

The Collector plugin generates a list of IDs for the flow to operate on. The IDs can be whatever your application needs (e.g., URIs, relational row IDs, twitter handles). The default Collector plugin produces a list of source document URIs.

An options parameter is passed to the Collector plugin, and it contains the following properties:

  • entity: the name of the entity this plugin belongs to (e.g., “Order”)
  • flow: the name of the flow this plugin belongs to (e.g., “Harmonize Orders”)
  • flowType: the type of flow being run (“input” or “harmonize”; e.g., “harmonize”)

The Load Orders input flow automatically groups the source documents into a collection named Order. The default Collector plugin uses that collection to derive a list of URIs.

View code snippet. ```javascript cts.uris(null, null, cts.collectionQuery(options.entity)) ```

In our source Order CSV file, each row represented one line item in an order. For example, if the order had three line items, then three documents were created for that order in the staging database during the input phase. To combine all three documents into a single Order entity, they must be harmonized.

Each of those three documents would have the same order ID but different URIs. Therefore, we must customize the collector plugin to return a list of unique order IDs, instead of a list of URIs.

Technical Notes

  • In our custom collector plugin code, we use the jsearch library library to find all the values of id in the Order collection and return the result.
  • By default, jsearch paginates results; therefore, we call slice() to get all results at once.

Steps

To customize the Collector plugin,

Harmonize Flow - Collector - custom code

  1. Click the COLLECTOR tab.
  2. Replace the collector plugin code with the following:

    /*
            * Collect IDs plugin
            * @param options - a map containing options. Options are sent from Java
            * @return - an array of ids or uris
     */
    function collect(options) {
     const jsearch = require('/MarkLogic/jsearch.sjs');
      return jsearch
     .values('id')
     .where(cts.collectionQuery(options.entity))
     .slice(0, Number.MAX_SAFE_INTEGER)
     .result();
    }
    module.exports = {
      collect: collect
    };
    
  3. Click SAVE.

3b - Customize the Content Plugin

The list of order IDs collected by our custom Collector plugin is passed to the Content plugin, specifically to its createContent function.

We will customize createContent to do the following:

  • Collect all the line items of the same order into a single Order entity.
  • Calculate the total cost of the order.

Technical Notes

  • A jsearch library query searches the Order collection for all source documents that have the same order id.

    We also apply a map function to each matching document to extract the original content inside the envelope.

    The orders variable will contain an array of original JSON objects.

    View code snippet.
    
    var orders = jsearch
      .collections('Order')
      .documents()
      .where(
    jsearch.byExample({
      'id': id
    })
      )
      .result('value')
      .results.map(function(doc) {
    return doc.document.envelope.instance;
      });
      
  • After collecting the line items in the same order,

    • We calculate the total amount of the order and
    • We store the appropriate Product entity references (using the SKU) in the products property of the Order instance.
    View code snippet.
    
    /* The following property is a local reference. */
    var products = [];
    var price = 0;
    for (var i = 0; i < orders.length; i++) {
      var order = orders[i];
      if (order.sku) {
    products.push(makeReferenceObject('Product', order.sku));
    price += xs.decimal(parseFloat(order.price)) * xs.decimal(parseInt(order.quantity, 10));
      }
    }
      
  • The default code includes some additional functions that we will remove because we do not need them.

    • extractInstanceProduct: Extracts a Product instance in a form suitable for insertion into an Order instance. Because we reference Product entities within the Order instance, we do not need this function.
    • extractInstanceOrder: Extracts an Order instance from an order source document. Since we do not have a one-to-one correspondence, we cannot use this function.

    However, although we do not use extractInstanceOrder, our customized createContent function must produce a similar structure.

    View code snippet.
    
    return {
      '$attachments': attachments,
      '$type': 'Order',
      '$version': '0.0.1',
      'id': id,
      'price': price,
      'products': products
    }
      

Steps

To customize the content plugin code,

Harmonize Flow - Content - custom code

  1. Click the CONTENT tab.
  2. Replace the content plugin code with the following:
  3. Click SAVE.

4 - Run the Harmonize Flow

When you create a flow with mapping, QuickStart automatically generates harmonization code based on the entity model and the mapping and then deploys the code to MarkLogic Server.

To run the harmonization flow,

Run Flow form

  1. Click the Flow Info tab.
  2. Click Run Harmonize.

5 - View the Harmonized Orders

As with other flow runs, you can view the job status.

  1. In the QuickStart menu, click Jobs to open the Jobs list.
  2. In the list, click >_ for .

You can also explore your harmonized data in the FINAL database.

Browse Data

  1. In the QuickStart menu, click Browse Data.
  2. From the database selection dropdown, choose the FINAL database.
  3. (Optional) To narrow the list to include entities only, check the Entities Only box.
  4. In the list, click the row of the first Order dataset item.