Tutorial: Harmonize the Product Data by Mapping

A harmonize flow is another series of plugins that harmonizes the data in the staging database and stores the results in the final database. Harmonization includes standardizing formats, enriching data, resolving duplicates, indexing, and other tasks.

We can specify the source of an entity property value using one of two methods:

  • By customizing the default harmonization code.
  • By defining mappings that specify which fields in the raw datasets correspond with which properties in the entity model.

Model-to-model mapping (between the source data model and the canonical entity model) was introduced in DHF v4.0.0 to enable users to easily create a harmonization flow without coding. Mappings are ideal when the source data can be easily converted for use as the value of the entity property; a simple conversion can be a difference in the label case or a difference in simple data types.

We have already loaded the Product raw data by:

In this section, we will:

1 - Define the Entity Model

We first define the entity model, which specifies the standard labels for the fields we want to harmonize. For the Product dataset, we will harmonize two fields: sku and price. Therefore, we must add those fields as properties to our Product entity model.

Name Type Other settings Notes
sku string key Used as the primary key because the SKU is unique for each product.
price decimal   Set as a decimal because we need to perform calculations with the price.

To define the Product entity model,

entity properties

  1. In QuickStart's navigation bar, click Entities.
  2. At the top of the Product entity card, click the pencil icon to edit the Product entity definition.
  3. In the Product entity editor, click + in the Properties section to add a new property.
    1. Set Name to sku.
    2. Set Type to string.
    3. To make sku the primary key, click the area in the key column for the sku row.
  4. Click + again to add another property.
    1. Set Name to price.
    2. Set Type to decimal.
  5. Click SAVE.
  6. If prompted to update the index, click Yes.
  7. Drag the bottom-right corner of the entity card to resize it and see the newly added properties.

2 - Define the Mappings

For the Product entity, we define the following simple mappings:

field in raw dataset (type) property in entity model (type) Notes
SKU (string) sku (string) Difference (case-sensitive) between field names
price (string) price (decimal) Difference in types

To create a mapping named Product Mapping,

Create mapping

  1. In QuickStart’s navigation bar, click Mapping.
  2. In the left panel, click the + icon for the Product entity.
  3. In the Create New Mapping form, set Mapping Name to Product Mapping.
  4. Click CREATE.

Your new mapping appears under the tab named Product in the left panel.

The mapping editor displays a row for each property in your entity model. In each row,

  • the right column displays the entity property, and
  • the left column contains a dropdown list from which you can select the source field that corresponds to that entity property.

To configure the mapping,

Mapping editor

  1. For each entity property, expand the dropdown list under Source and select the source field that corresponds to that entity property.

  2. Click SAVE MAPPING.

3 - Create and Run the Harmonize Flow

Harmonization uses the data in your STAGING database to generate canonical entity instances in the FINAL database.

To create a harmonization flow for the Product entity,

Create Harmonize Flow form

  1. In QuickStart’s navigation bar, click Flows.
  2. Expand the tab named Product in the left panel.
  3. Click the + for Harmonize Flows.
  4. In the Create Harmonize Flow dialog, set Harmonize Flow Name to Harmonize Products.
  5. Under Mapping Generation, check “ append: mappingcreated append: “ “.
  6. Click CREATE.

When you create a flow with mapping, QuickStart automatically generates harmonization code based on the entity model and the mapping and then deploys the code to MarkLogic Server.

To run the harmonization flow,

Run Flow form

  1. Click the Flow Info tab.
  2. Click Run Harmonize.

See Also