Configure Consolidation
Canopy provides a rich set of functionalities that help you consolidate, i.e. merge, entities to form a Master Entity. This document outlines the steps needed to successfully use Consolidation Rules to achieve entity detection results best suited to your data.
A Master Entity is a collection of merged entities that identify a single, unique person. There are two types of merged entities: entities merged automatically via consolidation rules and manually merged entities.
Canopy’s Global Rule Set can be replaced by importing custom consolidation rules. To import rules, click on Consolidation Rules from the kebab menu on the Entity List page:
Click on the Delete icon to either import rules or create your own rules:
Create rules interactively by clicking on the Create Rule button or via a JSON configuration file by clicking on the Import JSON button.
Canopy recommends that you start by using Global Rules, a universal set of rules suitable for all regions. Alternatively, you can select a trimmed down set of rules focused on a specific region:
- Australia
- Canada
- United Kingdom
- United States
Rules Last Updated May 24, 2023
Users have the ability to edit Canopy’s Global Rule Set, or whichever rule set is currently loaded.
Click on the Pencil icon to edit a rule set.
Editing examples:
Users may want to consolidate entities when the First Name AND Last Name AND Social Security Number are the same.
In each condition subgroup below, the selection on top says Any of these, indicating that any of these conditions is enough to consolidate the entities.
In the rule below, entities would be consolidated if they share the First Name AND Last Name AND any of Passport, Military ID and SSN. This is indicated by the All of these toggle in the top condition set and Any of these toggle in the bottom condition set.
In the example below, multiple condition groups are joined with an OR condition, meaning that at least one of the rule sets needs to match.
There are two types of field matching options, Exact and Phonetic. Phonetic matching is only applicable for name fields. Selecting Phonetic Match would consider “Jon” and “John” as same names.
Canopy uses a combination of phonetic match algorithms to detect names that sound the same but have different spellings. These algorithms are called Soundex and Metaphone.
Soundex converts word sounds into codes and then compares these codes to report similarities. Originally created and used for the US census in 1880, 1900, and 1910, Soundex doesn’t fare well with non-English names.
Metaphone was created as an alternative to Soundex. Consonant sounds were added to its analysis, helping it perform better on non-English names.
Canopy considers two words to be phonetically matched when both Soundex and Metaphone report exact matches. Use of both algorithms provides the most accurate results.
Canopy recommends using phonetic match detection on “First Name” or “Last Name” only. If used on both, a name like “Emmanuel Moreno” will match with “Immanuel Morano,” who may be two different people.
There are three types of ignore options, Blanks, Case Sensitivity, and Special Characters. Ignore options cannot be set to Numbers for numeric fields.
If you choose to ignore blank fields, Entity Consolidation will not group two entities where the fields are blank.
For example, you have two entities: | Entity | First Name | Last Name | | —— | —— | —— | | 1 | blank | Smith | | 2 | blank | Smith |
-
Rule configuration 1: First Name AND Last Name Exact Match
Entity Consolidation will cluster all people with the last name of Smith and a blank entry for the first name.
-
Rule configuration 2: First Name (Ignore Blanks) AND Last Name Exact Match
Entity Consolidation will ignore the blank fields and not create clusters based on them. In this example, these two entities with last name of Smith and a blank first name would not be clustered together.
If selected, Entity Consolidation will remove case sensitivity when comparing fields.
If selected, Entity Consolidation will ignore these characters when comparing fields:
- Hyphen -
- Parentheses ()
- Plus +
- Underscores _
- Numbers 1, 2, 3, …
- Spaces
Once conditions have been set, merge settings can be defined. Merge Settings can be accessed via the blue button on the bottom right of the screen:
In Merge Settings, the following options can be selected:
- Use Canopy’s nickname database to resolve first name conflicts
- Toggle this option ON to use the internal Canopy nickname database to resolve conflicts, eg. Jon Favreau and Jonathan Favreau will not be treated as a conflict.
- Combine address fields into a single line before comparing
- Toggle this option ON to collate all Address fields before comparing. Address will be combined only if the address field is configured to Do Not Merge in Field Conflict Settings. For example, the following two addresses will be treated the same:
- Street: 751 ML King Avenue, Unit: Apt 51, City: Neverland, State: Narnia, Country: US
- Street: 751 ML King Avenue, Apt 51, Unit: , City: Neverland, State: Narnia, Country: US
The Field Conflict Settings determine how to resolve conflicts between a Master Entity and a Raw Entity.
Create your own field conflict settings interactively, or import JSON field settings by clicking on Import Merge Settings.
These settings can be set on a field-by-field basis. There are three columns to consider: Action, Merging Method, and Blank Field. For example, let there be two entities, as follows:
Entity | First Name | Middle Name | Last Name | Entity Type |
---|---|---|---|---|
1 | John | William | Federer | Master |
2 | John | blank | Federer | Raw |
The Action column determines what happens when the Middle Name is a conflict between two entities to be consolidated according to the rules. If this is set to Do not merge, the two entities above will remain separate, but clustered. If this is set to Merge, the merging behavior will be determined by the next two columns.
The Merging Method column will only activate when the Action column is set to Merge. Select Append to show entities separated by a comma as a separate value in the field. Select Secondary Field to add the value of the secondary entity to the Additional Info field.
-
Only some fields, like “Names,” have the secondary field capability.
- Blank Field
-
This column determines what happens if one of the values is blank (like Entity 2 above). This column will only activate if Action is set to Do not Merge. If Prevent Merge is selected, the entities will stay clustered, but will not be automatically merged. If Merge is selected, the two entities will be merged.
The Update and Run option will be enabled when one or more of the following conditions are met:
- New rules are imported
- Current rules are edited
- Reset consolidation and/or Delete manual decisions is checked, as explained below.
Consolidation can be reset via a checkbox on the lower left of the Update Consolidation Rules page. Checking this box resets consolidation by deleting all automated clustering, grouping, and merging actions. This checkbox defaults to un-checked each time Update and Run is clicked.
If the Reset consolidation box is checked, the current consolidated entity list will be deleted, including automated clustering, grouping, and merging actions. Manual decisions will be remembered. Running consolidation will create clusters and automatically merge entities using your new rules and merge settings.
All manual decisions will be deleted during consolidation when the Delete manual decisions box is checked.
If the Delete manual decisions box is checked, the manual decisions made to the consolidated entity list will be deleted when consolidation is run.
Checking both of these boxes and clicking Update and Run will completely delete the consolidated entity list, including manual and automated clustering, grouping, and merging actions.
If both Reset consolidation and Delete manual decisions boxes are checked, the current consolidated entity list will be deleted, including manual and automated clustering, grouping, and merging actions.
- Condition Group
- Condition groups contain one or more subgroups of conditions separated by AND or OR statements. Condition groups can also be separated by AND or OR statements.
- Clustering
- Consolidation will create clusters based upon the values in Raw Entities, according to your consolidation rules. Based upon your merge settings, clustered entities will fall into two groups, Merged Entities or Related Entities.
- Master Entity
- A Master Entity is a collection of entities merged to identify a single, unique person. Merging entities can be done manually or via a consolidation rule. There is one Master Entity per cluster.
Changes to the elements in Master or Clustered entities will not be considered during consolidation. For changes to be considered during consolidation, edits must be made at the Raw entity level.
- Merged Entities
- Merged Entities form a Master Entity. There are two types of Merged Entities, automatically merged and manually merged. You can compare entities and remove a merged entity from a cluster using the Unmerge and Ungroup function. Click on the View Details icon in the Action column to access these functions.
- Order of Evaluation
- Rules will process sequentially.
- Related Entities
- Related entities are entities that are clustered together during consolidation, but could not be merged. You can manually merge the entity, or remove the related entity from the cluster using the Ungroup function. Click on this box in the Action column to access these functions:
You can also compare, merge, ungroup or delete entities from the detail view:
- Subgroup
- A group of conditions that you can configure to return a value of true if any of the conditions are met or if all of the conditions are met.