Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting started dedup: updated procedures and screnshots #455

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
94 changes: 36 additions & 58 deletions docs/010-getting-started/040-deduplication.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,94 +11,79 @@ tags: ["getting-started"]
1. TOC
{:toc}

Deduplication process helps you find and merge duplicate records based on a set of rules that you define. This process involves [creating a deduplication project](#create-deduplication-project), [configuring the matching rules](#configure-matching-rule) for identifying duplicates, and [fixing duplicates](#fix-duplicates).
Deduplication process helps you find and merge duplicate records based on certain criteria. This process involves creating a deduplication project, adding a matching rule for identifying duplicates, and fixing duplicates.

<div class="videoFrame">
<iframe src="https://player.vimeo.com/video/850839188?badge=0&amp;autopause=0&amp;player_id=0&amp;app_id=58479" frameborder="0" allow="autoplay; fullscreen; picture-in-picture" allowfullscreen title="Getting started with data deduplication in CluedIn"></iframe>
</div>

In this guide, you will learn how to deduplicate the data that you have ingested into CluedIn.
In this guide, you will learn how to find and merge duplicates in the data you ingested into CluedIn..

**Before you start:** Make sure you have completed all steps in the [Ingest data guide](/getting-started/data-ingestion).

**Context:** This guide focuses on identifying duplicates based on the same first name and last name.

## Create deduplication project

As a first step, you need to create a deduplication project that allows you to check for duplicates that belong to a certain entity type.
Creating a deduplication project involves setting up filters to identify the data that will be checked for duplicates.

**To create a deduplication project**

1. On the navigation pane, go to **Management**. Then, select **Deduplication**.
1. On the navigation pane, go to **Management** > **Deduplication**.

![create-dedup-project-1.png](../../assets/images/getting-started/deduplication/create-dedup-project-1.png)

1. Select **Create Deduplication Project**.

1. On the **Create Deduplication Project** pane, do the following:
1. Enter the name of the deduplication project.

1. Enter the name of the deduplication project.
1. 1. In the **Choose project type** section, select an option for identifying the golden records that will be checked for duplicates. Depending on the selected option, provide the required details:

1. Select the entity type that you want to use as a filter for all records.
- **By entity type** – select the entity type; all golden records belonging to the selected entity type will be checked for duplicates. You can choose multiple entity types.

![dedup-2.png](../../assets/images/getting-started/deduplication/dedup-2.png)
- **Using advanced filters** – set up the filter parameters; all golden records that meet the filter criteria will be checked for duplicates. You can add multiple filter rules. Read more about filters [here](/Documentation/Key-terms-and-features/Filters).

1. In the lower-right corner, select **Create**.
![create-dedup-project-2.png](../../assets/images/getting-started/deduplication/create-dedup-project-2.png)

You created the deduplication project.
1. Select **Create**.

![dedup-3.png](../../assets/images/getting-started/deduplication/dedup-3.png)

Now, you can proceed to define the rules for checking duplicates within the selected entity type.

## Configure matching rule

When creating a matching rule, you need to specify certain criteria. CluedIn uses these criteria to check for matching values among records belonging to the selected entity type.
## Add a matching rule

**To configure a matching rule**
Matching rule is used to compare golden records and identify duplicates based on specified criteria. Adding a matching rule involves defining the criteria that will be used to detect duplicates.

1. Go to the **Matching Rules** tab and select **Add Matching Rule**.
**To add a matching rule**

The **Add Matching Rule** pane opens on the right side of the page.
1. Go to the **Matching Rules** tab, and then select **Add Matching Rule**.

1. On the **Matching Rule Name** tab, enter the name of the matching rule, and then select **Next**.
1. Enter the name of the matching rule.

![dedup-4.png](../../assets/images/getting-started/deduplication/dedup-4.png)
![add-matching-rule-1.png](../../assets/images/getting-started/deduplication/add-matching-rule-1.png)

1. On the **Matching Criteria** tab, do the following:
1. Select **Next**.

1. Enter the name of the matching criteria.
1. Select the vocabulary key; all values associated with this vocabulary key will be checked for duplicates.

1. Select the vocabulary key. All values associated with this vocabulary key will be checked for duplicates.
1. In the **Matching Function** dropdown list, select the method for detecting duplicates.

1. In the **Matching Function** dropdown list, select the method for detecting duplicates.

![dedup-5.png](../../assets/images/getting-started/deduplication/dedup-5.png)
![add-matching-rule-2.png](../../assets/images/getting-started/deduplication/add-matching-rule-2.png)

1. In the lower-right corner, select **Next**.
1. Select **Next**.

1. On the **Preview** tab, review the defined matching criteria.

![dedup-6.png](../../assets/images/getting-started/deduplication/dedup-6.png)
To add more matching criteria to the rule, select **Add Matching Criteria**, and then repeat steps 4–6.

If you want to add more matching criteria to the rule, select **Add Matching Criteria**.
![add-matching-rule-3.png](../../assets/images/getting-started/deduplication/add-matching-rule-3.png)

1. After you have added the needed matching criteria, in the lower-right corner of the **Preview** tab, select **Add Rule**.
1. Select **Add Rule**.

The status of the deduplication project becomes **Ready to generate**.

![dedup-7.png](../../assets/images/getting-started/deduplication/dedup-7.png)

1. In the upper-right corner, select **Generate Results**. Then, confirm that you want to generate the results for the deduplication project.
1. Select **Generate Results**, and then confirm your choice.

{:.important}
The process of generating results may take some time.
The process of generating results may take some time. After the process is completed, you will receive a notification. If duplicates are detected, the results will be displayed on the page. The results are organized into groups containing records that match your criteria. For example, on the following screenshot, the group consists of two duplicates.

After the process is completed, you will receive a notification. If duplicates are detected, the results will be displayed on the page. The results are organized into groups containing records that match your criteria. For example, on the following screenshot, the group consists of two duplicates. The name of the group corresponds to the value of the vocabulary key from the matching rule.

![dedup-8.png](../../assets/images/getting-started/deduplication/dedup-8.png)

Now, you can proceed to fix the duplicates.
![add-matching-rule-4.png](../../assets/images/getting-started/deduplication/add-matching-rule-4.png)

## Fix duplicates

Expand All @@ -112,37 +97,30 @@ The process of fixing duplicates involves reviewing the values from duplicate re

1. In the **Conflicting** section, select the values that you want to merge into the deduplicated record.

![dedup-9.png](../../assets/images/getting-started/deduplication/dedup-9.png)

1. In the upper-right corner of the page, select **Next**.
![fix-duplicates-1.png](../../assets/images/getting-started/deduplication/fix-duplicates-1.png)

The **Preview Merge** tab opens. Here, you can view the values that will be merged into the deduplicated record.
1. Select **Next**.

![dedup-10.png](../../assets/images/getting-started/deduplication/dedup-10.png)
1. On the **Preview Merge** tab, review the values that will be merged into the deduplicated record.

1. In the upper-right corner of the page, select **Approve**. Then, confirm that you want to approve your selection of values for the group.
1. Select **Approve**. Then, confirm that you want to approve your selection of values for the group.

1. Select the checkbox next to the group name. Then, select **Merge**.
1. Select the checkbox next to the group name, and then select **Merge**.

![dedup-11.png](../../assets/images/getting-started/deduplication/dedup-11.png)
![fix-duplicates-2.png](../../assets/images/getting-started/deduplication/fix-duplicates-2.png)

1. Confirm that you want to merge the records from the group:

1. Review the group that will be merged and select **Next**.

1. Select an option to handle the data merging process if more recent data becomes available for the entity. Then, select **Confirm**.

![dedup-12.png](../../assets/images/getting-started/deduplication/dedup-12.png)

{:.important}
The process of merging data may take some time.

After the process is completed, you will receive a notification. As a result, the duplicate records have been merged into one record.
![fix-duplicates-3.png](../../assets/images/getting-started/deduplication/fix-duplicates-3.png)

You fixed the duplicate records.
The process of merging data may take some time. After the process is completed, you will receive a notification. As a result, the duplicate records have been merged into one record. On the **Merges** tab, you can view the merged records.

{:.important}
All changes to the data records in CluedIn are tracked. You can search for the needed data record and on the **Topology** pane, you can view the visual representation of the records that were merged through the deduplication process.
All changes to the records in CluedIn are tracked. You can search for the needed record and on the **Topology** pane, you can view the visual representation of the records that were merged through the deduplication process.

## Results & next steps

Expand Down
2 changes: 1 addition & 1 deletion docs/080-management/020-deduplication.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ You can reduce the number of duplicates in the system proactively even before cr

The following diagram shows the basic steps for merging duplicates in CluedIn.

![dedup-main.gif](../../assets/images/management/deduplication/dedup-main.gif)
![deduplication-steps.gif](../../assets/images/management/deduplication/deduplication-steps.gif)

This section covers the following areas:

Expand Down