Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updates in the Integration section #506

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified assets/images/integration/data-sources/process-data.gif
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Original file line number Diff line number Diff line change
Expand Up @@ -6,16 +6,24 @@ grand_parent: Integration
permalink: /integration/additional-operations-on-records/monitoring
title: Monitoring
tags: ["integration", "monitoring"]
last_modified: 2023-11-07
last_modified: 2024-08-15
---

Monitoring provides an overview of your records and helps you quickly identify issues. The **Monitoring** tab consists of two sections:
In this article, you will learn about the features on the **Monitoring** tab to gain insight into what is happening with your records and help you quickly identify issues.

## Monitoring for all types of data sources

Regardless of the type of data source, the **Monitoring** tab in the data set includes the following sections:

- **Total** – here you can view general information about the records, including the total number of records, original columns, mapped columns, and records in quarantine. This is a useful tool to compare the number of original columns and mapped columns.

- **Global queues** – here you can view global statistics on ingestion and processing requests from all data sets. This is a useful tool to ensure that the system runs correctly.

- **Queues** – here you can view messages containing records that are waiting to enter the next stage of their life cycle (loading, mapping, processing).

**Important!** If the number of messages of any type is greater than 0 while the number of consumers is 0, there may be an issue with your data. The following screenshot illustrates a situation where troubleshooting is needed to fix the processing of records.
In the data set created using an endpoint, these sections are located in the **Overview** area.

If the number of messages of any type is greater than 0 while the number of consumers is 0, there may be an issue with your data. The following screenshot illustrates a situation where troubleshooting is needed to fix the processing of records.

![monitoring-1.png](../../assets/images/integration/additional-operations/monitoring-1.png)

Expand All @@ -29,8 +37,36 @@ The following table provides descriptions of each message queue and correspondin

| Queue | Description | Troubleshooting |
|--|--|--|
| Submitting Messages | Messages containing JSON objects sent to the mapping service to be converted into records during processing. | Go to the **Process** tab and select **Cancel**. If you are a system administrator, verify the status of the mapping service and restart the pod containing the name "annotation". |
| Processing Messages | Messages containing records sent to the processing pipeline. | If you are a system administrator, restart the pod containing the name "submitter". |
| Quarantine Messages | Messages containing records that were approved on the **Quarantine** tab and sent to the processing pipeline. | If you are a system administrator, restart the pod containing the name "submitter". |
| Ingestion data set | Messages representing JSON objects sent to various endpoints. | If you are a system administrator, restart the pod named "datasource-processing". |
| Commit data set | Messages representing requests for data set processing. Messages can be added by selecting the **Process** button on the **Process** tab of the data set or each time the endpoint receives data with the auto-submission enabled. | If you are a system administrator, restart the pod named "datasource-processing". |
| Submitting Messages | Messages containing JSON objects sent to the mapping service to be converted into records during processing. | Go to the **Process** tab and select **Cancel**. If you are a system administrator, verify the status of the mapping service and restart the pod named "annotation". |
| Processing Messages | Messages containing records sent to the processing pipeline. | If you are a system administrator, restart the pod named "submitter". |
| Quarantine Messages | Messages containing records that were approved on the **Quarantine** tab and sent to the processing pipeline. | If you are a system administrator, restart the pod named "submitter". |
| Loading Failures | Messages containing records from the data set that cannot be fully loaded. | Go to the **Preview** tab and select **Retry**. |
| Error Processing Messages | Messages containing records that could not be processed by the processing pipeline because it does not respond. | Go to the **Process** tab and select **Retry**. If you are a system administrator, verify the status of processing pods. |
| Error Processing Messages | Messages containing records that could not be processed by the processing pipeline because it does not respond. | Go to the **Process** tab and select **Retry**. If you are a system administrator, verify the status of processing pods. |

## Monitoring for endpoints

In the data set created using an endpoint, the **Monitoring** tab includes two areas: **Overview** and **Ingestion reports**. The **Overview** area contains statistics described in the previous section. This section focuses on the **Ingestion reposts** area.

![monitoring-4.png](../../assets/images/integration/additional-operations/monitoring-4.png)

The **Ingestion reposts** area contains a table with detailed reports generated for each request sent to an endpoint. The table contains the following columns:

- **ReceiptID** – unique identifier of a request. Every request you send to an endpoint, whether successful or not, receives a unique receipt ID. This ID allows you to quickly locate the request report in CluedIn. Simply copy the receipt ID from the request response and paste it to the search field above the table.

- **Received** – the number of records received by CluedIn. If the number is 0, it means that the request contained errors and CluedIn rejected it.

- **Loaded** – the number of records loaded into CluedIn. This column contains three categories:

- **Success** – the number of records that were successfully loaded into CluedIn.

- **Failed** – the number of records that failed to load into CluedIn.

- **Retry** – the number of records that attempted to reload into CluedIn.

- **Logs** – the number of logs generated for a specific request. You can view the log details by selecting the content of the cell. Keep in mind that for endpoints, we only log warnings. These logs are the same as those found on the **Logs** tab of the dataset. The difference is that the **Logs** tab contains logs for all requests, while the **Ingestion reports** table provides logs for each specific request. For more information on how to read logs, see the [Logs](/integration/additional-operations-on-records/logs documentation.

- **Processed** – the number of records that were processed in CluedIn.

- **Created at** – the timestamp indicating when the ingestion report was generated. This corresponds to the time when the HTTP request was executed.
106 changes: 101 additions & 5 deletions docs/040-integration/data-sources/190-process-data.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,12 +6,28 @@ grand_parent: Integration
permalink: /integration/process-data
title: Process data
tags: ["integration", "processing"]
last_modified: 2023-11-07
last_modified: 2024-08-15
---

In this article, you will learn about the processing of data that you ingested into CluedIn. The goal of processing is to turn your data into golden records that can be cleaned, deduplicated, and streamed.
In this article, you will learn about the processing of data that you ingested into CluedIn. The goal of processing is to turn your records into standalone golden records or to use them to enhance existing golden records.

The processing of data is the same regardless of the source of data (file, ingestion point, or database.)
Depending on the type of data source, there are three processing options:

- For file, endpoint, and database: Manual processing

- For endpoint only: Auto-submission

- For endpoint only: Bridge mode

You can process the data set as many times as you want. In CluedIn, once a record has been processed, it won’t undergo processing again. When the processing is started, CluedIn checks for identical records. If identical records are found, they won’t be processed again. However, if you change the origin code for the previously processed records, CluedIn will treat these records as new and process them.

After the processing is completed, the [processing log](#processing-logs) appears in the table. Any records that fail to meet specific conditions outlined in [property](/Documentation/Data-sources/Additional-operations/Property-rules) or [pre-process](/Documentation/Data-sources/Additional-operations/Pre%2Dprocess-rules) rules will be sent to quarantine. To learn more about managing these records, see [Quarantine](/Documentation/Data-sources/Additional-operations/Quarantine). Records that were processed successfully are displayed on the **Data** tab.

If the processing takes a long time, go to the **Monitoring** tab and check the number of messages in the queues. Depending on the type of message queue with a high message count, you can perform specific troubleshooting actions. For further details, see [Monitoring](/Documentation/Data-sources/Additional-operations/Monitoring).

## Manual processing

Manual processing is available for the data coming from a file, an endpoint, or a database. With manual processing, the original data that was initially sent to CluedIn remains in the temporary storage on the **Preview** tab. After the data has been processed, the resulting golden records appear on the **Data** tab.

**To process the data**

Expand All @@ -36,6 +52,86 @@ The processing of data is the same regardless of the source of data (file, inges

![process-data.gif](../../assets/images/integration/data-sources/process-data.gif)

After the processing is completed, review the statistics. Any records that fail to meet specific conditions outlined in [property](/integration/additional-operations-on-records/property-rules) or [pre-process](/integration/additional-operations-on-records/preprocess-rules) rules will be sent to quarantine. To learn more about managing these records, see [Quarantine](/integration/additional-operations-on-records/quarantine).
## Auto-submission

Auto-submission is available for the data coming from an endpoint. When auto-submission is enabled, data received from the endpoint is processed automatically. With auto-submission, the original data that was initially sent to CluedIn remains in the temporary storage on the **Preview** tab. After the data has been processed, the resulting golden records appear on the **Data** tab.

**To enable auto-submission**

1. On the navigation pane, go to **Integrations** > **Data Sources**. Then, find and open the data set for which you want to enable auto-submission.

1. Go to the **Process** tab, and then turn on the toggle next to **Auto-submission**.

1. Confirm that you want to enable automatic processing of records once they are received by CluedIn.

![auto-submission.gif](../../assets/images/integration/data-sources/auto-submission.gif)

If you no longer want the records to be processed automatically, turn off the toggle next to **Auto-submission**.

## Bridge mode

Bridge mode is available for the data coming from an endpoint. When bridge-mode is enabled, all your JSON records will be transformed into golden records directly, without being stored in the temporary storage on the **Preview** tab. However, you can rely on data set logs and ingestion receipts for debugging purposes.

Bridge mode allows you to use less storage and memory, resulting in increased performance. Use this mode when your mapping will not change over time, and you want to use the ingestion endpoint only as a mapper.

**To switch to bridge mode**

1. On the navigation pane, go to **Integrations** > **Data Sources**. Then, find and open the data set that you want to switch to bridge mode.

1. Go to the **Process** tab. Open the three dots menu, and then select **Switch to bridge mode**.

1. Confirm that you want to switch to bridge mode by entering _BRIDGE_. Then, select **Confirm bridge mode**.

![enable-bridge-mode.gif](../../assets/images/integration/data-sources/enable-bridge-mode.gif)

If you no longer want your endpoint to operate in bridge mode, you can switch it back to the default mode. After switching back to the default mode, the **Preview** tab will appear. However, it will not contain records received while bridge mode was enabled.

**To switch back to default mode**

1. On the navigation pane, go to **Integrations** > **Data Sources**. Then, find and open the data set that you want to switch back to default mode.

1. Go to the **Process** tab. Open the three dots menu, and then select **Switch to default mode**.

1. Confirm that you want to switch back to default mode by entering _DEFAULT_. Then, select **Confirm default mode**.

![disable-bridge-mode.gif](../../assets/images/integration/data-sources/disable-bridge-mode.gif)

## Processing logs

Every time the records are processed, a new processing log appears on the **Process** tab of the data set. If the number of processing logs is growing, consider removing older logs. You can also configure the retention settings to automatically remove processing logs after a specific period.

### Remove processing logs

Removing processing logs frees up disk space without impacting the processed records. However, once removed, processing logs cannot be recovered.

**To remove processing logs**

1. Near the upper-right corner of the processing logs table, open the three dots menu, and then select **Purge processing logs**.

1. Select the statuses of the processing logs that you want to remove.

1. Confirm that you want to remove processing logs by entering _DELETE_. Then, select **Purge**.

![remove-processing-logs.gif](../../assets/images/integration/data-sources/remove-processing-logs.gif)

After processing logs are removed, the **Process** tab will display information about the user who removed them and the time of removal.

### Configure retention settings

Retention settings allow you to automatically delete processing logs after a specified period.

**To configure retention settings**

1. Near the upper-right corner of the processing logs table, open the three dots menu, and then select **Retention settings**.

1. Select the checkbox to enable retention.

1. Select a period to specify which processing logs should be removed. For example, selecting **2 months old** means that all processing logs created 2 months ago will be removed. Thus, when a processing log turns 2 months old, it will be automatically removed.

1. Select **Save**.

![retention-settings.gif](../../assets/images/integration/data-sources/retention-settings.gif)

1. If you want to change the retention period, repeat step 1. Then, select another period and save your changes.

If the processing takes a long time, go to the **Monitoring** tab and check the number of messages in the queues. Depending on the type of message queue with a high message count, you can perform specific troubleshooting actions. For further details, see [Monitoring](/integration/additional-operations-on-records/monitoring).
1. If you want to remove the retention settings, repeat step 1. Then, clear the checkbox to disable retention and save your changes.