diff --git a/assets/images/integration/additional-operations/monitoring-4.png b/assets/images/integration/additional-operations/monitoring-4.png new file mode 100644 index 00000000..b917cc31 Binary files /dev/null and b/assets/images/integration/additional-operations/monitoring-4.png differ diff --git a/assets/images/integration/data-sources/auto-submission.gif b/assets/images/integration/data-sources/auto-submission.gif new file mode 100644 index 00000000..6edbfaff Binary files /dev/null and b/assets/images/integration/data-sources/auto-submission.gif differ diff --git a/assets/images/integration/data-sources/disable-bridge-mode.gif b/assets/images/integration/data-sources/disable-bridge-mode.gif new file mode 100644 index 00000000..71532d65 Binary files /dev/null and b/assets/images/integration/data-sources/disable-bridge-mode.gif differ diff --git a/assets/images/integration/data-sources/enable-bridge-mode.gif b/assets/images/integration/data-sources/enable-bridge-mode.gif new file mode 100644 index 00000000..06be70b6 Binary files /dev/null and b/assets/images/integration/data-sources/enable-bridge-mode.gif differ diff --git a/assets/images/integration/data-sources/process-data.gif b/assets/images/integration/data-sources/process-data.gif index 1186b1e1..37857c44 100644 Binary files a/assets/images/integration/data-sources/process-data.gif and b/assets/images/integration/data-sources/process-data.gif differ diff --git a/assets/images/integration/data-sources/remove-processing-logs.gif b/assets/images/integration/data-sources/remove-processing-logs.gif new file mode 100644 index 00000000..5da7a419 Binary files /dev/null and b/assets/images/integration/data-sources/remove-processing-logs.gif differ diff --git a/assets/images/integration/data-sources/retention-settings.gif b/assets/images/integration/data-sources/retention-settings.gif new file mode 100644 index 00000000..fcf39e72 Binary files /dev/null and b/assets/images/integration/data-sources/retention-settings.gif differ diff --git a/docs/040-integration/additional-operations-on-records/050-monitoring.md b/docs/040-integration/additional-operations-on-records/050-monitoring.md index 0058f5cf..a371f8ea 100644 --- a/docs/040-integration/additional-operations-on-records/050-monitoring.md +++ b/docs/040-integration/additional-operations-on-records/050-monitoring.md @@ -6,16 +6,24 @@ grand_parent: Integration permalink: /integration/additional-operations-on-records/monitoring title: Monitoring tags: ["integration", "monitoring"] -last_modified: 2023-11-07 +last_modified: 2024-08-15 --- -Monitoring provides an overview of your records and helps you quickly identify issues. The **Monitoring** tab consists of two sections: +In this article, you will learn about the features on the **Monitoring** tab to gain insight into what is happening with your records and help you quickly identify issues. + +## Monitoring for all types of data sources + +Regardless of the type of data source, the **Monitoring** tab in the data set includes the following sections: - **Total** – here you can view general information about the records, including the total number of records, original columns, mapped columns, and records in quarantine. This is a useful tool to compare the number of original columns and mapped columns. +- **Global queues** – here you can view global statistics on ingestion and processing requests from all data sets. This is a useful tool to ensure that the system runs correctly. + - **Queues** – here you can view messages containing records that are waiting to enter the next stage of their life cycle (loading, mapping, processing). -**Important!** If the number of messages of any type is greater than 0 while the number of consumers is 0, there may be an issue with your data. The following screenshot illustrates a situation where troubleshooting is needed to fix the processing of records. +In the data set created using an endpoint, these sections are located in the **Overview** area. + +If the number of messages of any type is greater than 0 while the number of consumers is 0, there may be an issue with your data. The following screenshot illustrates a situation where troubleshooting is needed to fix the processing of records. ![monitoring-1.png](../../assets/images/integration/additional-operations/monitoring-1.png) @@ -29,8 +37,36 @@ The following table provides descriptions of each message queue and correspondin | Queue | Description | Troubleshooting | |--|--|--| -| Submitting Messages | Messages containing JSON objects sent to the mapping service to be converted into records during processing. | Go to the **Process** tab and select **Cancel**. If you are a system administrator, verify the status of the mapping service and restart the pod containing the name "annotation". | -| Processing Messages | Messages containing records sent to the processing pipeline. | If you are a system administrator, restart the pod containing the name "submitter". | -| Quarantine Messages | Messages containing records that were approved on the **Quarantine** tab and sent to the processing pipeline. | If you are a system administrator, restart the pod containing the name "submitter". | +| Ingestion data set | Messages representing JSON objects sent to various endpoints. | If you are a system administrator, restart the pod named "datasource-processing". | +| Commit data set | Messages representing requests for data set processing. Messages can be added by selecting the **Process** button on the **Process** tab of the data set or each time the endpoint receives data with the auto-submission enabled. | If you are a system administrator, restart the pod named "datasource-processing". | +| Submitting Messages | Messages containing JSON objects sent to the mapping service to be converted into records during processing. | Go to the **Process** tab and select **Cancel**. If you are a system administrator, verify the status of the mapping service and restart the pod named "annotation". | +| Processing Messages | Messages containing records sent to the processing pipeline. | If you are a system administrator, restart the pod named "submitter". | +| Quarantine Messages | Messages containing records that were approved on the **Quarantine** tab and sent to the processing pipeline. | If you are a system administrator, restart the pod named "submitter". | | Loading Failures | Messages containing records from the data set that cannot be fully loaded. | Go to the **Preview** tab and select **Retry**. | -| Error Processing Messages | Messages containing records that could not be processed by the processing pipeline because it does not respond. | Go to the **Process** tab and select **Retry**. If you are a system administrator, verify the status of processing pods. | \ No newline at end of file +| Error Processing Messages | Messages containing records that could not be processed by the processing pipeline because it does not respond. | Go to the **Process** tab and select **Retry**. If you are a system administrator, verify the status of processing pods. | + +## Monitoring for endpoints + +In the data set created using an endpoint, the **Monitoring** tab includes two areas: **Overview** and **Ingestion reports**. The **Overview** area contains statistics described in the previous section. This section focuses on the **Ingestion reposts** area. + +![monitoring-4.png](../../assets/images/integration/additional-operations/monitoring-4.png) + +The **Ingestion reposts** area contains a table with detailed reports generated for each request sent to an endpoint. The table contains the following columns: + +- **ReceiptID** – unique identifier of a request. Every request you send to an endpoint, whether successful or not, receives a unique receipt ID. This ID allows you to quickly locate the request report in CluedIn. Simply copy the receipt ID from the request response and paste it to the search field above the table. + +- **Received** – the number of records received by CluedIn. If the number is 0, it means that the request contained errors and CluedIn rejected it. + +- **Loaded** – the number of records loaded into CluedIn. This column contains three categories: + + - **Success** – the number of records that were successfully loaded into CluedIn. + + - **Failed** – the number of records that failed to load into CluedIn. + + - **Retry** – the number of records that attempted to reload into CluedIn. + +- **Logs** – the number of logs generated for a specific request. You can view the log details by selecting the content of the cell. Keep in mind that for endpoints, we only log warnings. These logs are the same as those found on the **Logs** tab of the dataset. The difference is that the **Logs** tab contains logs for all requests, while the **Ingestion reports** table provides logs for each specific request. For more information on how to read logs, see the [Logs](/integration/additional-operations-on-records/logs documentation. + +- **Processed** – the number of records that were processed in CluedIn. + +- **Created at** – the timestamp indicating when the ingestion report was generated. This corresponds to the time when the HTTP request was executed. \ No newline at end of file diff --git a/docs/040-integration/data-sources/190-process-data.md b/docs/040-integration/data-sources/190-process-data.md index 4fb175c2..bbfc74b3 100644 --- a/docs/040-integration/data-sources/190-process-data.md +++ b/docs/040-integration/data-sources/190-process-data.md @@ -6,12 +6,28 @@ grand_parent: Integration permalink: /integration/process-data title: Process data tags: ["integration", "processing"] -last_modified: 2023-11-07 +last_modified: 2024-08-15 --- -In this article, you will learn about the processing of data that you ingested into CluedIn. The goal of processing is to turn your data into golden records that can be cleaned, deduplicated, and streamed. +In this article, you will learn about the processing of data that you ingested into CluedIn. The goal of processing is to turn your records into standalone golden records or to use them to enhance existing golden records. -The processing of data is the same regardless of the source of data (file, ingestion point, or database.) +Depending on the type of data source, there are three processing options: + +- For file, endpoint, and database: Manual processing + +- For endpoint only: Auto-submission + +- For endpoint only: Bridge mode + +You can process the data set as many times as you want. In CluedIn, once a record has been processed, it won’t undergo processing again. When the processing is started, CluedIn checks for identical records. If identical records are found, they won’t be processed again. However, if you change the origin code for the previously processed records, CluedIn will treat these records as new and process them. + +After the processing is completed, the [processing log](#processing-logs) appears in the table. Any records that fail to meet specific conditions outlined in [property](/Documentation/Data-sources/Additional-operations/Property-rules) or [pre-process](/Documentation/Data-sources/Additional-operations/Pre%2Dprocess-rules) rules will be sent to quarantine. To learn more about managing these records, see [Quarantine](/Documentation/Data-sources/Additional-operations/Quarantine). Records that were processed successfully are displayed on the **Data** tab. + +If the processing takes a long time, go to the **Monitoring** tab and check the number of messages in the queues. Depending on the type of message queue with a high message count, you can perform specific troubleshooting actions. For further details, see [Monitoring](/Documentation/Data-sources/Additional-operations/Monitoring). + +## Manual processing + +Manual processing is available for the data coming from a file, an endpoint, or a database. With manual processing, the original data that was initially sent to CluedIn remains in the temporary storage on the **Preview** tab. After the data has been processed, the resulting golden records appear on the **Data** tab. **To process the data** @@ -36,6 +52,86 @@ The processing of data is the same regardless of the source of data (file, inges ![process-data.gif](../../assets/images/integration/data-sources/process-data.gif) - After the processing is completed, review the statistics. Any records that fail to meet specific conditions outlined in [property](/integration/additional-operations-on-records/property-rules) or [pre-process](/integration/additional-operations-on-records/preprocess-rules) rules will be sent to quarantine. To learn more about managing these records, see [Quarantine](/integration/additional-operations-on-records/quarantine). +## Auto-submission + +Auto-submission is available for the data coming from an endpoint. When auto-submission is enabled, data received from the endpoint is processed automatically. With auto-submission, the original data that was initially sent to CluedIn remains in the temporary storage on the **Preview** tab. After the data has been processed, the resulting golden records appear on the **Data** tab. + +**To enable auto-submission** + +1. On the navigation pane, go to **Integrations** > **Data Sources**. Then, find and open the data set for which you want to enable auto-submission. + +1. Go to the **Process** tab, and then turn on the toggle next to **Auto-submission**. + +1. Confirm that you want to enable automatic processing of records once they are received by CluedIn. + + ![auto-submission.gif](../../assets/images/integration/data-sources/auto-submission.gif) + +If you no longer want the records to be processed automatically, turn off the toggle next to **Auto-submission**. + +## Bridge mode + +Bridge mode is available for the data coming from an endpoint. When bridge-mode is enabled, all your JSON records will be transformed into golden records directly, without being stored in the temporary storage on the **Preview** tab. However, you can rely on data set logs and ingestion receipts for debugging purposes. + +Bridge mode allows you to use less storage and memory, resulting in increased performance. Use this mode when your mapping will not change over time, and you want to use the ingestion endpoint only as a mapper. + +**To switch to bridge mode** + +1. On the navigation pane, go to **Integrations** > **Data Sources**. Then, find and open the data set that you want to switch to bridge mode. + +1. Go to the **Process** tab. Open the three dots menu, and then select **Switch to bridge mode**. + +1. Confirm that you want to switch to bridge mode by entering _BRIDGE_. Then, select **Confirm bridge mode**. + + ![enable-bridge-mode.gif](../../assets/images/integration/data-sources/enable-bridge-mode.gif) + +If you no longer want your endpoint to operate in bridge mode, you can switch it back to the default mode. After switching back to the default mode, the **Preview** tab will appear. However, it will not contain records received while bridge mode was enabled. + +**To switch back to default mode** + +1. On the navigation pane, go to **Integrations** > **Data Sources**. Then, find and open the data set that you want to switch back to default mode. + +1. Go to the **Process** tab. Open the three dots menu, and then select **Switch to default mode**. + +1. Confirm that you want to switch back to default mode by entering _DEFAULT_. Then, select **Confirm default mode**. + + ![disable-bridge-mode.gif](../../assets/images/integration/data-sources/disable-bridge-mode.gif) + +## Processing logs + +Every time the records are processed, a new processing log appears on the **Process** tab of the data set. If the number of processing logs is growing, consider removing older logs. You can also configure the retention settings to automatically remove processing logs after a specific period. + +### Remove processing logs + +Removing processing logs frees up disk space without impacting the processed records. However, once removed, processing logs cannot be recovered. + +**To remove processing logs** + +1. Near the upper-right corner of the processing logs table, open the three dots menu, and then select **Purge processing logs**. + +1. Select the statuses of the processing logs that you want to remove. + +1. Confirm that you want to remove processing logs by entering _DELETE_. Then, select **Purge**. + + ![remove-processing-logs.gif](../../assets/images/integration/data-sources/remove-processing-logs.gif) + + After processing logs are removed, the **Process** tab will display information about the user who removed them and the time of removal. + +### Configure retention settings + +Retention settings allow you to automatically delete processing logs after a specified period. + +**To configure retention settings** + +1. Near the upper-right corner of the processing logs table, open the three dots menu, and then select **Retention settings**. + +1. Select the checkbox to enable retention. + +1. Select a period to specify which processing logs should be removed. For example, selecting **2 months old** means that all processing logs created 2 months ago will be removed. Thus, when a processing log turns 2 months old, it will be automatically removed. + +1. Select **Save**. + + ![retention-settings.gif](../../assets/images/integration/data-sources/retention-settings.gif) + +1. If you want to change the retention period, repeat step 1. Then, select another period and save your changes. -If the processing takes a long time, go to the **Monitoring** tab and check the number of messages in the queues. Depending on the type of message queue with a high message count, you can perform specific troubleshooting actions. For further details, see [Monitoring](/integration/additional-operations-on-records/monitoring). \ No newline at end of file +1. If you want to remove the retention settings, repeat step 1. Then, clear the checkbox to disable retention and save your changes. \ No newline at end of file