Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PROD-2700 Implement Soft Delete for PrivacyRequests #5321

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

erosselli
Copy link
Contributor

@erosselli erosselli commented Sep 23, 2024

Closes PROD-2700

Description Of Changes

Adds support for soft deleting PrivacyRequests. The deletion is defined by the new column deleted_at, and we also stored who deleted it in the deleted_by column. A new DELETE /privacy-request/id endpoint is added in order to allow soft deleting requests, as well as an endpoint to bulk delete multiple requests at the same time POST /privacy-request/bulk/delete.

Code Changes

  • Added columns deleted_at and deleted_by to PrivacyRequest , both nullable
  • Added a new endpoint to allow soft deleting privacy requests, DELETE /privacy-request/id and for bulk deleting them POST /privacy-request/bulk/delete
  • Added a new optional param to the existing search / list privacy requests endpoints, include_deleted_requests, that is False by default
  • Added validation to the existing privacy request endpoints that modify privacy request data, to throw a 422 error when attempting to update any data related to a deleted privacy request

Steps to Confirm

  • Run the admin UI and the privacy center (or alternatively, just run the entire test project)
  • Create four different privacy requests, each with a different email, by clicking the "Access my data" blob of the privacy center
  • If you go to the Admin UI, you should see all privacy requests
  • Open SwaggerUI (or your Postman collection if you prefer) and call the DELETE endpoint with the id of one of the privacy requests
  • Next, use the bulk delete endpoint to delete two other requests
  • Go back to Admin UI -- you should now see only one privacy request, the one that hasn't been deleted

Error cases

Trying to do anything that modifies the deleted privacy request should cause a 422 error. Some examples:

  • approving the request
  • deleting the request
  • restarting the request

Pre-Merge Checklist

  • All CI Pipelines Succeeded
  • Documentation:
    • documentation complete, PR opened in fidesdocs
    • documentation issue created in fidesdocs
    • if there are any new client scopes created as part of the pull request, remember to update public-facing documentation that references our scope registry
  • Issue Requirements are Met
  • Relevant Follow-Up Issues Created
  • Update CHANGELOG.md
  • For API changes, the Postman collection has been updated
  • If there are any database migrations:
    • Ensure that your downrev is up to date with the latest revision on main
    • Ensure that your downgrade() migration is correct and works
      • If a downgrade migration is not possible for this change, please call this out in the PR description!

Copy link

vercel bot commented Sep 23, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Skipped Deployment
Name Status Preview Comments Updated (UTC)
fides-plus-nightly ⬜️ Ignored (Inspect) Visit Preview Sep 24, 2024 2:50pm

Comment on lines +14 to +15
revision = "75bb9ee843f5"
down_revision = "9de4bb76307a"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note to self: remember to update before merging

Comment on lines 193 to +194
def get_privacy_request_or_error(
db: Session, privacy_request_id: str
db: Session, privacy_request_id: str, error_if_deleted: Optional[bool] = True
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the privacy request is deleted, we error by default, unless the error_if_deleted argument is set to False

@@ -434,6 +443,7 @@ def _filter_privacy_request_queryset(
external_id: Optional[str] = None,
action_type: Optional[ActionType] = None,
include_consent_webhook_requests: Optional[bool] = False,
include_deleted_requests: Optional[bool] = False,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here, by default we will filter out deleted requests, unless explicitly included

Comment on lines -885 to +905
get_privacy_request_or_error(db, privacy_request_id)
get_privacy_request_or_error(db, privacy_request_id, error_if_deleted=False)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I allow get operations on the deleted privacy request , but not update ones. TBH there weren't any specific requirements regarding this on the ticket, so I used my best judgement. Any feedback / opinions are welcome

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great thing to run by Michael early on! There's a case to be made either way.

Maybe I would strive for consistency here:get_request_status_logs, view_uploaded_manual_webhook_data, view_uploaded_erasure_manual_webhook_data, and get_individual_privacy_request_tasks return very specific details about potentially deleted privacy requests by default.

But the search endpoint which is the place to get primary privacy request details does not return deleted privacy requests by default. This latter detail tips me towards wanting to hide everything by default or add query params to the GET ones so the default is "don't show" instead of "show"

Screenshot 2024-09-24 at 3 05 00 PM

Comment on lines +469 to +471
self.save(db)
# TODO: do we need to remove cache keys?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the question is because the "real" delete method (defined right above) does the following:

        cache: FidesopsRedis = get_cache()
        all_keys = get_all_cache_keys_for_privacy_request(privacy_request_id=self.id)
        for key in all_keys:
            cache.delete(key)
        for provided_identity in self.provided_identities:  # type: ignore[attr-defined]
            provided_identity.delete(db=db)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great question, I wouldn't yet since you're still allowing these privacy requests to come up in search with certain query params.

Comment on lines 43 to 45
.filter(PrivacyRequest.status == PrivacyRequestStatus.awaiting_email_send)
.filter(PrivacyRequest.deleted_at.is_(None))
.order_by(PrivacyRequest.created_at.asc()) # oldest first
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again, there were no specfic requirements here, but I think it makes sense not to send emails for deleted requests

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

smart

Comment on lines +328 to +332
if privacy_request.deleted_at is not None:
logger.info(
"Terminating privacy request {}: request deleted.", privacy_request.id
)
return
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here, I didn't think it made sense to run an access request for a deleted privacy reqyest

Comment on lines +1315 to +1316
# FIXME: don't skip this
@pytest.mark.skip("temporary cause it hangs?")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Un related to this PR, see #5301 . I've skipped it just so that this file would run on the CI. I can remove the skip before merging

Copy link

cypress bot commented Sep 23, 2024

fides    Run #10085

Run Properties:  status check passed Passed #10085  •  git commit 306a46123a ℹ️: Merge f465edfc8ebf12d0c2fe4fd18c4ce7d9cfff9c39 into 0a0ea4f400ca6d1e10ad3b154f90...
Project fides
Branch Review refs/pull/5321/merge
Run status status check passed Passed #10085
Run duration 00m 38s
Commit git commit 306a46123a ℹ️: Merge f465edfc8ebf12d0c2fe4fd18c4ce7d9cfff9c39 into 0a0ea4f400ca6d1e10ad3b154f90...
Committer erosselli
View all properties for this run ↗︎

Test results
Tests that failed  Failures 0
Tests that were flaky  Flaky 0
Tests that did not run due to a developer annotating a test with .skip  Pending 0
Tests that did not run due to a failure in a mocha hook  Skipped 0
Tests that passed  Passing 4
⚠️ You've recorded test results over your free plan limit.
Upgrade your plan to view test results.
View all changes introduced in this branch ↗︎

@erosselli erosselli marked this pull request as ready for review September 24, 2024 14:30
Copy link
Contributor Author

@erosselli erosselli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wanted to call out that I tried to filter out the deleted requests from the places where it made sense, but also I don't know if I found all the places where we get a list of privacy requests. Wanted to point it out to get some extra eyes around that if possible

@pattisdr
Copy link
Contributor

starting review -

Copy link
Contributor

@pattisdr pattisdr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks pretty good but let's step back for a second and clarify with product about expected behavior around "soft deletes". What should be able to be retrieved about a privacy request once it has been soft deleted?

@@ -2441,3 +2488,80 @@ def request_task_async_callback(
queue_request_task(request_task, privacy_request_proceed=True)

return {"task_queued": True}


@router.delete(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this isn't a hard DELETE and it shows up under GET requests, I'd consider a different http method, maybe patch?

Comment on lines +26 to +33
op.create_foreign_key(
"privacyrequest_deleted_at_key",
"privacyrequest",
"fidesuser",
["deleted_by"],
["id"],
ondelete="SET NULL",
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't that dangerous of an operation, adding a nullable FK, but it could prevent privacy requests from being written while the table is being scanned? So that might be something we need to call out. If we're concerned about it, we could also add the constraint as not valid, and then validate the constraint in the other. The privacy request table can be large https://squawkhq.com/docs/adding-foreign-key-constraint/

Alternatively, we might consider removing the FK constraint. This has an added benefit that we can track if the root user made the change, which is not technically a real user. This would be similar to AuditLog.user_id instead

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah that's a good idea I think, it may be overkill to have the FK constraint anyways -- I was following what was done for the approved_by column (I think), but it could be argued that this is a different use case anyways

continue

privacy_request.soft_delete(db, client.user_id)
succeeded.append(privacy_request)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While it's nice to get to re-use the BulkPostPrivacyRequests schema, since the single delete API endpoint returns a 204, this feels a little jarring to return the whole object.

Perhaps we create a new schema, where the "succeeded" list has just the privacy request id, deleted_at, and deleted_by? or similar

Comment on lines +469 to +471
self.save(db)
# TODO: do we need to remove cache keys?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great question, I wouldn't yet since you're still allowing these privacy requests to come up in search with certain query params.

privacy_request.soft_delete(db, user_id)


@router.post(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe patch here as well? what do you think?

Comment on lines -885 to +905
get_privacy_request_or_error(db, privacy_request_id)
get_privacy_request_or_error(db, privacy_request_id, error_if_deleted=False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great thing to run by Michael early on! There's a case to be made either way.

Maybe I would strive for consistency here:get_request_status_logs, view_uploaded_manual_webhook_data, view_uploaded_erasure_manual_webhook_data, and get_individual_privacy_request_tasks return very specific details about potentially deleted privacy requests by default.

But the search endpoint which is the place to get primary privacy request details does not return deleted privacy requests by default. This latter detail tips me towards wanting to hide everything by default or add query params to the GET ones so the default is "don't show" instead of "show"

Screenshot 2024-09-24 at 3 05 00 PM

Comment on lines 43 to 45
.filter(PrivacyRequest.status == PrivacyRequestStatus.awaiting_email_send)
.filter(PrivacyRequest.deleted_at.is_(None))
.order_by(PrivacyRequest.created_at.asc()) # oldest first
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

smart

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants