Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Brainstorm data quality features #53

Open
robertkossendey opened this issue Dec 30, 2022 · 6 comments · May be fixed by #59
Open

Brainstorm data quality features #53

robertkossendey opened this issue Dec 30, 2022 · 6 comments · May be fixed by #59
Assignees
Labels
enhancement New feature or request

Comments

@robertkossendey
Copy link
Collaborator

robertkossendey commented Dec 30, 2022

Constraints are great data quality features that allow users to define filters / rules that identify invalid records.
But the only allow for fail on invalid records and I think we could do better.

Some ideas:

  • Ability to automatically drop invalid rows
  • Ability to automatically mark rows as invalid in target table
    - Ability to write invalid rows to "Quarantine" table

WDYT @MrPowers

@robertkossendey robertkossendey added the enhancement New feature or request label Dec 30, 2022
@MrPowers
Copy link
Owner

@robertkossendey - these sound like good suggestions. I'm guessing some external libs would help for this type of functionality (Great Expectations or PyDeequ perhaps), but don't want to add any dependencies to this lib. Let's keep this open as a "meta-issue". When you have ideas for individual functions, feel free to open up a separate issue and we can chat in detail before you put in the work. Thanks!

@robertkossendey
Copy link
Collaborator Author

@MrPowers I wouldn't like to use any other framework tbh. If you're okay with it I would create a PoC PR that allows you to specify a condition and if that condition is not fulfilled a write would fail.

@MrPowers
Copy link
Owner

@robertkossendey - yep, PoC PR sounds like a great next step!

@souvik-databricks
Copy link
Collaborator

@robertkossendey @MrPowers

Hey guys I actually had built a library to mock the dlt behaviors outside of databricks: dlt-with-debug

I think I can take out the expectation mock apis and add them here in mack.

@robertkossendey
Copy link
Collaborator Author

@souvik-databricks very cool! Maybe you can open up a PR and we can collaborate on that then :)

@souvik-databricks souvik-databricks self-assigned this Dec 30, 2022
@souvik-databricks
Copy link
Collaborator

I will raise the PR on this @robertkossendey

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants