Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skip processing empty dataset at any stage #12

Open
bhavitvyamalik opened this issue Mar 7, 2024 · 0 comments
Open

Skip processing empty dataset at any stage #12

bhavitvyamalik opened this issue Mar 7, 2024 · 0 comments
Labels
feature New feature or request need investigation Unknown scope

Comments

@bhavitvyamalik
Copy link

In the pipeline, same categories.json is copied for every next step followed by copying all datasets from previous step to current step. However there might be edge cases where:

  • Downloaded dataset is empty (affects clean(next) step)
  • Dataset after cleaning becomes empty (affects decontaminate (next) step)
  • Dataset after decontamination might become empty (affects gather (next) step) --very rare but still a possibility

I think it will be prudent to have a sanity check so that we don't copy empty datasets from previous step to next step. We can remove if dataset is empty check in _cmd_exit_str after this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request need investigation Unknown scope
Projects
None yet
Development

No branches or pull requests

2 participants