Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make glob consistent with glob.glob #1382

Merged
merged 9 commits into from
Oct 28, 2023

Conversation

mariosasko
Copy link
Contributor

Makes glob(path) consistent with glob.glob(path, recursive=True, include_hidden=True).

This means ** would only be allowed between the path separators. Although breaking, I think this change is good as the current handling of ** is somewhat unexpected (and not particularly useful?).

Also fixes #1380.

@mariosasko
Copy link
Contributor Author

Should be easy to fix the failing tests:

  • gcsfs: replace the line assert fn in gcs.glob(TEST_BUCKET + "/**1") in test_gcs_glob with assert fn in gcs.glob(TEST_BUCKET + "/**/*1")
  • downstream (dask) : replace the line root = "http://localhost:8999/" in test_open_glob with root = "http://localhost:8999"

(I can open PRs there after this one gets approved.)

@martindurant
Copy link
Member

"**" would only be allowed between the path separators

So I understand, is this the only change in the behaviour of glob that we expect from your change?

@mariosasko
Copy link
Contributor Author

So I understand, is this the only change in the behaviour of glob that we expect from your change?!

Yes, that's correct.

@martindurant
Copy link
Member

@mariosasko , do you have an opinion on #1394 ?

@mariosasko
Copy link
Contributor Author

Commented on the issue :)

@martindurant
Copy link
Member

Due to the test breakages downstream, I will release before merging, and then merge maybe on Monday, so that users won't see this until we've tidied up the other libraries too.

@mariosasko
Copy link
Contributor Author

Sounds good!

@martindurant
Copy link
Member

Will you be available to make the PR to dask? I will merge when you have time to do it, so that we don't cause their CI to break for too long. gcsfs is less urgent, and I can do that anyway.

@mariosasko
Copy link
Contributor Author

PR opened in dask :)

@martindurant martindurant merged commit 096e9a1 into fsspec:master Oct 28, 2023
9 of 11 checks passed
@martindurant
Copy link
Member

Do you intend to fix gcsfs, or will I need to deal with it?

@mariosasko
Copy link
Contributor Author

I linked a gcsfs PR with the fix.

@mariosasko mariosasko deleted the consistent-glob branch November 2, 2023 13:16
efiop added a commit to efiop/adlfs that referenced this pull request Dec 22, 2023
This makes it consistent with localfs/gcsfs/s3fs and also makes glob work
after changes in fsspec/filesystem_spec#1382
efiop added a commit to efiop/adlfs that referenced this pull request Dec 23, 2023
This makes it consistent with localfs/gcsfs/s3fs and also makes glob work
after changes in fsspec/filesystem_spec#1382
efiop added a commit to fsspec/adlfs that referenced this pull request Dec 23, 2023
This makes it consistent with localfs/gcsfs/s3fs and also makes glob work
after changes in fsspec/filesystem_spec#1382
@efiop
Copy link
Member

efiop commented Dec 23, 2023

For the record: turns out this also broke adlfs but we didn't have daily tests to notice (which were also broken for another reason 🤦‍♂️ ), so I've added those and fixed the tests.

@martindurant
Copy link
Member

turns out this also broke adlfs

Do you mean that things are now in a good place?

@efiop
Copy link
Member

efiop commented Dec 23, 2023

@martindurant Yes, all good now, no action needed. Here's the main PR fixing that fsspec/adlfs#448 and a few more (fsspec/adlfs#447, fsspec/adlfs#446, fsspec/adlfs#445) related to the other reasons for adlfs tests failing. So all fixed now and should be sustainable long term.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

.glob("**/filename") returns incorrect results
3 participants