-
-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow filtering sitemap urls #9
Comments
This may not be necessary because URLs are first output to a .sitemap.json file and someone could manually adjust that file. But, it could be possible to allow passing in some sort of regex which would allow the user to decide which pages they'd want. |
If we offered this, there's a question of: should this modify the existing exported sitemap? Or should it create a new one? Option 1:
option 2:
option 3:
Seems like regardless, the Other questions worth asking:
|
Note: There's a url pattern api now: https://developer.mozilla.org/en-US/docs/Web/API/URL_Pattern_API |
Some of this work was done under #4 . the option to honor robots involved creating a But the The thinking here is that we're asking SelectorHound to behave just like any other crawling bot. What we still want is for an option where a user can simply say, "don't do these URLs". that could be that we follow the same pattern we do for sitemap:
but doing this with Robots might be overkill for users who may want to just give a disallow / allow list. |
Allow ability to give a wildcard to filter urls (useful if there's a LOT of urls).
The text was updated successfully, but these errors were encountered: