-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding v5 support #2
Comments
I was hiding v4 / v5 just so we didnt have 3x lots of the essentially same docs, and was being lazy about just removing the POD.. I would class those modules as internal (tho I guess folks might want to load them instead of the top layer?) Sounds good re test ideas, I'll have a poke tomorrow unless someone else gets to it. Current default is v4, unless v5 is requested.. I'm happy with either variant . |
I think v5 should be the default. If we're going to change the default, then the v5.0.0 release of the module is the time to do it. I think it's OK to break things in a big version leap like that, and HTML5 is what folks want in this day & age. |
Makes sense to me |
Btw in case it wasn't clear (and the labelling is somewhat misleading, I borrowed it from DBIx::Class) .. the technique doesn't hide the modules from PAUSE, but more from metacpan / cpan websites. |
Looking at writing the tests of course makes me think of more things.. HTML5 has been through several iterations, some of which have added, then later removed, elements (grrr).. What approach should we take? My initial thought would be: |
I guess I'm not seeing it as a problem to have the v4/v5 searchable in metacpan et al. I've removed them: 9d62e33 |
Do you have examples of things that were added to HTML5 and then removed? I didn't think there were any. |
So far I've found "keygen" and "menuitem", from the mozilla obseleted list here: https://developer.mozilla.org/en-US/docs/Web/HTML/Element (bottom of page) |
Anything else we need to be doing? |
I was under the impression you were still working on things. Did I misunderstand? Right now I've got a |
Oh, oops, miscommunication then. I was done after I submitted the tests update. |
@castaway have you used this code anywhere? Have you used it with existing code that needs HTML::Tagset and tried it with the new 5.0.0? |
I'm making the minimum version for HTML::Tagset be Perl 5.10.1, which came out in 2009. |
I've cleaned up a bunch of stuff on formatting in the new files. All my changes have been pushed back to github on the What else do we need to do in order to release it? |
I did run it yes, I can retry the current content |
There are some places that we need more documentation. I've marked them with |
Added docs: https://github.com/castaway/html-tagset-1/tree/v5 I also ran brewbuild --revdeps (tests reverse dependencies) - and got several which have tests containing html4 style html, so I'm about to write them patches and link back here. |
I also forked HTML::Tagset to Github (PhilterPaper/HTML-Tagset) and have made a bunch of changes to clean up the code, add all the HTML 5 tags and their attributes list, and fixed a few bugs. It's still waiting for a consensus on how to deal with different HTML versions (see my issues PhilterPaper/HTML-Tagset/issues/1 and PhilterPaper/HTML-Tagset/issues/2). I would have no problem with someone merging my changes into some "official" repository and then into the CPAN release. I have my hands full with PDF::Builder, which uses HTML::TreeBuilder, which uses HTML::Tagset; I could take on managing HTML::Tagset if no one else wants the job, but would rather that someone else do it. Somebody, please? |
I would be glad to fold in any changes that add v5 support without breaking existing behavior. Does your fork do that? |
I added all the tags and attributes I could find up through HTML 5, and very limited testing suggests nothing broke. One thing needed is some thorough testing. I also added to the "phrase" (inline) tags list and added the "block" tags list, but it's not clear exactly what the criteria are for inclusion in either list (discussion needed!). The POD should probably list everything available. My changes don't address v4 versus XHTML versus v5 (see my issues 1 and 2), which might break existing usage. That needs to be settled, not necessarily in the next release. New methods that take the HTML version (and whether to discard removed or deprecated tags) as input may be the answer. We should consolidate discussion of all these issues in one Github repository. If you want to pull my changes over to your repository, and conduct discussions there, I would be willing to erase my repository at Github as redundant. At the very minimum |
I don't know what you're asking for there. Please make a separate ticket for it if something is wrong with the HTML 4 tags that can be updated, separate from any potential HTML 5 overhaul. |
https://rt.cpan.org/Public/Bug/Display.html?id=151970 Yes, these are missing HTML 4 tags. Also (should all be HTML 4):
Some stuff in the lists are deprecated/removed after HTML 3 and should be taken care of with "HTML version" control. |
I see that you have released 3.22, with some of my requested changes (ins and del, POD typo fixed). Thank you -- PDF::Builder now runs correctly when it encounters I guess that besides the HTML 5 handling, there are a number of HTML 3 (only ?) tags such as bgsound and plaintext, that we have to decide what to do with. There are also some HTML 4 tags and attributes that you probably want to default to, and clean up %isPhraseMarkup and %isHeadElement. I see also that Strawberry Perl now installs HTML::Tagset in /Strawberry/perl/site instead of /Strawberry/perl/vendor (if that was intentional -- it didn't seem to break anything). Throughout the life of HTML, there have been a number of tags (elements) added, and some removed. For a given HTML version, we would need a switch to specify "everything that has accumulated to this point" versus "just the official, supported items" (and maybe separate deprecated and removed switches for those). In order not to break compatibility with existing applications using HTML::Tagset, I think you should add accessor methods that return about the same thing (hash) as the raw variable access, but permit input switches to detail exactly what level of HTML to provide. HTML 4.01 seems to be approximately what the current product is, so that would make a good default. So, in addition to I'm not sure what to do about the |
Ooops, I missed those. Sorry. Please make an RT ticket that includes any changes that need to make HTML::Tagset correct for HTML 4, and I will update for it. Let's keep that separate from anything to be done to be able to handle HTML5. Also, please comment on https://rt.cpan.org/Public/Bug/Display.html?id=74627 if you want. I don't see any reason to NOT include it. |
hey both, reading all the notes and wondering if this chatter has gotten a bit off topic. where are we on the actual PR i submitted to handle HTML5 explicitly, while not confusing folks expecting 4? Looks like i volunteered to write test patches for a bunch of other dists, that was daft of me.. |
Looks like Andy closed it years ago, with the request that you migrate it to the v5 branch. Is having a separate v5 branch a good idea at this point? It seems to be inactive. Ultimately there should be a single product release, with a way to switch among desired HTML levels. |
First, thank you for your work with getting HTML::Tagset to handle HTML5. That said...
We are at zero right now. I ask for your patience. I have handed off HTML::Tagset to the libwww-perl group for stewardship. (Yes, I'm part of the group, but for this discussion pretend I'm not) A big part of that is that I think other folks besides only me should steer the future of HTML::Tagset. I don't know what libwww-perl will want to do as far as HTML5. Three options immediately come to mind:
I'm not the best person to steer this, and this past week or two has made this clear. I'm not doing any work any more that relies on HTML::Tagset. Clearly, other people will have more informed opinions. HTML::Tagset is really part of an ecosystem, and so I'm very happy that it's moved under the libwww-perl umbrella. |
Yes, I started that branch and no, it hasn't had anything done on it years.
As I said up above, that's an option, and the one that I initially was hoping for, but at this point I don't know that it's the best, and am leaving it to others to steer. |
Hmm. If libwww-perl is now running the show, whose hand is on the steering wheel? It's got to be someone. I don't think you need to get permission to get v4 up to date with a full set of tags, cleaned-up formatting, etc., but we should reach a consensus on how to handle v5 (as well as 3.2 and XHTML) before adding in v5 tags/attributes and doing anything else in that area. I think there's general agreement that whatever is done, it should not break existing applications using HTML::Tagset (especially HTML::TreeBuilder, but there are others). Beyond that, what sort of compatibility should be maintained? Should v4 continue to be the default tag set, as everyone uses v5 now? What is the best way to introduce HTML level switches, and possibly [switches for] removal of deprecated and withdrawn tags? There are a number of tags which are perfectly functional, but whose function should better be done with CSS (e.g., |
Nobody right now. I just handed it over 12 hours ago. Patience, please.
I don't think there's any general agreement of anything yet. |
On behalf of the libwww-perl org, I'm happy to release new versions etc, but since I don't really use this module directly, my main concern is not having it break anything that depends on it. I'd like to defer to people who are familiar with the internals, but I'm happy to help keep things moving along. |
Oops: I meant my copy of the v5 one (there's a link around here somewhere..) ah here: https://github.com/castaway/html-tagset-1/tree/v5 |
(currently re-running brewbuild to see where I got to re rev deps, man installing that was fun) Personally: My interest is in parsing HTML, generally from whole pages from active websites, so they are 99% likely to be v5. I also use TreeBuilder (which I sent a test fix for, see above.. untouched so we may need to prod em /release one), TreeBuilder uses HTML::Parser and TagSet. For a whole page TreeBuilder could catch the Parser event "declaration", figure out which html version is being stated, load the correct part of TagSet, and bob's yer uncle. ( Patch for TreeBuilder goes here? https://metacpan.org/module/HTML::TreeBuilder/source#L1371 ) Somewhat more tricky is when we parse chunks of html without declarations, my suggestion would be: v5 has existed 13+ years, default to HTML5, "ignoring" tags not in v4 (like we do now for v5 tags), document well in Parser and co how to enforce use of v4 if required. FWIW that's what the above linked v5 branch of mine does, so imo "rebase that on current main branch, retest, call done". :) |
Mmmm, only 25 reverse dep test fails.. (out of 30) |
For posterity, here are the original tickets requesting HTML5 support. |
Items for discussion:
PhilterPaper/HTML-Tagset should be pretty up to date, as far as v4 goes, but will need to accommodate different levels (v5 are commented out). Finally, we need to address having a consistent architecture of the lists regarding permissible/required children/parents and ensuring that a tag or attribute gets listed only once in a given list. For example, |
Hi Phil, I feel like we're talking past each other a bit, so I've created a draft PR of the current work state, as last seen in 2022 or so. I hadn't realised the local v5 branch had been removed, which makes it difficult to refer to! See #12
Agreed. Mostly the plan here was to a) patch any "uses Tagset" CPAN modules to not use non-v5 tags in their tests. b) release this as a shiny new (version 5.0.0) module, and document breaking changes. (see POD)
W3C / WHATG lists should be used in my opinion. There's been a lot of change over the lifetime or v5 already. Do we put in all the tags etc that existed over its lifetime? (probably easier than making users try subsets ad-infinitum)
We picked a way to do this, see PR.
Belongs in a separate issue, I think. Aka nice to have, but not directly related to "make v5 work in general". Personally I don't need this, I guess it depends on your use of TreeBuilder et al. I want to parse existing pages, so for me "all ever existing v5 tags in the v5 set" will do. We can ponder how to make subsets of v5 work as well as "the whole thing"
Twas decided to be v5, any strong arguments for it not being?
In another issue? this also doesnt feel like a direct dependency of "make v5 work" |
NB most of these are "uses HTML::Tree, which fails" |
Maybe I missed something going by, but I was not aware that the structure had been definitely decided upon. Once Andy either says, "this is the way we'll do it" or accepts your PR, I'll believe that it's been settled once and for all. Personally, it doesn't matter all that much how it's done, so long as HTML::TreeBuilder works properly. If it were up to me, I would probably go with methods (in addition to the existing variables), but whatever works... As far as being side issues to "make v5 work", I feel that it's all part of the whole architecture and needs to be addressed holistically. Splitting them out to separate issues raises the possibility of their getting lost and not addressed. Some tags aren't all that supported by W3C documentation, for instance |
I think it's too early to talk about code. I want to get some high-level design and interface ironed out first. I've started issue #13 to discuss it. |
Given the lack of any actual use case where someone needs to be able to handle both HTML4 and HTML5 using the same module, I don't see any reason to add HTML5 (or XHTML or HTML3) functionality to a 25-year-old module. If there is such a case, I'm glad to hear it and we can discuss strategy from that point of view. |
I've created a new
v5
branch. Let's work against that, and use this ticket for discussion.Why do you want to hide HTML::Tagset::v[45] from PAUSE? I don't think we do.
As to tests:
font
,i
,center
, etc)audio
,video
,mark
, etc)table
,div
, etc)We should probably test the differences in attributes. https://www.w3.org/TR/html5-diff/ notes, for example, "A new
placeholder
attribute can be specified on theinput
andtextarea
elements." I think it should be pretty exhaustive. We only have to do this once.Finally, as I read it, you have v5 as the default, which is what we should do. We just need to make it dead simple, one line of code ideally, for someone to change back to v4 in their existing code.
The text was updated successfully, but these errors were encountered: