Creating 'valid' HTML 5

The HTML 5 standard as such does not yet exist. All that we have at the moment is a series of proposals for what it is thought might be appropriate to be added in that version. That the proposed additions are nowhere near being the finished version is clearly indicated by the contradictory nature of some of the proposals such as the proposal to allow a "required" attribute on form fields when the proposed "pattern" attribute incorporates that by specifying pattern=".+". There are also several tags proposed to be reintroduced that were flagged as obsolete back in 1997 when HTML 4 became the current standard. These obsolete tags continued to be used to work around issues with older browsers not supporting the replacement tag in HTML 4 and the last of those browsers are now effectively dead making using those obsolete tags unnecessary just as some people are proposing to reinstate them.

Anyway, this article isn't about what is or isn't appropriate to be included in the final HTML 5 standard, it is about how to use the proposed HTML 5 tags in a way that provides you with clean and valid HTML. The problem is that HTML 5 does not have its own doctype to validate the HTML with - the doctype HTML 5 uses is equally valid to use for HTML 2. The page cannot therefore be validated properly based on that doctype since then any tags valid in any version of HTML between HTML 2 and HTML 5 would be considered to be valid. It provides no means to tell whether the page is using obsolete tags. Since most pages on the web are still written using HTML 3.2 tags that have been obsolete since 1997 the likelihood is that most web pages that are validated using the HTML 2-5 doctype will consist mostly of HTML 3.2 with a few HTML 5 tags thrown in. Many of the tags used will be obsolete but the validator will not tell you about them.

HTML 5 is basically the addition of a few extra tags and attributes to what already exists in HTML 4. So if we were to temporarily remove the tags and attributes that are specifically HTML 5 and not also a part of HTML 4 from our page then our page should validate as HTML 4 strict (the transitional HTML 4 doctype only existed to allow pages using HTML 3.2 to gradually transition to HTML 4 by allowing the new HTML 4 tags to be used while still accepting that the page contains obsolete HTML 3.2 tags). If you didn't comment out the HTML 5 tags and attributes and validate the page as HTML 4 strict then the only errors it should report should be caused by the new HTML 5 tags and attributes.

While changing the doctype and possibly commenting out parts of the HTML is more involved than simply running a validator on what is there, it will provide you with a more accurate indication of whether your page is written entirely in HTML 5 (since with very few exceptions HTML 4 is a subset of HTML 5) or whether your page is mostly written in HTML 3.2 with a few HTML 5 tags thrown in.

There is no effective alternative for attempting to validate web pages that contain HTML 5 at the moment without carrying out these additional steps. Hopefully once HTML 5 is actually released as a standard it will be provided with its own identifier that can be used to specify that the page should be validated as HTML 5 and not simply as HTML 2+ as identified by the current doctype recommended for use with HTML 5 pages.


This article written by Stephen Chapman, Felgall Pty Ltd.

go to top

FaceBook Follow
Twitter Follow