HTML Validation

The official standards for what is and isn't HTML is set by the W3C (not to be confused with w3Schools which is a privately run site that just happens to have a similar name). On the W3C site you can find definitions for which HTML tags and attributes are considered to be currently valid as well as which tags and attributes are obsolete (meaning that browsers are required to continue supporting them so as to not break old web pages but which are not supposed to be used for new web pages). The standards also used to specify deprecated tags and attributes (obsolete and to be removed in the next version) but as not only have the obsolete tags in old pages not been replaced but people continue to use them in new pages so that actually dropping support for them would break too many antiquated web pages (and no browser wants to lose market share by being the first to do that).

There is also an alternative language for defining your web page content called XHTML. This follows slightly different rules to HTML and is not supported by older browsers such as Internet Explorer 8 and so is not used very much yet.

At the time of writing this the current HTML standard is HTML 4 - identified by the following doctype above the start of the HTML:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">

The current version of XHTML is XHTML 1.0 - identified by the following doctype above the start of the HTML:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

New versions of both HTML and XHTML (called HTML 5 for obvious reasons and XHTML 5 for consistency) are about to become the new standard. They don't follow the SGML standard for defining markup languages and so don't have a doctype (although HTML 5 uses a short version of the doctype as the first HTML tag to act as a switch between standards and quirks mode - where quirks mode identifies a pre-release version of certain standards that some browsers and many web pages adopted where the standard changed slightly between the creation of the browser and the standard becoming final). XHTML doesn't suffer from that problem and so doesn't need this extra tag at all.

The W3C site provides a facility to allow you to validate your (X)HTML to see whether or not all of your tags and attributes comply with the standard. Where the page doesn't contain a doctype or where you want to test if the page complies with a more recent standard before changing the doctype you can override the doctype in the page and validate against the standard you want. There are even "transitional" doctypes for web pages that have been partly updated to a new version but still contain tags from the older standard that have since been made obsolete although now that there are no deprecated tags there is little point in validating against the transitional standards.

When you use this facility to validate your HTML it validates the page as it looks before any of the scripts attached to the page have run. As scripts can change the HTML this means that validating at this point does not prove that the final page is actually valid. Some people actually fool themselves into thinking that their page is valid by applying all of the invalid tags and attributes using scripts. The standard doesn't say that the tags and attributes it lists are specific to what is originally in the HTML and that you can add whatever tags and attributes you like using scripts. The listed tags and attributes are the only ones that are valid both in the original HTML and in any updates applied by scripts.

To test the validity of the page after a script has applied changes is not catered for directly by the W3C validator. It used to be difficult to actually validate the updated page. The addition of browser extensions that link to the W3C validator have rectified this problem. They allow you to display a web page and have the script run. You can then press the validate button that the extension added to your browser and the HTML as it is currently with the script updates applied so far gets passed to the W3C validator for validation.

This difference becomes particularly obvious where you have your actual HTML written to the HTML 4 standard and have some of the proposed HTML 5 attributes being added to your page by third party scripts. Validating the page via the extension identifies all of the HTML 5 attributes as not being valid in HTML 4.

Is this a problem? Of course not. One of the reasons that following the SGML standard for defining HTML was abandoned with HTML 5 is that the browsers have never implemented anything that actually uses the markup definition for the particular version that the page is using. Regardless of which version of HTML your page uses the browser feeds it through the same HTML parser. Browsers don't care what version of HTML your page is using. The only use that the SGML doctype has with HTML is when it comes to validating the HTML against a particular standard. It will report as errors and tags and attributes from prior versions that are now obsolete and any tags and attributes added since the version you are testing. It will also report any tags and attributes you invented that are not a part of any standard. With the first of these groups the browser knows exactly how to process the code, with the second group a modern enough browser will also know how to handle it. Only invented tags and attributes may or may not work.

Assuming that you care enough to validate your HTML (and if you don't then why are you reading this) then you will not want to include any obsolete or invented tags or attributes in your page and will want to fix any issues regarding where tags are allowed to be used. If you have corrected all of those errors then the only errors your validation should be showing are newly introduced tags and attributes that have been added since the standard your doctype identifies. The simplest way to check that you haven't missed anything that is obsolete or invented is to simply validate the page a second time overriding the doctype to specify the newer version. If all of the errors that the first validation reported disappear in the second validation then you can consider your page to be valid. Of course you may also want to look at any errors that appear in the second validation that were not in the first as these indicate things that might need to be changed to comply with the new standard. You need to be careful though that you pick the right version to validate against. While HTML 5 is not yet a standard the validator only provides one option for validating HTML 5. That validation does not distinguish between current and obsolete tags and so using that for validation will make all of the errors relating to obsolete tags disappear as well as the errors for the new tags. Until the validator offers both a strict and transitional option for HTML 5 you need to rely on your own knowledge of what is being added in HTML 5 to distinguish those tags from the others that are reported as errors when you validate as HTML 4.

 

This article written by Stephen Chapman, Felgall Pty Ltd.

go to top

FaceBook Follow
Twitter Follow
Donate