Standards or What Works

From what I have seen, most people do not bother with trying to learn how to write HTML in accordance with the standards. Even fewer try to use it correctly by choosing the semantically appropriate tags for their specific content. Most people write HTML that just works despite its use of long dead tags and wrapping content in the completely wrong tag in order to get it to display the way they want in specific browsers.

Most web pages are written in something that might be HTML 3.2 if it weren't for the various proprietary tags that the page uses.

The cause of this is that very few people actually attend a course to learn how to write HTML properly. They therefore miss out on the reasons why it is best to follow the standards and use semantically correct tags. Of course there are exceptions where an alternate approach might be required but you need to know the rules very well before you can determine properly the rare instances where it might be appropriate to break them and most people simply never learn the rules that ought to be followed in writing HTML.

It is because most people use what works rather than doing things properly that those working on HTML 4 have had to ignore that HTML 4 listed a number of tags that were to be removed completely and no longer supported in HTML 5 simply because most people still use those tags even though their removal was announced back in 1997. Not only have the pages that existed back then that used the tags not been updated to remove them but millions of new web pages have been created since then that use tags that were flagged for removal.

As a result of this continuing use of garbage code that works in some browsers but which almost certainly produces a mess in others (but which the web page owners don't know about because they never checked their page in those browsers) browsers are now forced to continue to support that garbage simply because if they don't and other browsers do then people will switch to using the browser that tries to make sense of the garbage rather than sticking with the one that has abandoned the long dead tags. Of course there is always the possibility that markup might be so poor that while it works in one version of a browser it ceases to work in the next version of the same browser because the bug in the browser that allowed the poor markup to work has been fixed.

Of course browsers also need to support the standards and the latest version all browsers support the HTML 4 standard and so can understand web pages that use that version (and will all process it the same way - unlike the way they handle non-standard markup where different browsers may treat it differently. The biggest problem here is that some old browsers take a long time to die - such as Internet Explorer 6 which was the most standard compliant browser at the time it was introduced but was still used by a significant number of people five years after it had become the least standard compliant browser. Other languages don't suffer from this problem as the author of the code usually has more control over the environment that it runs in.

While a markup language such as HTML is extremely simple when compared to any programming language, it doesn't mean that you can learn it properly simply by examining the markup that other people have used. Just because they copied it from somewhere else because it worked way back in whichever versions of the various browsers were around when the original page that used it, doesn't mean that it is necessarily even remotely close to being the best way of coding it - or even that it will work at all now. I often see people asking for help where their web page doesn't work properly in the latest web browsers (but where it presumably works fine in some antiquated browser that may or may not still have a significant number of people using it).

By learning HTML properly and using the standard tags in a semantic way you also make your web page source easier for search engines to read and understand. This means that the page is likely to be listed much higher in the search results than the same page content would be if the HTML used was a jumbled mess. While the search engine is primarily concerned with the content, the HTML is what tells the search engine what the content is.

The entire purpose of HTML is to identify what the various pieces of content are. It needs to clearly identify what are headings, paragraphs, lists, tabular data etc. HTML 5 is going one step further in this direction with introducing specific tags to identify larger portions of the content as page headers and footers, navigations, articles, asides, sections etc that will make the significant part of the content easier to identify. Converting a page to make proper use of these new tags will be much easier if the semantically correct tags are already being used for the smaller components of the page as a semantically coded HTML 4 page will generally use divs with specific ids or classes to identify those larger portions while waiting for the next standard to be completed and to introduce them properly into HTML. Pages that do not use semantic markup will require a complete rewrite in order to be able to make use of these new tags (people might still decide to use them in their pages but they will not be of much use if the rest of the tags are meaningless with respect to their content).

Over time the difference between web pages written using standard compliant semantic HTML and pages written using whatever worked at the time is growing. The same is happening with JavaScript (where there are even people are still teaching how to write JavaScript for the long dead Netscape browser rather than for modern browsers) and to a lesser extent with style sheets (where people writing HTML that works rather than trying to follow the standards still tend to not use style sheets at all or only use a small part of it). If this continues then a more formal split of web pages into two groups will eventually occur - the professional web where the pages work in all standard compliant browsers and the amateur web where anything goes and the pages may not work in some browsers. The way that the standards are now being defined makes this split even more likely. The new HTML 5 standard also has an XHTML equivalent called XHTML 5. XHTML is far less forgiving of some types of error in the markup and so makes it far easier to check if the page complies with the standard simply by seeing what works as trying to use some of the non-standard things that work in HTML will not work in XHTML. The only thing holding back the introduction of XHTML is that Internet Explorer 8 doesn't support it so once that browser is dead those following the HTML standards can make their testing easier by using XHTML instead of HTML. The 2011 version of JavaScript also introduced an alternative "strict" version of that language that makes it easier for those following the latest standards to find and fix errors in their code as well as to reduce the likelihood of errors in the first place by doing away with some of the obsolete problematic commands and restricting how other problematic commands can be used.

The one thing that I do know is that by writing semantically correct and standard compliant HTML in the first place will make it far easier to maintain the page in the future as it will be far less likely to have issues that appear when a new browser version is introduced. The same is true of modern unobtrusive JavaScript written using strict mode.


This article written by Stephen Chapman, Felgall Pty Ltd.

go to top

FaceBook Follow
Twitter Follow