HTML and XHTML - What's the Difference

Many people get these two markup languages mixed up because they don't understand the differences between them. To start with Internet Explorer 8 and earlier do not understand XHTML at all and will offer it for download. The same was true for older browsers such as Netscape 4 that were popular after XHTML was first introduced.

To make the transition from HTML to XHTML easier back when few browsers actually supported XHTML, the standard permitted the use of the XHTML 1.0 transitional doctype with HTML provided that only the subset of XHTML that was compatible with HTML was used to mark up the page. The idea was to basically serve these pages as HTML until all browsers supported XHTML and then be able to switch them across.

The doctype doesn't determine whether a page is HTML or XHTML - the MIME type the page is served wit is what makes that determination. Pages served as text/html are HTML while XHTML is served as application/xhtml+xml. Which MIME type given pages will use depends on the headers passed before the start of the page. These can be defaulted for given file extensions or loaded using a server side language. It is also possible to use content negotiation to set the MIME type based on what the browser supports.

The most noticeable difference between HTML and XHTML is that XHTML requires all tags to be closed. Just specifying <br> in XHTML is invalid in XHTML. XHTML allows tags that don't have any content to be closed in two ways. You can either add the closing tag straight after the opening one <br></br> or you can make a tag self closing by adding a / just inside the end of the tag <b/r> - both work equally well in XHTML.

That all the examples you see of this in actual web pages use the self-closing variant is not only because it is shorter but it is also the one that can be handled as HTML. The <br></br> variant would be invalid as HTML because that tag is not allowed to have a closing tag in HTML. The <br/> variant simply discards the / as an invalid attribute and treats the tag as if it were <br>.

Note that XHTML does not have a space before the /. You can put a space there but that just makes the tag one character longer than it needs to be. When you see a space in that location it means that the page was either originally written back when Netscape 4 was still in use or the person who wrote the page has copied it from one of those pages. The space was necessary when the page was served as HTML to Netscape 4 as that browser would discard whatever preceded the / as an error as well if the space wasn't present.

Another difference between HTML and XHTML is the use of the doctype tag. With HTML 2 through 4 and XHTML 1.0 through 1.1 an SGML doctype was allowed to be added above the start of the (X)HTML in the page. This doctype would identify the (X)HTML was being used and could also identify the version. It could also contain a link to the SGML standard that defined what tags are allowed to be used and what they do. The reason this tag was optional is that no browser actually implemented using the SGML standard for actually determining how to render the web page. Each browser simply continued to use its own built in rendering to render all pages the same way regardless of which version of HTML or XHTML they use. Since the browsers don't use it the entire concept of using SGML doctypes was dropped from (X)HTML 5.

When Internet Explorer 5 was released the CSS standard was not quite finished but Microsoft decided to implement it anyway. A couple of last minute changes to the standard meant that IE5 wasn't fully compliant to the standards and so Microsoft needed a way to ue with IE6 to determine if web pages used the standard version of CSS or the variant used by IE5. Microsoft decided that the simplest solution was to check for the presence of the SGML doctype tag to use as a switch. Without the tag the page would render as in IE5 and with the tag it would assume the standards were being followed. This meant that the presence of a doctype actually changed the way the browser would render the page but not in the way that the tag was actually meant to. With the removal of the SGML doctype from HTML 5, a new HTML doctype tag was added that acts purely as a switch. The rest of the information that would be found in an SGML doctype is not valid in this new tag which is an HTML tag rather than an SGML tag. XHTML only has one rendering mode and so XHTML 5 doesn't have this tag at all.

The other significant difference between HTML and XHTML is in how browsers handle errors in the markup. HTML is forgiving and will simply ignore anything that it doesn't recognise (such as the / in the end of tags). XHTML is not as forgiving and is supposed to stop rendering the page as soon as it finds an error. It should then display an error message telling you what is wrong so that you can fix the errors before publishing the page.

There are other differences between HTML and XHTML such as being able to add additional XML namespaces to an XHTML document and being able to add tags to distinguish between CDATA and PCDATA (also in XHTML). In fact about the only thing that the two markup languages share in common is that they both use the same tags to represent the same things.


This article written by Stephen Chapman, Felgall Pty Ltd.

go to top

FaceBook Follow
Twitter Follow