The Standard Generalised Markup Language provides a standard way of defining how documents are structured. This allows the same document content to be shared across a range of different applications provided that they all understand the particular way in which the document structure is defined.

All SGML based documents start with an SGML doctype tag. This tag contains the Document Type Definition (or DTD) which provides the definition for the markup tags that will be found throughout the rest of the document. Any program can read the DTD to understand the meaning of the tags found through the rest of the document and so be able to process them in accordance with the needs of the particular program.

The Document Type Declaration (or doctype) not only specifies the DTD to use for interpreting the document, it also specifies the name of this type of document and whether the DTD is a public standard or is privately defined. Where the application correctly follows the SGML standard that application will use the doctype to access the DTD (which can either be embedded in the doctype tag or more commonly be referenced from the dtd tag) and makes use of that DTD in interpreting the rest of the document. Where the document does not conform to the DTD the document will not be able to be processed by that program.

For example the following doctype defines that the document it precedes uses the standard markup for Archival Finding Aid.

<!DOCTYPE  ead  PUBLIC  "-//Society of American Archivists//DTD
ead.dtd (Encoded Archival Description (EAD) Version 1.0)//EN">

A short form of all SGML doctypes is also available where only the name is provided without a reference to the DTD. The application can then presumably determine which DTD to use to process the document based on the name provided.

<!DOCTYPE  ead>

This is the minimum requirement for document types defined using SGML. There is an alternative way of defining documents called XML. While a doctype is mandatory in SGML it is optional in XML. There are some additional rules built into XML that do not apply in SGML (such as all markup tags needing to be closed) and where a doctype is not present XML can determine which DTD to apply to what content based on the namespace that is defined on the root tag wrapped around the specific content.

A DTD can define elements (the tags the document uses), attributes (parameters that can be attached to those tags), and entities (codes that represent particular values throughout the document). The DTD defines a hierarchy for where the particular elements are allowed to appear and also which attributes are allowed to be used on which elements.

So now that we know what a doctype is and how and where it needs to be used let's look at the place where you are most likely to see them - in HTML and XHTML documents.

When HTML was first created no thought was given to using any sort of standards for the markup language. The idea of actually using a standard for defining the HTML markup was introduced with HTML 2. Basically HTML 2 was a rewrite of HTML 1 to comply with SGML. As such to be compliant with SGML the HTML 2 documents require a doctype with <!DOCTYPE html> being the short version and <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> being the full version. HTML is not case sensitive so it doesn't matter whether you specify html or HTML as the name for this type of document.

Unfortunately the browsers at the time were not SGML compliant applications and so neither recognised the doctype statement (as it wasn't a part of HTML) nor made any attempt to parse the document according to a DTD. Basically the browsers just continued to treat HTML as a proprietary custom markup language that they interpreted in accordance with hard coded rules.

Both HTML 3 and HTML 4 continued with the idea of defining the different versions of HTML according to the SGML standards. Browsers went so far as to recognise that HTML might be defined with a doctype and so would at least go so far as to ignore the doctype tag before applying their custom rules to how to interpret the rest of the document. If a browser that actually correctly uses the doctype to determine which rules to apply to the document exists then that browser has never gained enough of a following to be noticed. What did happen was that some browsers needed a way to distinguish between web pages written for older browsers that rendered the web page according to proposed standards and web pages written to comply with the actual standard where it differed from the proposal. The doctype became popular in between these two browser generations and so checking for the existence of a doctype provided an easy way for these browsers to choose between two different ways of rendering the page. This use of the doctype is of course nothing to do with the reason for the doctype being there in the first place.

A second attempt at getting HTML to follow a markup standard was introduced with XHTML 1. This was a reworking of HTML 4 to comply with the XML rules rather than SGML. This has had slightly better success in that at least some browsers that support XHTML will refuse to render the document if the content of the document does not comply with the XHTML DTD. As with any XML document, the doctype is optional in XHTML. As the browsers that use an antiquated way of rendering the page don't support XHTML there is no need for any way of telling the browser to use a different rendering and so the alternative use the doctype is put to in HTML doesn't apply with XHTML.

The lack of browsers actually making use of SGML to be able to render web pages differently depending on which version of HTML they are using means that while the HTML markup language follows SGML and validators can validate the content as SGML, when it comes to rendering the page the browser will interpret the page content according to its own rules and will completely ignore whichever version of HTML you have defined the page to be via the doctype.

When it came to defining the new versions of (X)HTML 5 the decision was made that since browsers are not SGML applications there is no need for the documents they read to be defined as SGML documents. The SGML doctype was therefore abandoned for HTML5 taking us back to the days of HTML 1 where the tags are hard coded into the browser processing. All browsers have always worked this way but now the attempt to follow the standard has been abandoned. Browsers still need a way to identify which way to render the page though and so a switching mechanism is still required. Since the doctype was already being misused for this purpose the decision was made to make the short version of the HTML 2 SGML doctype into an HTML tag that precedes all HTML 5 pages so as to get the browsers to render them correctly. This extra HTML tag only applies with HTML 5, XHTML 5 is still at least partly based on XML and so doesn't require a doctype.


This article written by Stephen Chapman, Felgall Pty Ltd.

go to top

FaceBook Follow
Twitter Follow