Anatomy of a Web Page

There is more to a web page than many people realise as when you view the source of a web page you only see the content of the page and when you write HTML you only write the content of the page. Of course if you are using a server side scripting language then you can also generate other parts of the page but the fact that the rest of the page isn't actually visible when you view the source gets many confused as to just what is going on.

When a web page is sent to the browser it actually consists of two parts - the headers and the content. The content of the page is of course the doctype tag (if you use one - which you should) and everything inside the <html></html> container. What makes it slightly confusing is that the content itself is split into two parts with what is in the <head></head> tags supplying information about the page and what is in the <body></body> being what actually appears in the browser viewport. Everything in the HTML is a part of the content of the page and the page headers are completely separate and sent before the content.

Sending any content for the page at all (even a single space) is enough to indicate that all of the page headers have been sent. When you use a server side language to generate headers for your web page you therefore need to either write all the headers first making sure that you don't output any content until after you have finished writing headers or you need to write all of the content to a buffer until after all of the headers are sent and then output the content from the buffer.

Just to give you an idea of what headers look like, here are the headers that get sent ahead of the content of my home page.

HTTP/1.1 200 OK
Date: Thu, 23 Jul 2009 21:13:00 GMT
Server: Apache/2.2.11 .....
X-Powered-By: PHP/5.2.9
Connection: close
Content-Type: text/html

The important ones here from the viewpoint of someone viewing the page are the first and last lines. The 200 in the first line is the return code that tells the browser that the page was actually found and that its content follows. The last line defines the content type of the page as being HTML so that the browser knows how to process the content.

You can write your own headers using the appropriate command in whatever server side language you are using. For example in PHP the headers() function is used to write out headers. For example a very common header to send from PHP is


which actually stops running the current script and loads the specified page in its place. This is actually one of two special headers that can be set from PHP the other being

header("HTTP/1.0 404 Not Found");

which is used to set the status of the request to a different value to what it would normally have. Any headers that you write out that don't start with Location or HTTP will be written exactly as they appear. For example to prevent the page being cached you can send

header("Cache-Control: no-cache, must-revalidate");
header("Expires: Sat, 26 Jul 1997 05:00:00 GMT");

Other things are passed in the headers as well. All POST data for example is passed in the headers and so are any cookies associated with the page.

As you can see there is a lot of information that is passed in the page headers that is not a part of the content of the web page. In order to properly understand how all these things work we'll take a quick look at the four main types of requests that can be sent to the server. These four are HEAD, GET, PUT, and POST.

The most common call relating to web pages is a GET call. This call is intended to retrieve a web page (both headers and content) for display in the browser. Where done from ajax in order to update a part of the page this call also retrieves headers and content that you test after the retrieval in order to update your page. The HEAD call works similarly to GET except that only the headers are retrieved. You would use HEAD when you don't need to retrieve the page content.

Each of these two calls is intended for retrieval and the assumption is that it is not doing any updates on the server. The results retrieved from the first call can therefore be cached by the browser so that a subsequent GET or HEAD call for the same page will retrieve the cached copy instead of going back to the server.

PUT and POST are the two calls that can be used when you want to actually update something on the server and therefore multiple calls of this type automatically assume that things have changed in between and therefore nothing gets cached. POST is the more commonly used of these as a PUT call only allows one value to be sent while POST allows for multiple values.

Another point of interest are meta tags within the content of your web page that use http-equiv. These are commands in the head section of your page content that are attempting to emulate equivalent HTTP headers. Depending on just exactly what headers you are trying to emulate it may or may not work since some of the values to be used need to be decided before the browser starts loading the content.

By realising that there is more to a web page than just the content you get your first glimpse at how the web actually operates and will have a better idea of what you need to look at when it doesn't work quite right. Note that there are plugins/extensions available for some browsers that will allow you to view the HTTP headers sent in front of a web page so that you can see for yourself how the headers relate to what the browser displays.


This article written by Stephen Chapman, Felgall Pty Ltd.

go to top

FaceBook Follow
Twitter Follow