At the time of writing this most of the pages on the web are still using the old HTML 3.2 tags where the HTML is "polluted" to at least some extent with tags and attributes that affect the appearance of the page rather than doing all of the code for the page appearance in the CSS. Note that this discussion also applies to XHTML 1.0 when you are using that as the doctype for your HTML since that is just HTML 4 using XML syntax.
In many cases the owner of the page thinks that they are using HTML 4 because that's what the doctype of the page says. They are only fully using HTML 4 though if the doctype also specifies "strict". the HTML 4 transitional doctype is for those who are still in the process of moving from HTML 3.2 to HTML 4 and will accept both versions of HTML as being valid. Only if your page validates using an HTML 4 strict doctype is your page really using HTML 4 and nothing else.
That so few web pages yet use HTML 4 means that there is little point in even thinking about HTML 5 unless you are creating experimental web pages. There is no way to validate an HTML 5 page anyway since they have decided that HTML 5 will not be based on SGML but will instead have the doctype as a part of HTML itself. That they have chosen the short form of the doctype that is valid for all versions of HTML from 2 onward means that there is nothing there to identify that the page should be validated as HTML 5 rather than as HTML 2.
Anyway back to HTML 4 and how to upgrade pages that are already written but which do not fully follow the HTML 4 standard. Obviously one option is to completely rewrite the HTML of the page in order to upgrade to HTML 4 but if you have hundreds or thousands of pages and don't want to be spending a huge amount of time rewriting everything then a gradual change of the code to slowly get rid of the old code and replace it with the HTML 4 equivalent is a better approach. That's why the transitional doctype exists - because it allows you to gradually convert the code in your page from HTML 3.2 to HTML 4 and allows both to validate while you slowly move from one to the other.
What I suggest that you do is to upgrade your HTML one thing at a time. Pick a particular tag or attribute that no lnger exists in HTML 4 and go through all your pages replacing the occurrences of that tag with the HTML 4 equivalent.
A good place to start if your page uses them is <font> tags. If they are directly inside or outside another tag then add a class to that other tag, apply the same styling to that class as the font tag was applying and remove the font tag completely.. If they are not then replace the font tag with a span tag and apply the class to that.
Updating your HTML to get rid of the attributes that control positioning on the page is slightly more involved. Some are fairly simple such as replacing align left, right, and center for text. The text-align property in CSS is a direct substitute for those. Replacing the align attributes is complicated though by the fact that it can also be used to align blocks of content as well as aligning text. So while the old HTML 3.2 attributes were the same the new CSS to use with HTML 4 is different. The simplest substitution for aligning blocks left or right is to replace them with CSS to specify float left or right. Replacing align center for a block is the most difficult one. Provided that the block has a width specified (or is of a type that will use a default width other than 100%) you can centre a block by specifying "auto" for the left and right margins.
By progressively replacing each of the old HTML 3.2 tags and attributes you will eventually endup with web pages that consist only of HTML 4 and you can finally change your doctype to specify strict. Your HTML may not be as clean as you could have achieved by trying to do the jump all in one step by completely rewriting your page and it may have taken longer to do the upgrade than if you did the rewrite but it does have several advantages in that your pages remain available online throughout as each change that you make only takes seconds to do and also since each change is so quick you can easily fit the changes in between other tasks meaning that your pages do eventually end up being upgraded to use HTML 4. On the other hand if you wait until you have time to rewrite the pages to HTML 4 in one go you may never find the time to do it.
This article written by Stephen Chapman, Felgall Pty Ltd.