Why Normalise a Database

Normalising your datais always an appropriate first step in designing your database.By normalising your data you remove redundancies and identify relationships.

With a normalised database there is only one place in the database where each piece of information is held. While there may be duplicate copies of what appears to be the same information in multiple places that information has different meanings in each place where it occurs - such as the same address being held as both the buyer's address and the shipping address. By not storing the same information in multiple places you avoid the possibility of updating one place with a new value and not updating the other places that hold the same information which would lead to your not knowing which copy of the information is correct.

Now some people take this idea of avoiding storing the same information in multiple places to mean that you should avoid using indexes. This is not what it means at all though since any indexes you create that deliberately duplicate information in the database are there to improve the efficiency of the access to the information and the duplication is completely under the control of the database so that the copies can't become inconsistent.

This idea of duplicating data to improve efficiency is the reason why the fully normalised version of a database is never implemented. There is always some normalisation that is undone prior to the actual physical implementation taking place. This is because there are actually two contradictory situations with regard to the data. Any actual updates to the data in your database are most efficient with a fully normalised database where everything to be updated occurs in one place. Not having to update the data in multiple places makes the updates more efficient. It is when you come to reading the data where full normalisation is less than optimal and duplication of all the data so that whatever is being requested can all be retrieved from the one record is the most efficient way.

In almost all cases the efficiencies gained for updating a normalised database far outweigh the loss off efficiency that means for reading back the data (particularly with regard to knopwing that you have the correct copy of the information) and so for most databases the amount by which you would undo the normalisations in coming up with your physical design will be relatively minor. With the exception of normalisations that you'd most obviously want to undo such as the parts of an address you are best off to actually work out the fully normalised logical design as a first step before deciding on which of them to undo for the physical design. In most instances the read efficiencies gained by undoing normalisations will be negligible and so may even require that you perform actual volume testing of the normalised design in order to identify if there are any actual bottlenecks that can be resolved by undoing normalisations.

You can even preempt any possible undoing of normalisations by defining views on the database that look the way that the tables would end up if you undo the normalisation. That way if you do need to undo normalisations then the amount of code changes required in the programs accessing the database will be minimal and hence deferring the decision until a later point in the testing becomes a more practical option.

Normalising the data also makes it easier to change the data that is being stored. If you are recording the subjects that each student is taking and each student currently takes eight subjects then storing all eight in the one record means a major redesign if the situation changes and students now need to take nine subjects whereas if you normalise the information into a separate table then the change just means you will have nine records per student instead of eight.

Many of the problems that occur after an application using a database has been created are due to the data in the database not having been correctly normalised first. In some instances this is because the database designer was given the incorrect information at the start as to just what fields that there are and how those fields are supposed to be used but more often it is a result of the database having been thrown together by someone who does not understand database design properly in the first place and who therefore hasn''t carried out the essential first step of normalising the data.


This article written by Stephen Chapman, Felgall Pty Ltd.

go to top

FaceBook Follow
Twitter Follow