Primary Keys

One mistake that I have seen many people make who don't really understand databases is to use a generated id field as the primary key on each and every table that they create.Such generated fields are rarely needed since in most instances there is a field or combination of fields that already exists that will allow each record in the table to be uniquely identified. It is somewhat like if you were building houses and each house were built with a dog kennel alongside without any consideration for whether there would ever be a dog.

Adding an id to a table to use as the primary key should be the last option that you consider and not the first. In many instances there will be a single field within the table that already uniquely identifies the record and which can be expected to remain relatively static (for example an email address is relatively static since people don't go changing email addresses all that often) Of course there may be other unique identifiers on the data that are even less likely to change (such as membership numbers) and so the most static of the fields that uniquely identifies the data should be selected as the primary key.

Where there is no one unique field to identify the record there may be a combination of fields that can serve that purpose. There is nothing that says that the primary key must bo a single field and if there are two or three relatively short fields in combination that are needed to identify the record then these should together be considered to be the primary key. It is only where this results in an extremely long primary key that you would consider adding one or more id fields as a part of your denormalisation of the logical design in creating the physical design. Those long fields would then be split out into a table of their own with a generated id as the primary key for that table and with that id then being used in place of the long field in every other place where the field is referenced. Of course if that field by itself is already a primary key on a table then you'd simply add the generated id as the primary key for that table.

Such changes to use generated keys in those situations should be done only for the purpose of making the physical database more efficient and you still need to define the original primary key as being unique in order to maintain the integrity of the data. It is then obvious to anyone looking at the way that the table is defined because of the unique constraint that the generated key has been added simply to make joining tables more efficient so that if those joins cease to be needed the generated key can be removed and the table revert back to using its original primary key.

The one situatiobn where you need to generate an id field from the start is where there is no suitable field to use as the primary key for the data. Here we are adding a key to uniquely identify each of the records where no such unique identifier previously existed. In most cases therefore this key that we are adding will have uses outside of the database where ever the data needs to be uniquely identified. For most of these instances just generating sequential numbers will not really satisfy the requirement fully since there is probably some sort of grouping within the table and a key that at least gives a clue as to which group that a record belongs to will be more useful. So when you do need to add a field as the primary key you should first consider whether there is anything in the data of each record that could be used as a part of this key in order to provide that clue. The key that you add for a parts table would be a part_number or product_code and would hopefully contain some sort of clue as to what sort of part that it is.

An id with a straight numerical value would only be useful as the identifier in those instances where the records to be distinguished do not contain anything that provides any form of grouping information at all such as the posts and threads in a forum database where the number of the post or thread is as meaningful a key value as you can get. Of course even here the primary keys that we are adding will be used outside of the database itself and so the values must remain constant. No attempt should ever be made to fill in any gaps caused by records being deleted or the external references to the data will be invalidated


This article written by Stephen Chapman, Felgall Pty Ltd.

go to top

FaceBook Follow
Twitter Follow