Referential Integrity

A properly designed database will contain a number of different related tables. These relationships may or may not be defined within the database itself.

Where the relationships are defined in the database there will be foreign key constraints defined on the fields in one table that reference the primary key of another table which not only make it clear that the relationship exists when you read the definition of the tables but which also enforces the relationship.

Some people believe that by leaving out these foreign key references they make their database design simpler, faster, or more flexible. In fact leaving out the foreign key references just moves the responsibility for maintaining referential integrity from the database to the associated programs that are using the database. Instead of the programs being able to assume that the data in the database is valid they must all perform the checks themselves in order to confirm that the data is valid.

The problem with relying on the code in the programs to maintain the integrity of the data between tables in your database is that those programs are not necessarily the only way that the database will be updated. Any ad hoc queries run on the database from outside of the programs specifically written to maintain the database will not perform the necessary referential tests and so an ad hoc query could be run that updates the database into an invalid state. Also if a new program were to be added to perform additional processing on the database you are reliant on the person who writes that program to perform all the necessary referential checks manually within their code.

Now assuming that you are using a database that allows you to define foreign keys, you can simplify this entire process by defining the foreign keys that are needed to specify just what referential integrity is supposed to exist within the database. Despite some people's opinion that omitting them simplifies things, the opposite is in fact the case since with the database itself enforcing referential integrity the code required in your programs becomes simpler in what is required in order to test integrity there. It also ensures that the integrity constraints are also applied to any ad hoc queries. In short, it becomes far less likely that the integrity of the database can be corrupted.

Now applying referential integrity within the database itself does enforce a particular order in which updates must be applied. You can no longer insert a row into a child table without first adding the row in the parent table to which it refers and similarly you can no longer delete the row from the parent table without first deleting all the related rows from all child tables. Without the referential integrity in the database you could apply these updates in whatever order you liked however the end result would only be valid if all of the updates were run successfully. With integrity in the database there is now no longer a possibility of the data ending up in an invalid state if some but not all of the updates were to run. To achieve the same level of integrity without the constraints on the database you would need to either perform the updates in the same order as if the constraints were there or you'd need to use transaction processing to ensure that if any of the updates fail that they all get backed out. Neither of these alternatives actually makes your code any simpler than it would be with the constraints in the database.

Applying the constraints in the database actually allows you to simplify any deletes in a way that is not possible if the constraints are not there. Where you wish to delete a row from a parent table along with all of the related rows in whatever child tables exist, it is not necessary to specify all of the deletes separately where the relationships are defined in the database as you can simply specify that you wish to delete them all in one query by specifying the row in the parent table that is to be deleted and requesting that the delete request be cascaded to all of the related child rows.

Any performance gains achieved by leaving out the foreign key constraints are also illusory. The code that you need to add to maintain integrity without those constraints will negate most if not all of the performance savings achieved by removing the constraints. Also since you can no longer rely on the constraints being 100% enforced it will be necessary at some point to run some form of integrity check on the data in order to find any data where the referential integrity has not been maintained correctly by the code in your programs. Once this is deducted from any performance gains achieved by removing the referential integrity checks from the database you will find that overall performance would have been better had the referential integrity checks in the database been used.

Provided that the database you are using supports your defining foreign key constraints, there is no benefit to be obtained by not using them to apply those constraints between tables that should always exist. Only where your database does not support foreign keys or where the relationship is an optional one should you rely on your program to test the constraint.

 

This article written by Stephen Chapman, Felgall Pty Ltd.

go to top

FaceBook Follow
Twitter Follow
Donate