Databases and Null

If you are used to programming in just about any language at all then you may have come across the concept of null as a value that variables or properties in that language can have. While null is a rather special value in those languages it is still just another value and it doesn't make any significant changes to how things work if you do or don't use it.

The same is NOT true of relational databases. Where databases allow the concept of NULL as a value that a field in the database can contain that value represents an unknown value, not the specific value of NULL. The entire logic process for comparisons is changed because using NULL in your database introduces the concept of unknown as a third "boolean" value. Any field containing NULL is considered to be unknown. Since the logic of all the comparisons in your database is changed when you use NULL, you should only ever use NULL when you really do want the logic to be changed that way. (Strictly praking if you do allow NULL then your relational database is no longer relational).

Now any comparisons done on a field that can be NULL can return true, false, or unknown. With AND comparisons true and unknown equals unknown, unknown and unknown equals unknown, and unknown and false equals false. With OR comparisons true and unknown equals true, unknown and unknown equals unknown, and unknown and false equals unknown. NOT unknown is still unknown (despite the apparent contradiction in that statement).

Including fields in your database that contain NULL can lead to illogical results being returned that make no sense. Consider the following comparison.

(A <> B) OR (B <> C)

Using normal two value logic this should only return false when A = B = C regardless of what those values represent. Where A and C are known values that are not equal then the value returned will be true regardless of the value of B.However if B is a database field that is allowed to be NULL and where the value of B is unknown (and so the field is actually set to NULL) the result of the comparison will not be the true that it ought to be but rather will be unknown.

So using NULL in your database can result in a loss of information. If A and C in our example are not equal then since B can only have a single value it can't possibly be equal to both and yet by NULL logic the database determines that each half of the comparison is unknown and so provides unknown as the final result even though the two unknowns actually cancel out in the real world to produce a known result regardless of which particular value the NULL represents.

There is another reason why using NULL in your database doesn't make any sense though. That you do not know the value of a field is not a value in the same sense as any other value that the field can hold. Every other value in the field represents the value of the field itself while NULL simply means that you don't know the value and isn't actually the real value of the field.

At this point you should be asking yourself why you don't know the value of a field? When collecting the information to be entered into the database you should be asking for the values you need for all the fields. If you ask for all the information then none of the fields will be unknown when you enter that information into the database. Sure there may be instances where the person refuses to tell you what the value is for a given field but that isn't the same as the value being unknown since you actually do know something about the value. While you don't know what the real value of the field is, the value isn't the complete unknown that it was before you asked since you know that the supply of that information about that value has been refused. Refused and unknown are not the same value anda field where the value is refused should not be treated the same as one where the value is unknown.

Note that there may also be fields where the value is empty. That too is a specific value that is neither unknown nor refused.

In the real world the only time that you will ever have information in your database that is unknown will be where you expand the database to include extra fields and where you have not yet collected the information to fill in the new fields from the people that the existing information in those entries belongs to. Since introducing new fields will not affect the existing logic that doesn't reference the new fields it shouldn't affect anything in your existing code if you add new fields where NULL is an allowed initial value that is used as a placeholder while you collect the information from everyone to fill the fields in properly. If you give everyone a specific amount of time to provide the additional information then at the end of that time you will either have the information for the fields or will know that the person has refused to supply it. As before, refused is not the same as unknown and so at that point you can modify the database again so that NULL is no longer an acceptable value for the fields to hold. You can then implement your new logic involving the new fields using the regular two value logic that always returns the expected results without getting involved in the confusing and not always correct three value logic that using NULL involves.

So while NULL may be useful temporarily while you are in the process of modifying the structure of the database, there is no reason for using it with your regular live data since there should never be an instance where the value of any field in unknown. Sure there may be fields where the value is supposed to be empty or where the information has been refused but those should be represented by actual values in the field rather than by NULL. So if you have a field for middle name and the person doesn't have one you might store a space in the field to represent no middle name and may place '{refused}' in the field if you have given them the opportunity to tell you that they have a middle name but are refusing to tell you what it is. If you ask for their income and they don't have an income then you would store zero in the field. If they refuse to tell you their income then any negative value can be used to identify that situation since no one has a negative income.

go to top

FaceBook Follow
Twitter Follow