Char or Varchar

When designing your database you eventually reach the point of needing to allocate types to each of the fields. For some fields the type should be obvious as if you have a date to store then it obviously belongs in a date field and if you have a number to store it needs to be one of the numeric types.

When it comes to text that you are going to store in your database table the most likely choices for most of the text fields will be to make them either CHAR or VARCHAR fields. Deciding which of these to use is not necessarily obvious (certainly not as obvious as the choice of a date or numeric field is for those types of data).

Selecting the appropriate one of these for each of your text fields will mean that you can end up with a table that is significantly smaller as it will not be wasting more space than it needs to by having text fields longer than they need to be.

With both CHAR and VARCHAR fields you specify the length that the field has. The difference between the two is that the length that you specify for a CHAR field is a fixed length allocated to that field for every row in your table. For VARCHAR the length is the maximum length that can be allocated to the field and the field can be shorter than that maximum thus saving space in the table.

Why not always use VARCHAR then if it can save space when the content is shorter than the maximum size? The answer to that is that a VARCHAR has an overhead in that it needs to use an extra character (or more) to track the length of each specific field. This means that for any given data a VARCHAR field will take up more space in the database than a CHAR field just big enough to hold the data.

The choice between the two is therefore dependent on the expected content for the field. If all of the data for your field will always be the exact same length then a CHAR is the obvious choice since that saves you the overhead of using VARCHAR That does not mean that using VARCHAR is necessarily the best choice if the content varies in length.

Whether it is better to use a VARCHAR instead of a CHAR when the data varies in length depends on how it varies. If all of the data for the field is either one or two characters in length then a CHAR is the obvious choice since the overhead of using a VARCHAR means that the VARCHHAR will always end up using more space since it ads to the length of all the records unnecessarily instead of just adding to the length of the one character fields.

The same is true of any data where the length of most of the data is almost the same (vvarying by only one or perhaps two chharacters in length). When the data is all approximately the same length use a CHAR.

Where the data varies significantly in length from one record to the next with the longest being significantly longer than the average then you need to use a VARCHAR. There is no point in using a CHAR(40) if the average length is only 15 since that will resuult in a lot of wasted space in most records. Using a VARCHAR(40) instead means that you can still fit the leng 40 chharacter values while the average space used will be 15 plus the record length marker overhead.

Where it becomes difficult to decide is where you don't know enough about the data that may be entered into the database to know whether the majority will be the same length or whether they are spread over a wide range of different lengths. If you don't know how the actual lengths will be distributed then all you can base your decision on is the maximum length. As data that is always relatively short will not waste an enormous amount of space if you define it as CHAR then that is the most reasonable choice for when the maximum length is relatively short (say 10 or less) since unless the data is decidedly skewed (eg. almost all one character but with a small number that are 10 characters) the average will normally tend toward being only a few characters short of your maximum length anyway. For data that can be very long a VARCHAR is the more obvious choice since it will provide significant space savings for all of the data that are more than a couple of characters shorter than the maximum length. Even where it turns out that all of the characters are the maximum length you still haven't made the data take up significantly more space than it outherwise would have since the length marker is relatively small compared to the long length of all the records and so your VARCHAR data may only take up one percent more space than if you had made it a CHAR field in even the worst case.

As a general rule then use CHAR for short fields and VARCHAR for long fields. Where do you draw the line between short and long? Well that is your choice to make but the better that you know the expected length distribution for your actual data the more likely you are to be able to choose whichever of the two is expected to be more efficient. You may even find that once you properly analyse your data that you end up with a few CHAR fields that are as long or longer than some of the VARCHAR fields.

One final thing to note is that some database systems will automatically treat char fields bigger than a certain size as varchar on the assumption that doing so will be more efficient. This makes it less important when you are using one of those databases to make the choice between these two alternative definitions for yourself.

 

This article written by Stephen Chapman, Felgall Pty Ltd.

go to top

FaceBook Follow
Twitter Follow
Donate