Don't Use Wildcards

One very common construct that newcomers to SQL use is SELECT * to retrieve all of the fields in each row. While this is easier to type than specifying a comma separated list of the actual fields that you want, there are a number of issues with using the * wildcard in a select statement which means that your select is effectively a bug just waiting to happen - as soon as the table the select is referring to is changed the code relying on the select may break - something which will not occur if you specify the field names instead.

The * retrieves all of the fields that the query references in the order that they exist within the database. The order is further complicated when the select merges data from more than one table and is still more complicated where fields with the same name exist in each table. Once you add to that the possibility that one of the tables may be updated to add or delete columns you effectively have no way to predict what order the fields will be returned in. If instead you provide a comma separated list then the fields will be returned in the order you specify.

The next benefit to specifying the list explicitly rather than using * is that you only need to return the fields that you specifically need. Some field types are less efficient to access than others and so the table may be physically split in two with the less efficient to access fields being stored separately from the rest of the table. This is done internally by the database system itself as one of the ways it makes accessing the data more efficient. Referencing those less efficient field types that are stored separately will slow down the database access and since specifying * whether you need all the fields or not will retrieve all of the fields the select can never take advantage of the performance improvements the database provides through splitting the data internally. If you specify the list of exactly what you need and you don't need any of the fields that are less efficient to access then the query will run much faster.

Specifying which fields you need also protects your code against changes to the definition of the table. An extra column added anywhere in the table will not have any impact on your code whatever where the new field is not relevant to what the code is doing. Had you used * then the order of all of the fields you are retrieving may have been changed and the subsequent code may no longer work correctly even though you don't need to use the new field. Similarly if a column is deleted from the table then if you use an explicit list of fields you can easily tell straight away if that change impacts on your code as if the select refers to the deleted column then your code needs changing whereas if it doesn't reference the deleted column it will not be affected by the column having been deleted. If you used * then it is much harder to tell if the code is affected or not.

Yet another benefit to using a list of the field names is where you have multiple tables and the same column name appears in each. You can easily qualify the names as to which table they refer to and if you want to be able to reference both separately can also apply aliases to each so that they each have a different name in the following code. If you use * then only one of the fields will be returned and you will have little control over which of the two that it will be.

The thing is that when you are writing queries to run in a program, you only need to write the query once and then it will be run each time the program runs. There is very little extra work required to list the fields that are needed once in a query for a program that will run many many times. The benefits of explicitly listing the fields far outweighs the miniscule time saving gained by specifying * instead.

Ad Hoc queries is a different matter. Where you are going to be writing a large number of queries and running each of them just once in order to extract information from the database on an ad hoc basis it is perfectly appropriate to use * in order to save the time that is needed to list all the fields. As the query is only going to run once there are not going to be any changes to the table structure to potentially break your query and the extra overhead in running the query will not be significant enough to compensate for the time extra time required to write all the field names in the query (eg, you save a minute of typing time by using * and it takes 1/10th of a second longer for the query to run).

 

This article written by Stephen Chapman, Felgall Pty Ltd.

go to top

FaceBook Follow
Twitter Follow
Donate