SQL Efficiency

More often than not the fewer database calls that you can rearrange your code to make, the more efficient that code will be. The database optimiser will make each call in the most efficient way that it can based on what it knows and so by combining calls together you provide more information on what the end result is that you are after and the process for the one more complex call will in most cases be more efficient than the multiple individual calls.

One obvious example of this is where you have a loop in your code that iterates over a list of those entries you want to retrieve with one SELECT statement in the loop that retrieves the record corresponding to the current entry in the list. Far more efficient is to have the loop build the list first and then make a single SELECT call outside the loop to retrieve all of the entries at once. That way the database optimiser does a single pass looking for all of the required entries rather than having to start over again for each search. Even where each entry can be obtained directly via an index lookup the single call will still be more efficient because the database can arrange the order in which it retrieves the records into the most efficient order based on how the index is organised.

A second perhaps not quite so obvious example is where you have a SELECT statement to work out whether or not an entry exists followed by an UPDATE if it does exist and an INSERT if it doesn't. This can be made more efficient by getting rid of the SELECT completely. Just how best to handle this really depends on whether the more common situation is where records already exist and are to be updated or whether it is more common that they don't exist and are to be inserted. Where insert is the more common requirement then you can use a single database INSERT call that includes a DUPLICATE KEY test that contains the UPDATE instructions. Where having the entry already exist is more common you could simply use a REPLACE call which updates if the record exists and inserts if it doesn't. Another alternative would be to simply call whichever of the INSERT and UPDATE commands is the one most likely to be applicable first and then if a duplicate key or not found error occurs then make the second call. While this alternative has two calls instead of the one that the earlier alternative used this will be more efficient where the vast majority will not require the second call - since that means that the single call made most of the time will be simpler and hence will run faster than the single call that also caters for the rare exception.

Combining calls together in order to remove an extra call that is there only to handle rare exceptions is one of the few instances where it is not necessarily going to improve efficiency and where even if it does improve efficiency very slightly the SQL will be more complicated and hence harder to maintain. A slight efficiency saving is not worth it if it is going to make maintaining the code a lot more difficult. The increased time to maintain the code will more than offset the slight increase in execution speed.

Another situation where two calls are better than one is where you are doing a random retrieval of a single record that meets specified criteria. In mySQL you could use ORDER BY RAND LIMIT 1 to do this in a single call. In fact this is the more efficient way to retrieve a single record at random provided that there are fewer than 1000 records that meet the criteria that need to be 'sorted' into random order. As the number of records increases the time to 'sort' them will increase. If instead you do two calls - the first to retrieve the number of records that meet the criteria and the second which uses a random number generated by the surrounding code to retrieve a single record then the time taken for the two calls will be fairly constant regardless of whether there are ten or ten billion records that meet the criteria.

When deciding how to split your processing between SQL and the surrounding code you should always look to get the SQL to perform as much of the necessary processing as you can as that will generally make the code more efficient. Only where reducing the code to use a single database call means getting the database to do unnecessary things (such as randomly 'sorting' all the records when you only want to retrieve one) or where the SQL will end up overly complex and hard to maintain should you consider using multiple SQL statements.

The other thing that you will need to consider where you do have multiple SQL calls that are all part of a single task is what you want to happen if one or more work but others fail. Usually in this situation you will want to process them as a single transaction so that you can ROLLBACK if any of the calls fail and only COMMIT if they all work.


This article written by Stephen Chapman, Felgall Pty Ltd.

go to top

FaceBook Follow
Twitter Follow