Sorting Databases

When you are storing data in a relational database one thing that you do not need to concern yourself with is the order that the records are stored within the database itself. ly time that you need to concern yourself with what order the data is in is when you pull the data out of the database.This is done using the ORDER BY clause on your select statement to specifry what field or fields to sort the results into and whether the sort should be in ascending or descending sequence.

Using the ORDER BY clause of course means that the selected data needs to be sorted prior to delivering it to your program and so just how many records you are selecting and what you are sorting them on will make a big difference to how efficient that the database can sort your resultset.

Using a WHERE clause to restrict the number of records to be extracted and hence needing to be sorted is one way of making your sort more efficient provided that you can actually specify criteria to reduce the number of records needed that way. The alternatives of using the HAVING clause or some proprietary method of reducing the number of records to be extracted (such as the mySQL LIMIT clause) do not reduce the number of recirds to be sorted as they apply to the sorted data in determining what to include in the resultset. Similarly the GROUP BY clause requires that the records be sorted first before they can be grouped.

What probably has the biggest impact on the efficiency of your sort is what fields you are sorting on. The most efficient sorts are the ones where all of the fields to be sorted are contained within an index as then only the index needs to be sorted and any other fields requested can be easily retrieved from the main data table after the index has been sorted.Adding an index in this situation where one doesn't already exist may be one of the simplest ways that you can make a dramatic difference to the speed of the required processing.

The types of sort to avoid are those where it isn't possible to add an index. For example where you want to order the content by a part of a field. Needing to specify a part of a field in an ORDER BY clause is a good indication tha there is a problem with your database design. While modifying your database and the attached application to split up the data in that field into multiple fields may involve some work, it is probably far better to do that than to actually use the part field in the sort and so require that the sort run against the main data table.which in turn will require a significant amount of storage to be used in order to make a temporary copy of the entire table (or at least those rows identified in the WHERE clause) in order to perform the sort.

Performing a random sort on a table involves as much overhead as sorting on a part field. Again the entire table needs copying in order that the data can be "sorted". A lot of the time a random sort is performed simply in order to extract one record at random. Where only one record is actually required a far more efficient solution is to run two queries on the database, the first to determine how many entries there are that meet the criteria to qualify for for the random selection and then after selecting a random number between 1 and that number of entries you perform a second query to retrieve the record in the database's natural order that corresponds to that location in your resultset. For example mySQL can specify LIMIT 1 and then use OFFSET to specify the random number entry to retrieve.

By remioving the need to sort when you only want one record and by reducing the amount of data to be sorted when you do actually need to extract more than one record you can greatly speed up your database read processing.

 

This article written by Stephen Chapman, Felgall Pty Ltd.

go to top

FaceBook Follow
Twitter Follow
Donate