Select and Union

Most of the time when writing database queries you will be able to specify the criteria that your query is meant to retrieve quite simply. The WHERE clause will simply identify the criteria that needs to be matched. Where more than one table is involved you will simply specify each table and include how the entries in the tables are to be joined either as part of your WHERE or separately in an ON clause.

Things become slightly more complicated where you have alternate tables that you want to match against. In this situation you really need to first ask yourself why you have the information in multiple tables. In most instances there is no real reason why equivalent data should be split into more than one table as doing so will sometimes make accessing the data less efficient and will at best be as efficient as the single table version - unless most of the time you are only referencing the data in one of the tables and rarely reference the other table.

One situation where I split a table into two tables with identical data where I considered such a split to be justified is with a club membership with a Member's Area. Here I split the main data for the members into two tables - one for those current and recently resigned members who were to have access to the member's area and the second table for past members where the main requirement was simply to be able to look up to see if a particular person had been a member previously (so as to determine whether discounts for rejoining were applicable). Well over 99% of the data access would be on the current member table which would only grow slowly in size as the actual membership of the club grew while the second table for past members would grow much more rapidly and soon be many times the size of the other table but with the data seldom needing to be referenced. Anyway that was my logic in deciding to make this two separate but identically structured tables (the member's area access info being retained in the past member table so that in the rare instance where we reinstated a membership that the data could simply be copied back).

In splitting the membership data into two tables I had overlooked one area where the data from both tables would be needed. That area is the audit trail part of the receipt issuing where all the receipt numbers need to be able to be reported along with what they were for. This involved joining the receipt and member tables on the member number so as to extract the membership information to report on the receipt. By moving those who are no longer members to a second table I broke this audit trail facility because receipts issued to those no longer members would not find a match in the member table and so would not be reported.

Now that we have considered a situation where a UNION statement in the SQL is required, let's now move on to how to actually define such a UNION.

First let's start with a simplified version of our original query which references two tables - let's call the tables receipt and member to match with the situation I described above.

SELECT fld1, fld2, ... FROM receipt, member WHERE .... ORDER BY fld1

That is enough of the query for us to be able to see exactly how we need to proceed in order to add to it those receipts where the member information is in the oldmember table instead of in member. To retrieve just those receipts we'd use:

SELECT fld1, fld2, ... FROM receipt, oldmember WHERE .... ORDER BY fld1

So what we need to do is to get the results from both of these queries to use in our processing. We can combine the two together into a single query using a UNION like this:

(SELECT fld1, fld2, ... FROM receipt, member WHERE .... ORDER BY fld1)
(SELECT fld1, fld2, ... FROM receipt, oldmember WHERE .... ORDER BY fld1)

The problem with this is that while the entries from each of the two sub queries are sorted into the desired order, those from the second SELECT all follow those from the first and so we have not gained anything by combining the queries rather than running them one after the other. We need to move the ORDER BY outside of the UNION in order to sort the result from the union. To do this we need to have the UNION build a temporary table for the ORDER BY to sort for us. The resultant code is:

(SELECT fld1, fld2, ... FROM receipt, member WHERE .... )
(SELECT fld1, fld2, ... FROM receipt, oldmember WHERE .... )
) AS temp_table ORDER BY fld1

Note that the code from the original query is basically intact having just been duplicated and wrapped in additional code in order to effectively join the two member tables as if they were one table. That's why I didn't need to include all of the exact details in the original query for this example - because that isn't affected by using UNION.

There are two other things to note with the final code that we need to do this. First is that the UNION is creating a temporary table and even though we will not be keeping that table after we complete the query we still need to give the table a name in order to comply with the appropriate syntax for the SELECT statement.

The other thing to note is that this is one of the few places where it makes sense to use * rather than listing all of the fields that we wish to retrieve. You should always list all of the fields that your particular query needs to retrieve in a SELECT statement so as to not retrieve any fields that you don't need. That helps to ensure that your code will not be impacted if extra fields are added that your code doesn't need to reference and also makes the code easier to read as you can see exactly what is being retrieved. In this instance though we can use * and still know exactly what fields we are retrieving because we have already listed those fields in the sub-queries.


This article written by Stephen Chapman, Felgall Pty Ltd.

go to top

FaceBook Follow
Twitter Follow