"Behind the Scenes"
|August 2014||The monthly newsletter by Felgall Pty Ltd|
What is the Purpose
Many people writing computer programs today follow somewhat of a cookbook approach. They see what the rules are for a particular step in the process and they make sure that what they create follows those rules. When they do this they don't think at all about what the rules are there for. This can mean that while their code follows the rules that it may or may not actually avoid having the issues that those rules are there to resolve. The rules are there for a purpose but simply following the rules does not guarantee that their purpose is met.
The other side of this are those rules created for one particular purpose where the resultant code is then used for an entirely different purpose. That code then turns out to have issues when used for this alternate purpose that it wasn't intended for and the code gets a bad reputation even though it still serves its original purpose quite properly.
Let's look at a couple of examples to make it clearer what I mean by the difference between the purpose and the rules that people are following.
There are a number of rules to follow when designing a relational database. These rules are called normalisation and as the rules are applied o the data one by one the data achieves 1NF, 2NF, 3NF, BCNF, and where necessary 4NF, 5NF and even 6NF. Each of these Normal Forms implies that the data for your database has gone through a process to ensure that it complies with all of the normalisation rules up to that stage. Originally there were only three Normal Forms as these were originally all that was felt to be necessary to meet the purpose of normalisation. Boyce-Codd normal form was introduced to cover something that had been overlooked in the original rules and 4th and 5th Normal Forms were introduced to cover special cases.
So just what purpose is normalisation intended to serve? Well the intention is that when you query the database for a particular item that the value that you get back from the query be guaranteed to always be the same value regardless of how that query gets processed in retrieving the item. A database that is in BCNF is one where every item is a given table is dependent on 'the key the whole key and nothing but the key' that identifies that particular row in the table. 4NF deals with ambiguities that can occur if you use three way joins and 5NF deals with where there might be multiple paths through the data that can potentially end up in different places in the same table depending on which path you take. These descriptions are only loosely accurate but are sufficient when considering the purpose of normalisation.
Two things that ought to occur when a database is properly normalised so as to meet the purpose of normalisation is that each item (other than those used as keys) should only occur once in the design and any items that can be derived from other values should not appear at all. Now this doesn't mean that the same field name and content can't appear multiple times provided that each one has a different meaning (for example you might have a home address, a shipping address, and various other addresses that are all different items of data as far as the design is concerned regardless of whether they are stored in the same table or use separate tables).
Normalisation does not guarantee that each item will occur only once in the final design. Normalisation simply provides a set of rules that will help you to reorganise the way the database is structured so as to hopefully arrive at that point. It may be possible to have your database fully normalised and still have the same data appearing multiple times in the database such that querying that data has the potential to return the value from either place with no guarantee within the database design itself that the values stored in both places will be the same. Since the database is normalised and that process is supposed to ensure that multiple values for the same item are not possible the person creating programs based on this design will probably not even consider the possibility of data integrity errors.
One side effect of designing your database to satisfy the purpose for which the normalisation rules exist is that writing and updating the data is made more efficient as each item is stored in only one place. This does come at a cost though since simplifying all of the writes like this means that actually reading back the data (particularly when the values required include derivitive fields) is more complicated and slower. Where the number of reads os relatively high and the data is relatively static (few writes) then undoing some of the normalisations to speed up the read processing may be a completely appropriate thing to do. The purpose of avoiding slow processing overrides the purpose of maintaining data integrity entirely within the database. Of course doing this does mean that you then have to maintain that integrity in the way that the writes are combined together outside of the database itself.
For our second example let's consider hashing. Now the original purpose of hashing was also data integrity - this time relating to any file and not databases specifically. The purpose of a hash is to be able to easily detect if any changes have been made to the data. Given the content of a particular document the hash of that document will always be the same value (assuming that the same hashing algorithm is used) so you can confirm at any time that the document is unchanged by regenerating the hash and checking that it is the same as you got last time. Hashes make it easy to detect even the smallest change in the document because they are designed so that even the smallest change will result in a completely different hash. The hash doesn't tell you what has changed, it just tells you that something has when the hash you get this time is different from the hash you got last time.
When used for this purpose it doesn't matter whether it is easy or hard to take a hash and work out a possible original value that would create that hash since the original document is available so you know what value generates that hash provided that the document isn't tampered with.
There is an alternative purpose for which hashes are now used. This second purpose also takes advantage of the fact that small changes in the original result in large changes in the hash but this second use also requires that it not be easy to determine a possible original value from the hash. This new purpose is the storing of passwords in a database in a format that will prevent anyone other than the person who originally set the password from finding out what the password is by examining the value stored in the database.
While all of the hashing algorithms that have been created meet the original purpose of hashes, only some of the algorithms satisfy this second purpose. In addition, as computing power increases the possibility of working out passwords to match with hashes (whether those are in fact the actual password the owner selected or simply another string that hashes to the same value doesn't matter) becomes easier. Hashes where the amount of processing required to find a value that generates a given hash using a given hashing algorithm may be more than is available today, it is quite probable that in the future processing power will be available to do this and then using a different (probably longer) hash will become necessary in order to achieve the purpose.
So any hashing algorithm will serve one purpose while only some hashing algorithms will serve the second purpose. That a given hashing algorithm doesn't serve the second purpose doesn't mean that it is useless or bad, it just means that it is an inappropriate one for that purpose while continuing to be perfectly usable for its original purpose.
Blindly following the rules will in most cases actually achieve the desired purpose (after all that's what the rules were created for) but where following the rules doesn't actually achieve the purpose then more is required. Where the actual purpose of the rules is not considered and the rules do not completely serve the purpose for which they are intended then there will be potential holes in the final code that might just end up being exploited by someone. At best there will eventually be some sort of integrity problem where the data cannot be relied on in all cases.
Recently discovered that the new hosting I moved to a couple of months ago has additional security built in regarding sending of emails that was preventing a lot of the emails from being sent. I have now updated the script so that it uses authenticated SMTP to send the emails instead of the PHP mail() method and so hopefully the email issues are now resolved. My apologies if you didn't get the last couple of newsletters - they will soon be available via the web site so that you can catch up.
The following links will take you to all of the various pages that have been added to the site or undergone major changes in the last month.