What is Validation?

I have seen questions posted in forums that indicate that some people writing code have no idea as to what validation actually is. I suppose this is mainly due to people wanting to learn how to program in a specific language without first having learnt the basics of program design.

While there are specific validations that can be applied to specific types of fields eg. email addresses, most validations will depend to a large extent on the exact purpose of the field in business terms. Validating a field means you test the content that has been entered into the field against the business rules that you have decided apply to that field. If you are writing the program for someone else then they should be the ones defining most of the business rules as a part of their requirements. Of course they are unlikely to give you a comprehensive set of business rules at the start and if you are designing the program yourself you may not come up with all the rules at the start.

So how can we ensure that we have an accurate set of business rules that we can apply to validate the input? Well in the case of those fields I mentioned where specific validations can be applied there is a very clear definition of what is and isn't allowed in any field of that type and so that definition can be applied as the validation. In the case of an email address the validation could be done using a regular expression that is several characters long (as you find built into PERL) but in most languages where such a field is common you should find that a validation module to provide that validation for you is provided so as to make including such a complex expression in your own code unnecessary (eg. PHP has a validation filter specifically for email addresses which presumably has that regular expression embedded into the PHP program itself). JavaScript does not have anything built in for validating email addresses but since the validation will need to be done again on the server simply checking for the most obvious mistakes would be sufficient in JavaScript. Of course its easy to get too specific in defining an email validation in JavaScript and reject valid email addresses.

So our business rules for any field need to identify the difference between valid input and invalid input. To build such a set of rules where there is no clearcut definition you should first start by listing those rules that you think should be applied. You then make two lists of sample input for that field - one list of values that should be valid and the other of values that should be invalid. You should try to make the entries on each list as different from one another as possible in an attempt to capture as many of the alternatives as possible - there's no point in listing lots of similar values that would all pass or fail the same rule that you have already identified.

The next step is to check each of the values you have listed against the rules you have so far. Would applying the rules to any of the entries on your valid list result in that valid value being rejected - if so you need to modify the rule so it doesn't misidentify valid input as invalid. More importantly is there anything on the invalid list that would pass the validation with the existing business rules. If so then you need to add more rules so as to ensure that the invalid data is detected.

As you modify the business rules for the data on your lists, more alternatives will occur to you that should be added to the lists and checked against the business rules. Once you have a set of business rules that allows all of the entries on your valid list to pass and which will reject all of the rules on your invalid list you need to put the whole thing down and leave it for a few days. As you go about other tasks you may come up with additional alternatives. You need to remember that no value is to ridiculous to include on your list. Even though legitimate users may never attempt to enter anything remotely like what you have thought of doesn't mean that someone trying to break your code wouldn't try it.

Once a few days has passed and you have added all those extra entries to the list that didn't occur to you straight away you repeat the process of checking each against all of the business rules and modify the rules if you find a value that would produce the wrong result. Don't forget to go back and recheck all the prior values if the rules have been changed since you first checked them.

You need to keep these lists. They represent your test data for the field and will not only be used to test your validation once you have written the code to make sure that the code matches the business rules, it will also form the data for your regression testing should you ever need to add further values to either list that result in a change to the validation being required.

It is only at this point in the process (after you have repeated the above steps for every single input field) that you should start writing your code. Now you know exactly what validation is and exactly what validation you need to apply to each field because you have a list of business rules for each field that defines what is and isn't allowed in the field. You now apply your validation code to each field immediately at the start of your processing once those fields become available. That way you can be certain that in all of the subsequent processing that you do on a field that the content of the field is meaningful input for that field because it passes the applicable business rules and that the field content is unlikely to be junk (or worse). Of course a field can contain valid data and still be meaningless (such as John Smith entering his name as Joe Blow). Testing for such meaningless data is generally far more complicated and is not necessary for most applications. Should you decide that it is necessary for a particular field then that check should also be included in the business rules.

Of course some of these more advanced validations may not be able to be performed immediately and so you may need to provisionally accept a field as valid while testing this business rule. In these cases an extra field may need to be stored with the data to identify whether it is provisionally accepted or has actually been confirmed to be valid. An example of this would be where an email address is entered that passes all the email address rules. To confirm that it exists and belongs to the person who entered it (your extra two business rules) you send an email to that address with a link in it to code to update the flag from provisional to confirmed.

the work on defining the business rules is best done at the same time as you are working out the rest of what the code is supposed to be doing. Often how you are going to use an input defines some of the business rules for you (one formula I saw someone asking about would fail if a number less than 52 was entered into the input field and so for that particular code there would need to be a business rule that the number needs to be 52 or greater). Even if you leave actually adding the validation code until after you test the rest of the processing it will still be useful to have worked out the business rules first. Your processing needs to be able to work correctly with all of the values you wrote on your valid data list. Of course you don't need to test the actual processing code with any of the values on your invalid data list because once you add the validation step into your code none of those values should get that far. Of course you'll need to retest all the values on both lists once the code is thought to be complete and every time you change the code after that in order to ensure that the business rules have been correctly implemented in your validation.


This article written by Stephen Chapman, Felgall Pty Ltd.

go to top

FaceBook Follow
Twitter Follow