"Behind the Scenes"
|June 2014||The monthly newsletter by Felgall Pty Ltd|
From recent discussions I have been involved in on a couple of different forums, I have discovered that there are quite a number of people writing scripts for the web where the person has no idea of some of the most basic of security concepts. The one in particular that came up in recent discussions is the concept of tainted and untainted fields. The way these people were arguing made it obvious that they had never heard of this concept and so obviously were not applying it to their code.
The concept of tainted and untainted fields in programming is a very basic one. An untainted field is one where the content of the field is guaranteed to be valid. Either the value has been set within the current program itself or if the value has come from elsewhere it has been validated or sanitized before moving the value to the untainted field. A tainted field is one where the validity or otherwise of the content is potentially unknown because there is at least one way where the user of the script can insert their own value into the field.
Note that a field is either tainted or untainted. Fields do not change their status if the content is confirmed to be valid. A tainted field is always a tainted field even after the content have been validated or sanitized. Only by moving the content to an untainted field as a part of the validation or sanitizing process do we get the value in an untainted field.
I am sure that there is at least one term that I have used in this discussion so far where you don't understand fully what the term means with regard to writing programs so let's define a few terms before I get to discussing why having untainted fields in important.
Validation is the process of testing the content of a field against a set of rules that indicates what the field is allowed to contain. If the entire content of the field complies with the rules then the value passes validation. If anything in the value doesn't match the rules then the field fails validation and an error message is produced to tell the person who entered the value that it is invalid. Validation is used of fields where the person using the code is expected to enter the value. So for example if the field is supposed to contain an email address then the rules to compare the content against will be the rules defining what an email address is allowed to contain. In this particular case the rules are quite complex but email addresses are quite common and so many languages have a built in validation routine for email addresses that can be used.
Where the value is not being entered directly by the user but is potentially accessible for someone trying to crack the system to alter we use santizing instead of validation. Sanitizing also compares the value against the rules for the particular type of field but instead of rejecting the field completely and producing an error message if the value doesn't match the rules, the content of the field is instead altered so that it matches the rules. What this generally means is that any characters in the input that do not match the rules are stripped out (so if validating an email address where only one @ is allowed then a second !@ would be removed). Sanitizing is done on fields where the value was originally defined in code and where that value is expected to be passed through and so be valid. The purpose of sanitizing the field is to cover the case where someone deliberately changes the value. Sanitizing removes anything from the content that would be invalid and so greatly reduces the possibility that tampering with the data will actually achieve anything.
Where a field has its value set directly in the code we know exactly what value it has and that the value is valid. Where the field is set directly from the output of a validation routine we don't know exactly what value will be set but we do know that the value is valid because we will only have got to this point if the value passed validation. Where a field is set directly from the output of sanitizing then we cannot be sure that the field is valid but we do know that every character still in the field is one that passed the validation rules. The only way the value can be invalid is if there are characters missing that the rules require to be there. This type of invalid data is comparatively safe does not have much potential to do harm. Since this situation is only going to occur if someone tampers with values they are not supposed to touch, this situation is unlikely to arise and if it does then the worst that is likely to happen is that the script will fail at some later point when it tries to process incomplete data.
Since an untainted field is always loaded in one of these three ways within the script itself, we can be certain that the value in an untainted field is either valid or at least safe at every point within our code without having to backtrack through the code. At every point where a value is loaded into an untainted field the value was confirmed to be valid or harmless and so at every point where the untainted field is referenced we can assume that the content is valid or harmless without needing to backtrack to find how the value was loaded. A field is considered untainted because we have this control over the validity of the values that are loaded into it and therefore do not need to backtrack to check this. No matter how much code the program or script contains, if we follow the rules for how values are loaded into untainted fields then we can be certain that the field always contains a valid value.
We do not have any such certainty with tainted fields. A tainted field may or may not contain a valid value. Even if we include code to validate a tainted field there is no certainty that the tainted field will contain a valid value at some subsequent point in the code. There is no guarantee that there is not a path through the code that bypasses the validation and therefore allows an invalid value to be contained in a tainted field at any point in the code. Yes it is possible to backtrack through the code and confirm that the only way to get to this point takes you through the code that validates the field but you have to do that backtracking from each and every point where the field is referenced. There is also no guarantee that a minor change to the code in the future will not introduce a path that bypasses the validation. Many of the security holes that are introduced into existing code are introduced because a path is accidentally introduced that bypasses the validation of a field. Tainted fields are always considered to be tainted because there is no way without backtracking the code form every possible point in the code to be absolutely certain that the field content is valid.
The key to using untainted fields is that the tainted and untainted fields use a completely different naming convention. The code is written in such a way as to ensure that all the fields following the untainted field naming convention only ever have values assigned to them either directly or as output from validation or sanitizing. Accidentally setting up a path that bypasses validation or sanitizing will be detected immediately in testing as the value will never have been transferred to the untainted field and so the code that relies on the value being there will not work the way that is expected. Re-validating a field just in case it somehow bypassed the original validation will never be necessary as no matter how far the piece of code is from where the validation is supposed to have taken place, the fact that the value is in an untainted field means that it must have gone through validation.
Now most of the people reading this are not writing programs or scripts of your own. The purpose I have in presenting this information is to show you a simple security concept that people could be using when they write their code that many are not using. A significant percentage of the patches that you are constantly downloading for all the programs on your computer and the patches that various web sites are constantly installing for the scripts they use are to patch security holes that would never have been able to occur in the first place had the author of the code used this simple concept of distinguishing between tainted and untainted fields in their code. Note that this measure is by no means the only security measure that they should be, but possibly are not, applying in their code but it is certainly the simplest one.
Perhaps the reason why so many do not implement this feature is that most books and web sites teaching how to program present most of their examples with the validation and sanitizing part of the code left out. The reason why they do this is that the omitted part of the code (to do the validation and sanitizing) is usually the greater portion of the complete script and the part of the code that is the subject of the particular lesson is within the remaining smaller part of the code. They can make the example code easier to read with respect to what is being taught at that point by leaving out the code that does the validation. After all the person learning the code is going to enter a valid value when they perform their test of this particular piece of code. This gets those learning how to write the code into the habit of simply assigning the value across from the tainted field to what is supposed to be the untainted field without having the validation in between and resulting in what is supposed to be an untainted field ending up tainted as well (since there is now at least one point in the code where it can potentially contain an invalid value). Once you have one or more tainted fields using what is supposed to be the untainted field naming convention then all benefit of being able to clearly identify untainted fields is lost. At that point the only way to reintroduce untainted fields into the code is to introduce a new naming convention for those fields.
I have just started a new series of CSS tutorials complete with working examples. This new series covers styling form fields - one of the most difficult parts of the web page to style consistently although as more browsers are supporting more CSS3 commands it is becoming easier.
The following links will take you to all of the various pages that have been added to the site or undergone major changes in the last month.