Tainted data is anything that is not guaranteed to be valid

This comment by Chris Shifflet on page 16 of his book "Essential PHP Security" basically sums up the biggest security issue with most code today. So many people writing code allow tainted data to be processed throughout their code rather than eliminating it at the very start.

So how do you eliminate tainted data - you validate it (or at least sanitize it) so that the data will be known to be valid. It isn't good enough to just validate or sanitize in place though. You need to know that the data is untainted just by looking at the variable name as otherwise you can't tell just by looking at one small part of the code that the content is untainted and you are thus effectively looking at tainted data again.

What we need to do is to divide up the variables into two groups - those where the content is known to be valid and those that are tainted. Chris Shifflet suggests using an array called $clean to hold all of the valid values and to treat everything else as tainted. This approach is necessary where people have not implemented proper validation from the start since PHP itself names most of the tainted variables in an easily recognisable way provided that you don't go copying those values to other variables without ensuring the content is valid first. Since most people seem to copy tainted variables unnecessarily as the first step in their code that effectively means that the names that PHP provides that would be known to be valid without this are tainted by the coder (a backward step for security right at the start). As you can't ever be 100% certain that you have fixed all of these coder introduced security holes you can't rely on the built in naming for security unless the project enforces the naming conventions right from the start, so introducing a new naming standard for untainted variables is always going to be more secure.

So once you have a naming convention to identify all of your fields that are not tainted, how do you ensure that those variables are not tainted? Well there are two ways that values can be inserted into variables that don't taint the variable. First you can assign a set value defined within the code to the variable where you know that the variable is untainted because you know that the value you set it to is valid because it is set in the code itself. The second way is to filter the content of a tainted field so as to make sure that field contains a valid value and then immediately copy it to the untainted variable. This filtering can be either validation (where an invalid value is rejected and a request made for the user to enter a valid value) or sanitisation where any invalid data is removed from the tainted variable so that whatever is left will be valid.

Validation is required for user input where you have no control over what is input in the first place. Sanitisation is required for all the other inputs where the expected valid input value has been set elsewhere but where there is a possibility no matter how slight of the value being tampered with. Unless the data has been tampered with sanitising should not change the value and instead simply pass the already valid value through unchanged. Where sanitising does change the value then you know someone is trying to tamper with the data and should immediately terminate the processing without producing an error message to give the tamperer any clue as to what has happened.

All of the actual processing done by your code should be done using the variables where the content is known to be valid. The tainted variables should not be referenced anywhere after the initial filtering stage. Note however that if no filtering is being done then your code will be more secure if you continue using the original tainted variables rather than trying to conceal that the variables are tainted by copying them to another name.


This article written by Stephen Chapman, Felgall Pty Ltd.

go to top

FaceBook Follow
Twitter Follow