It has come to my attention that some people mistakenly believe that escaping data is a security measure. After much discussion I was eventually able to clarify why they had this misunderstanding. It seems that many people believe injection attacks are the only thing that is a security issue and since they can use an escape function to patch this particular security hole then that is all they need concern themselves about regarding security.
Now injection is just one of the many possible security issues that can occur. These people have code that is still just as exposed to any other type of attack as if they hadn't patched injection. There are potentially lots of places that the code can be attacked long before it gets to the escape function.
If you forget about patching code that is potentially riddled with security holes and instead design your processing with security in mind from the start then you will never use escaping as a security measure. Instead you will use sanitizing as a security measure and will keep escaping for its intended purpose. Where the code and data can be separate then you do not need to escape anything. Where they must be jumbled together but valid data cannot be mistaken for code then there is no need to escape. Only where legitimate data can contain characters that can be mistaken for code do you need to escape. In that situation the escaping is both necessary for handling valid data and it is specific to the language the code is written in.
Let's look at a couple of simple examples to show how sanitizing can protect all of your code from security issues where escaping achieves nothing.
Now in Australia we have a concept called franked dividends. Australian companies that pay 30% tax on their profits can pass on credits for the tax paid when they pay dividends to their shareholders. So a $70 dividend paid can have $30 franking credit attached. The recipient declares $100 income with $30 of that already paid as tax. We'll assume that the nett dividend and the shareholders name have been entered into a form and that the form content has been validated and that once determined to be valid that those two values are loaded into the querystring and passed to the page we are about to examine in more detail. Now it doesn't really matter how that data gets passed from one page to another, the only reason we are using the querystring is that it makes it more obvious that there is the possibility for someone to tamper with the data on its way from one page to the other. The same possibility of tampering exists no matter how you pass the data between pages but this way makes it easier for you to replicate the possible attacks.
Let's look at the dividend amount first and imagine that we want to display the gross dividend amount (including franking credit) on the is second page. To keep things simple let's imagine the amount gets passed in cents rather than dollars and that we will only convert to dollars at the end.
$frdiv = (int) $_GET['frdiv'];
// calculate gross amount in cents
$grdiv = $frdiv * 10 / 7;
// convert to dollars
$moneydiv = money_format('%i', $grdiv / 100);
$dispdiv = htmlentities($moneydiv);
In this example we have used casting to an integer for sanitizing and htmlentities for escaping.Now if the dividend is $14 then the value passed to the sanitizing step is 1400 and the value echoed to the HTML is 'AUD 20.00'. Now since 'AUD ' followed by a number with two decimal places can never contain a character that can be misinterpreted as HTML the escaping call in this instance does nothing when the data getting this far through the code is valid.
Now let's consider what would happen if someone were to tamper with that 1400 and change it to '<script src="http://example.com/inject.js"></script>' instead (appropriately escaped to be part of the query string so that the $_GET variable will be loaded with this text).
With the code exactly as show the sanitizing converts that script tag to the number 0 (anything that isn't a number gets cast to zero when you convert to an integer). That results in 'AUD 0.00' being passed to the escape function which again in this instance does nothing because the text doesn't contain anything that can be misinterpreted as HTML.
Now let's consider what would happen if we didn't sanitize this input. In this case the value in $frdiv would be the text for the script tag. The first line of the processing now tries to multiply that by 10 nd since the value is not a number, our code crashes. Now depending on how the server is configured and what other code has already been run this crash may potentially expose information either about the server or about the other data to whoever has managed to generate this crash. Again the escaping achieves nothing as in this situation it doesn't even get to run.
As you can see, in this particular situation the sanitizing protects the following code from crashing (a potential security hole) while the escaping either receives data that doesn't need escaping or never gets to run at all and therefore is completely unnecessary.
Now let's consider the name field. Let's imagine that the shareholder in this case is "T & M O'Brian-Smith". (Note that in Australia any of the accent marks on foreign addresses are usually ignored so we'll assume that any such characters have been converted to remove them when the address was validated - in countries where these are kept you would need to change the regular expression in the sanitize step to allow those characters).
$name = preg_replace('/[^A-Za-z& \-\'\.]+/', '$1',$_GET['name']);
$dispname = htmlentities($name);
Now with this code the sanitizing basically strips out any characters that are not valid in a name. As can be seen for our example valid name a name can contain upper and lowercase letters, spaces, quotes, hyphens and ampersands. So our sanitizing removes everything that is not one of these characters. Now at the moment we don't have any actual processing to perform that uses this input so the sanitizing is not strictly necessary but including it makes it far easier if we do need to add processing at a later date as we will know that all our inputs are safe for processing because they can only contain valid values.
In this instance the escaping is more necessary than the sanitizing because valid names can contain ampersands and that is one character that always needs to be escaped when outputting to HTML. Note that this escaping is necessary to allow a valid name to be displayed and that this call would only become a security measure if we left out the sanitizing step.
So as you can see, sanitizing inputs that can potentially be tampered with protects all of your code from security issues while leaving escaping for those instances where it is needed for valid data to be distinguished from code. With both of the above examples it is the sanitizing step that protects the echo statement from injection attacks involving invalid data. The escaping becomes completely unnecessary for fields where valid values can't be confused with code and is only required where valid data can contain characters that can cause that problem. The only security application for escaping is as a defence in depth for if junk values somehow manage to get past the validation/sanitizing step.
This article written by Stephen Chapman, Felgall Pty Ltd.