Escaping Data and Security

There are a number of different commands for 'escaping' data depending on where it is to be used - for example in PHP where you are intending to output text into a web page where the text might legitimately contain a less than sign or ampersand you need to escape that data using htmlspecialchars(). Similarly, it used to be common to use mysql_real_escape_string for any data being loaded into a database where the data might contain quote characters.

Note that I have specified in the above paragraph the sorts of things that would be legitimately allowed to be in the content in order to require that the data be escaped. This is because the purpose of escaping data is to ensure that characters that have a special meaning in the context of where you are sending them (into the HTML of a web page or into an SQL command that will need to run to update a database in the above examples) are not interpreted as having those special meanings when the characters are meant to be a part of the data. With database calls now allowing the data to be specified separately from the SQL by using prepare and bind statements, the entire reason for needing to escape the data in that situation no longer exists.

One mistake that a lot of people make is in thinking that escaping data has something to do with security. It actually has as much to do with security as airbags in a car have in preventing a collision. With a car it is the brakes that are the component intended for preventing a collision and in a script it is validation that is what provides you with security. Just as an airbag might save your life if you drive and crash a car that has no brakes, escaping data might (or might not) prevent a major security breach on a web site that has no security. Of course it will not prevent someone filling a database with millions of records of meaningless characters so that the application can still become unusable.

What you need to remember is that escaping data is an output function. At this point in the process the data should have already gone though the input process of validating it so as to make sure that the data is meaningful. It should not be necessary at all to escape any field which cannot legitimately contain a character that needs to be escaped. Running data through an escaping function when that data cannot contain characters that need escaping in the first place simply wastes processing time as there should not be anything in the data for the function to escape.

Security should be taken care of as the first thing in your input processing. A PHP script receives its input through the $_GET[] and $_POST[] arrays. Before you do anything else with that data you should validate it to make sure that what each field contains is meaningful data for that field. Only by ensuring that each field contains meaningful data do you have proper security of those fields throughout the rest of your script. It isn't just the database calls and writing to the web page where malicious data can create security issues - such issues can occur anywhere within the code and so the security measures must come prior to anything else in order to ensure that there isn't going to be security issues with early statements in your code.

One reason why 'register globals' was turned off in PHP and is about to be removed completely is that it interferes with your ability to make sure that the security checks are performed before the data is used. Knowing that the data has come in through those two arrays and has no other way into the code means that you can begin by validating those entries as you move them from those tainted arrays (tainted in this context means potentially insecure) into the fields used throughout the rest of the code. Only by validating the content as you move it for the first time can you be sure that the field you are moving it to will not be tainted and therefore be a potential source of security issues elsewhere in the code.

Looking at it from a different viewpoint makes the situation more obvious. If a field is supposed to only contain numbers then if you start by checking that it only contains numbers then it can never contain any of the characters that can cause problems if you were to not escape the field at some point. If instead of validating the field is numeric you just escape it before adding it to a database or writing it to a web page then the person entering the data can still insert junk into your database (even if you use prepare/bind in order to do away with the need to escape the data) and can still write junk into the web page. This junk is just as much a breach of security as any other security breach. Just in this case the problem is limited to your application and doesn't expose your entire hosting account.

Escaping data does nothing to make your code more secure. If you ensure that the data is valid from the start then your data is already secure and you only need to escape fields that can legitimately contain data that needs to be escaped. If your data can be invalid when it gets to the point where you are escaping it then your security is already compromised.

 

This article written by Stephen Chapman, Felgall Pty Ltd.

go to top

FaceBook Follow
Twitter Follow
Donate