Proper Processing of Variables in PHP

Most beginners get totally confused when it comes to the processing that their scripts require for the variables they are using. The problem is that most courses leave out huge chunks of the necessary code in order to demonstrate how to provide the overall application but with the assumption that the data being processed is valid and in the right format. This gives those beginners the wrong idea of what code they need as the code to validate, sanitize, format, and escape data usually makes up the largest portion of any program and the code they write leaves out most if not all of that code.

One possible solution to this (adding the appropriate code, not educating beginners) is to use an object in place of each variable. All of the extra code can then be encapsulated in the class that the object belongs to and the actual code that the beginners have learnt to write will require minimal modification to add all the necessary functionality. The variable itself becomes a private property of the object and so can only be accessed via the methods provided for the class.

The methods that we would add to such a class would be divided into two groups - those to load the value into the object, and those that use the value in the object.

The first of these groups requires at least one of each of two different types of methods. We need at least one method that validates the value entered before loading it into the private property. This method would be called when the value to be loaded into the object has been entered by a person and passed into the code via $_GET[] or $_POST[] and would ensure that the value entered is actually a valid value for that variable to contain. If the format we decide to store the value in internally is different from that entered then it would also take the valid value and convert it to that internal format (for example dates would be entered as a text string but would be stored internally as a date. The second input method would sanitize the data rather than validating it. This second method would be used where we have a reasonable expectation that the data should be valid but where there is a remote possibility that the data could be tampered with and so we run it through a sanitizing filter that would strip out any invalid data if in fact the content was tampered with to insert such invalid characters. This method would typically be called with the value returned from a database call just so as to ensure that if the database were to be tampered with that the data would still be able to be processed correctly by our code. Additional variants of either of these methods might be required if there are multiple formats in which the value can be supplied and you don't want to use a generic method that will accept any valid format.

The second group of methods are those that take the value and format it for use. You would almost certainly have multiple formatting methods since the format that the data needs to be in will vary depending on where it is to be used. Typically you'd have one method to format the value to a format suitable for use in database calls and another to format the value for display in a web page. For example the database format method for a date field needs to take the date that is stored in the private property and convert it to a text string in "ccyy-mm-dd" format while the method for outputting to a web page would take that same date and format it the way you would expect people to be more used to seeing it - probably with the month name in words so as to make it easier for those from countries who put the day and month the wrong way around to more easily tell which is which.

Where the field might validly contain characters that could be confused with the code the value is to be inserted into (for example if the field can validly contain & < or > characters and you are outputting to a web page) then you would also need your format method to call the appropriate escape function so as to escape those characters - in this instance you'd call htmlspecialchars(). Not all fields need to be escaped since you only need to escape fields that are allowed to contain characters that can be confused with code. A date field would never need to be escaped since it can never contain any characters to escape. Another example of where no escaping would be required would be a numeric field.

Current best practice with database accesses is to use separate prepare and bind statements either through mysqli or PDO. This keeps the SQL in the prepare statement and the data in the bind statement and so no escaping of any data is required because it is impossible for the data to be confused for code. This solution is far superior to the old way using query calls with the SQL and code jumbled together which required tie use of mysql_real_escape_string() to encode those characters in the data that could be mistaken for SQL.

By replacing each variable with an object with these methods defined and then updating the code to call the appropriate methods at the appropriate times, a beginner would be able to easily take their existing code which relies on their visitors entering valid values and convert it into a robust secure version that handles any values thrown at it in the most efficient manner.


This article written by Stephen Chapman, Felgall Pty Ltd.

go to top

FaceBook Follow
Twitter Follow