Newsletter "Behind the Scenes" Newsletter

July 2011The monthly newsletter by Felgall Pty Ltd

My Word

Form Validation and Security

A lot of people get confused about how security works with forms in web pages. There are many different aspects to this security and a lot of people get one mixed up with another. Here I'll go through all of the different aspects of form security, what they are for and how they work. Even if you don't create your own web forms you may find the following useful when it comes to filling out forms on other people's sites.

The best way to ensure that any forms in your web page do not create security holes is to validate all the form fields properly. If your form fields only allow values that actually make sense then the application the form passes that data to will be far more secure than if you attempt to apply generic security solutions (which are often misused and therefore don't provide the expected security anyway. Validate form input properly and the security measures relating to those form fields will be taken care of automatically.

Even without considering security you need to validate form field inputs anyway if you want the application that uses that input to be able to process it correctly. If a field is supposed to collect an email address then someone entering the address of their post office box into the field isn't going to allow emails to be sent.

There are a number of considerations when it comes to validating form fields not all of which provide for security.What we need to also consider is the purpose of each of the alternatives so that each can be used in the most appropriate way and where there are alternatives that the most appropriate alternative is the one that we use.

The most obvious consideration with respect to validating form fields is where the validation should be done. Here we have two choices - we can perform the validation in the browser itself using JavaScript or we can wait until the form is submitted and use a server side language to perform the validation.

Each of these two places that we can validate actually serves a different purpose. Client side validation is there entirely for the benefit of your visitor. By using JavaScript to detect when a field contains invalid content we enable the person entering the data into the form to correct their mistakes before they submit the form. This saves them filling out the whole form and submitting it only to find once the next web page loads that what they entered was full of mistakes that need to be fixed. JavaScript can validate form fields as they are entered so that a mistake can be corrected before you enter the next field. This means they shouldn't make the same mistake in all of their entries because they know it is incorrect after entering the first one. Using JavaScript validation also provides more instant feedback because it doesn't need to load an entire new web page just to display one error message. What JavaScript can't do though is to apply any security whatsoever to your form as anyone can easily bypass the JavaScript validation simply by turning JavaScript off.

To be able to ensure that all the form fields are properly validated we must perform that validation on the server after the form has been submitted. That does not mean that performing validation in JavaScript first is pointless. JavaScript validation may do nothing for security but it does make filling out the form a more pleasant experience for those filling out the form with JavaScript enabled as they get more immediate feedback of any mistakes.

It may not be practical to implement all of the validations that you need for a form using JavaScript but that doesn't matter. Since the form must be validated again on the server after it is submitted any validations too difficult to implement in JavaScript will simply mean that the form has to be submitted before those particuler errors can be detected (just as applies to all errors for those with JavaScript turned off).

Our server side validation must be as thorough in validating the form field content as we can be, not only because it makes the subsequent processing more secure but it also helps to ensure that the information is actually meaningful in that what has been entered is at least valid content for that field. We might be able to exactly duplicate the validation in JavaScript so as to make it as convenient as possible for those filling out the form but we are not compromising security if we leave a particular validation out of the JavaScript through deciding that it is too difficult to implement there.

The next aspect to consider is how the data in the form is to be sent to the server.There are two methods that you can specify with a form to specify how the form data is to be handled - GET and POST. Which of these you choose should be based on the purpose for which these options are provided since this choice has nothing whatever to do with security.

Using the GET method tells the server that the intention is to retrieve information from the server. The information to be retrieved is presumably relatively static and so the browser is to be allowed to cache the results so that if the same request is made again that the results can be redisplayed without having to call the server.

The POST method implies that you are updating something on the server and therefore if the same form data is posted again it does need to be passed to the server since the action to be performed the second time may not be the same as it was the first time and the results will also possibly be different.

What does determine how secure the data is that you are passing from your form to the server is whether the form is on a web page using HTTP or on one using HTTPS. In the former case the form content is passed to the server in plain text. If someone intercepts the transmission (what is known as a "man in the middle" attack) then they will be able to read/use the data that is being passed. The only way of securing the data to prevent a man in the middle from accessing it is to use HTTPS where a security certificate attached to the web page is used to encrypt the form data before sending it and it gets decrypted only by using the same security certificate on the server.

Any claims that POST is more secure than GET or that you can secure the transmission without using HTTPS are false. It is just as easy to access/update POST data as it is GET data since the same data is being passed, just in slightly different spots in the transmission. Anyone trying to break the security of your transmission will have the necessary tools to access the POST data just as easily as they can access the GET data. Also there is no way to encrypt the data without using a security certificate - at least not for all users. The only way you can encrypt anything even for some users is to use JavaScript. Using JavaScript to properly encrypt the data to be sent means adding at least a 4-5k script to your page. Even then you still have to allow for plain text data from people without JavaScript and so the only effect using JavaScript can have is to remove the ability to read the data for some people. It does not prevent the data being used if someone captures it and resends it to the server at a later time.

So now we've considered the overall aspects of validation and security for our form it's time to think about how we are going to validate each field. There are a number of different approaches that we can take all of which can achieve the same end result. We can write code that breaks the field content up into its component parts and test each part for specific values, we can examine the field character by character to see whether it is one that is allowed at that spot, we can use a regular expression to see if the content matches a particular pattern, and depending on the language and what the field can contain we may have a simple function call available to use that can do the validation for us. While each of these approaches produces the same result there are specific reasons for choosing one approach over the others - the less code you need to write yourself and the more processing you can hand off to functionality built into the language, the less error prone your code will be and the less likely that it will contain security holes. In other words "don't reinvent the wheel".

Where the language you are using for your validation provides a single function call that can be used to validate the field then that is the best choice to use for validating your code. So if you are validating a field that must be numeric using PHP then testing is_numeric() reduces your validation to a single statement and eliminates the possibility of any invalid data getting through that validation. Of course if the number needs to be within a specific range of values you'd need to test for that too but having already eliminated everything that isn't a number first means that the range testing will also be much simpler.

Of course not everything has its own validation function built right into the language and so most fields will require a bit more than a simple function call. PHP does have many of the main field types covered though with the filter_var() function which has a number of common field types able to be specified using the second parameter to the call (for example filter_var($email, FILTER_VALIDATE_EMAIL) will validate if $a contains a valid email address returning true if it does and false if it doesn't).

At this point it is also worth considering what the difference is between validating a field and sanitizing a field. Validating means testing if the field is valid or not whereas sanitizing simply strips out any characters that are invalid for the particular field type. Even if you sanitize a field it doesn't necessarily mean that what you have left is valid.

Where a specific function is not available to do the validation for us the next best option is to use a regular expression since that too keeps the amount of code we need for our validation to a minimum. Here too you don't want to go reinventing the wheel when you don't have to. For example, JavaScript doesn't have a simple function call that can validate an email address for us and so we will use a regular expression to do it instead. We don't need to write our own regular expression to do it though as there are plenty of places where a regular expression to do that particular validation is readily available to save you the trouble of writing and testing your own.

Where a single regular expression is insufficient to validate the field (or where a single regular expression would be hugely complex) you might consider using a combination of perhaps two or three regular expressions in combination. You would be really struggling to find a field that requires anything more complex than that to do the validation.

The only excuse that anyone has for using any other approach for doing the validation is if the language they are using doesn't allow regular expressions or that they are a very new coder who hasn't learnt about regular expressions yet and who has fields that are so unusual that they can't locate a regular expression that will do the validation for them or where you need to provide more specific advise as to exactly what it is that is wrong with the input value. There's not much you can do other than to write the validation out the long way if you don't have access to a language that uses regular expressions but for the other two you still don't really have an excuse to not use one. If you don't know how to write a regular expression to validate your field and you can't find the code to do it anywhere then go to an appropriate forum and ask for help in writing it. If you want to provide more information on what specifically is invalid about a value then use a regular expression to validate it first and only if it is invalid then proceed to the more longwinded way of checking the content to identify exactly what's wrong. That way you know that the field is invalid before you spend the time analysing it (which both makes your code more efficient for when the field is valid plus eliminates the possibility of errors in the code doing the analysis resulting in invalid data being accepted as valid).

For your server side validation you need to be as precise as you can in determining the validity of a given field content. For example in validating an email address you need to validate that the field contains something which is allowed to be an email address. There isn't any way to verify if an email address actually exists or not unless you send an email to that address and get the recipient to reply (eg by clicking on a link that only exists in that email). You need to decide depending on the use that you are going to make of the email address as to whether you need to verify the address or to just validate that the content is allowed to be an email address. There's no reason though why you wouldn't do server side validation that eliminates everything that obviously isn't an email address.

Performing the strictest validation that is reasonable on the server automatically improves the security of your form because it eliminates the possibility of something harmful being input into most of the fields on your form. It is impossible to do any sort of HTML or SQL code injection into a field that is being validated as a number or email address because the code that they are trying to inject contains characters that are not valid for those field types.

Once you have all of the proper validations set up for all your fields then the only fields that will accept HTML or SQL being entered into the field are those fields where you expect that type of input. That means that those fields need to be able to handle the HTML or SQL entered into those fields all the time and not just when someone is trying to bypass your security. Since outputting user entered HTML into a web page as HTML creates an automatic security hole in your processing you'd obviously never want to do that without having some very thorough validation to restrict what tags can be used. If instead the HTML is to be displayed as text you'd use htmlspecialchars() or the equivalent for whichever server side language you are using to convert the HTML into plain text immediately before outputting it to the page. Since converting those special characters to entity codes is something you need to do for all the valid input in that field its use there has nothing to do with security. While using such a call on other fields that are not expected to contain HTML when outputting them into a web page prevents any successfully injected HTML from being able to be processed as HTML, such content is still invalid for those fields and if it gets far enough into your code for that call to make a difference then it has already shown that your validation is inadequate and that your data has been compromised. If your barn has no walls then locking the door does prevent the horse getting out through the door but doesn't prevent it getting out and using htmlspecialchars on an email address is as effective at preventing code injection as locking that barn door.

In fact any form of "escaping" data is there simply to prevent the data from being confused with the commands that are acting on that data. Often there are alternative ways to write your code so that the possibility of the code and data being confused can be avoided and escaping the data then becomes unnecessary. For example if you use prepare statements for your database calls then you can use placeholders for almost all of the places where you want to be able to insert data into the call. About the only place where you can't use a placeholder is for specifying the table name (and if you're letting your visitors enter the name of the table to be accessed in your database through a form then you really need to think carefully about how you are going to handle the security of your entire database). By eliminating the possibility of the code and data getting confused you do away with the need to escape the data being input into the database completely.

By validating the fields from your form properly when they are first passed to the server you can greatly simplify all of the subsequent code since once you have eliminated all of the invalid values you no longer need to worry about those invalid values interfering with any of the processing that you perform on those fields. If invalid values can't get into your system in the first place then you not only avoid potential security holes, you also avoid processing meaningless garbage and while there are only a small number of people who try to deliberately insert malicious code into forms so as to bypass security, there are far more people who may accidentally type garbage values into your form and so using the most appropriate validation will have its greatest impact in the area of data integrity.
 

On Site

As you can see, I have started publishing a new series of references on CSS 3 this month. I do have a number of further articles prepared for this series to publish in coming months and will be assembling even more. I have also been working on designing the second phase of my club membership site application (the first phase created a site for a specific club). This second phase will start the provess of extracting all the club specific parts so that the same script will be able to be used for other clubs.
 

What's New

The following links will take you to all of the various pages that have been added to the site or undergone major changes in the last month.

Main Links

Ask Felgall
Past Newsletters
Sign Up/Unsubscribe
Question Forum

Categories

Browsers
HTML
Javascript
Interactive Web
Mainframe
PC Software
Networking
Comms Software
Word Processing
DTP
Graphics
OS/2
Linux
DOS/Windows
NT/2000/XP
Book Reviews
Links

Other Links

My Javascript Site
My Blog

http://www.felgall.com/