I regularly see forum posts about injection where the code that is included in the post is completely inappropriate for the purpose. Their code applies escaping functions to their input in the mistaken belief that it is an appropriate way of preventing injection.
Now escaping is an output function and should never be used on input. The appropriate place to use escaping is immediately before writing the data to where ever it needs to be written to and the escaping function to use depends on where it is being written. When writing data into HTML you need to escape those characters that would otherwise be misinterpreted as the start of HTML tags. When writing to a database you are not likely to even need an escaping function as most modern database calls allow the code and data to be kept completely separate making escaping completely unnecessary.
Note that even where escaping is necessary it is only necessary on those rare occasions where the data can validly contain characters that can be misinterpreted as code. If you have fields collecting names, dates of birth, and phone numbers you will not need to escape them to write them into HTML as none of those fields can validly contain any characters that can be misinterpreted as HTML. An address should be escaped in case it contains an ampersand - something which might validly appear in an address.
Escaping fields that can never contain characters that would need escaping will slow the code slightly as the escaping function works out that there is nothing for it to do. It does provide defence in depth by preventing injection attacks that are made possible by allowing junk to be accepted as valid input into fields.
That is the problem with all of these questions about injection. Their code accepts any junk as input for any of their fields. They are so involved in trying to prevent injection that they have failed to stop everyone giving their name as '@#$%^!!@' and their address as 'select * from users'. Preventing injection doesn't stop people adding millions of junk entries like those into the database.
As mentioned above, the instances where escaping is needed with valid data are few but the instances where an injection attempt would be considered to be valid data are even fewer. This means that the correct way to prevent injection is in fact almost the same as the way that you prevent junk being processed - you validate your inputs so as to make sure that the content of each field is at least possible for that particular field. A first name shouldn't contain anything other than letters and possibly a hyphen. A last name might also contain spaces and apostrophes. Addresses can also contain numbers and possibly ampersands. Limiting the inputs for these fields to those characters is the most basic validation you can perform as it can still allow obvious junk combinations to be entered. You would probably want to limit the number of occurrences of some of the valid characters and also the order that some of the characters need to be in so as to reduce further the possibility of junk input.
When you provide proper validation to eliminate as much junk input as possible, the chances of an injection attempt not getting rejected as junk are extremely slight. In those cases where escaping is required the only data being escaped will be where those characters are a valid part of the field value, you should never see the code of an injection attempt getting escaped unless their injection attempt just happens to correspond to what is considered to be valid data for the field in the first place - in which case everyone will be entering that type of data in the field and you wouldn't consider that data to be an injection attempt.
Injection attempts are a small subset of junk data. You don't just want to stop injection attempts, you want to stop junk data. The appropriate way of preventing junk data from user inputs is to use validation to ensure that the content of the field appears valid for that particular field. You can't always prevent all junk data through validation but you can prevent a lot of it and what you can prevent will always include injection attempts.
The way to prevent injection is not to use output escaping functions, it is to use input validation functions.
This article written by Stephen Chapman, Felgall Pty Ltd.