htmlentities and htmlspecialchars

These two PHP functions perform similar processing and so in many instances you can swap one function for the other without significantly affecting the result. They do have some differences though and so in most cases one of the functions will be the better choice.

|Let's take a step back though and first consider just where we would want to use either of these functions in the first place. There is a misconception held by some people that these functions have something to do with security. Those people are mistaken as these functions have nothing to do with security.

These are output functions that provide processing on data immediately before it is presented to be viewed. All security processing needs to be done using input functions that prevent bad data getting into your system in the first place and so all of the security processing will have been run long before you get to run any output functions. Perhaps in part this misconception about security relates back to some of the early tutorials that people see where input fields are fed through one of these functions in order to display it directly in the web page. Because these examples are as simple as they can be made there is no validation included in the code at all and in fact no input processing whatsoever. As the example probably does include one of these functions to process each field the person learning PHP may acquire the mistaken belief that these functions are there for security reasons since without the function it is possible to ype data into an input field and have it break the page.

These two functions do only one thing. They convert plain text data so that it uses entity codes for some of the characters in the text. In the case of htmlspecialchars it converts those characters to entity codes which if not converted have the potential to be misinterpreted by the browser as being a part of the HTML rather than a part of the content.The goal of running the plain text through this function is to make sure that the text displays exactly the same in the web page as is intended. The main character that it converts is < which if sent to the web page as it is would be interpreted as the start of a tag. The < gets converted to &lt; to ensure that it is not misinterpreted by the browser.The htmlspecialchars function also converts three or four other characters to the entity code equivalents to ensure that other characters that might potentially be misinterpreted by the browser are correctly displayed.

Where htmlentities differs from htmlspecialchars is that it doesn't limit itself to just converting those characters that might be confused with HTML into the appropriate entity codes, instead it converts all of the characters that have an entity code equivalent into those entity codes. The default for this particular function is to convert everything possible to entity codes and that means that content that has already been converted to entity codes will be double encoded unless you supply an optional parameter to turn double encoding off. Where double encoding is useful is where your text content contains entity codes where you want to display the actual codes in your web page rather than the characters they represent.

Note that in each case these functions assume that the data being fed into them has already been validated and any invalid data and security issues have been dealt with since the functions simply ensure that the text passed into the function displays on the web page without getting mistaken for HTML.


This article written by Stephen Chapman, Felgall Pty Ltd.

go to top

FaceBook Follow
Twitter Follow