HTTP Header Manipulation Series Author: David Zimmer Site: ----------------------------------------------------------------------------------- This document is the first in a series on Http header manipulation.The primary focus of this paper is to introduce the reader to what HTTPheaders are and how they work. This common sense understanding will then lead us through how HTTP headers may be manipulated and to what end in later papers in the series. Http Headers are the unseen workforce that brings the web, as we know it, into our lives. From the point of view of the web surfer, we enter a web address or click a link and the page is simply given to us. If you have ever wondered about the logistics behind this seemingly simple process then you have wondered about http headers. For a little background, let me describe a few concepts. Programming is a very cut and drylogic. Computers can only "understand" what they are explicitly told to look for. It thus follows that for two computers to exchange meaningful data, the way in which they do so must be predefined so that each side knows what the other wants. The definition of such a communication standard is termed a "protocol". The protocol used by the web browsers to request a web page is known as HTTP Protocol which stands for "Hypertext Transfer Protocol" (HTML, the language of web pages, is known as Hypertext Markup Language which is just a fancy way of saying that the text can be visually marked up and organized). So what exactly is the protocol? What is it responsible for? How come I have never seen it when surfing the web? The actual HTTP protocol is a system of commands, messages and data fields that internet browsers (such as Internet Explorer) and web servers use to transfer webpages, images and other web content. The HTTP protocol includes commands to request and submit data, as well as commands to allow file uploads, resume interrupted downloads and even to request debugging information from the server. Aside from the actual command interaction there are also a number of data fields exchanged such as your web browser type, the page that you were previously on, cookie information and what type of documents your browser is capable of understanding. It is this rich , yet relatively simple environment that allows us to have the diverse web experience we have all become so accustomed to. Now comes the question of exactly how these command interactions take place and what they look like. Thankfully HTTP headers are a relatively simple beast. When a browser requests a web page, it will take the URL you give it, automatically use it to connect to the web server and issue a request for the page you specify. A standard browser generated request may look like this: ---------------------------------- GET /index.html HTTP/1.1 HOST: Referrer: ACCEPT: */* Accept-Encoding: None User-Agent: Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0) Connection: Close Accept-Transfer-Encoding: None ---------------------------------- This HTTP request has one command and 7 data fields. From its format you can probably deduce a couple key features of the protocol such as the command is always the first entry and that each line consists of a single entry. When the web server receives this request it first looks for the specified page and then replies with another HTTP header signifying the status of the page and, if it exists, the page’s data. In the case above the server might send the following http header and page data in response: ---------------------------------- 200 OK Content-Type:text/html Content-Length: 38


---------------------------------- Even though the http header is sent in the same breath as the html page how come I never see it? In the above example you can see that the http header is set apart from the page content by a blank line. It is this double carriage return line feed pair that signifies the end of the http header. Browsers, being the automated and user friendly programs that they are, purposely hide all the details of the transaction from you. Now that we are up to speed on the basics of what HTTP headers are, where they live and what they do for us, we will turn our focus to the main topic of this paper: a definition of HTTP header manipulation with respect to web application security. Because http headers commonly do all their work behind the scenes..people tend to take them for granted. Web application developers may take for granted that the User-Agent String is a defacto standard and of only several known safe values and not open to tampering. They may believe that the referrer value is a set in stone, definitive answer to where the browser is coming from, and they may believe that the cookie values that they are depending on, come from an tamper proof source. With our simple view of http headers above..we can quickly see that all of fields can be subject to manipulation with a program as simple as telnet ! The next paper in this series will expand on this introduction and go into specifics of scenarios as to what goals people could look to achieve from HTTP header manipulation and some of the common tools of the trade they may use to preform it. It will also introduce the reader on how to monitor and watch HTTP request transactions to and from a server, as well as how to manipulate them on the fly with the use of a local proxy. If you would like to read further about the HTTP protocol and its defined command and data field structure you are encouraged to read through the list of relevant materials provided below: RFC: 2616 Hypertext Transfer Protocol -- HTTP/1.1 Book: Http Essentials author- Stephen Thomas
Mail the author