HTTP Header Manipulation Series Author: David Zimmer Site: http://sandsprite.com/Sleuth ----------------------------------------------------------------------------------- This document is the first in a series on Http header manipulation.The primary focus of this paper is to introduce the reader to what HTTPheaders are and how they work. This common sense understanding will then lead us through how HTTP headers may be manipulated and to what end in later papers in the series. Http Headers are the unseen workforce that brings the web, as we know it, into our lives. From the point of view of the web surfer, we enter a web address or click a link and the page is simply given to us. If you have ever wondered about the logistics behind this seemingly simple process then you have wondered about http headers. For a little background, let me describe a few concepts. Programming is a very cut and drylogic. Computers can only "understand" what they are explicitly told to look for. It thus follows that for two computers to exchange meaningful data, the way in which they do so must be predefined so that each side knows what the other wants. The definition of such a communication standard is termed a "protocol". The protocol used by the web browsers to request a web page is known as HTTP Protocol which stands for "Hypertext Transfer Protocol" (HTML, the language of web pages, is known as Hypertext Markup Language which is just a fancy way of saying that the text can be visually marked up and organized). So what exactly is the protocol? What is it responsible for? How come I have never seen it when surfing the web? The actual HTTP protocol is a system of commands, messages and data fields that internet browsers (such as Internet Explorer) and web servers use to transfer webpages, images and other web content. The HTTP protocol includes commands to request and submit data, as well as commands to allow file uploads, resume interrupted downloads and even to request debugging information from the server. Aside from the actual command interaction there are also a number of data fields exchanged such as your web browser type, the page that you were previously on, cookie information and what type of documents your browser is capable of understanding. It is this rich , yet relatively simple environment that allows us to have the diverse web experience we have all become so accustomed to. Now comes the question of exactly how these command interactions take place and what they look like. Thankfully HTTP headers are a relatively simple beast. When a browser requests a web page, it will take the URL you give it, automatically use it to connect to the web server and issue a request for the page you specify. A standard browser generated request may look like this: ---------------------------------- GET /index.html HTTP/1.1 HOST: http://someserver.com Referrer: http://someserver.com ACCEPT: */* Accept-Encoding: None User-Agent: Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0) Connection: Close Accept-Transfer-Encoding: None ---------------------------------- This HTTP request has one command and 7 data fields. From its format you can probably deduce a couple key features of the protocol such as the command is always the first entry and that each line consists of a single entry. When the web server receives this request it first looks for the specified page and then replies with another HTTP header signifying the status of the page and, if it exists, the page’s data. In the case above the server might send the following http header and page data in response: ---------------------------------- 200 OK Content-Type:text/html Content-Length: 38