Overview of HTTP

URLs and resources

URLs specify the location of a resource. A resource is often a static file, such as an HTML document, an image, or a sound file. But increasingly, it may be a dynamically generated object, perhaps based on information stored in a database.

When a user agent requests a resource, what is returned is not the resource itself, but some representation of that resource. For example, if the resource is a static file, then what is sent to the user agent is a copy of the file.

Multiple URLs may point to the same resource, and an HTTP server will return appropriate representations of the resource for each URL. For example, an company might make product information available both internally and externally using different URLs for the same product. The internal representation of the product might include information such as internal contact officers for the product, while the external representation might include the location of stores selling the product.

This view of resources means that the HTTP protocol can be fairly simple and straightforward, while an HTTP server can be arbitrarily complex. HTTP has to deliver requests from user agents to servers and return a byte stream, while a server might have to do any amount of processing of the request.

HTTP characteristics

HTTP is a stateless, connectionless, reliable protocol. In the simplest form, each request from a user agent is handled reliably and then the connection is broken. Each request involves a separate TCP connection, so if many resources are required (such as images embedded in an HTML page) then many TCP connections have to be set up and torn down in a short space of time.

Thera are many optimisations in HTTP which add complexity to the simple structure, in order to create a more efficient and reliable protocol.

Versions

There are 3 versions of HTTP

  • Version 0.9 - totally obsolete
  • Version 1.0 - almost obsolete
  • Version 1.1 - current

Each version must understand requests and responses of earlier versions.

HTTP 0.9

Request format

Request = Simple-Request

Simple-Request = "GET" SP Request-URI CRLF

Response format

A response is of the form

Response = Simple-Response

Simple-Response = [Entity-Body]

HTTP 1.0

This version added much more information to the requests and responses. Rather than "grow" the 0.9 format, it was just left alongside the new version.

Request format

The format of requests from client to server is

Request = Simple-Request | Full-Request

Simple-Request = "GET" SP Request-URI CRLF

Full-Request = Request-Line
        *(General-Header
        | Request-Header
        | Entity-Header)
        CRLF
        [Entity-Body]

A Simple-Request is an HTTP/0.9 request and must be replied to by a Simple-Response.

A Request-Line has format

Request-Line = Method SP Request-URI SP HTTP-Version CRLF

where

Method = "GET" | "HEAD" | POST |
     extension-method

e.g. GET http://jan.newmarch.name/index.html HTTP/1.0

Response format

A response is of the form

Response = Simple-Response | Full-Response

Simple-Response = [Entity-Body]

Full-Response = Status-Line
        *(General-Header 
        | Response-Header
        | Entity-Header)
        CRLF
        [Entity-Body]

The Status-Line gives information about the fate of the request:

Status-Line = HTTP-Version SP Status-Code SP Reason-Phrase CRLF

e.g.

HTTP/1.0 200 OK

The codes are

Status-Code =      "200" ; OK
        | "201" ; Created
        | "202" ; Accepted
        | "204" ; No Content
        | "301" ; Moved permanently
        | "302" ; Moved temporarily
        | "304" ; Not modified
        | "400" ; Bad request
        | "401" ; Unauthorised
        | "403" ; Forbidden
        | "404" ; Not found
        | "500" ; Internal server error
        | "501" ; Not implemented
        | "502" ; Bad gateway
        | "503" | Service unavailable
        | extension-code

The Entity-Header contains useful information about the Entity-Body to follow

Entity-Header =    Allow
        | Content-Encoding
        | Content-Length
        | Content-Type
        | Expires
        | Last-Modified
        | extension-header

For example

HTTP/1.1 200 OK
Date: Fri, 29 Aug 2003 00:59:56 GMT
Server: Apache/2.0.40 (Unix)
Accept-Ranges: bytes
Content-Length: 1595
Connection: close
Content-Type: text/html; charset=ISO-8859-1

HTTP 1.1

HTTP 1.1 fixes many problems with HTTP 1.0, but is more complex because of it. This version is done by extending or refining the options available to HTTP 1.0. e.g.

  • there are more commands such as TRACE and CONNECT
  • you should use absolute URLs, particularly for connecting by proxies e.g
GET http://www.w3.org/index.html HTTP/1.1
  • there are more attributes such as If-Modified-Since, also for use by proxies

The changes include

  • hostname identification (allows virtual hosts)
  • content negotiation (multiple languages)
  • persistent connections (reduces TCP overheads - this is very messy)
  • chunked transfers
  • byte ranges (request parts of documents)
  • proxy support

The 0.9 protocol took one page. The 1.0 protocol was described in about 20 pages. 1.1 takes 120 pages.

results matching ""

    No results matching ""