1
1
HTTP HyperText Transfer Protocol
Miguel Leitão, 2012
2
HTTP
• HTTP is the protocol that supports
communication between Web browsers and Web
servers.
• From the RFC: “HTTP is an application-level
protocol with the lightness and speed necessary
for distributed, hypermedia information systems.”
• The HTTP communication generally takes place
over a TCP connection, but the protocol itself is
not dependent on a specific transport layer
2
3
HTTP Transaction
Client Browser Web Server
TCP
Connect
HTTP
transaction
4
Request - Response
• HTTP has a simple structure:
– client sends a request
– server returns a reply.
• HTTP can support multiple request-reply
exchanges over a single TCP connection.
3
5
Well Known Address
• A “Web Server” is a HTTP server
• The “well known” TCP port for HTTP
servers is port 80.
• Other ports can be used as well...
6
HTTP Versions
• The original version is known as “HTTP Version 0.9”
– HTTP/0.9 was used for many years.
• Starting with HTTP 1.0 the version number is part of
every request.
• HTTP is still changing...
4
7
HTTP 1.x Request
• Lines of text (ASCII).
• Lines end with CRLF “\r\n”
• First line is called “Request-Line”
Request-Line
Headers . . .
Content...
8
Request Line
Method URI HTTP-Version \r\n
• The request line contains 3 tokens (words).
• space characters “ “ separate the tokens.
• Newline (\n) seems to work by itself (but the protocol requires CRLF)
5
9
Request Method
The Request Method can be:
GET HEAD PUT
POST DELETE TRACE
OPTIONS
future expansion is supported
10
Methods
• GET: retrieve information identified by the URI.
• HEAD: retrieve meta-information about the URI.
• POST: send information to a URI and retrieve result.
GET, HEAD and POST are supported everywhere.
6
11
Methods (other)
• PUT: Store information in location named by URI.
• DELETE: remove entity identified by URI.
• TRACE: used to trace HTTP forwarding through
proxies, tunnels, etc.
• OPTIONS: used to determine the capabilities of
the server, or characteristics of a named resource.
12
URI: Universal Resource Identifier
• URIs defined in RFC 2396.
• Absolute URI: scheme://hostname[:port]/path http://www.cs.rpi.edu:80/blah/foo
• Relative URI: /path
/blah/foo
/absolute/path/to/resource.txt
relative/path/to/resource.txt
No server mentioned
7
13
URI Usage
• When dealing with a HTTP 1.1 server, only a path is used (no scheme or hostname). – HTTP 1.1 servers are required to be capable of
handling an absolute URI, but there are still some out there that won’t…
• When dealing with a proxy HTTP server, an absolute URI is used. – client has to tell the proxy where to get the
document!
14
HTTP Version Number
“HTTP/1.0” or “HTTP/1.1”
HTTP 0.9 did not include a version
number in a request line.
If a server gets a request line with no
HTTP version number, it assumes 0.9
8
15
The Header Lines
• After the Request-Line come a number
(possibly zero) of HTTP headers.
• Each header line contains an attribute
name followed by a “:” followed by the
attribute value.
16
Headers
• Request Headers provide information to
the server about the client
– what kind of client
– what kind of content will be accepted
– who is making the request
– Web site
• There can be 0 headers
9
17
Example HTTP Headers
Accept: text/html
From: [email protected]
User-Agent: Mozilla/4.0
Referer: http://foo.com/blah
18
End of the Headers
• Each header ends with a CRLF
• The end of the header section is
marked with a blank line. – just CRLF
• For GET and HEAD requests, the end of the headers is the end of the request!
10
19
Post
• A POST request includes some content (some
data) after the headers (after the blank line).
• There is no format for the data (just raw bytes).
• A POST request must include a
Content-Length line in the headers:
– Content-Length: 267
20
Example GET Request
GET /~hollingd/testanswers.html HTTP/1.0
Accept: */*
User-Agent: Internet Explorer
From: [email protected]
Referer: http://foo.com/
There is a blank line here!
11
21
Example POST Request
POST /~hollingd/changegrade.cgi HTTP/1.1
Accept: */*
User-Agent: SecretAgent V2.3
Content-length: 35
Referer: http://monte.cs.rpi.edu/blah
stuid=6660182722&item=test1&grade=99
22
HTTP Response
• ASCII Status Line
• Headers Section
• Content can be anything (not just text) – typically is HTML document or some kind of image.
Status-Line
Headers . . .
Content...
12
23
Response Status Line
HTTP-Version Status-Code Message
• Status Code is 3 digit number (for computers)
• Message is text (for humans)
Status Codes 1xx Informational
2xx Success
3xx Redirection
4xx Client Error
5xx Server Error
HTTP/1.0 200 OK
HTTP/1.0 301 Moved Permanently
HTTP/1.0 400 Bad Request
HTTP/1.0 500 Internal Server Error
Examples
24
Response Headers
Provide information about the returned entity (document).
– what kind of document
– how big the document is
– how the document is encoded
– when the document was last modified
Example Date: Wed, 30 Jan 2002 12:48:17 EST
Server: Apache/1.17
Content-Type: text/html
Content-Length: 1756
Content-Encoding: gzip
13
25
Response Header Examples
Date: Wed, 30 Jan 2002 12:48:17 EST
Server: Apache/1.17
Content-Type: text/html
Content-Length: 1756
Content-Encoding: gzip
26
Content
• Content can be anything (sequence of
raw bytes).
• Content-Length header is required for
any response that includes content.
• Content-Type header also required.
14
27
Try it with telnet
> telnet www.dee.isep.ipp.pt 80
GET / HTTP/1.0
HTTP/1.0 200 OK
Server: Apache
...
28
Single Request/Reply
• The client sends a complete request.
• The server sends back the entire reply.
• The server closes it’s socket.
• If the client needs another document it
must open a new connection.
15
29
Persistent Connections
• HTTP 1.1 supports persistent
connections (this is supposed to be the
default).
• Multiple requests can be handled.
• Most servers seem to close the
connection after the first response…
30
Virtual Hosts
• HTTP 1.1 can use virtual hosts.
– Allows multiple hosts to share a single server.
– Each host has a different name.
– The name of the destination host is given as
part of the page request.
16
31
HTTP 1.1 Head Request
V $ telnet linuxzoo.net 80
HEAD / HTTP/1.1
Host: tiger.net
HTTP/1.1 200 OK
Date: Mon, 01 Nov 2008 15:06:44 GMT
Server: Apache/2.0.46 (Red Hat)
Last-Modified: Fri, 29 Oct 2008 14:47:22 GMT
ETag: "4981dd-920-22ea7280"
Accept-Ranges: bytes
Content-Length: 2336
Content-Type: text/html; charset=UTF-8
32
HTTP Proxy Server
HTTP
Server Browser Proxy
17
33
HTTPS
HTTPS
SSL
TCP
34
Client Browser Web Server
TCP
Connect
SSL
Connect
HTTPS GET
transaction
HTTPS Transaction
18
35
Typical HTTP use
• A Web page is set of
many items.
• Each item is
downloaded separately.
• Items from the same
server are downloaded
sequentially.
36
Domain Sharding
Use of multiple domains to increase the amount of
simultaneously downloaded resources for a particular website
19
37
Domain Sharding
Pros
• Several resources are downloaded in parallel
• Faster page load time
Cons
• Increased DNS lookup times
• Website modifications
• Increased TCP overhead
38
SPDY • Google, 2009-2015
• Multiplexed Stream Support
SPDY can send many sessions concurrently over a single TCP
connection without serializing requests. Make SPDY as efficient as HTTP
but only use a single connection.
• Request Prioritization
A client can request as many items as it wants from the server.
The server return the contents with the higher-priority first. .
• HTTP Header Compression
HTTP headers are compressed, leading to fewer bytes transmitted.
• Server Initiated Streams (aka "Server Push")
SPDY allows either the client or server to initiate a stream once the client
has established a connection.
• Server Hint
The server often knows a client will need a resource. It can inform the
client about resource it would otherwise discover much later.
20
39
HTTP/2
• First major update since HTTP/1.1
• Binary, instead of textual.
• Fully multiplexed – Allows sending multiple requests in parallel over a single TCP connection.
• Uses header compression HPACK to reduce overhead.
• Allows servers to PUSH responses to clients.
• Uses the new ALPN extension which allows for faster
encrypted connections
• Domain sharding and asset concatenation are no longer needed.
40
Binary Framing
© 2013 Ilya Grigorik. Published by O'Reilly Media, Inc.
21
41
Connection
42
Frame Header
22
43
Frame Types
• DATA transports HTTP message bodies
• HEADERS transports header fields for a stream
• PRIORITY communicates sender-advised priority of a stream
• RST_STREAM signals termination of a stream
• SETTINGS communicates configuration parameters for the connection
• PUSH_PROMISE signals a promise to serve the referenced resource
• PING used to check the roundtrip time and the "live" state
• GOAWAY orders the peer to stop creating streams
• WINDOW_UPDATE used to implement flow stream and connection flow control
• CONTINUATION used to continue a sequence of header block fragments
44
HPACK
23
45
Push
46
HTTP/2 Upgrade
A client supporting HTTP/1.1 and HTTP/2, wants to make a
request without prior knowledge about HTTP/2 support on
the server.
=> The client must use the HTTP Upgrade mechanism:
• starts an HTTP/1.1 request.
• includes an Upgrade header field with the "h2c" token.
• includes one HTTP2-Settings header field.
=> The Server can
• accept upgrade and produce an HTTP/2 reply.
• Ignore de upgrade header and produce a HTTP/1.1 reply.
[RFC7230]
24
47
HTTP/2 Upgrade
GET /page HTTP/1.1
Host: server.example.com
Connection: Upgrade, HTTP2-Settings
Upgrade: h2c
HTTP2-Settings: (SETTINGS payload)
HTTP/1.1 200 OK
Content-length: 243
Content-type: text/html
(... HTTP/1.1 response ...)
(or)
HTTP/1.1 101 Switching Protocols
Connection: Upgrade
Upgrade: h2c
(... HTTP/2 response ...)
Initial HTTP/1.1 request with
HTTP/2 upgrade header
Base64 URL encoding of
HTTP/2 SETTINGS payload
Server declines upgrade,
returns response via HTTP/1.1
Server accepts HTTP/2 upgrade,
switches to new framing
48
HEADERS frame in Wireshark
25
50
Apache
• Very well known.
• Respected HTTP server.
• Used commercially.
• Freely available from http://www.apache.org
• Plenty of plugins.
• Relatively easy and flexible to configure.
• Fast and Reliable.
• Supports HTTP/2
51
Multi-thread server
• Most servers follow a – Forking model
– Threaded model • needs special OS support
• uses less resources
• Apache is built as an hybrid multi-process multi-
threaded server.
– Keeps multiple child processes available.
– Each child process runs many threads.
– Each thread processes a request.
26
52
Apache Forking Model
MUX
Child
Child
Child
Child
HTTP
request
Allocate
Idle Child Get data from disk
Response
53
Forking Configuration
Most servers use default values…
Parameter Initial Value
StartServers 8
MinSpareServers 5
MaxSpareServers 20
MaxClients 150
MaxRequestsPerChild 1000
Most important options:
27
54
Important Files
• /etc/init.d/httpd – the server control script
• /etc/httpd/conf/http.conf – the main config file.
• /var/log/httpd/access_log
• /var/log/httpd/error_log
The main configuration file is only reread on a
server reload or restart
55
Reload or Restart
Restart shuts down then starts the server…
• If configuration file contains errors, start up can fail.
With a Reload,
• Apache checks the configuration file
– if it contains no errors, it is used.
– If it has errors, Apache keeps running the old configuration.
• Allows to reconfigure a server with no downtime.
Error log can be checked for help
• /var/log/httpd/error_log
• /var/log/messages (syslog)
28
56
Virtual Hosts
• The sharing of a single IP to provide multiple
hostnames is well supported in Apache.
• A Virtual Host is defined in the config file in a
<VirtualHost> block.
• Each block holds a list of hostnames it can handle
• The first host found in the file is always considered
the default, so if no VirtualHost section matches,
the first block is used.
57
VirtualHost config
<VirtualHost>
ServerAdmin [email protected]
DocumentRoot /home/tele/public_html
ServerName tele.isep.ipp.pt
ServerAlias www.tele.org tele.isep.pt
ErrorLog logs/tele-error_log
CustomLog logs/tele-access_log combined
</VirtualHost>
29
58
Personal Web pages
Typical environment:
• Apache runs on a server used by many users.
• Each user has his own directory in /home.
• Each user wants to build his own web pages.
Apache allows personal Web pages in the users home directory, under a dedicated subdirectory:
• public_html
• WWW
59
public_html access
• URLs of the form
– http://our.webserver.net/~JohnSmith/file.html
• Refer to
– /home/JohnSmith/public_html/file.html
• This feature can be activated in httpd.conf:
UserDir public_html
30
60
URL Rewriting
• mod_rewrite is a module in Apache.
• Allows changing URLs dynamically.
• Can be useful to:
– Change the URL of aliases in a domain so that they always give the correct name.
– Support directories and files being moved without breaking bookmarked URLs.
– Provide a variety of proxying methods.
61
Methods
• mod_rewrite has many functions: – RewriteCondition – an IF statement
– RewriteRule – an action (do it) statement.
– …
• Can be placed in several Apache configuration files:
– in VirtualHost areas of httpd.conf.
– In .htaccess at specific directories
– …
• To work, the area must also have:
RewriteEngine on
31
62
RewriteRule
Basic format:
RewriteRule URL-reg-exp New-URL
Example:
If /old.txt was moved to /new.txt
RewriteRule /old.txt /new.txt
63
Regular Expressions
• Text comparison uses regular expressions.
• Text matching:
. Any single Character
[chars] One of the characters in chars
[^chars] None of the characters in chars
Text1|Text2 Either “Text1” or “Text2”
^ Beginning of the URL
$ End of the URL
\ Escaping
32
64
Quantifiers and Grouping
Quantifiers:
? 0 or 1 of the preceding text
* 0 or more of the preceding text
+ 1 or more of the preceding text
n n occurrences of the preceding text
Grouping
(text) Marks a text group:
- Can limit an alternative.
- Can be back referenced as $n
65
Back References
$n refers to the nth group from the URL match.
Example:
– rewrite any URL ending in .txt to .html:
RewriteRule (.*)\.txt $1.html
33
66
More complex example
Rewrite URLs in all directories …/demo/ to use
directories /exp/ in the same position
RewriteRule ^(.*)/demo/(.*)$ $1/exp/$2
67
Additional Flags
• At the end of the line, the RewriteRule
can can have serveral Flags.
• Flags are listed in [brackets],
eg [F,G] for flags F and G.
• These change or enhance the
behaviour of the match.
34
68
Options:
• R or R=code – Sends the browser the new URL as an external
REDIRECTION. The code can be the type of redirection, such as 302 or 404.
• F
– Send back FORBIDDEN.
• G
– Send back GONE
• P
– Proxy: Forward the request
69
Options Cont…
• L
– Last: do not look at any more rules.
• C
– Chain: If the pattern matche,s do the next rule,
otherwise ignore the remaining rules.
• NC
– case insensitive.
• There are many more options….
35
70
Complex example
• If the URL has /work/ in it,
rewrite /work/ to /home/.
• In addition, if the URL did have /work/ in
it, replace “hello.txt” with “bye.txt”.
RewriteRule ^(.*)/work/(.*)$ $1/home/$2 [C]
RewriteRule ^(.*)hello.txt$ $1/bye.txt [L]
71
RewriteCond
• This command performs tests or RULES.
• If the test matches, then the next test is
checked.
• If all tests match, then the RewriteRule
which follows the tests is performed.
• If any Cond does not match, processing
skips on till after the Rule(s) in this block.
36
72
RewriteCond
Basic Form:
RewriteCond TestString ConditionString
• Compares the value of TestString to the
ConditionString.
• ConditionString can be a regular expression.
• TestString can include variables and file tests.
73
Variables:
• Some variables are available:
• REMOTE_ADDR
• REMOTE_HOST
• HTTP_HOST
• REQUEST_URI ( /index.html )
• REQUEST_FILENAME ( /home/mike/www/… )
• …
• Vars can be used as %{REMOTE_ADDR}
37
74
Flags
• RewriteCond can take 2 flags
– NC – case insensitive
– OR – or the Conds together.
• Normally all rules have to be true before
the Rule is done.
• With OR the rule is done if ANY Cond is
true.
75
Example 1
If 10.20.0.5 tries to view
/electro/index.html
redirect the page reference to
/electro/bye.html.
RewriteCond %{REMOTE_ADDR} ^10\.20\.0\.5$
RewriteRule ^/electro/index.html$ /electro/bye.html [L]
38
76
Example 2
Rewrite:
• isep.org,
• www.isep.org,
• www.isep.org.pt.
to isep.org.
RewriteEngine on
RewriteCond %{HTTP_HOST} !^isep\.org$
RewriteRule ^(.*)$ http://isep.org$1 [L,R]
77
Example 3
Rewrite *.isep.org to isep.org,
and *.isep.org.pt to isep.org.pt.
RewriteEngine on
RewriteCond %{HTTP_HOST} ^.+isep.org$
RewriteRule ^(.*)$ http://isep.org$1 [L,R]
RewriteCond %{HTTP_HOST} ^.+isep\.org\.pt$
RewriteRule ^(.*)$ http://isep.org.pt$1 [L,R]
39
78
Documentation: • RFC 1945 (HTTP 1.0)
• RFC 2616 (HTTP 1.1)
• Apache HTTP Server Version 2.4 Documentation
Top Related