0% found this document useful (0 votes)
31 views58 pages

Web Essentials

Uploaded by

cvverma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views58 pages

Web Essentials

Uploaded by

cvverma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 58

Web Essentials: Clients, Servers, and

Communication
World Wide Web

Originally, one of several systems for


organizing Internet-based information
 Competitors: WAIS, Gopher, ARCHIE
Distinctive feature of Web: support for
hypertext (text containing links)
 Communication via Hypertext Transport
Protocol (HTTP)
 Document representation using Hypertext
Markup Language (HTML)
2
World Wide Web

The Web is the collection of machines


(Web servers) on the Internet that provide
information, particularly HTML documents,
via HTTP.
Machines that access information on the
Web are known as Web clients. A Web
browser is software used by an end user to
access the Web.
3
Hypertext Transport Protocol
(HTTP)
HTTP is based on the request-response
communication model:
 Client sends a request
 Server sends a response
HTTP is a stateless protocol:
 The protocol does not require the server to
remember anything about the client between
requests.
Date: 02-10-2024 4
HTTP

Normally implemented over a TCP connection


(80 is standard port number for HTTP)
Typical browser-server interaction:
 User enters Web address in browser
 Browser uses DNS to locate IP address
 Browser opens TCP connection to server
 Browser sends HTTP request over connection
 Server sends HTTP response to browser over connection
 Browser displays body of response in the client area of
the browser window
Date: 02-10-2024 5
HTTP

The information transmitted using HTTP is


often entirely text
Can use the Internet’s Telnet protocol to
simulate browser request and view server
response

Date: 02-10-2024 6
HTTP
Connect { $ telnet www.example.org 80
Trying 192.0.34.166...
Connected to www.example.com
(192.0.34.166).
Escape character is ’^]’.

{
Send GET / HTTP/1.1
Request Host: www.example.org

{
Receive HTTP/1.1 200 OK
Response Date: Thu, 09 Oct 2003 20:30:49 GMT

Date: 02-10-2024 7
HTTP Request

Structure of the request:


 start line
 header field(s)
 blank line
 optional body

Date: 02-10-2024 8
HTTP Request

Structure of the request:


 start line
 header field(s)
 blank line
 optional body

Date: 02-10-2024 9
HTTP Request

Start line
 Example: GET / HTTP/1.1
Three space-separated parts:
 HTTP request method
 Request-URI (Uniform Resource Identifier)
 HTTP version

Date: 02-10-2024 10
HTTP Request

Start line
 Example: GET / HTTP/1.1
Three space-separated parts:
 HTTP request method
 Request-URI
 HTTP version
 We will cover 1.1, in which version part of start line
must be exactly as shown
Date: 02-10-2024 11
HTTP Request

Start line
 Example: GET / HTTP/1.1
Three space-separated parts:
 HTTP request method
 Request-URI
 HTTP version

Date: 02-10-2024 12
HTTP Request

Uniform Resource Identifier (URI)


 Syntax: scheme : scheme-depend-part
 Ex: In http://www.example.com/
the scheme is http
 Request-URI is the portion of the requested URI
that follows the host name (which is supplied by
the required Host header field)
 Ex:/ is Request-URI portion of
http://www.example.com/
Date: 02-10-2024 13
URI

URI’s are of two types:


 Uniform Resource Name (URN)
 Can be used to identify resources with unique names,
such as books (which have unique ISBN’s)
 Scheme is urn

 Uniform Resource Locator (URL)


 Specifies location at which a resource can be found
 In addition to http, some other URL schemes are

https, ftp, mailto, and file


Date: 02-10-2024 14
HTTP Request

Start line
 Example: GET / HTTP/1.1
Three space-separated parts:
 HTTP request method
 Request-URI
 HTTP version

Date: 02-10-2024 15
HTTP Request

Common request methods:


 GET
 Used if link is clicked or address typed in browser
 No body in request with GET method

 POST
 Used when submit button is clicked on a form
 Form information contained in body of request

 HEAD
 Requests that only header fields (no body) be returned
in the response
Date: 02-10-2024 16
HTTP Response

Structure of the response:


 status line
 header field(s)
 blank line
 optional body

Date: 02-10-2024 17
HTTP Response

Structure of the response:


 status line
 header field(s)
 blank line
 optional body

Date: 02-10-2024 18
HTTP Response

Status line
 Example: HTTP/1.1 200 OK
Three space-separated parts:
 HTTP version
 status code
 reason phrase (intended for human use)

Date: 02-10-2024 19
HTTP Response

Status code
 Three-digit number
 First digit is class of the status code:
 1=Informational
 2=Success

 3=Redirection (alternate URL is supplied)

 4=Client Error

 5=Server Error

 Other two digits provide additional information



See http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html
Date: 02-10-2024 20
HTTP Response

Structure of the response:


 status line
 header field(s)
 blank line
 optional body

Date: 02-10-2024 21
HTTP Response

Common header fields:


 Connection, Content-Type, Content-Length
 Date: date and time at which response was generated
(required)
 Location: alternate URI if status is redirection
 Last-Modified: date and time the requested resource was
last modified on the server
 Expires: date and time after which the client’s copy of
the resource will be out-of-date
 ETag: a unique identifier for this version of the requested
resource (changes if resource changes)
Date: 02-10-2024 22
Client Caching

A cache is a local copy of information


obtained from some other source
Most web browsers use cache to store
requested resources so that subsequent
requests to the same resource will not
necessarily require an HTTP request/response
 Ex: icon appearing multiple times in a Web page

Date: 02-10-2024 23
Client Client Caching Server

1. HTTP request for image

2. HTTP response containing image

Browser Web
Server

3. Store image

Cache

Date: 02-10-2024 24
Client Client Caching Server

Browser Web
Server
I need that
image
again…

Cache

Date: 02-10-2024 25
Client Client Caching Server

This…

HTTP request for image


Browser Web
HTTP response containing image Server
I need that
image
again…

Cache

Date: 02-10-2024 26
Client Client Caching Server

Browser Web
Server
I need that
image
again…

Get … or this
image

Cache

Date: 02-10-2024 27
Client Caching

Cache advantages
 (Much) faster than HTTP request/response
 Less network traffic
 Less load on server
Cache disadvantage
 Cached copy of resource may be invalid
(inconsistent with remote version)

Date: 02-10-2024 28
Client Caching

Validating cached resource:


 Send HTTP HEAD request and check Last-
Modified or ETag header in response
 Compare current date/time with Expires header
sent in response containing resource
 If no Expires header was sent, use heuristic
algorithm to estimate value for Expires
 Ex: Expires = 0.01 * (Date – Last-Modified) + Date

Date: 02-10-2024 29
Web Clients

Many possible web clients:


 Text-only “browser” (lynx)
 Mobile phones
 Robots (software-only clients, e.g., search engine
“crawlers”)
 etc.
We will focus on traditional web browsers

Date: 02-10-2024 30
Web Browsers
First graphical browser running on general-
purpose platforms: Mosaic (1993)

Date: 02-10-2024 31
Web Browsers

Date: 02-10-2024 32
Web Browsers

Primary tasks:
 Convert web addresses (URL’s) to HTTP
requests
 Communicate with web servers via HTTP
 Render (appropriately display) documents
returned by a server

Date: 02-10-2024 33
HTTP URL’s
http://www.example.org:56789/a/b/c.txt?t=win&s=chess#para5

host (FQDN) port path query fragment

authority Request-URI

Browser uses authority to connect via TCP


Request-URI included in start line (/ used
for path if none supplied)
Fragment identifier not sent to server (used
to scroll browser client area)
Date: 02-10-2024 34
Web Browsers

Standard features

Save web page to disk

Find string in page

Fill forms automatically (passwords, CC numbers, …)

Set preferences (language, character set, cache and
HTTP parameters)

Modify display style (e.g., increase font sizes)

Display raw HTML and HTTP header info (e.g., Last-
Modified)

Choose browser themes (skins)

View history of web addresses visited

Bookmark favorite pages for easy return
Date: 02-10-2024 35
Web Browsers

Additional functionality:
 Execution of scripts (e.g., drop-down menus)
 Event handling (e.g., mouse clicks)
 GUI for controls (e.g., buttons)
 Secure communication with servers
 Display of non-HTML documents (e.g., PDF)
via plug-ins

Date: 02-10-2024 36
Web Servers

Basic functionality:
 Receive HTTP request via TCP
 Map Host header to specific virtual host (one of many
host names sharing an IP address)
 Map Request-URI to specific resource associated with
the virtual host
 File: Return file in HTTP response
 Program: Run program and return output in HTTP response

 Map type of resource to appropriate MIME type and use


to set Content-Type header in HTTP response
 Log information about the request and response
Date: 02-10-2024 37
Web Servers

httpd: UIUC, primary Web server c. 1995


Apache: “A patchy” version of httpd, now the
most popular server (esp. on Linux platforms)
IIS: Microsoft Internet Information Server
Tomcat:
 Java-based
 Provides container (Catalina) for running Java servlets
(HTML-generating programs) as back-end to Apache or
IIS
 Can run stand-alone using Coyote HTTP front-end
Date: 02-10-2024 38
Tomcat Web Server

HTML-based server administration


Browse to
http://localhost:8080
and click on Server Administration link
 localhost is a special host name that means
“this machine”

Date: 02-10-2024 39
Tomcat Web Server

Context provides mapping from Request-URI


path to a web application
Document Base field is directory (possibly
relative to Application Base) that contains resources
for this web application
For this example, browsing to
http://localhost:8080/
returns resource from
c:\jwsdp-1.3\webapps\ROOT
 Returns index.html (standard welcome file)
Date: 02-10-2024 40
Secure Servers
Man-in-the-Middle Attack
Fake Fake
DNS www.example.org
Server 100.1.1.1
What’s IP
address for 100.1.1.1 My credit card number is…
www.example.org?

Real
Browser www.example.org

Date: 02-10-2024 41
Secure Servers
Preventing Man-in-the-Middle
Fake Fake
DNS www.example.org
Server 100.1.1.1
What’s IP
address for 100.1.1.1 Send me a certificate of identity
www.example.org?

Real
Browser www.example.org

Date: 02-10-2024 43
Internet Protocol (IP)

IP is the fundamental protocol defining the


Internet (as the name implies!)
IP address:
32-bit number (in IPv4)
Associated with at most one device at a time
(although device may have more than one)
Written as four dot-separated bytes, e.g. 192.0.34.166

Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 45


IP

IP function: transfer data from source device to


destination device
IP source software creates a packet representing the
data
Header: source and destination IP addresses, length of data,
etc.
Data itself
If destination is on another LAN, packet is sent to a
gateway that connects to more than one network
Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 46
IP

Source

Network 1

Gateway
Destination

Gateway

Network 2 Network 3

Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 47


IP

Source

LAN 1

Gateway
Destination

Gateway

Internet Backbone LAN 2

Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 48


Transmission Control Protocol
(TCP)

Limitations of IP:
No guarantee of packet delivery (packets can be
dropped)
Communication is one-way (source to destination)
TCP adds concept of a connection on top of IP
Provides guarantee that packets delivered
Provide two-way (full duplex) communication

Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 49


TCP
Establish
connection.
{ Can I talk to you?

OK. Can I talk to you?

OK.

{
Here’s a packet.
Send packet
Source Destination
with Got it.
acknowledgment.

Here’s a packet.

{
Resend packet if
no (or delayed) Here’s a resent packet.
acknowledgment.
Got it.

Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 50


TCP

TCP also adds concept of a port


TCP header contains port number representing an
application program on the destination computer
Some port numbers have standard meanings
• Example: port 25 is normally used for email
transmitted using the Simple Mail Transfer Protocol
(SMTP)
Other port numbers are available first-come-first
served to any application

Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 51


User Datagram Protocol (UDP)

Like TCP in that:


Builds on IP
Provides port concept
Unlike TCP in that:
No connection concept
No transmission guarantee
Advantage of UDP vs. TCP:
Lightweight, so faster for one-time messages
Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 52
Domain Name Service (DNS)

DNS is the “phone book” for the Internet


Map between host names and IP addresses
DNS often uses UDP for communication
Host names
Labels separated by dots, e.g., www.example.org
Final label is top-level domain
• Generic: .com, .org, etc.
• Country-code: .us, .il, etc.

Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 53


DNS
Domains are divided into second-level
domains, which can be further divided into
subdomains, etc.
E.g., in www.example.com, example is a second-
level domain
A host name plus domain name information is
called the fully qualified domain name of the
computer
Above, www is the host name, www.example.com
is the FQDN
Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 54
DNS

nslookup program provides command-line


access to DNS (on most systems)
looking up a host name given an IP address is
known as a reverse lookup
Recall that single host may have multiple IP
addresses.
Address returned is the canonical IP address specified
in the DNS system.
Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 55
DNS

ipconfig (on windows) can be used to find


the IP address (addresses) of your machine
ipconfig /displaydns displays the
contents of the DNS Resolver Cache
(ipconfig /flushdns to flush it)

Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 56


Analogy to Telephone Network

IP ~ the telephone network


TCP ~ calling someone who answers, having a
conversation, and hanging up
UDP ~ calling someone and leaving a message
DNS ~ directory assistance

Guy-Vincent Jourdan :: CSI 3140 :: based on Jeffrey C. Jackson’s slides 57

You might also like