Introduction to Networking

Internet: Example

• Click -> get page• Specifies

- protocol (http) - location (www.cnn.com)

Internet: Locating Resource

• www.cnn.com – name of a computer – Implicitly also a file

• Map name to IP address– DNS

host local comcnn.com? cnn.com?

a.b.c.d a.b.c.d

Internet: Connection

• Http sets up a connection (tcp) – between the host and cnn.com to transfer the page

• The connection transfers page as a byte stream – without errors: flow control + error control

Host www.cnn.com

Connect

OK

Get page

Page; close

Internet: End-to-end• Byte stream flows end to end across many links/switches:

– routing (+ addressing)

• That stream is regulated and controlled by both ends:– retransmission of erroneous or missing bytes; flow control

HOST

CNN.COM

end-to-end pacing and error control

routing

Internet: Packets• The network transports bytes grouped into packets• Packets are “self-contained”; routers handle them 1 by 1• The end hosts worry about errors and pacing

– Destination sends ACKs; Source checks losses

C

HOST: B

CNN.COM: AA | B | # , CRC | bytes

B: toC

Internet: Bits

• Equipment in each node sends packets as string of bits• That equipment is not aware of the meaning of the bits• Frames (packetizing) vs. streams

01011...011...110

Transmitter Physical Medium Receiver

01011...011...110

OpticalCopperWireless

Internet: Points to remember

• Separation of tasks– send bits on a link: transmitter/receiver [clock, modulation,…]

– send packet on each hop [framing, error detection,…]

– send packet end to end [addressing, routing]

– pace transmissions [detect congestion]

– retransmit erroneous or missing packets [acks, timeout]

– find destination address from name [DNS]

• Scalability– routers don’t know full path– names and addresses are hierarchical

Internet : Challenges

• Addressing ?• Routing ?• Reliable transmission ?• Interoperability ?• Resource management ?• Quality of service ?

Concepts at heart of the Internet

• Protocol• Layered Architecture• Packet Switching• Distributed Control• Open System

Protocol

• Two communicating entities must agree on:– Expected order and meaning of messages they exchange– The action to perform on sending/receiving a message

• Asking the time

Layered Architectures

• Human beings can handle lots of complexity in their protocol processing.– Ambiguously defined protocols– Many protocols all at once

• How computers manage complex protocol processing?– Specify well defined protocols to enact.– Decompose complicated jobs into layers;

• each has a well defined task

Layered Architectures

• Break-up design problem into smaller problems– More manageable

• Modular design: easy to extend/modify.• Difficult to implement

– careful with interaction of layers for efficiency

Layered Architecture

Web, e-mail, file transfer, ...

Reliable/ordered transmission, QOS,security, compression, ...

End-to-end transmission,resource allocation, routing, ...

Point-to-point links,LANs, radios, ...

Applications

Middleware

Routing

Physical Links

usersnetwork

The OSI Model

• Open Systems Interconnect model is a standard way of understanding conceptual layers of network comm.

• This is a model, nobody builds systems like this.• Each level provides certain functions and guarantees, and

communicates with the same level on remote notes.• A message is generated at the highest level, and is passed

down the levels, encapsulated by lower levels, until it is sent over the wire.

• On the destination, it makes its way up the layers,until the high-level msg reaches its high-level destination.

OSI Levels

Presentation

Transport

Network

Data Link

Physical

Application

Presentation

Transport

Network

Data Link

Physical

ApplicationNode A Node B

Network

OSI Levels

• Physical Layer: electrical details of bits on the wire• Data Link: sending “frames” of bits and error detection• Network Layer:” routing packets to the destination• Transport Layer: reliable transmission of messages,

disassembly/assembly, ordering, retransmission of lost packets• Session Layer; really part of transport, typ. Not impl.• Presentation Layer: data representation in the message• Application: high-level protocols (mail, ftp, etc.)

Internet protocol stack

HTTP, SMTP, FTP, TELNET, DNS, …

TCP, UDP.

IP

Point-to-point links,LANs, radios, ...

Application

Transport

Network

Physical

usersnetwork

Air travel

Ticket (purchase)

Baggage (check)

Gates (load)

Runway (take off)

Passenger Origin

Ticket (complain)

Baggage (claim)

Gates (unload)

Runway (landing)

Passenger Destination

Airplane routing

Protocol stack

e-mail client

TCP server

IP server

ethernetdriver/card

user X

SMTP

TCP

IP

e-mail server

TCP server

IP server

ethernetdriver/card

user Y

IEEE 802.3 standard

electric signals

English

Protocol interfaces

e-mail client

TCP server

IP server

ethernetdriver/card

user X

e-mail server

TCP server

IP server

ethernetdriver/card

user Y

s = open_socket();socket_write(s, buffer);…

Addressing

• Each network interface has a hardware address– Multiple interfaces multiple addresses

• Each application communicates via a port– Port is a logical connection endpoint

– Allows multiple local applications to use network resources

– Up to 65535• < 1024 : used by privileged applications• 1024 ≤ available for use ≤ 49151• 49152 ≤ Dynamic ports/private ports ≤ 65535

– http ports 80 and 8080

– telnet 23, ftp 21, etc

• Think of a telephone network …

Addressing and Packet Format

• The ``Data'' segment contains higher level protocol information.

– Which protocol is this packet destined for?

– Which process is the packet destined for?

– Which packet is this in a sequence of packets?

– What kind of packet is this?

• This is the stuff of the OSI reference model.

Start (7 bytes)

Destination (6)

Source (6)

Length (2)

Msg Data (1500)

Checksum (4)

Ethernet packet dispatching

• An incoming packet comes into the Ethernet controller.• The Ethernet controller reads it off the network into a buffer.• It interrupts the CPU.• A network interrupt handler reads the packet out of the controller into

memory.• A dispatch routine looks at the Data part and hands it to a higher level

protocol• The higher level protocol copies it out into user space.• A program manipulates the data.• The output path is similar.• Consider what happens when you send mail.

Example: MailHi Dad.

Hi Dad.

To: Dad

SrcAddr: 128.95.1.2DestAddr: 128.95.1.3SrcPort: 110, DestPort: 110Bytes: 1-20

Hi Dad.

To: Dad

SrcEther: 0xdeadbeefDestEther: 0xfeedfaceSrcAddr: 128.95.1.2DestAddr: 128.95.1.3SrcPort: 100DestPort: 200Bytes: 1-20

Hi Dad.

To: Dad

Mail Composition And Display

Mail Transport Layer

Network Transport Layer

Link Layer

Network

User

Kernel

Hi Dad.

Hi Dad.

To: Dad

SrcAddr: 128.95.1.2DestAddr: 128.95.1.3SrcPort: 110, DestPort: 110Bytes: 1-20

Hi Dad.

To: Dad

SrcEther: 0xdeadbeefDestEther: 0xfeedfaceSrcAddr: 128.95.1.2DestAddr: 128.95.1.3SrcPort: 100DestPort: 200Bytes: 1-20

Hi Dad.

To: Dad

Protocol encapsulation

e-mail client

TCP server

IP server

ethernetdriver/card

user X

e-mail server

TCP server

IP server

ethernetdriver/card

user Y“Hello”

“Hello”

“Hello”

“Hello”

“Hello”

End-to-End Argument

• What function to implement in each layer?• Saltzer, Reed, Clarke 1984

– A function can be correctly and completely implemented only with the knowledge and help of applications standing at the communication endpoints

– Argues for moving function upward in a layered architecture

• Should the network guarantee packet delivery ?– Think about a file transfer program

– Read file from disk, send it, the receiver reads packets and writes them to the disk

End-to-End Argument

• If the network guaranteed packet delivery– one might think that the applications would be simpler

• No need to worry about retransmits– But need to check that file was written to the remote disk

intact

• A check is necessary if nodes can fail– Consequently, applications need to perform their retransmits

• No need to burden the internals of the network with properties that can, and must, be implemented at the periphery

End-to-End Argument

• An Occam’s razor for Internet design– If there is a problem, the simplest explanation is probably the

correct one

• Application-specific properties are best provided by the applications, not the network– Guaranteed, or ordered, packet delivery, duplicate

suppression, security, etc.

• The internet performs the simplest packet routing and delivery service it can– Packets are sent on a best-effort basis– Higher-level applications do the rest

Two ways to handle networking

• Circuit Switching– What you get when you make a phone call– Dedicated circuit per call

• Packet Switching– What you get when you send a bunch of letters– Network bandwidth consumed only when sending– Packets are routed independently

Circuit Switching

• End-to-end resources reserved for “call”– Link bandwidth, switch capacity

– Dedicated resources: no sharing

– Circuit-like (guaranteed) performance

– Call setup required

Packet Switching

• Each end-to-end data stream divided into packets– User’s packets share network resources

• Compared to dedicated allocation

– Each packet uses full link bandwidth• Compared to dividing bandwidth into pieces

– Resources are used as needed• Compared to resource reservation

• Resource contention:– Aggregate demand can exceed amount available– Congestion: packets queue, wait for link use– Store and forward: packets move one hop at a time

• Transmit over link• Wait turn at next link

Routing

• Goal: move data among routers from source to dest.• Datagram packet network:

– Destination address determines next hop

– Routes may change during session

– Analogy: driving, asking directions

– No notion of call state

• Circuit-switched network:– Call allocated time slots of bandwidth at each link

– Fixed path (for call) determined at call setup

– Switches maintain lots of per call state: resource allocation

Packet vs. Circuit Switching

• Reliability: no congestion, in-order data in circuit-switch• Packet switching: better bandwidth use• State, resources: packet switching has less state

– Good: less control plane processing resources along the way

– More data plane (address lookup) processing

• Failure modes (routers/links down)– Packet switch reconfigures sub-second timescale

– Circuit switching: more complicated• Involves all switches in the path

A small Internet

A

V

R

B

W

r1,e1r2,e2

r3

a,e3

w,e5 b,e4

Scenario:A wants to send data to B.

Packet forwarding

HTTP

TCP

IP

ethernet

Host A

IP

eth

Router R

link

HTTP

TCP

IP

ethernet

Router W

Host B

IP

ethlink

The Link Layer

What is purpose of this layer?

• Physically encode bits on the wire• Link = pipe to send information

– E.g. point to point or broadcast

• Can be built out of:– Twisted pair, coaxial cable, optical fiber, radio waves, etc

• Links should only be able to send data– Could corrupt, lose, reorder, duplicate, (fail in other ways)

How to connect routers/machines?

• WAN/Router Connections– Commercial:

• T1 (1.5 Mbps), T3 (44 Mbps)• OC1 (51 Mbps), OC3 (155 Mbps)• ISDN (64 Kbps)• Frame Relay (1-100 Mbps, usually 1.5 Mbps)• ATM (some Gbps)

– To your home:• DSL• Cable

• Local Area:– Ethernet: IEEE 802.3 (10 Mbps, 100 Mbps, 1 Gbps, 10Gbps)– Wireless: IEEE 802.11 b/g/a (11 Mbps, 22 Mbps, 54 Mbps)

Link level Issues

• Encoding: map bits to analog signals• Framing: Group bits into frames (packets)• Arbitration: multiple senders, one resource• Addressing: multiple receivers, one wire

Encoding

• Map 1s and 0s to electric signals• Simple scheme: Non-Return to Zero (NRZ)

– 0 = low voltage, 1 = high voltage

• Problems:– How to tell an error? When jammed? When is bus idle?

– When to sample? Clock recovery is difficult.• Idea: Recover clock using encoding transitions

1 0 1 1 0

Manchester Encoding

• Used by Ethernet• Idea: Map 0 to low-to-high transition, 1 to high-to-low

• Plusses: can detect dead-link, can recover clock• Bad: reduce bandwidth, i.e. bit rate = ½ baud rate

– If wire can do X transition per second?

0 1 1 0

Framing

• Why send packets?– Error control

• How do you know when to stop reading?– Sentinel approach: send start and end sequence

– For example, if sentinel is 11111

– 11111 00101001111100 11111 10101001 11111 010011 11111

– What if sentinel appears in the data?• map sentinel to something else, receiver maps it back

– Bit stuffing

Example: HDLC

• Same sentinel for begin and end: 0111 1110• packet format:

• Bit stuffing– Sender: If 5 1s then insert a 0

– Receiver: if 5 1s followed by a 0, remove 0

• Else read next bit

• Packet size now depends on the contents

0111 1110 header data CRC 0111 1110

0111 1110 0111 1101 0

0111 1101 0 0111 1110

Arbitration

• One medium, multiple senders– What did we do for CPU, memory, readers/writers?

– New Problem: No centralized control

• Approaches– TDMA: Time Division Multiple Access

• Divide time into slots, round robin among senders• If you exceed the capacity do not admit more (busy signal)

– FDMA: Frequency Division Multiple Access (AMPS)• Divide spectrum into channels, give each sender a channel• If no more channels available, give a busy signal

– Good for continuous streams: fixed delay, constant data rate

– Bad for bursty Internet traffic: idle slots

Ethernet

• Developed in 1976, Metcalfe and Boggs at Xerox• Uses CSMA/CD:

– Carrier Sense Multiple Access with Collision Detection

• Easy way to connect LANs

Metcalfe’s Ethernet sketch

CSMA/CD

• Carrier Sense:– Listen before you speak

• Multiple Access:– Multiple hosts can access the network

• Collision Detection:– Can make out if someone else started speaking

Older Ethernet Frame

CSMA

Wait until

carrier free

CSMA/CA

Garbled signals

If the sender detects a collision, it will stop and then retry!

What is the problem?

CSMA/CD

Packet?

Sense Carrier

Discard Packet

Send Detect Collision

Jam channel b=CalcBackoff();

wait(b);attempts++;

No

Yes

attempts < 16

attempts == 16

Ethernet’s CSMA/CD (more)

Jam Signal: make sure all other transmitters are aware of collision; 48 bits;

Exponential Backoff: • Goal: adapt retransmission attempts to estimated current load

– heavy load: random wait will be longer• first collision: choose K from {0,1}; delay is K x 512 bit

transmission times• after second collision: choose K from {0,1,2,3}…• after ten or more collisions, choose K from {0,1,2,3,4,…,1023}

Packet Size

If packets are too small, the collision goes unnoticed Limit packet size Limit network diameter Use CRC to check frame integrity truncated packets are filtered out

Ethernet Problems

• What if there is a malicious user?– Might not use exponential backoff– Might listen promiscuously to packets

• Integrating Fast and Gigabit Ethernet

Addressing & ARP

• ARP is used to discover physical addresses• ARP = Address Resolution Protocol

“What is the physical address of the host named 128.84.96.89”

128.84.96.90128.84.96.89

128.84.96.91

“I’m at 1a:34:2c:9a:de:cc”

Addressing & RARP

• RARP is used to discover virtual addresses• RARP = Reverse Address Resolution Protocol

“I just got here. My physical address is 1a:34:2c:9a:de:cc. What’s my name ?”

128.84.96.90RARP Server

???

128.84.96.91

“Your name is 128.84.96.89”

Repeaters and Bridges

• Both connect LAN segments• Usually do not originate data• Repeaters (Hubs): physical layer devices

– forward packets on all LAN segments– Useful for increasing range– Increases contention

• Bridges: link layer devices– Forward packets only if meant on that segment– Isolates congestion– More expensive

Backbone Bridge

The Network Layer

Purpose of Network layer

• Given a packet, send it across the network to destination• 2 key issues:

– Portability: • connect different technologies

– Scalability• To the Internet scale

networkdata link

physical

networkdata link

physical

networkdata link

physical

networkdata link

physical

networkdata link

physical

networkdata link

physical

networkdata link

physical

networkdata link

physical

application

transport

networkdata link

physical

application

transport

networkdata link

physical

What does it involve?

Two important functions:• routing: determine path from source to dest. • forwarding: move packets from router’s input to

output

T1T3

Sts-1

T3

T1

Network service modelQ: What service model for

“channel” transporting packets from sender to receiver?

• guaranteed bandwidth?

• preservation of inter-packet timing (no jitter)?

• loss-free delivery?

• in-order delivery?

• congestion feedback to sender?? ??

virtual circuitor

datagram?

The most important abstraction provided

by network layer:

service abstraction

Which things can be “faked” at the transport layer?

Two connection models• Connectionless (or “datagram”):

– each packet contains enough information that routers can decide how to get it to its final destination

• Connection-oriented (or “virtual circuit”)– first set up a connection between two nodes– label it (called a virtual circuit identifier (VCI))– all packets carry label

BAb b

C

BA1 1

C

1

Virtual circuits: signaling protocols

• used to setup, maintain teardown VC• setup gives opportunity to reserve resources• used in ATM, frame-relay, X.25• not used in today’s Internet

application

transportnetwork

data linkphysical

application

transportnetwork

data linkphysical

1. Initiate call 2. incoming call

3. Accept call4. Call connected5. Data flow begins 6. Receive data

Virtual circuit switching• Forming a circuit:

– send a connection request from A to B. Contains VCI + address of B– rule: VCI must be unique on the link its used on– switch creates an entry mapping input messages with VCI to output

port– switch picks a new VCI unique between it and next switch

a b

25

21

c12

1

(Input link,VCI) (output link, new VCI)(1, 2) (?, ?)(1, 5) (?, ?)

Virtual circuit forwarding• For each VCI switch has a table which maps input link to

output link and gives the new VCI to use– if a’s messages come into switch 1 on link 2 and go out on link 3 then

the table will be:

a b

25

21

c12

1

Switch 1

Switch 2

Switch 3

Virtual Circuits: Discussion

• Plusses: easy to associate resources with VC– Easy to provide QoS guarantees (bandwidth, delay)– Very little state in packet

• Minuses:– Not good in case of crashes

• Requires explicit connect and teardown phases

– What if teardown does not get to all routers?– What if one switch crashes?

• Will have to teardown and rebuild route

Datagram networks• no call setup at network layer• routers: no state about end-to-end connections

– no network-level concept of “connection”

• packets typically routed using destination host ID– packets between same source-dest pair may take different paths

• Best effort: data corruption, packet drops, route loops

application

transportnetwork

data linkphysical

application

transportnetwork

data linkphysical

1. Send data 2. Receive data

Datagrams: ForwardingHow does packet get to the destination?• switch creates a “forwarding table”, mapping destinations to

output port (ignores input ports)• when a packet with a destination address in the table arrives, it

pushes it out on the appropriate output port• when a packet with a destination address not in the table

arrives, it must find out more routing information (next problem)

a b

c1d

22

00S1

S2

S3

1

01

Datagrams

• Plusses:– No round trip connection setup time

– No explicit route teardown

– No resource reservation each flow could get max bandwidth

– Easily handles switch failures; routes around it

• Minuses– Difficult to provide resource guarantees

– Higher per packet overhead

• Internet uses datagrams: IP (Internet Protocol)

Datagrams Forwarding

• How to build forwarding tables?– Manually enter it

• What if nodes crashed

• What about scale?

• The graph-theoretic routing problem– Given a graph, with vertices (switches), edges (links), and

edge costs (cost of sending on that link)– Find the least cost path between any two nodes

• Path cost = (cost of edges in path)

Simple Routing Algorithm

• Choose a central node– All nodes send their (nbr, cost) information to this node

– Central node uses info to learn entire topology of the network

– It then computes shortest paths between all pairs of nodes• Using All Pair Shortest Path Algorithm

– Sends the new matrix to every node

• Nice, simple, elegant!• What is the problem?

– Scalability: centralization hurts scalability

– Central node is “crushed” with traffic

Link State Routing

• Basic idea:– Every node propagates its (nbr, cost) information

– This information at all nodes is enough to construct topology

– Can use a graph algorithm to find the shortest routes

• Mechanisms required:– Reliable flooding of link information

– Method to calculate shortest route (Dijkstra’s algorithm)

• Example link state update packet:– [node id, (nbr, cost) list, seq. no., ttl]

• Seq. no. to identify latest updates, ttl specifies when to stop msg.

Reliable flooding

receive(pkt) If already have a copy of LSP from pkt.ID

if pkt’s sequence number <= copy’sdiscard pkt

elsedecrement pkt.TTLreplace copy with pktforward pkt to all links besides the one that we received it on

# done every 10 minutes or sogen_LSP()

increment node’s sequence # by onerecompute cost vectorsend created LSP to all neighbors

Discussion: Link-State Routing• Plusses:

– Simple, determines the optimal route most of the time– Used by OSPF

• Minuses:– Might have oscillations

– Avoid using load as cost metric, reduce herding effect

A

D

C

B1 1+e

e0

e

1 1

0 0

A

D

C

B2+e 0

001+e1

A

D

C

B0 2+e

1+e10 0

A

D

C

B2+e 0

e01+e1

Initially start withalmost equal routes

… everyone goes with least loaded

… recomputeLeast loaded =>

Most loaded

… recompute

Is our routing algorithm scalable?

• Route table size grows with size of network– Because our address structure is flat!

• Solution: have a hierarchical structure– Used by OSPF– Divide the network into areas, each area has unique number

• Nodes carry their area number in the address 1.A, 2.B, etc.

– Nodes know complete topology in their area– Area border routers (ABR) know how to get to any other

area

Hierarchical Addressing

1.a3.b

2.b11.b2

2

00S1

S2

S3

1

01

2.a

3

3.a

2

Zone 3

Zone 2

Forwarding table for switch 1Destination switch port

2. ?3. ? 1.b ? 1.a ?

IP has 2-layer addressing• Each IP address is 32 bits

– Network part: which network the host is on?– Host part: identifies the host.

• All hosts on same network have the same network part

• 3 classes of addresses: A, B and C

18.26.0.1

network 32-bits host

0 net host

1 7 24 bits

1 0 net host

2 14 16 bits

110 net host

3 21 8 bits

IP addressing• The different classes:

• Problems: inefficient, address space exhaustion

0network host

10 network host

110 network host

1110 multicast address

A

B

C

D

class1.0.0.0 to127.255.255.255

128.0.0.0 to191.255.255.255

192.0.0.0 to223.255.255.255

224.0.0.0 to239.255.255.255

32 bits

Unicast

Multicast

1111 reservedE 240.0.0.0 to255.255.255.255Reserved

IP addressing: CIDR

• Classless InterDomain Routing– network portion of address of arbitrary length– address format: a.b.c.d/x, where x is # bits in network portion

– Examples:• Class A: /8• Class B: /16• Class C: /24

11001000 00010111 00010000 00000000

networkpart

hostpart

200.23.16.0/23

Internet Protocol Datagram

ver length

32 bits

data (variable length,typically a TCP or UDP segment)

16-bit identifier Internet

checksumtime tolive

32 bit source IP address

IP protocol versionNumber

header length

max numberremaining hops

(decremented at each router)

forfragmentation/reassembly

total datagramlength (bytes)

upper layer protocolto deliver payload to

head.len

type ofservice

“type” of data flgsfragment offset

upper layer

32 bit destination IP address

Options (if any) E.g. timestamp,record routetaken, pecifylist of routers to visit.

Datagram Portability

• IP Goal: To create one logical network from multiple physical networks– All intermediate routers should understand IP

– IP header information sufficient to carry the packet to destination

– Goal: Run over anything!

• Problem:– Physical networks have different MTUs

– “max. transmission unit”: 1500 for Ethernet, 48 for ATM

• Solution 1:– Fit everything in the MTU (!)

IP Fragmentation & Reassembly

• Solution 2: (the one used)– If packet size > MTU of network, then fragment into pieces

• Each fragment is less than MTU size• Each has IP headers + frag bit set + frag id + offset

– Packets may get refragmented on the way to destination

– Reassembly only done at the destination

– What is a good initial packet size?

fragmentation: in: one large datagramout: 3 smaller datagrams

reassembly

Internet: Names and Addresses

Naming in the Internet

• What are named? All Internet Resources.– Objects: www.cs.cornell.edu/~einar

– Services: weather.yahoo.com/forecast

– Hosts: planetlab1.cs.cornell.edu

• Characteristics of Internet Names– human recognizable

– unique

– Persistent?

• Universal Resource Names (URNs)

Locating the resources

• Internet services and resources are provided by end-hosts– ex. www1.cs.cornell.edu and www2.cs.cornell.edu host Einar’s home

page.

• Names are mapped to Locations– Universal Resource Locators (URL)

– Embedded in the name itself: ex. weather.yahoo.com/forecast

• Semantics of Internet naming human recognizable

uniqueness

x persistent

Locating the Hosts?

• Internet Protocol Addresses (IP Addresses)– ex. planetlab1.cs.cornell.edu 128.84.154.49

• Characteristics of IP Addresses– 32 bit fixed-length

– enables network routers to efficiently handle packets in the Internet

• Locating services on hosts– port numbers (16 bit unsigned integer) 65536 ports

– standard ports: HTTP 80, FTP 20, SSH 22, Telnet 20

Mapping Not 1 to 1

• One host may map to more than one name – One server machine may be the web server (www.foo.com),

mail server (mail.foo.com)etc.

• One host may have more than one IP address– IP addresses are per network interface

• But IP addresses are generally unique!– two globally visible machines should not have the same IP

address– Anycast is an Exception:

• routers send packets dynamically to the closest host matching an anycast address

How to get a name?

• Naming in Internet is Hierarchical– decreases centralization– improves name space management

• First, get a domain name then you are free to assign sub names in that domain– How to get a domain name coming up

• Example: weather.yahoo.com belongs to yahoo.com which belongs to .com– regulated by global non-profit bodies

Domain name structure

ccTLDs

root (unnamed)

com milgovedu grorgnet fr ukus ......

cornell ustreas second level (sub-)domainslucent

gTLDs

gTLDs= Generic Top Level Domains ccTLDs = Country Code Top Level Domains

Top-level Domains (TLDs)

• Generic Top Level Domains (gTLDs)– .com - commercial organizations– .org - not-for-profit organizations– .edu - educational organizations– .mil - military organizations– .gov - governmental organizations– .net - network service providers– New: .biz, .info, .name, .xxx (nearly..)

• Country code Top Level Domains (ccTLDs)– One for each country

How to get a domain name?• In 1998, non-profit corporation, Internet

Corporation for Assigned Names and Numbers (ICANN), was formed to assume responsibility from the US Government

• ICANN authorizes other companies to register domains in com, org and net and new gTLDs– Network Solutions is largest and in transitional

period between US Govt and ICANN had sole authority to register domains in com, org and net

ICANN and politics..

• Why should a US company control Internet naming?

• Should companies (from whatever country) be able to profit from internet names?– 28th Aug 2006: “ICANN to allow domain registries to charge ‘what the market will bear’ for domain names & renewals”

How to get an IP Address?

• Answer 1: Normally, answer is get an IP address from your upstream provider– This is essential to maintain efficient routing!

• Answer 2: If you need lots of IP addresses then you can acquire your own block of them.– IP address space is a scarce resource - must prove you

have fully utilized a small block before can ask for a larger one and pay $$ (Jan 2002 - $2250/year for /20 and $18000/year for a /14)

How to get lots of IP Addresses? Internet

RegistriesRIPE NCC (Riseaux IP Europiens Network Coordination

Centre) for Europe, Middle-East, Africa

APNIC (Asia Pacific Network Information Centre )for Asia and Pacific

ARIN (American Registry for Internet Numbers) for the Americas, the Caribbean, sub-saharan Africa

Note: Once again regional distribution is important for efficient routing!

Can also get Autonomous System Numbers (ASNs from these registries

Are there enough addresses?

• Unfortunately No!– 32 bits 4 billion unique addresses– but addresses are assigned in chunks– ex. cornell has four chunks of /16 addressed

• ex. 128.84.0.0 to 128.84.255.255• 128.253.0.0, 128.84.0.0, 132.236.0.0, and 140.251.0.0

• Expanding the address space!– IPv6 128 bit addresses– difficult to deploy (requires cooperation and

changes to the core of the Internet)

DHCP and NATs

• Dynamic Host Control Protocol– lease IP addresses for short time intervals– hosts may refresh addresses periodically only live hosts need valid IP addresses

• Network Address Translators– Hide local IP addresses from rest of the world– only a small number of IP addresses are visible outside solves address shortage for all practical purposes access is highly restricted

• ex. peer-to-peer communication is difficult

NATs in operation

• Translate addresses when packets traverse through NATs

• Use port numbers to increase number of supportable flows

DNS: Domain Name System

Domain Name System:• distributed database implemented in

hierarchy of many name servers• application-layer protocol host, routers, name

servers to communicate to resolve names (address/name translation)– Note how a core Internet function is implemented

as application-layer protocol– complexity at network’s “edge”

DNS name serversName server: process

running on a host that processes DNS requests

local name servers:– each ISP, company has

local (default) name server– host DNS query first goes to

local name server

authoritative name server:– can perform name/address

translation for a specific domain or zone

How could we provide this service? Why not centralize DNS?

• single point of failure• traffic volume• distant centralized database• maintenance

doesn’t scale!

• no server has all name-to-IP address mappings

Name Server Zone Structure

root

Structure based onadministrative issues.

lucent

com miledugov grorgnet fr ukus

ustreas

www

irs Zone: subtree with commonadministration authority.

Name Servers (NS)

root

cornelllucent

com ...edugov

ustreas

customs

www

irsIRS NS

Ustreas NSLucent NS

Root NS

Name Servers (NS)

• NSs are duplicated for reliability.

• Each domain must have a primary and secondary.

• Each host knows the IP address of the local NS.

• Each NS knows the IP addresses of all root NSs.

DNS: Root name servers

• contacted by local name server that can not resolve name

• root name server:– Knows the

authoritative name server for main domain

• ~ 60 root name servers worldwide– real-world

application of anycast

Simple DNS example

host surf.eurecom.fr wants IP address of www.cs.cornell.edu

1. Contacts its local DNS server, dns.eurecom.fr

2. dns.eurecom.fr contacts root name server, if necessary

3. root name server contacts authoritative name server, dns.cornell.edu, if necessary (what might be wrong with this?)

requesting hostsurf.eurecom.fr

www.cs.cornell.edu

root name server

authorititive name serverdns.cornell.edu

local name serverdns.eurecom.fr

1

23

4

5

6

DNS exampleRoot name server:• may not know

authoritative name server

• may know intermediate name server: who to contact to find authoritative name server

requesting hostsurf.eurecom.fr

www.cs.cornell.edu

root name server


1

23

4

authoritative name serverdns.cs.cornell.edu

intermediate name server

dns.cornell.edu

7

10

56

.edu name server

89

DNS Architecture

• Hierarchical Namespace Management– domains and sub-domains– distributed and localized authority

• Authoritative Nameservers– server mappings for specific sub-domains– more than one (at least two for failure resilience)

• Caching to mitigate load on root servers– time-to-live (ttl) used to delete expired cached

mappings

DNS: query resolution

iterated query:• contacted server

replies with name of server to contact

• “I don’t know this name, but ask this server”

• Takes burden off root servers

recursive query:• puts burden of name

resolution on contacted name server

• reduces latency requesting hostsurf.eurecom.fr

www.cs.cornell.edu

root name server


1

23

4

8 7

authoritative name serverdns.cs.cornell.edu

intermediate name serverdns.cornell.edu

9

10

iterated query

56

.edu name server

recursive query

DNS records: More than Name to IP Address

DNS: distributed db storing resource records (RR)

• Type=NS– name is domain (e.g. foo.com)– value is IP address of

authoritative name server for this domain

RR format: (name, value, type,ttl)

• Type=A– name is hostname– value is IP address

– One we’ve been discussing; most common

• Type=CNAME– name is an alias name

for some “cannonical” (the real) name

– value is cannonical name

• Type=MX– value is hostname of

mailserver associated with name

nslookup

• Use to query DNS servers (not telnet like with http – why?)

• Examples:– nslookup www.yahoo.com– nslookup www.yahoo.com

dns.cs.cornell.edu• specify which local nameserver to use

– nslookup –type=mx cs.cornell.edu• specify record type

PTR Records

• Do reverse mapping from IP address to name

• Why is that hard? Which name server is responsible for that mapping? How do you find them?

• Answer: special root domain, arpa, for reverse lookups

Arpa top level domain

root

com miledugov grorgnet fr ukusarpa

In-addr

128

30 33 1

ietf

www

1.33.30.128.in-addr.arpa.

www.ietf.org.

Want to know machine name for 128.30.33.1?Issue a PTR request for 1.33.30.128.in-addr.arpa

Why is it backwards?

• Notice that 1.30.33.128.in-addr.arpa is written in order of increasing scope of authority just like www.cs.foo.edu

• Edu largest scope of authority; foo.edu less, down to single machine www.cs.foo.edu

• Arpa largest scope of authority; in-addr.arpa less, down to single machine 1.30.33.128.in-addr.arpa (or 128.33.30.1)

http://www.cs.foo.edu/

In-addr.arpa domain

• When an organization acquires a domain name, they receive authority over the corresponding part of the domain name space.

• When an organization acquires a block of IP address space, they receive authority over the corresponding part of the in-addr.arpa space.

• Example: Acquire domain berkeley.edu and acquire a class B IP Network ID 128.143

DNS protocol, messagesDNS protocol : query and reply messages, both with same message format

msg header• identification: 16 bit # for

query, reply to query uses same #

• flags:

– query or reply

– recursion desired

– recursion available

– reply is authoritative

– reply was truncated

DNS protocol, messages

Name, type fields for a query

RRs in reponseto query

records forauthoritative servers

additional “helpful”info that may be used

The Transport Layer

Purpose of this layer• Interface end-to-end applications and protocols

– Turn best-effort IP into a usable interface

• Data transfer b/w processes:– Compared to end-to-end IP

• We will look at 2:– TCP– UDP

application

transport

networkdata link

physical

application

transport

networkdata link

physical

networkdata link

physical

networkdata link

physical

networkdata link

physical

networkdata link

physicalnetworkdata link

physical

logical end-end transport

UDP• Unreliable Datagram Protocol

• Best effort data delivery between processes– No frills, bare bones transport protocol

– Packet may be lost, out of order

• Connectionless protocol:– No handshaking between sender and receiver

– Each UDP datagram handled independently

UDP Functionality

• Multiplexing/Demultiplexing– Using ports

• Checksums (optional)– Check for corruption

applicationtransportnetwork

MP2

applicationtransportnetwork

receiver

HtHnsegment

segment Mapplicationtransportnetwork

P1M

M MP4

segmentheader

application-layerdata

P3

Multiplexing/Demultiplexing• Multiplexing:

– Gather data from multiple processes, envelope data with header – Header has src port, dest port for multiplexing

• Why not process id?

• Demultiplexing:– Separate incoming data in machine to different applications– Demux based on sender addr, src and dest port

source port #dest port #

32 bits

Applicationdata

(message)

UDP segment format

length checksumLength, in

bytes of UDPsegment,including

header

Implementing Ports

• As a message queue– Append incoming message to the end– Much like a mailbox file

• If queue full, message can be discarded• When application reads from socket

– OS removes some bytes from the head of the queue

• If queue empty, application blocks waiting

UDP Checksum

• Over the headers and data– Ensures integrity end-to-end– 1’s complement sum of segment contents

• Is optional in UDP• If checksum is non-zero, and receiver computes

another value:– Silently drop the packet, no error message detected

UDP Discussion

• Why UDP?– No delay in connection establishment

– Simple: no connection state

– Small header size

– No congestion control: can blast packets

• Uses:– Streaming media, DNS, SNMP– Could add application specific error recovery

TCP

• Transmission Control Protocol– Reliable, in-order, process-to-process, two-way byte stream

• Different from UDP– Connection-oriented– Error recovery: Packet loss, duplication, corruption,

reordering

• A number of applications require this guarantee– Web browsers use TCP

Handling Packet Losssender receiver

time

message

There are a number of reasons why the packet may get lost:- router congestion, lossy medium, etc.

How does sender know of a successful packet send?

Lost Acks

sender receiver

time

message

ack

What if packet/ack is lost?

timeout

Delayed ACKssender receiver

time

message

ack

timeout

What will happen here? Due to congestion, small timeout, …Delayed ACKs duplicate packets

message

Delayed ACKssender receiver

time

m1

ack

timeout

m1

timeout

m2

ack

How to solve this scenario?

Insertion of Packetssender receiver

time

m1

ack1

m2

ack2

m2’

m2’ could be from an old expired session!

Message Identifiers

• Each message has <message id, session id>– Message id: uniquely identifies message in sender– Session id: unique across sessions

• Message ids detect duplication, reordering• Session ids detect packet from old sessions• TCP’s sequence number has similar functionality:

– Initial number chosen randomly– Unique across packets– Incremented by length of data bytes

TCP Packetssource port # dest port #

32 bits

applicationdata

(variable length)

sequence numberacknowledgement numberrcvr window size

ptr urgent datachecksum

FSRPAUheadlen

notused

Options (variable length)

URG: urgent data (generally not used)

ACK: ACK #valid

PSH: push data now(generally not used)

RST, SYN, FIN:connection estab(setup, teardown

commands)

# bytes rcvr willingto accept

countingby bytes of data(not segments!)

Internetchecksum

(as in UDP)

TCP Connection Establishment

sender receiver

(ack x, seq # y)

(open, seq # x)

(ack y)

TCP is connection-oriented. Starts with a 3-way handshake.Protects against duplicate SYN packets.

TCP Usagesender

receiver

(ack x, seq

# y)

(open, seq # x)

(ack y)Data

Data, ACK

Fin, ACK

Fin, ACK

TCP timeouts

• What is a good timeout period ?– Want to improve throughput without unnecessary transmissions

• Timeout is thus a function of RTT and deviation

NewAverageRTT = (1 - ) OldAverageRTT + LatestRTTNewAverageDev = (1 - ) OldAverageDev + LatestDevwhere LatestRTT = (ack_receive_time – send_time), LatestDev = |LatestRTT – AverageRTT|, = 1/8, typically.Timeout = AverageRTT + 4*AverageDev

TCP Windows

• Multiple outstanding packets can increase throughput

TCP Windows

• Can have more than one packet in transit

• Especially over fat pipes, e.g. satellite connection

• Need to keep track of all packets within the window

• Need to adjust window size

DATA, id=17DATA, id=18DATA, id=19DATA, id=20

ACK 17

ACK 18

ACK 19

ACK 20

TCP Congestion Control

• TCP increases its window size when no packets dropped

• It halves the window size when a packet drop occurs

– A packet drop is evident from the acknowledgements

• Therefore, it slowly builds to the max bandwidth, and hover around the max

– It doesn’t achieve the max possible though

– Instead, it shares the bandwidth well with other TCP connections

• This linear-increase, exponential backoff in the face of congestion is termed TCP-friendliness

TCP Window Size

• Linear increase• Exponential

backoff

• Assuming no other losses in the network except those due to bandwidth

Time

Ban

dwid

th

Max Bandwidth

TCP Fairness

• Want to share the bottleneck link fairly between two flows

Bandwidth for Host B

Ban

dwid

th f

or H

ost A

B

A

BottleneckLink

D

TCP Slow Start

• Linear increase takes a long time to build up a window size that matches the link bandwidth*delay

• Most file transactions are not long enough

• Consequently, TCP can spend a lot of time with small windows, never getting the chance to reach a sufficiently large window size

• Fix: Allow TCP to build up to a large window size initially by doubling the window size until first loss

TCP Slow Start

• Initial phase of exponential increase

• Assuming no other losses in the network except those due to bandwidth

Time

Ban

dwid

th

Max Bandwidth

TCP Summary

• Reliable ordered message delivery• Connection oriented, 3-way handshake

• Transmission window for better throughput• Timeouts based on link parameters

• Congestion control• Linear increase, exponential backoff

• Fast adaptation• Exponential increase in the initial phase

Introduction to Networking

Documents

Transcript of Introduction to Networking