University of Maryland - UMIACSjbg/teaching/LBSC_690_2012/lecture_02.pdf · University of Maryland...

67
The Web: Moving Data Around the World LBSC 690: Jordan Boyd-Graber University of Maryland September 17, 2012 Adapted from Jimmy Lin’s Slides LBSC 690: Jordan Boyd-Graber (UMD) The Web: Moving Data Around the World September 17, 2012 1 / 45

Transcript of University of Maryland - UMIACSjbg/teaching/LBSC_690_2012/lecture_02.pdf · University of Maryland...

Page 1: University of Maryland - UMIACSjbg/teaching/LBSC_690_2012/lecture_02.pdf · University of Maryland ... /windows/index.html I File extensions ... Also used for “geolocation” (which

The Web: Moving Data Around the World

LBSC 690: Jordan Boyd-Graber

University of Maryland

September 17, 2012

Adapted from Jimmy Lin’s Slides

LBSC 690: Jordan Boyd-Graber (UMD) The Web: Moving Data Around the World September 17, 2012 1 / 45

Page 2: University of Maryland - UMIACSjbg/teaching/LBSC_690_2012/lecture_02.pdf · University of Maryland ... /windows/index.html I File extensions ... Also used for “geolocation” (which

Goals (Computer - Hardware / Computer - Computer)

How data are stored

How the web works

Create your first webpage

Learn how to transfer files

LBSC 690: Jordan Boyd-Graber (UMD) The Web: Moving Data Around the World September 17, 2012 2 / 45

Page 3: University of Maryland - UMIACSjbg/teaching/LBSC_690_2012/lecture_02.pdf · University of Maryland ... /windows/index.html I File extensions ... Also used for “geolocation” (which

Outline

1 Storage

2 Protocols and the Internet

3 Making a Webpage

4 Discussion

5 Practice Problems

LBSC 690: Jordan Boyd-Graber (UMD) The Web: Moving Data Around the World September 17, 2012 3 / 45

Page 4: University of Maryland - UMIACSjbg/teaching/LBSC_690_2012/lecture_02.pdf · University of Maryland ... /windows/index.html I File extensions ... Also used for “geolocation” (which

What are some kinds of storage?

RAM

Flash memory

Magnetic (Hard Disk)

Optical memory

LBSC 690: Jordan Boyd-Graber (UMD) The Web: Moving Data Around the World September 17, 2012 4 / 45

Page 5: University of Maryland - UMIACSjbg/teaching/LBSC_690_2012/lecture_02.pdf · University of Maryland ... /windows/index.html I File extensions ... Also used for “geolocation” (which

RAM

Lots of little electronic switches

Jay Forrester (MIT): First practical RAM (1951)

Little magnetic donuts; orientation could be switched / read bysending appropriate electric pulses

Unlike tape, you could read anything at any time (random access)

Volatile

LBSC 690: Jordan Boyd-Graber (UMD) The Web: Moving Data Around the World September 17, 2012 5 / 45

Page 6: University of Maryland - UMIACSjbg/teaching/LBSC_690_2012/lecture_02.pdf · University of Maryland ... /windows/index.html I File extensions ... Also used for “geolocation” (which

RAM

Lots of little electronic switchesJay Forrester (MIT): First practical RAM (1951)Little magnetic donuts; orientation could be switched / read bysending appropriate electric pulsesUnlike tape, you could read anything at any time (random access)Volatile

But don’t count on volatility for security

LBSC 690: Jordan Boyd-Graber (UMD) The Web: Moving Data Around the World September 17, 2012 5 / 45

Page 7: University of Maryland - UMIACSjbg/teaching/LBSC_690_2012/lecture_02.pdf · University of Maryland ... /windows/index.html I File extensions ... Also used for “geolocation” (which

Flash

Like RAM, lots of little electronicswitches

Retains memory when powered o↵

Fairly cheap, getting denser

Slower than RAM, faster than HDD

Where can you find Flash memory?

LBSC 690: Jordan Boyd-Graber (UMD) The Web: Moving Data Around the World September 17, 2012 6 / 45

Page 8: University of Maryland - UMIACSjbg/teaching/LBSC_690_2012/lecture_02.pdf · University of Maryland ... /windows/index.html I File extensions ... Also used for “geolocation” (which

Flash

Like RAM, lots of little electronicswitches

Retains memory when powered o↵

Fairly cheap, getting denser

Slower than RAM, faster than HDD

Where can you find Flash memory?

LBSC 690: Jordan Boyd-Graber (UMD) The Web: Moving Data Around the World September 17, 2012 6 / 45

Page 9: University of Maryland - UMIACSjbg/teaching/LBSC_690_2012/lecture_02.pdf · University of Maryland ... /windows/index.html I File extensions ... Also used for “geolocation” (which

Hard Drives

Little magnetic flakes that get spunaround

Retains memory when powered o↵

For consumers, cheapest per MB

Relatively slow

What made the iPod popular (inaddition to its UI)

RAID (Redundant Array of InexpensiveDisks)

I Backup and speedupI Duplicated data across disks so the

head doesn’t have to move as far onaverage

LBSC 690: Jordan Boyd-Graber (UMD) The Web: Moving Data Around the World September 17, 2012 7 / 45

Page 10: University of Maryland - UMIACSjbg/teaching/LBSC_690_2012/lecture_02.pdf · University of Maryland ... /windows/index.html I File extensions ... Also used for “geolocation” (which

Optical

Lasers detect little pits in media

Retains memory when powered o↵

Very cheap to produce

Relatively slow

Can be fairly durable

(With some e↵ort) Rewriteable

LBSC 690: Jordan Boyd-Graber (UMD) The Web: Moving Data Around the World September 17, 2012 8 / 45

Page 11: University of Maryland - UMIACSjbg/teaching/LBSC_690_2012/lecture_02.pdf · University of Maryland ... /windows/index.html I File extensions ... Also used for “geolocation” (which

Cloud

Physical storage doesn’t matter (you can’t see it)

Follows you wherever you go

Requires network access for update

Not as cheap as buying a HD (backup costs?)I Google DocsI DropboxI Mozy

LBSC 690: Jordan Boyd-Graber (UMD) The Web: Moving Data Around the World September 17, 2012 9 / 45

Page 12: University of Maryland - UMIACSjbg/teaching/LBSC_690_2012/lecture_02.pdf · University of Maryland ... /windows/index.html I File extensions ... Also used for “geolocation” (which

Filesystem

How does your computer know where stu↵ is, physically, on your disk?

Examples: ZFS, ReiserFS, NTFS, FAT32, AFS, Ext3

The folder metaphor

I Hierarchically nested directoriesI Absolute vs. relative paths (look out for this!)

F../index.html

Fc:/windows/index.html

I File extensions

Operating systems have their favorite file systems

LBSC 690: Jordan Boyd-Graber (UMD) The Web: Moving Data Around the World September 17, 2012 10 / 45

Page 13: University of Maryland - UMIACSjbg/teaching/LBSC_690_2012/lecture_02.pdf · University of Maryland ... /windows/index.html I File extensions ... Also used for “geolocation” (which

Filesystem

How does your computer know where stu↵ is, physically, on your disk?

Examples: ZFS, ReiserFS, NTFS, FAT32, AFS, Ext3

The folder metaphorI Hierarchically nested directoriesI Absolute vs. relative paths (look out for this!)

F../index.html

Fc:/windows/index.html

I File extensions

Operating systems have their favorite file systems

LBSC 690: Jordan Boyd-Graber (UMD) The Web: Moving Data Around the World September 17, 2012 10 / 45

Page 14: University of Maryland - UMIACSjbg/teaching/LBSC_690_2012/lecture_02.pdf · University of Maryland ... /windows/index.html I File extensions ... Also used for “geolocation” (which

Outline

1 Storage

2 Protocols and the Internet

3 Making a Webpage

4 Discussion

5 Practice Problems

LBSC 690: Jordan Boyd-Graber (UMD) The Web: Moving Data Around the World September 17, 2012 11 / 45

Page 15: University of Maryland - UMIACSjbg/teaching/LBSC_690_2012/lecture_02.pdf · University of Maryland ... /windows/index.html I File extensions ... Also used for “geolocation” (which

The tubes of the Internets

Packet-basedEach transmission is broken up into pieces and routed separately

High network load results in long delays

Circuit-basedFixed connection between caller and called

High network load results in busy signals

LBSC 690: Jordan Boyd-Graber (UMD) The Web: Moving Data Around the World September 17, 2012 12 / 45

Page 16: University of Maryland - UMIACSjbg/teaching/LBSC_690_2012/lecture_02.pdf · University of Maryland ... /windows/index.html I File extensions ... Also used for “geolocation” (which

Packet Switching

Break long messages into short “packets”

Keeps one user from hogging a line

Each packet is tagged with where it’s going

Route each packet separately

Each packet often takes a di↵erent route

Packets often arrive out of order

Receiver must reconstruct original message

Questions:I How do packet-switched networks deal with continuous data?I What happens when packets are lost?

LBSC 690: Jordan Boyd-Graber (UMD) The Web: Moving Data Around the World September 17, 2012 13 / 45

Page 17: University of Maryland - UMIACSjbg/teaching/LBSC_690_2012/lecture_02.pdf · University of Maryland ... /windows/index.html I File extensions ... Also used for “geolocation” (which

Web 6= Internet

Internet = collection of global networks

Web = particular way of accessing information on the Internet

Uses the HTTP protocol

Other ways of using the InternetI UsenetI FTPI email (SMTP, POP, IMAP, etc.)I Internet Relay Chat

LBSC 690: Jordan Boyd-Graber (UMD) The Web: Moving Data Around the World September 17, 2012 14 / 45

Page 18: University of Maryland - UMIACSjbg/teaching/LBSC_690_2012/lecture_02.pdf · University of Maryland ... /windows/index.html I File extensions ... Also used for “geolocation” (which

The Internet is a Collection of Networks

What are Firewalls? Why can’t you do stu↵ behind them?

LBSC 690: Jordan Boyd-Graber (UMD) The Web: Moving Data Around the World September 17, 2012 15 / 45

Page 19: University of Maryland - UMIACSjbg/teaching/LBSC_690_2012/lecture_02.pdf · University of Maryland ... /windows/index.html I File extensions ... Also used for “geolocation” (which

The Internet is a Collection of Networks

VPN = Virtual Private Network

LBSC 690: Jordan Boyd-Graber (UMD) The Web: Moving Data Around the World September 17, 2012 15 / 45

Page 20: University of Maryland - UMIACSjbg/teaching/LBSC_690_2012/lecture_02.pdf · University of Maryland ... /windows/index.html I File extensions ... Also used for “geolocation” (which

The Web is Built on Standards

Basic protocols for the InternetI TCP/IP (Transmission Control Protocol/Internet Protocol): basis for

communicationI DNS (Domain Name Service): basis for naming computers on the

network

Protocol for the WebI HTTP (HyperText Transfer Protocol): protocol for transferring Web

pages

Protocol for E-mailI SMTP, IMAP: broken?

Fprivacy

Fspam

LBSC 690: Jordan Boyd-Graber (UMD) The Web: Moving Data Around the World September 17, 2012 16 / 45

Page 21: University of Maryland - UMIACSjbg/teaching/LBSC_690_2012/lecture_02.pdf · University of Maryland ... /windows/index.html I File extensions ... Also used for “geolocation” (which

IP Address

Every computer on the Internet is identified by a address

IP address = 32 bit number, divided into four “octets”

Example: go in your browser and type “http://128.8.237.26/”

Also used for “geolocation” (which language Google uses, no Hulu forCanadians)

Questions:I What’s the di↵erence between static and dynamic IP?I Are there enough IP addresses to go around?

I Even with 4 billion, things are getting crowded

Not enough IP addresses?

I IPv6 - 128 bits long (5 ⇤ 1028 IP Addresses per person)

I Network Address Translation - Not everybody gets a private IP

LBSC 690: Jordan Boyd-Graber (UMD) The Web: Moving Data Around the World September 17, 2012 17 / 45

Page 22: University of Maryland - UMIACSjbg/teaching/LBSC_690_2012/lecture_02.pdf · University of Maryland ... /windows/index.html I File extensions ... Also used for “geolocation” (which

IP Address

Every computer on the Internet is identified by a address

IP address = 32 bit number, divided into four “octets”

Example: go in your browser and type “http://128.8.237.26/”

Also used for “geolocation” (which language Google uses, no Hulu forCanadians)

Questions:I What’s the di↵erence between static and dynamic IP?I Are there enough IP addresses to go around?I Even with 4 billion, things are getting crowded

Not enough IP addresses?

I IPv6 - 128 bits long (5 ⇤ 1028 IP Addresses per person)

I Network Address Translation - Not everybody gets a private IP

LBSC 690: Jordan Boyd-Graber (UMD) The Web: Moving Data Around the World September 17, 2012 17 / 45

Page 23: University of Maryland - UMIACSjbg/teaching/LBSC_690_2012/lecture_02.pdf · University of Maryland ... /windows/index.html I File extensions ... Also used for “geolocation” (which

IP Address

Every computer on the Internet is identified by a address

IP address = 32 bit number, divided into four “octets”

Example: go in your browser and type “http://128.8.237.26/”

Also used for “geolocation” (which language Google uses, no Hulu forCanadians)

Questions:I What’s the di↵erence between static and dynamic IP?I Are there enough IP addresses to go around?I Even with 4 billion, things are getting crowded

Not enough IP addresses?

I IPv6 - 128 bits long (5 ⇤ 1028 IP Addresses per person)

I Network Address Translation - Not everybody gets a private IP

LBSC 690: Jordan Boyd-Graber (UMD) The Web: Moving Data Around the World September 17, 2012 17 / 45

Page 24: University of Maryland - UMIACSjbg/teaching/LBSC_690_2012/lecture_02.pdf · University of Maryland ... /windows/index.html I File extensions ... Also used for “geolocation” (which

Historical Bias of IPv4

LBSC 690: Jordan Boyd-Graber (UMD) The Web: Moving Data Around the World September 17, 2012 18 / 45

Page 25: University of Maryland - UMIACSjbg/teaching/LBSC_690_2012/lecture_02.pdf · University of Maryland ... /windows/index.html I File extensions ... Also used for “geolocation” (which

IPv6

Written as eight 4-digit hexadecimal numbers (base 16)

Plenty of room!

Harder to write down

e.g. Google: 2001:4860:4860::8888

Some technical advantagesI “ephemeral” addressed for privacyI multicast

LBSC 690: Jordan Boyd-Graber (UMD) The Web: Moving Data Around the World September 17, 2012 19 / 45

Page 26: University of Maryland - UMIACSjbg/teaching/LBSC_690_2012/lecture_02.pdf · University of Maryland ... /windows/index.html I File extensions ... Also used for “geolocation” (which

Hexadecimal

Huh?More when we do HTML colors!

LBSC 690: Jordan Boyd-Graber (UMD) The Web: Moving Data Around the World September 17, 2012 20 / 45

Page 27: University of Maryland - UMIACSjbg/teaching/LBSC_690_2012/lecture_02.pdf · University of Maryland ... /windows/index.html I File extensions ... Also used for “geolocation” (which

Hexadecimal

Huh?More when we do HTML colors!

LBSC 690: Jordan Boyd-Graber (UMD) The Web: Moving Data Around the World September 17, 2012 20 / 45

Page 28: University of Maryland - UMIACSjbg/teaching/LBSC_690_2012/lecture_02.pdf · University of Maryland ... /windows/index.html I File extensions ... Also used for “geolocation” (which

Domain Name Service

“Domain names” improve usabilityI Easier to remember than numeric IP addressesI DNS coverts between names and numbersI Written like a postal address: specific-to-general

Each name server knows one level of namesI “Top level” name server knows .edu, .com, .mil, . . .I .edu name server knows umd, caltech, mit, stanford, princeton, . . .I .umd.edu name server knows ischool, wam, . . .

Recent developmentsI New TLDsI Non-Latin addresses

LBSC 690: Jordan Boyd-Graber (UMD) The Web: Moving Data Around the World September 17, 2012 21 / 45

Page 29: University of Maryland - UMIACSjbg/teaching/LBSC_690_2012/lecture_02.pdf · University of Maryland ... /windows/index.html I File extensions ... Also used for “geolocation” (which

TCP/IP

Transport Control Protocol specifies how data moves across theInternetEach node has address and ports

I Loopback: 127.0.0.1I Local: 10.x.x.x, 192.168.x.x (What does it mean if this is your IP

address?)

A port is a number to channel tra�c20 FTP22 SSH25 SMTP80 HTTP

2710 Bittorrent trackerUses

I Block applicationsI Have computers specialize (e.g. behind NAT)I Security (Firewall only opens port 80)

LBSC 690: Jordan Boyd-Graber (UMD) The Web: Moving Data Around the World September 17, 2012 22 / 45

Page 30: University of Maryland - UMIACSjbg/teaching/LBSC_690_2012/lecture_02.pdf · University of Maryland ... /windows/index.html I File extensions ... Also used for “geolocation” (which

TCP/IP

Transport Control Protocol specifies how data moves across theInternetEach node has address and ports

I Loopback: 127.0.0.1I Local: 10.x.x.x, 192.168.x.x (What does it mean if this is your IP

address?)

A port is a number to channel tra�c20 FTP22 SSH25 SMTP80 HTTP

2710 Bittorrent trackerUses

I Block applicationsI Have computers specialize (e.g. behind NAT)I Security (Firewall only opens port 80)

LBSC 690: Jordan Boyd-Graber (UMD) The Web: Moving Data Around the World September 17, 2012 22 / 45

Page 31: University of Maryland - UMIACSjbg/teaching/LBSC_690_2012/lecture_02.pdf · University of Maryland ... /windows/index.html I File extensions ... Also used for “geolocation” (which

TCP/IP

(Quite simplified) Routing table for 4.8.15.2Destination Next Hop52.55.*.* 63.6.9.1218.1.*.* 192.28.2.5 or 63.6.9.124.*.*.* 225.2.55.1. . .

Can also include

Cost

Quality

Filtering

LBSC 690: Jordan Boyd-Graber (UMD) The Web: Moving Data Around the World September 17, 2012 23 / 45

Page 32: University of Maryland - UMIACSjbg/teaching/LBSC_690_2012/lecture_02.pdf · University of Maryland ... /windows/index.html I File extensions ... Also used for “geolocation” (which

TCP/IP

TCP is how, IP is whatFundamental unit of IP communication is the packetIP Provides support for:

I Missing dataI Repeated arrivalsI Out of order arrivalI Data corruption

LBSC 690: Jordan Boyd-Graber (UMD) The Web: Moving Data Around the World September 17, 2012 24 / 45

Page 33: University of Maryland - UMIACSjbg/teaching/LBSC_690_2012/lecture_02.pdf · University of Maryland ... /windows/index.html I File extensions ... Also used for “geolocation” (which

TCP/IP

IP is just a way of breaking up data

Doesn’t even have to be on computers

Pigeons: 1 hr latency, 55% packet loss

This is why the Internet is in so many places on so many devices

LBSC 690: Jordan Boyd-Graber (UMD) The Web: Moving Data Around the World September 17, 2012 25 / 45

Page 34: University of Maryland - UMIACSjbg/teaching/LBSC_690_2012/lecture_02.pdf · University of Maryland ... /windows/index.html I File extensions ... Also used for “geolocation” (which

Last Mile

Fiber Optics

EthernetI Hub - Everyone talks at once, shuts up if they conflictI Router - There’s a moderator

IEEE 802.11(a/g) (Wireless) - Radio in your building

EDGE (Enhanced Data rates for GSM Evolution) - Radio to yourphone

Takeaway

To improve connectivity, focus on the weakest link. In a crowded dorm,don’t upgrade the T1 if the wireless is saturated. In rural Iowa, don’tinstall fiber optic cable to every room.

LBSC 690: Jordan Boyd-Graber (UMD) The Web: Moving Data Around the World September 17, 2012 26 / 45

Page 35: University of Maryland - UMIACSjbg/teaching/LBSC_690_2012/lecture_02.pdf · University of Maryland ... /windows/index.html I File extensions ... Also used for “geolocation” (which

Outline

1 Storage

2 Protocols and the Internet

3 Making a Webpage

4 Discussion

5 Practice Problems

LBSC 690: Jordan Boyd-Graber (UMD) The Web: Moving Data Around the World September 17, 2012 27 / 45

Page 36: University of Maryland - UMIACSjbg/teaching/LBSC_690_2012/lecture_02.pdf · University of Maryland ... /windows/index.html I File extensions ... Also used for “geolocation” (which

Why Code HTML by Hand?

The only way to learn is by doing

WSIWYG editors . . .I Often generate unreadable codeI Ties you down to that particular editorI Cannot help you connect to backend databases

Hand coding HTML allows you to have finer-grained control

HTML is merely demonstrative of other important concepts:I Structured documentsI Metadata

LBSC 690: Jordan Boyd-Graber (UMD) The Web: Moving Data Around the World September 17, 2012 28 / 45

Page 37: University of Maryland - UMIACSjbg/teaching/LBSC_690_2012/lecture_02.pdf · University of Maryland ... /windows/index.html I File extensions ... Also used for “geolocation” (which

Editing Plaintext

Used to be the norm!

Stu↵ you already have:I Notepad (Windows)I TextEdit (Mac)I pico (Linux)

Good options:I TextWrangler (Mac)I Editpad (Windows)I VI, Emacs, gedit (Linux)

One-to-one correspondence between characters and ASCII written todisk

LBSC 690: Jordan Boyd-Graber (UMD) The Web: Moving Data Around the World September 17, 2012 29 / 45

Page 38: University of Maryland - UMIACSjbg/teaching/LBSC_690_2012/lecture_02.pdf · University of Maryland ... /windows/index.html I File extensions ... Also used for “geolocation” (which

Hello World

LBSC 690: Jordan Boyd-Graber (UMD) The Web: Moving Data Around the World September 17, 2012 30 / 45

Page 39: University of Maryland - UMIACSjbg/teaching/LBSC_690_2012/lecture_02.pdf · University of Maryland ... /windows/index.html I File extensions ... Also used for “geolocation” (which

Hello World

Trivia

Brian Kernighan: engineer at AT&T who helped create UNIX, C, AWK,AMPL, other programming languages. Created an example program thatprinted “hello world” and nothing else to show o↵ C. Now everybody doesit.

LBSC 690: Jordan Boyd-Graber (UMD) The Web: Moving Data Around the World September 17, 2012 31 / 45

Page 40: University of Maryland - UMIACSjbg/teaching/LBSC_690_2012/lecture_02.pdf · University of Maryland ... /windows/index.html I File extensions ... Also used for “geolocation” (which

Tips

Edit files on your own machine, upload when youre happy

Save early, save often, just save!

Reload browser

File namingI Don’t use spaces!I Punctuation matters!

LBSC 690: Jordan Boyd-Graber (UMD) The Web: Moving Data Around the World September 17, 2012 32 / 45

Page 41: University of Maryland - UMIACSjbg/teaching/LBSC_690_2012/lecture_02.pdf · University of Maryland ... /windows/index.html I File extensions ... Also used for “geolocation” (which

Uploading Your Page

Connect to “terpconnect.umd.edu”

Change directory to “public html” (Assignment 0)

Upload files

Your very own home page at:

http://terpconnect.umd.edu/⇠USERID/

LBSC 690: Jordan Boyd-Graber (UMD) The Web: Moving Data Around the World September 17, 2012 33 / 45

Page 42: University of Maryland - UMIACSjbg/teaching/LBSC_690_2012/lecture_02.pdf · University of Maryland ... /windows/index.html I File extensions ... Also used for “geolocation” (which

WinSCP

LBSC 690: Jordan Boyd-Graber (UMD) The Web: Moving Data Around the World September 17, 2012 34 / 45

Page 43: University of Maryland - UMIACSjbg/teaching/LBSC_690_2012/lecture_02.pdf · University of Maryland ... /windows/index.html I File extensions ... Also used for “geolocation” (which

WinSCP

LBSC 690: Jordan Boyd-Graber (UMD) The Web: Moving Data Around the World September 17, 2012 34 / 45

Page 44: University of Maryland - UMIACSjbg/teaching/LBSC_690_2012/lecture_02.pdf · University of Maryland ... /windows/index.html I File extensions ... Also used for “geolocation” (which

WinSCP

LBSC 690: Jordan Boyd-Graber (UMD) The Web: Moving Data Around the World September 17, 2012 34 / 45

Page 45: University of Maryland - UMIACSjbg/teaching/LBSC_690_2012/lecture_02.pdf · University of Maryland ... /windows/index.html I File extensions ... Also used for “geolocation” (which

Fetch

LBSC 690: Jordan Boyd-Graber (UMD) The Web: Moving Data Around the World September 17, 2012 35 / 45

Page 46: University of Maryland - UMIACSjbg/teaching/LBSC_690_2012/lecture_02.pdf · University of Maryland ... /windows/index.html I File extensions ... Also used for “geolocation” (which

Fetch

LBSC 690: Jordan Boyd-Graber (UMD) The Web: Moving Data Around the World September 17, 2012 35 / 45

Page 47: University of Maryland - UMIACSjbg/teaching/LBSC_690_2012/lecture_02.pdf · University of Maryland ... /windows/index.html I File extensions ... Also used for “geolocation” (which

Fetch

LBSC 690: Jordan Boyd-Graber (UMD) The Web: Moving Data Around the World September 17, 2012 35 / 45

Page 48: University of Maryland - UMIACSjbg/teaching/LBSC_690_2012/lecture_02.pdf · University of Maryland ... /windows/index.html I File extensions ... Also used for “geolocation” (which

Outline

1 Storage

2 Protocols and the Internet

3 Making a Webpage

4 Discussion

5 Practice Problems

LBSC 690: Jordan Boyd-Graber (UMD) The Web: Moving Data Around the World September 17, 2012 36 / 45

Page 49: University of Maryland - UMIACSjbg/teaching/LBSC_690_2012/lecture_02.pdf · University of Maryland ... /windows/index.html I File extensions ... Also used for “geolocation” (which

What’s wrong with this picture?

LBSC 690: Jordan Boyd-Graber (UMD) The Web: Moving Data Around the World September 17, 2012 37 / 45

Page 50: University of Maryland - UMIACSjbg/teaching/LBSC_690_2012/lecture_02.pdf · University of Maryland ... /windows/index.html I File extensions ... Also used for “geolocation” (which

This week’s discussion

As part of your schools technology committee, you need to plan the networkinghardware purchases. Describe what hardware components you might need in yourschool to connect all of your classrooms to the school network and the Internet(server, wireless access points, switches, storage, cables etc.). How will youhandle addressing the computers; what use cases would change your decision?Context: Your schools has a special room for your server(s) with the outside T1connection to your Internet Service Provider (ISP); it receives a single static IP.The school is also wired with a single 10Mbs ethernet connector into eachclassroom from the server room. All computers connect to a DHCP server thatgives it a 192.168.1.X address.

LBSC 690: Jordan Boyd-Graber (UMD) The Web: Moving Data Around the World September 17, 2012 38 / 45

Page 51: University of Maryland - UMIACSjbg/teaching/LBSC_690_2012/lecture_02.pdf · University of Maryland ... /windows/index.html I File extensions ... Also used for “geolocation” (which

This week’s discussion

Your vendor wants you to upgrade your wiring. Is it worth it?

A teacher wants to use a classroom computer as a webserver. Whocan see what webpages its serving?

Students are going to be allowed to bring in their personal laptops.How might you change the way your system is set up?

Disney caught one of the computers on your network serving abittorrent of a popular film. How did they know it was your school?How can you prevent this from happening?

LBSC 690: Jordan Boyd-Graber (UMD) The Web: Moving Data Around the World September 17, 2012 39 / 45

Page 52: University of Maryland - UMIACSjbg/teaching/LBSC_690_2012/lecture_02.pdf · University of Maryland ... /windows/index.html I File extensions ... Also used for “geolocation” (which

This week’s discussion

Your vendor wants you to upgrade your wiring. Is it worth it?

A teacher wants to use a classroom computer as a webserver. Whocan see what webpages its serving?

Students are going to be allowed to bring in their personal laptops.How might you change the way your system is set up?

Disney caught one of the computers on your network serving abittorrent of a popular film. How did they know it was your school?How can you prevent this from happening?

LBSC 690: Jordan Boyd-Graber (UMD) The Web: Moving Data Around the World September 17, 2012 39 / 45

Page 53: University of Maryland - UMIACSjbg/teaching/LBSC_690_2012/lecture_02.pdf · University of Maryland ... /windows/index.html I File extensions ... Also used for “geolocation” (which

This week’s discussion

Your vendor wants you to upgrade your wiring. Is it worth it?

A teacher wants to use a classroom computer as a webserver. Whocan see what webpages its serving?

Students are going to be allowed to bring in their personal laptops.How might you change the way your system is set up?

Disney caught one of the computers on your network serving abittorrent of a popular film. How did they know it was your school?How can you prevent this from happening?

LBSC 690: Jordan Boyd-Graber (UMD) The Web: Moving Data Around the World September 17, 2012 39 / 45

Page 54: University of Maryland - UMIACSjbg/teaching/LBSC_690_2012/lecture_02.pdf · University of Maryland ... /windows/index.html I File extensions ... Also used for “geolocation” (which

This week’s discussion

Your vendor wants you to upgrade your wiring. Is it worth it?

A teacher wants to use a classroom computer as a webserver. Whocan see what webpages its serving?

Students are going to be allowed to bring in their personal laptops.How might you change the way your system is set up?

Disney caught one of the computers on your network serving abittorrent of a popular film. How did they know it was your school?How can you prevent this from happening?

LBSC 690: Jordan Boyd-Graber (UMD) The Web: Moving Data Around the World September 17, 2012 39 / 45

Page 55: University of Maryland - UMIACSjbg/teaching/LBSC_690_2012/lecture_02.pdf · University of Maryland ... /windows/index.html I File extensions ... Also used for “geolocation” (which

Outline

1 Storage

2 Protocols and the Internet

3 Making a Webpage

4 Discussion

5 Practice Problems

LBSC 690: Jordan Boyd-Graber (UMD) The Web: Moving Data Around the World September 17, 2012 40 / 45

Page 56: University of Maryland - UMIACSjbg/teaching/LBSC_690_2012/lecture_02.pdf · University of Maryland ... /windows/index.html I File extensions ... Also used for “geolocation” (which

Practice Problems

As a rule of thumb, MP3-encoded sound takes about 1 MB/minute ofstorage. How big a disk would be required to record everything you haveever heard in your life so far in MP3?

30years

1

1440minutes

1day

365.25days

1year

1MB

minute⇡ 16 · 106MB (1)

16 · 106MB106bytes

MB

⇡ 16 · 1012bytes = 16TB (2)

LBSC 690: Jordan Boyd-Graber (UMD) The Web: Moving Data Around the World September 17, 2012 41 / 45

Page 57: University of Maryland - UMIACSjbg/teaching/LBSC_690_2012/lecture_02.pdf · University of Maryland ... /windows/index.html I File extensions ... Also used for “geolocation” (which

Practice Problems

As a rule of thumb, MP3-encoded sound takes about 1 MB/minute ofstorage. How big a disk would be required to record everything you haveever heard in your life so far in MP3?

30years

1

1440minutes

1day

365.25days

1year

1MB

minute⇡ 16 · 106MB (1)

16 · 106MB106bytes

MB

⇡ 16 · 1012bytes = 16TB (2)

LBSC 690: Jordan Boyd-Graber (UMD) The Web: Moving Data Around the World September 17, 2012 41 / 45

Page 58: University of Maryland - UMIACSjbg/teaching/LBSC_690_2012/lecture_02.pdf · University of Maryland ... /windows/index.html I File extensions ... Also used for “geolocation” (which

Practice Problems

As a rule of thumb, MP3-encoded sound takes about 1 MB/minute ofstorage. How big a disk would be required to record everything you haveever heard in your life so far in MP3?

30years

1

1440minutes

1day

365.25days

1year

1MB

minute⇡ 16 · 106MB (1)

16 · 106MB106bytes

MB

⇡ 16 · 1012bytes = 16TB (2)

LBSC 690: Jordan Boyd-Graber (UMD) The Web: Moving Data Around the World September 17, 2012 41 / 45

Page 59: University of Maryland - UMIACSjbg/teaching/LBSC_690_2012/lecture_02.pdf · University of Maryland ... /windows/index.html I File extensions ... Also used for “geolocation” (which

Practice Problems

A New York Times article on 6/9/04 says that it can take “days” todownload a high quality movie over a DSL line. Suppose that the DSL lineis 1 Mbps, and that a standard movie DVD is about 5 GB. How long doesthe download take under these assumptions?

5GB · 1s

Mbit· 10

3MB

GB· 8bitbyte

⇡ 40 · 103s (3)

40 · 103s1hour3600s

⇡ 11hours (4)

LBSC 690: Jordan Boyd-Graber (UMD) The Web: Moving Data Around the World September 17, 2012 42 / 45

Page 60: University of Maryland - UMIACSjbg/teaching/LBSC_690_2012/lecture_02.pdf · University of Maryland ... /windows/index.html I File extensions ... Also used for “geolocation” (which

Practice Problems

A New York Times article on 6/9/04 says that it can take “days” todownload a high quality movie over a DSL line. Suppose that the DSL lineis 1 Mbps, and that a standard movie DVD is about 5 GB. How long doesthe download take under these assumptions?

5GB · 1s

Mbit· 10

3MB

GB· 8bitbyte

⇡ 40 · 103s (3)

40 · 103s1hour3600s

⇡ 11hours (4)

LBSC 690: Jordan Boyd-Graber (UMD) The Web: Moving Data Around the World September 17, 2012 42 / 45

Page 61: University of Maryland - UMIACSjbg/teaching/LBSC_690_2012/lecture_02.pdf · University of Maryland ... /windows/index.html I File extensions ... Also used for “geolocation” (which

Practice Problems

A New York Times article on 6/9/04 says that it can take “days” todownload a high quality movie over a DSL line. Suppose that the DSL lineis 1 Mbps, and that a standard movie DVD is about 5 GB. How long doesthe download take under these assumptions?

5GB · 1s

Mbit· 10

3MB

GB· 8bitbyte

⇡ 40 · 103s (3)

40 · 103s1hour3600s

⇡ 11hours (4)

LBSC 690: Jordan Boyd-Graber (UMD) The Web: Moving Data Around the World September 17, 2012 42 / 45

Page 62: University of Maryland - UMIACSjbg/teaching/LBSC_690_2012/lecture_02.pdf · University of Maryland ... /windows/index.html I File extensions ... Also used for “geolocation” (which

Practice Problems

How many bits are needed to represent monetary values of up to twentydollars to the nearest penny?

If we have n bits, we can represent 2n values. There are a total of 2000pennies in twenty bucks, so we need at least 2000 unique values.Everybody should know that

210 = 1024, (5)

which is too small, so211 = 2048 (6)

should do it.

LBSC 690: Jordan Boyd-Graber (UMD) The Web: Moving Data Around the World September 17, 2012 43 / 45

Page 63: University of Maryland - UMIACSjbg/teaching/LBSC_690_2012/lecture_02.pdf · University of Maryland ... /windows/index.html I File extensions ... Also used for “geolocation” (which

Practice Problems

How many bits are needed to represent monetary values of up to twentydollars to the nearest penny?

If we have n bits, we can represent 2n values. There are a total of 2000pennies in twenty bucks, so we need at least 2000 unique values.Everybody should know that

210 = 1024, (5)

which is too small, so211 = 2048 (6)

should do it.

LBSC 690: Jordan Boyd-Graber (UMD) The Web: Moving Data Around the World September 17, 2012 43 / 45

Page 64: University of Maryland - UMIACSjbg/teaching/LBSC_690_2012/lecture_02.pdf · University of Maryland ... /windows/index.html I File extensions ... Also used for “geolocation” (which

Practice Problems

Compute the number of bits stored per square inch of recording surface fora CD-ROM.

750MB

CD

CD

((120mm)2 � (15mm)2)⇡

645.16mm2

in28bit

byte

220bytes

MB

(7)

LBSC 690: Jordan Boyd-Graber (UMD) The Web: Moving Data Around the World September 17, 2012 44 / 45

Page 65: University of Maryland - UMIACSjbg/teaching/LBSC_690_2012/lecture_02.pdf · University of Maryland ... /windows/index.html I File extensions ... Also used for “geolocation” (which

Practice Problems

Compute the number of bits stored per square inch of recording surface fora CD-ROM.

750MB

CD

CD

((120mm)2 � (15mm)2)⇡

645.16mm2

in28bit

byte

220bytes

MB

(7)

LBSC 690: Jordan Boyd-Graber (UMD) The Web: Moving Data Around the World September 17, 2012 44 / 45

Page 66: University of Maryland - UMIACSjbg/teaching/LBSC_690_2012/lecture_02.pdf · University of Maryland ... /windows/index.html I File extensions ... Also used for “geolocation” (which

Practice Problems

At Google, somewhere they store the satellite views of the earth displayedat maps.google.com. Suppose the finest resolution is 1 meter (that is, theystore one pixel for each 1 meter by 1 meter square of the earth’s surface).How many pixels are there if you ignore compression? To save you a tripto Google, the surface of a sphere is 4⇡r2, and the radius of the earth is6000 kilometers.

1pixel

m2

·✓103m

1km

◆2

· 4⇡(6 · 103km)2 (8)

106pixel

km2

· 450 · 106 ⇡ 4.5 · 1014 (9)

LBSC 690: Jordan Boyd-Graber (UMD) The Web: Moving Data Around the World September 17, 2012 45 / 45

Page 67: University of Maryland - UMIACSjbg/teaching/LBSC_690_2012/lecture_02.pdf · University of Maryland ... /windows/index.html I File extensions ... Also used for “geolocation” (which

Practice Problems

At Google, somewhere they store the satellite views of the earth displayedat maps.google.com. Suppose the finest resolution is 1 meter (that is, theystore one pixel for each 1 meter by 1 meter square of the earth’s surface).How many pixels are there if you ignore compression? To save you a tripto Google, the surface of a sphere is 4⇡r2, and the radius of the earth is6000 kilometers.

1pixel

m2

·✓103m

1km

◆2

· 4⇡(6 · 103km)2 (8)

106pixel

km2

· 450 · 106 ⇡ 4.5 · 1014 (9)

LBSC 690: Jordan Boyd-Graber (UMD) The Web: Moving Data Around the World September 17, 2012 45 / 45