Course on Digital Libraries - nmis.isti.cnr.it · Broke down once a week! ... First Draft of a...

60
Course on Digital Libraries Vittore Casarosa casarosa@isti cnr it casarosa@isti.cnr.it – tel. 050-315 3115 cell 348 397 2168 cell. 348-397 2168 Receiving students on Mondays Final assessment 70% final oral examination – 30% project FUB 2012-2013 Vittore Casarosa – Digital Libraries Part 1 -1

Transcript of Course on Digital Libraries - nmis.isti.cnr.it · Broke down once a week! ... First Draft of a...

Page 1: Course on Digital Libraries - nmis.isti.cnr.it · Broke down once a week! ... First Draft of a report on EDVAC was published in 1945, but just had ... The OSI model, ...

Course on Digital Libraries

Vittore Casarosa– casarosa@isti cnr [email protected]– tel. 050-315 3115

cell 348 397 2168– cell. 348-397 2168 Receiving students on Mondays Final assessment

– 70% final oral examination– 30% project

FUB 2012-2013 Vittore Casarosa – Digital Libraries Part 1 -1

Page 2: Course on Digital Libraries - nmis.isti.cnr.it · Broke down once a week! ... First Draft of a report on EDVAC was published in 1945, but just had ... The OSI model, ...

Outline of the course

Introduction to Digital Libraries (15%) Description of Information (30%) Access to Information (30%)( ) User Services (10%) Additional topics (15%)p ( )

Building of a (small) digital library

Reference material:– Ian Witten, David Bainbridge, David Nichols, How to build a Digital

Library Morgan Kaufmann 2010 ISBN 978 0 12 374857 7Library, Morgan Kaufmann, 2010, ISBN 978-0-12-374857-7(Second edition)

– The Web

FUB 2012-2013 Vittore Casarosa – Digital Libraries Part 1 -2

Page 3: Course on Digital Libraries - nmis.isti.cnr.it · Broke down once a week! ... First Draft of a report on EDVAC was published in 1945, but just had ... The OSI model, ...

Where we come from

The role of libraries– selection– acquisition– description– access– preservation

Early technodreams– Vannevar Bush (1890-1974)– JCR (Joseph Carl Robnett) Licklider (1915-1990)

E l ti f t h l Evolution of technology WWW: the World Wide Web

FUB 2012-2013 Vittore Casarosa – Digital Libraries Part 1 -3

Page 4: Course on Digital Libraries - nmis.isti.cnr.it · Broke down once a week! ... First Draft of a report on EDVAC was published in 1945, but just had ... The OSI model, ...

The role of libraries

Centuries and centuries of history Mediators between information and users Selection

– Definition of collections Acquisition

Physical objects– Physical objects Description

– CatalogsCatalogs Access

– Shelves Preservation

– Controlled enviroment

FUB 2012-2013 Vittore Casarosa – Digital Libraries Part 1 -4

Page 5: Course on Digital Libraries - nmis.isti.cnr.it · Broke down once a week! ... First Draft of a report on EDVAC was published in 1945, but just had ... The OSI model, ...

Libraries: some figures

Volumes (in millions)Institution 1910 1995 2002Library of Congress 1,8 23 26Harvard Univ. 0,8 12,9 14,9

201133

Yale Univ. 0,55 9,5 10,9U Illinois (Urbana) 0,1 8,5 9,9U California (Berkeley) 0,24 8,1 9,4

British Library 2 15 18Cambridge Univ. 0,5 3,5 7

25

Journals

Oxford Univ. 0,8 4,8 6Bibl. Nat. De France 3 11 12

Journals– From 10.000 in 1950 to 150.000 in 2002

Alexandria principle beginning to fade

FUB 2012-2013 Vittore Casarosa – Digital Libraries

p p g g

Part 1 -5

Page 6: Course on Digital Libraries - nmis.isti.cnr.it · Broke down once a week! ... First Draft of a report on EDVAC was published in 1945, but just had ... The OSI model, ...

Where we come from

The role of libraries– selection– acquisition– description– access– preservation

Early technodreams– Vannevar Bush (1890-1974)– JCR (Joseph Carl Robnett) Licklider (1915-1990)

E l ti f t h l Evolution of technology WWW: the World Wide Web

FUB 2012-2013 Vittore Casarosa – Digital Libraries Part 1 -6

Page 7: Course on Digital Libraries - nmis.isti.cnr.it · Broke down once a week! ... First Draft of a report on EDVAC was published in 1945, but just had ... The OSI model, ...

Vannevar Bush(As we may think - 1945)(As we may think - 1945)

Head of US science during WW2g Use of “knowledge” and team work to advance

ScienceScience The Memex: mechanized private archive and

library (microfilms)library (microfilms) “trails” of information

– associative links No “free text” search

FUB 2012-2013 Vittore Casarosa – Digital Libraries Part 1 -7

Page 8: Course on Digital Libraries - nmis.isti.cnr.it · Broke down once a week! ... First Draft of a report on EDVAC was published in 1945, but just had ... The OSI model, ...

JCR Licklider(Libraries of the future - 1965)(Libraries of the future 1965)

Head of US Dept. of Defense, Information Processing T h l iTechnologies

The book foresees the research and development needed to build a Digital Libraryto build a Digital Library– Time-sharing just beginning– “Big” memories around 32Kg– Networking “to be invented”

Rather accurate overall view of what a DL could look like i 1995in 1995 – Under-estimation of computing power

Over estimation of progress in– Over-estimation of progress in • Artificial intelligence• Natural language processing

FUB 2012-2013 Vittore Casarosa – Digital Libraries Part 1 -8

Page 9: Course on Digital Libraries - nmis.isti.cnr.it · Broke down once a week! ... First Draft of a report on EDVAC was published in 1945, but just had ... The OSI model, ...

Where we come from

The role of libraries– selection– acquisition– description– access– preservation

Early technodreams– Vannevar Bush (1890-1974)– JCR (Joseph Carl Robnett) Licklider (1915-1990)

E l ti f t h l Evolution of technology WWW: the World Wide Web

FUB 2012-2013 Vittore Casarosa – Digital Libraries Part 1 -9

Page 10: Course on Digital Libraries - nmis.isti.cnr.it · Broke down once a week! ... First Draft of a report on EDVAC was published in 1945, but just had ... The OSI model, ...

History of computers

Charles Babbageg1791-1871Professor of Mathematics, Cambridge UniversityCambridge University,1827-1839

FUB 2012-2013 Vittore Casarosa – Digital Libraries Part 1 -10

Page 11: Course on Digital Libraries - nmis.isti.cnr.it · Broke down once a week! ... First Draft of a report on EDVAC was published in 1945, but just had ... The OSI model, ...

Babbage’s engines

Difference Engine 1823 Difference Engine 1823

Analytic Engine 1833Th f f d di i l– The forerunner of modern digital computer

ApplicationApplication– Mathematical Tables – Astronomy– Nautical Tables – Navy

Technology– mechanical - gears, Jacquard’s loom, simple

calculatorscalculators

FUB 2012-2013 Vittore Casarosa – Digital Libraries Part 1 -11

Page 12: Course on Digital Libraries - nmis.isti.cnr.it · Broke down once a week! ... First Draft of a report on EDVAC was published in 1945, but just had ... The OSI model, ...

Punched cards

FUB 2012-2013 Vittore Casarosa – Digital Libraries Part 1 -12

Page 13: Course on Digital Libraries - nmis.isti.cnr.it · Broke down once a week! ... First Draft of a report on EDVAC was published in 1945, but just had ... The OSI model, ...

Harvard Mark I

Built in 1944 in IBM Endicott laboratories– Howard Aiken – Professor of Physics at Harvard– Essentially mechanical but had some electro-magnetically

controlled relays and gearscontrolled relays and gears– Weighed 5 tons and had 750,000 components– A synchronizing clock that beat every 0.015 seconds (66KHz)y g y ( )

Performance:Performance:0.3 seconds for addition6 seconds for multiplication1 minute for a sine calculation1 minute for a sine calculation

Broke down once a week!

FUB 2012-2013 Vittore Casarosa – Digital Libraries Part 1 -13

Page 14: Course on Digital Libraries - nmis.isti.cnr.it · Broke down once a week! ... First Draft of a report on EDVAC was published in 1945, but just had ... The OSI model, ...

ENIAC - Electronic Numerical Integrator and ComputerIntegrator and Computer

Eckert and Mauchly designed and built ENIAC (1943-45) at the U i i f P l iUniversity of Pennsylvania

The first, completely electronic, operational, general-purpose analytical calculator!y– 30 tons, 72 square meters, 200KW

Performance– Read in 120 cards per minute– Addition took 200 s, Division 6 ms

1000 ti f t th M k I

WW-2 Effort

– 1000 times faster than Mark I Not very reliable!

Application: Ballistic calc lationsApplication: Ballistic calculations

angle = f (location, tail wind, cross wind, air density, temperature, weight of shell,

FUB 2012-2013 Vittore Casarosa – Digital Libraries

air density, temperature, weight of shell,propellant charge, ... )

Part 1 -14

Page 15: Course on Digital Libraries - nmis.isti.cnr.it · Broke down once a week! ... First Draft of a report on EDVAC was published in 1945, but just had ... The OSI model, ...

Second World War effort

Colossus Mark 1 and Mark 2 were

d i WW2 (iused in WW2 (in London) to decipher secret German messages

FUB 2012-2013 Vittore Casarosa – Digital Libraries Part 1 -15

Page 16: Course on Digital Libraries - nmis.isti.cnr.it · Broke down once a week! ... First Draft of a report on EDVAC was published in 1945, but just had ... The OSI model, ...

Dominant Problem: Reliability

Mean time between failures (MTBF)

MIT’s Whirlwind with an MTBF of 20 min. was MIT s Whirlwind with an MTBF of 20 min. was perhaps the most reliable machine !

Re on fo n eli bilitReasons for unreliability:

1. Vacuum Tubes

2. Storage mediumacoustic delay linesacoustic delay linesmercury delay linesWilliams tubesSelections

FUB 2012-2013 Vittore Casarosa – Digital Libraries Part 1 -16

Page 17: Course on Digital Libraries - nmis.isti.cnr.it · Broke down once a week! ... First Draft of a report on EDVAC was published in 1945, but just had ... The OSI model, ...

EDVAC - Electronic Discrete Variable Automatic ComputerVariable Automatic Computer

ENIAC’s programming system was external– Sequences of instructions were executed independently of the

results of the calculationHuman intervention required to take instructions “out of order”– Human intervention required to take instructions out of order

Eckert, Mauchly, John von Neumann and others designed EDVAC (1944) to solve this problem– Solution was the stored program computer

“program can be manipulated as data”p g p First Draft of a report on EDVAC was published in 1945, but just had

von Neumann’s signature

In 1973 the court of Minneapolis attributed the honor of inventing the computer to John Atanasoff

FUB 2012-2013 Vittore Casarosa – Digital Libraries Part 1 -17

Page 18: Course on Digital Libraries - nmis.isti.cnr.it · Broke down once a week! ... First Draft of a report on EDVAC was published in 1945, but just had ... The OSI model, ...

Von Neumann architecture

Control Unit

CPUCPUCentral

Processing

RAMRandom Access

Unit Memory

I/OInput and Output Devices

FUB 2012-2013 Vittore Casarosa – Digital Libraries Part 1 -18

Page 19: Course on Digital Libraries - nmis.isti.cnr.it · Broke down once a week! ... First Draft of a report on EDVAC was published in 1945, but just had ... The OSI model, ...

Evolution of technology

Computer technologyCPU d i d hi– CPU and integrated chips

– Random Access Memories• RAM – from KB to GB

– External memories• Tapes, hard disks, floppy disks• Memory sticksMemory sticks• CDs• DVDs• from MB to GB to TB to PB to EB• from MB to GB to TB to PB to EB

Communication technology (networks)– (Telephone) line speed– Point to point (leased lines)– Local Area Networks– Inter-networking (TCP/IP)

FUB 2012-2013 Vittore Casarosa – Digital Libraries

Inter networking (TCP/IP)

Part 1 -19

Page 20: Course on Digital Libraries - nmis.isti.cnr.it · Broke down once a week! ... First Draft of a report on EDVAC was published in 1945, but just had ... The OSI model, ...

Size of digital information

FUB 2012-2013 Vittore Casarosa – Digital Libraries Part 1 -20

Page 21: Course on Digital Libraries - nmis.isti.cnr.it · Broke down once a week! ... First Draft of a report on EDVAC was published in 1945, but just had ... The OSI model, ...

Evolution of technology

Computer technologyCPU d i d hi– CPU and integrated chips

– Random Access Memories• RAM – from KB to GB

– External memories• Tapes, hard disks, floppy disks• Memory sticksMemory sticks• CDs• DVDs• from MB to GB to TB to PB to EB• from MB to GB to TB to PB to EB

Communication technology (networks)– (Telephone) line speed– Point to point (leased lines)– Local Area Networks– Inter-networking (TCP/IP)

FUB 2012-2013 Vittore Casarosa – Digital Libraries

Inter networking (TCP/IP)

Part 1 -21

Page 22: Course on Digital Libraries - nmis.isti.cnr.it · Broke down once a week! ... First Draft of a report on EDVAC was published in 1945, but just had ... The OSI model, ...

Early computer communication

From mainframe toFrom mainframe to mainframe through telephone lines (point to point connection)

Telephone lines:slowexpensiveexpensiveregulated

FUB 2012-2013 Vittore Casarosa – Digital Libraries Part 1 -22

Page 23: Course on Digital Libraries - nmis.isti.cnr.it · Broke down once a week! ... First Draft of a report on EDVAC was published in 1945, but just had ... The OSI model, ...

Networking

In the sixties, first studies on “networking” In the sixties, first studies on networking– Networking means communication between node A

and node B through one or more intermediateand node B through one or more intermediate nodes

In the seventies fragmentation of the market with In the seventies, fragmentation of the market with the arrival of “minicomputers” provided further motivation for research on networkingmotivation for research on networking

At the same time (in the seventies), the arrival of the LANs (Local Area Networks) provided the finalthe LANs (Local Area Networks) provided the final impulse for the development of networking

FUB 2012-2013 Vittore Casarosa – Digital Libraries Part 1 -23

Page 24: Course on Digital Libraries - nmis.isti.cnr.it · Broke down once a week! ... First Draft of a report on EDVAC was published in 1945, but just had ... The OSI model, ...

LAN - Local Area Networks

Token ringToken ringPrivate networks Token ringToken ringUp to several kilometersSpeed up to 100 Mb/sec

LAN switchLAN switchEthernetEthernetEther Switch

LAN switchEther Switch

Ether Switch

LAN switchEthernetEthernet

FUB 2012-2013 Vittore Casarosa – Digital Libraries Part 1 -24

Page 25: Course on Digital Libraries - nmis.isti.cnr.it · Broke down once a week! ... First Draft of a report on EDVAC was published in 1945, but just had ... The OSI model, ...

Research on networking

Starting in the late sixties, many research projects on g , y p jnetworking, both from universities and industry– Arpanet, Cyclades, SNA, DECnet

In the late seventies ISO (International Standard Organization), under pressure of a group of computer manufacturer, started the work for the proposal of a “new” communication standard, called OSI: Open System InterconnectionInterconnection

The OSI model, though no longer in use today, has established a number of networking concepts and is stillestablished a number of networking concepts and is still used as a “reference model”

FUB 2012-2013 Vittore Casarosa – Digital Libraries Part 1 -25

Page 26: Course on Digital Libraries - nmis.isti.cnr.it · Broke down once a week! ... First Draft of a report on EDVAC was published in 1945, but just had ... The OSI model, ...

The OSI Model

Protocol:formats and rules for exchanging messages between “partners” (e.g. computers)p )

FUB 2012-2013 Vittore Casarosa – Digital Libraries Part 1 -26

Page 27: Course on Digital Libraries - nmis.isti.cnr.it · Broke down once a week! ... First Draft of a report on EDVAC was published in 1945, but just had ... The OSI model, ...

The seven layers

Layer 7: The application layer...This is the layer at which communication partners are identified, quality of service is identified, user authentication and privacy are considered, and any constraints on data syntax are identified. (This layeridentified, user authentication and privacy are considered, and any constraints on data syntax are identified. (This layer is not the application itself, although some applications may perform application layer functions.)

Layer 6: The presentation layer...This is a layer, usually part of an operating system, that converts incoming and outgoing data from one presentation format to another (for example, from a text stream into a popup window with the newly arrived text). Sometimes called the syntax layer.

Layer 5: The session layer...This layer sets up, coordinates, and terminates conversations, exchanges, and dialogs between the applications at each end. It deals with session and connection coordination.

Layer 4: The transport layer...This layer manages the end-to-end control (for example, determining whether all packets have arrived) and error-checking. It ensures complete data transfer.

Layer 3: The network layer...This layer handles the routing of the data (sending it in the right direction to the right destination on outgoing transmissions and receiving incoming transmissions at the packet level). The network layer does routing and forwarding.

Layer 2: The data link layer This layer provides synchronization for the physical level and does bit stuffing for strings Layer 2: The data-link layer...This layer provides synchronization for the physical level and does bit-stuffing for strings of 1's in excess of 5. It furnishes transmission protocol knowledge and management.

Layer 1: The physical layer...This layer conveys the bit stream through the network at the electrical and mechanical level. It provides the hardware means of sending and receiving data on a carrier.level. It provides the hardware means of sending and receiving data on a carrier.

FUB 2012-2013 Vittore Casarosa – Digital Libraries Part 1 -27

Page 28: Course on Digital Libraries - nmis.isti.cnr.it · Broke down once a week! ... First Draft of a report on EDVAC was published in 1945, but just had ... The OSI model, ...

Layered Protocols

A typical message as it appears on the network.yp g pp

2-2

FUB 2012-2013 Vittore Casarosa – Digital Libraries Part 1 -28

Page 29: Course on Digital Libraries - nmis.isti.cnr.it · Broke down once a week! ... First Draft of a report on EDVAC was published in 1945, but just had ... The OSI model, ...

OSI – Open System InterconnectionInterconnection

FUB 2012-2013 Vittore Casarosa – Digital Libraries Part 1 -29

Page 30: Course on Digital Libraries - nmis.isti.cnr.it · Broke down once a week! ... First Draft of a report on EDVAC was published in 1945, but just had ... The OSI model, ...

Mnemonics for OSI layers

Please Physical

Do Data Link

All Application

People Presentation Do Data Link

Not Network

Th T t

People Presentation

Seem Session

T T t Throw Transport

Sausage Session

To Transport

Need Network

Pizza Presentation

Away Application

Data Data Link

Processing Physical

FUB 2012-2013 Vittore Casarosa – Digital Libraries Part 1 -30

Page 31: Course on Digital Libraries - nmis.isti.cnr.it · Broke down once a week! ... First Draft of a report on EDVAC was published in 1945, but just had ... The OSI model, ...

OSI and Internet

The OSI effort provided a sound and durable foundation for pnetworking, but never became a “market leader”– Slow development

• Initial opposition from IBM• “Designed by a Committee”

E i d l• Expensive development

– Heavy and slow in operationI th i d th I t t d fi i b f In the same period the Internet was defining a number of “light weight” protocols

Most of the market preferred them to OSI Most of the market preferred them to OSI

FUB 2012-2013 Vittore Casarosa – Digital Libraries Part 1 -31

Page 32: Course on Digital Libraries - nmis.isti.cnr.it · Broke down once a week! ... First Draft of a report on EDVAC was published in 1945, but just had ... The OSI model, ...

Internet evolution

ReserachReserachNetwork

NSFNSF

Internet

CommunicationExperimental

NetworkInfrastructure

Private and DARPA

Arpanet

ate a dpublic sectors

The Web

FUB 2012-2013 Vittore Casarosa – Digital Libraries Part 1 -32

Arpanet The Web

Page 33: Course on Digital Libraries - nmis.isti.cnr.it · Broke down once a week! ... First Draft of a report on EDVAC was published in 1945, but just had ... The OSI model, ...

Internet timeline

FUB 2012-2013 Vittore Casarosa – Digital Libraries Part 1 -33

Page 34: Course on Digital Libraries - nmis.isti.cnr.it · Broke down once a week! ... First Draft of a report on EDVAC was published in 1945, but just had ... The OSI model, ...

Inter-networking

Internet is basically a (huge) collection of LANs( g )

FUB 2012-2013 Vittore Casarosa – Digital Libraries Part 1 -34

Page 35: Course on Digital Libraries - nmis.isti.cnr.it · Broke down once a week! ... First Draft of a report on EDVAC was published in 1945, but just had ... The OSI model, ...

Internet and Intranets

The growth of Internet was also due to thewas also due to the adoption of the Internet protocols by private

InternetInternetcompanies

Firewall

Intranet

FUB 2012-2013 Vittore Casarosa – Digital Libraries Part 1 -35

Page 36: Course on Digital Libraries - nmis.isti.cnr.it · Broke down once a week! ... First Draft of a report on EDVAC was published in 1945, but just had ... The OSI model, ...

The Internet layers

FUB 2012-2013 Vittore Casarosa – Digital Libraries Part 1 -36

Page 37: Course on Digital Libraries - nmis.isti.cnr.it · Broke down once a week! ... First Draft of a report on EDVAC was published in 1945, but just had ... The OSI model, ...

OSI and TCP/IP

FUB 2012-2013 Vittore Casarosa – Digital Libraries Part 1 -37

Page 38: Course on Digital Libraries - nmis.isti.cnr.it · Broke down once a week! ... First Draft of a report on EDVAC was published in 1945, but just had ... The OSI model, ...

Internet packets

FUB 2012-2013 Vittore Casarosa – Digital Libraries Part 1 -38

Page 39: Course on Digital Libraries - nmis.isti.cnr.it · Broke down once a week! ... First Draft of a report on EDVAC was published in 1945, but just had ... The OSI model, ...

Internet protocols

CSMA/CD, 802.x, ARP, RARP, etc.IP (IP 4 d IP 6) IP (IPv4 and IPv6)

TCP, UDP DNS, TLS/SSL, FTP, Gopher, HTTP, IMAP,

POP3, SMTP, SNMP, SSH, Telnet, Echo

FUB 2012-2013 Vittore Casarosa – Digital Libraries Part 1 -39

Page 40: Course on Digital Libraries - nmis.isti.cnr.it · Broke down once a week! ... First Draft of a report on EDVAC was published in 1945, but just had ... The OSI model, ...

Internetworking

FUB 2012-2013 Vittore Casarosa – Digital Libraries Part 1 -40

Page 41: Course on Digital Libraries - nmis.isti.cnr.it · Broke down once a week! ... First Draft of a report on EDVAC was published in 1945, but just had ... The OSI model, ...

Routing in Internet

FUB 2012-2013 Vittore Casarosa – Digital Libraries Part 1 -41

Page 42: Course on Digital Libraries - nmis.isti.cnr.it · Broke down once a week! ... First Draft of a report on EDVAC was published in 1945, but just had ... The OSI model, ...

Internet routing

FUB 2012-2013 Vittore Casarosa – Digital Libraries Part 1 -42

Page 43: Course on Digital Libraries - nmis.isti.cnr.it · Broke down once a week! ... First Draft of a report on EDVAC was published in 1945, but just had ... The OSI model, ...

IPv4 addressing

Each node in the Internet is identified by (one or more) IP dd d h IP 4 dd h 32 bi (4 b )address, and each IPv4 address has 32 bits (4 bytes)

An IP address is (was) made of two parts:the network address and the node address within thethe network address and the node address within the network

The boundary between the parts is variable, and isy p ,identified by the “network mask”

The 1s in the mask identify the net portion and the 0s the h t tihost portion

0n31network host

11111111111111110000000000000000

FUB 2012-2013 Vittore Casarosa – Digital Libraries

maskPart 1 -43

Page 44: Course on Digital Libraries - nmis.isti.cnr.it · Broke down once a week! ... First Draft of a report on EDVAC was published in 1945, but just had ... The OSI model, ...

IP addresses

An IP address is usually indicated with four numbers(from 0 to 255) corresponding to the 4 bytes of the address

IP address: 131.114.1.30mask: 255.255.255.0

network address 131.114.1host address 30

Three classes of network addresses(255.0.0.0, 255.255.0.0, 255.255.255.0)

No more IPv4 addresses available today– Network Address Translation (NAT) commonly used

IP 6 (128 bit ) l l l i IP 4 IPv6 (128 bits) slowly replacing IPv4FUB 2012-2013 Vittore Casarosa – Digital Libraries Part 1 -44

Page 45: Course on Digital Libraries - nmis.isti.cnr.it · Broke down once a week! ... First Draft of a report on EDVAC was published in 1945, but just had ... The OSI model, ...

Evolution of computer market

Military applications in early 40s Military applications in early 40s Scientific/research applications in late 40s Commercial applications appear in early 50s Commercial applications appear in early 50s Monopoly of IBM starts with 650, 701, 702 Monopoly of IBM continues with 7070, 7090, starting the p y , , g

“mainframe era” and the “invention” of the byte with the 360 series (in the early 60s) A i l f th “ i i t ” i th 70 Arrival of the “minicomputers” in the 70s

Arrival of the PC in the 80s Arrival of the Internet Arrival of the Internet Arrival of the Web

FUB 2012-2013 Vittore Casarosa – Digital Libraries Part 1 -45

Page 46: Course on Digital Libraries - nmis.isti.cnr.it · Broke down once a week! ... First Draft of a report on EDVAC was published in 1945, but just had ... The OSI model, ...

The World Wide Web

Combination of computer technology and communication p gytechnology

It all started with the “hyperlink” Then came the “browser” (Mosaic) Then came the first wave

Th th “d t d t ” Then came the “dot come, dot gone” Then came the second wave Finally came the “information explosion” Finally came the information explosion

– An estimate of 500 to 1000 million hosts– An estimate of 30 to 50 billion pages on lineAn estimate of 30 to 50 billion pages on line

And now comes Web 2.0 (with Web 3.0 just around the corner)

FUB 2012-2013 Vittore Casarosa – Digital Libraries Part 1 -46

Page 47: Course on Digital Libraries - nmis.isti.cnr.it · Broke down once a week! ... First Draft of a report on EDVAC was published in 1945, but just had ... The OSI model, ...

The editors

Text processing applications started already in the early days of the computers (sixties)

A “text processor” (or editor) has two main functions: i th t t (d l t l i t t )– processing the text (delete, replace, insert, etc.)

– specifying the format (bold, center, new line, etc.) The first editors were using a “mark up” language (i e The first editors were using a mark up language (i.e.

commands intermixed with the text) to provide formatting instructions (only limited interactivity available through ( y y gtypewriter-like terminals)

The “second generation” editors were using the WYSIWYG di Wh t Y S I Wh t Y G t ( h b ttparadigm: What You See Is What You Get (much better

interactivity available with display and mouse)

FUB 2012-2013 Vittore Casarosa – Digital Libraries Part 1 -47

Page 48: Course on Digital Libraries - nmis.isti.cnr.it · Broke down once a week! ... First Draft of a report on EDVAC was published in 1945, but just had ... The OSI model, ...

The hyperlink

The idea of the “hyperlink” was (experimentally) proposed in the i i f f “ di ”sixties, as a feature of a “smart editor”– selecting a portion of the text, it was possible to open a second

document, in addition to the one being edited (very awkward to , g ( yuse on a typewriter-like terminal)

With the arrival of display screens and the mouse (eighties) the hyperlink came back in “3D documents”hyperlink came back in 3D documents– clicking on a portion of the text it was possible to open a second

document, which was maintained as a second (virtual) screen behind the first one

With the arrival of the (fast) internet, it became the “web hyperlink”clicking on a portion of the text it was possible to open a second– clicking on a portion of the text it was possible to open a second document, coming from a different computer

FUB 2012-2013 Vittore Casarosa – Digital Libraries Part 1 -48

Page 49: Course on Digital Libraries - nmis.isti.cnr.it · Broke down once a week! ... First Draft of a report on EDVAC was published in 1945, but just had ... The OSI model, ...

The browser

With the arrival of the (web) hyperlink, the problem wasthen how to properly display a (web) page that had beengenerated on a different computer, possibly with a diff t ( i ) ditdifferent (wysiwyg) editor

The solution was the definition of HTML (Hyper Text Markup Language) i e a standard mark up languageMarkup Language), i.e. a standard mark up language, and the implementation of smart editors (the browser) capable of correctly displaying pages formatted withp y p y g p gHTML, regardless of where they were coming from

FUB 2012-2013 Vittore Casarosa – Digital Libraries Part 1 -49

Page 50: Course on Digital Libraries - nmis.isti.cnr.it · Broke down once a week! ... First Draft of a report on EDVAC was published in 1945, but just had ... The OSI model, ...

The World Wide Web

Combination of computer technology and communication p gytechnology

It all started with the “hyperlink” Then came the “browser” (Mosaic) Then came the first wave

Th th “d t d t ” Then came the “dot come, dot gone” Then came the second wave Finally came the “information explosion” Finally came the information explosion

– An estimate of 500 to 1000 million hosts– An estimate of 30 to 50 billion pages on lineAn estimate of 30 to 50 billion pages on line

And now comes Web 2.0 (with Web 3.0 just around the corner)

FUB 2012-2013 Vittore Casarosa – Digital Libraries Part 1 -50

Page 51: Course on Digital Libraries - nmis.isti.cnr.it · Broke down once a week! ... First Draft of a report on EDVAC was published in 1945, but just had ... The OSI model, ...

Number of hosts

FUB 2012-2013 Vittore Casarosa – Digital Libraries Part 1 -51

Page 52: Course on Digital Libraries - nmis.isti.cnr.it · Broke down once a week! ... First Draft of a report on EDVAC was published in 1945, but just had ... The OSI model, ...

Internet usage (1/3)

WORLD INTERNET USAGE AND POPULATION STATISTICSDecember 31 2011December 31, 2011

World Regions Population( 2011 Est.)

Internet Users

Dec. 31,

Internet Users

L t t D t

Penetration(%

P l ti

Growth2000-2011

Users %of Table( 2011 Est.) Dec. 31,

2000 Latest Data Population)

2011 of Table

Africa 1,037,524,058 4,514,400 139,875,242 13.5 % 2,988.4 % 6.2 %

Asia 3,879,740,877 114,304,000 1,016,799,076 26.2 % 789.6 % 44.8 %

Europe 816,426,346 105,096,093 500,723,686 61.3 % 376.4 % 22.1 %

Middle East 216,258,843 3,284,800 77,020,995 35.6 % 2,244.8 % 3.4 %

North America 347,394,870 108,096,800 273,067,546 78.6 % 152.6 % 12.0 %

Latin America / Carib. 597,283,165 18,068,919 235,819,740 39.5 % 1,205.1 % 10.4 %

Oceania / Australia 35,426,995 7,620,480 23,927,457 67.5 % 214.0 % 1.1 %

FUB 2012-2013 Vittore Casarosa – Digital Libraries Part 1 -52

WORLD TOTAL 6,930,055,154 360,985,492 2,267,233,742 32.7 % 528.1 % 100.0 %

Page 53: Course on Digital Libraries - nmis.isti.cnr.it · Broke down once a week! ... First Draft of a report on EDVAC was published in 1945, but just had ... The OSI model, ...

Internet usage (2/3)

FUB 2012-2013 Vittore Casarosa – Digital Libraries Part 1 -53

Page 54: Course on Digital Libraries - nmis.isti.cnr.it · Broke down once a week! ... First Draft of a report on EDVAC was published in 1945, but just had ... The OSI model, ...

Internet usage (3/3)

FUB 2012-2013 Vittore Casarosa – Digital Libraries Part 1 -54

Page 55: Course on Digital Libraries - nmis.isti.cnr.it · Broke down once a week! ... First Draft of a report on EDVAC was published in 1945, but just had ... The OSI model, ...

Where we come from

The role of libraries– selection– acquisition– description– access– preservation

Early technodreams– Vannevar Bush (1890-1974)– JCR (Joseph Carl Robnett) Licklider (1915-1990)

E l ti f t h l Evolution of technology WWW: the World Wide Web

FUB 2012-2013 Vittore Casarosa – Digital Libraries Part 1 -55

Page 56: Course on Digital Libraries - nmis.isti.cnr.it · Broke down once a week! ... First Draft of a report on EDVAC was published in 1945, but just had ... The OSI model, ...

Where do we want to go ?

Digital Libraries do exist todayg y– Are they a transformation of “traditional libraries”?– Are they an evolution of data bases ?y– Are they (a subset of) the Web ?– Are they useful ?y

DLs are at the intersection of a number of different disciplines/technologiesdisciplines/technologies

A “theory” of Digital Libraries not yet developed Two perspectives Two perspectives

– The Digital Library Curriculum ProjectThe DELOS Reference Model– The DELOS Reference Model

FUB 2012-2013 Vittore Casarosa – Digital Libraries Part 1 -56

Page 57: Course on Digital Libraries - nmis.isti.cnr.it · Broke down once a week! ... First Draft of a report on EDVAC was published in 1945, but just had ... The OSI model, ...

Digital Library Curriculum Project

1 - Overview 2 - Digital Objects 3 - Collection Developmentp 4 - Info/Knowledge organization 5 - Architectures (agents, mediators)( g , ) 6 - User Behavior/Interactions 7 - Services 8 - Preservation 9 - Management and Evaluation9 Management and Evaluation 10 - DL education and research

FUB 2012-2013 Vittore Casarosa – Digital Libraries Part 1 -57

Page 58: Course on Digital Libraries - nmis.isti.cnr.it · Broke down once a week! ... First Draft of a report on EDVAC was published in 1945, but just had ... The OSI model, ...

Topics underlying DLs (1)

1 - Overview– 1-a (10-c): Conceptual frameworks theories definitions– 1-a (10-c): Conceptual frameworks, theories, definitions– 1-b: History of digital libraries and library automation

2 - Digital Objects– 2-a: Text resources– 2-b: Multimedia– 2-c (8-c): File formats, transformation, migration

3 - Collection Development3 C ll i d l / l i li i– 3-a: Collection development/selection policies

– 3-b: Digitization– 3-c: Harvesting– 3-d: Document and e-publishing/presentation markup3 d: Document and e publishing/presentation markup

4 - Info/Knowledge organization– 4-a: Information architecture (e.g., hypertext, hypermedia)– 4-b: Metadata, cataloging, metadata markup, metadata harvesting– 4-c: Ontologies, classification, categorization– 4-d: Subject description, vocabulary control, thesauri, terminologies– 4-e: Object description and organization for a specific domain

FUB 2012-2013 Vittore Casarosa – Digital Libraries Part 1 -58

Page 59: Course on Digital Libraries - nmis.isti.cnr.it · Broke down once a week! ... First Draft of a report on EDVAC was published in 1945, but just had ... The OSI model, ...

Topics underlying DLs (2)

5 - Architectures (agents, mediators)5 A hit t i / d l– 5-a: Architecture overviews/models

– 5-b: Application software– 5-c: Identifiers, handles, DOI, PURL

5 d: Protocols– 5-d: Protocols– 5-e: Interoperability– 5-f: Security

6 - User Behavior/Interactions 6 - User Behavior/Interactions– 6-a: Info needs, relevance– 6-b: Search strategy, info seeking, behavior, user modeling– 6-c: Sharing, networking, interchange (e.g., social)6 c: Sharing, networking, interchange (e.g., social)– 6-d: Interaction design, info summarization and visualization, usability assessment

7 - Services– 7-a: Search engines, IR, indexing methods7 a: Search engines, IR, indexing methods– 7-b: Reference services– 7-c: Recommender systems– 7-d: Routing, community filteringg y g– 7-e: Web publishing (e.g., wiki, rss, Moodle, etc.)

FUB 2012-2013 Vittore Casarosa – Digital Libraries Part 1 -59

Page 60: Course on Digital Libraries - nmis.isti.cnr.it · Broke down once a week! ... First Draft of a report on EDVAC was published in 1945, but just had ... The OSI model, ...

Topics underlying DLs (3)

8 - Preservation8 A h t hi i d it d l t– 8-a: Approaches to archiving and repository development

– 8-b: Sustainability– 8-c (2-c): File formats, transformation, migration

9 Management and Evaluation 9 - Management and Evaluation– 9-a: Project management– 9-b: DL case studies– 9-c: DL evaluation user studies– 9-c: DL evaluation, user studies– 9-d: Bibliometrics, Webometrics– 9-e: Legal issues (e.g., copyright)– 9-f: Cost/economic issues9 f: Cost/economic issues– 9-g: Social issues

10 - DL education and research– 10-a: Future of DLs10 a: Future of DLs– 10-b: Education for digital librarians– 10-c (1-a): Conceptual framework, theories, definitions– 10-d: DL research initiatives

FUB 2012-2013 Vittore Casarosa – Digital Libraries Part 1 -60