1 Standards In A Digital World: Z39.50, HTML, Java: Do They Really Work? Brian Kelly UK Web Focus...

39
1 Standards In A Digital World: Z39.50, HTML, Java: Do They Really Work? Brian Kelly UK Web Focus UKOLN University of Bath [email protected] http://www.ukoln.ac.uk/

Transcript of 1 Standards In A Digital World: Z39.50, HTML, Java: Do They Really Work? Brian Kelly UK Web Focus...

1

Standards In A Digital World:Z39.50, HTML, Java:

Do They Really Work?

Brian Kelly

UK Web Focus

UKOLN

University of Bath

[email protected]

http://www.ukoln.ac.uk/

2

Contents• Introduction• HTML

• Initial Roadmap / The Diversion / Back on Course

• W3C Standardisation Process• Rivals to HTML

• PDF• Viewers

• Scripting• Client-side Scripting Languages• Server side Scripting

• Distributed Searching• Z39.50• Other Protocols

• Conclusions

3

UK Web FocusUK Web Focus:

• National web coordination post for UK HE community

• Based at UKOLN, University of Bath

• Responsibilities include:– Technology watch– Information dissemination in variety of ways:

– Workshops (national, regional)– Presentations at conferences and seminars– Online

– Coordination activities– Representing JISC on W3C

• Brian Kelly appointed on 1st November 1996– Involved with web since January 1993– Previously worked at University of Newcastle, Leeds,

Liverpool, and Loughborough

4

The QuestionWhere do you stand?

The success of the Web is based on competition

in the marketplace.

Just look at the benefits provided by competition between Netscape and

Microsoft.

The success of the Web is based on building on open, non-proprietary

standards.

Use of proprietary systemshas increased costs for

the user, and resulted in flawed systems.

5

HTML Roadmap

HTML 1.0 Gets things started

HTML 2.0 CERN / NCSA partnership introduces NCSA Mosaic with support for forms and inline images

HTML + Proposal for enhancements including improved layout control (e.g. tables), maths, etc.

Style Sheets Mechanism for defining appearance

Structure separate from appearanceVarious proposals (DSSSL, CSS, …)

6

HTML History

HTML 1.0 Unpublished specification. DTD developed by Tim Berners-Lee (CERN).

HTML 2.0 Spec. based on innovations from NCSA (forms and inline images!)

HTML 3.0 Proposed spec. (renamed from HTML+).Very comprehensive Failed to complete IETF standardisation processLittle implementation experience

HTML 3.2 Spec. based on description of mainstream innovations in marketplace

HTML 4.0 Current proposal.

7

HTML Wars

October 1994 Netscape released (Mosaic Communication Corporation)Quality browser, but supported proprietary tags (<BLINK>, <FONT>, etc.)

1995 New versions of Netscape released, supporting additional proprietary tags (<SPACER>, <LAYER>, etc.)

1996 Microsoft respond to competition with their own proprietary tags (<MARQUEE>, etc)

8

HTML Wars - The ProblemsDevice Dependency

• Resources are dependent on a particular browser• Platform dependency

Costs• Costs in supporting authoring tool• Potential costs in re-engineering

Architecture• Proprietary innovations have been flawed:

– Merging content and appearance– Maintenance of resources

• Accessibility problems:– Poor support for access by disabled (e.g. speaking

browsers for visually impaired)

9

End of the Wars?

Microsoft Pledge on HTML Standards "HTML is the most basic and fundamental data format of the Web.

Support for HTML standards ensures that content can be viewed by any browser as the creator intended.

…. agreement on the most basic data format is critical to interoperability and the continued growth of the industry."

Thursday, August 21 1996

See http://www.microsoft.com/internet/html.htm

10

Microsoft Pledge (Cont.)"Previous proprietary HTML extensions from Microsoft and other vendors have confused the market, hampered interoperability and been ill-conceived with respect to [HTML] design principles ...

Microsoft will agree to: Not ship extensions to HTML without first submitting them to

W3C. Implement all W3C approved HTML standards. Clearly identify any not-yet-approved HTML tags we support as

such. Publish a Document Type Definition (DTD) for its browser as

mandated by SGML. Follow the architecture principles of HTML and its parent,

SGML, when proposing new extensions.

Microsoft agrees to hold itself to these standards. Will all the other Web browser vendors, including Netscape, also agree to this conduct of behavior?"

11

HTML 4.0 and CSSHTML 4.0 and CSS will provide an architecturally pure, yet functionally rich environment

HTML 4.0• Improved forms• Hooks for stylesheets• Hooks for scripting

languages• Table enhancements• Better printing

CSS• Support for all HTML

formatting • Positioning of HTML

elements• Support for multiple

media

ProblemsSome problems with CSS are being experienced following:

• Use of CSS features which changed during CSS development

• Browser supported features which changed

ProblemsSome problems with CSS are being experienced following:

• Use of CSS features which changed during CSS development

• Browser supported features which changed

12

W3C ProcessW3C:

• A consortium of subscribing member organisations• Areas of work agreed by

members• Working group set up:

– Charter– WG membership (restricted)

• Initial recommendationsproduced by WG

• Recommendation made public• Feedback on open mailing lists and to editor• Recommendation updated• Members vote

User Interface:• HTML• Style Sheets• Document Object Model• Maths• Graphics• Fonts

User Interface:• HTML• Style Sheets• Document Object Model• Maths• Graphics• Fonts

13

W3C Process

Pros• Work can be well-

focussed• Avoids "flaming"• Battle can take place

in private• Implementation and

development of spec closely linked

Cons• Discussions are closed• Process undemocratic• Only rich companies

can afford to take part• Difficult for non-

members to contribute their expertise

• Non-members may be developing systems in isolation

14

HTML - The Competition

What are the alternatives to HTML ?HTML An SGML DTD

Describes document structureUsed in conjunction with emerging style sheet proposalAgreements on standards emerging

PDF Adobe's Portable Document FormatProvides control over appearanceProprietary

Native file formatStore document in native format, and provide user with reader on client machine

SGML / XMLRicher DTDs

15

PDF

PDF Pros• Control over appearance not (yet) easily available in

HTML• Functionality of PDF Reader can controlled (e.g.

prevent copying, printing with watermarks)

PDF Cons• Does not store document structure• Proprietary

– How would we feel about it if it where owned by Microsoft?

– Remember GIF patent problems!• Printing problems

16

Use of Native File FormatFiles can be stored in their native file format (Word, Powerpoint, LaTeX, DVI, etc.)

Files may then be viewed using the application or a viewer which understands the format

Pros:• No conversion needed

Cons:• Viewing software needed• Format version issues• Indexing issues• Viruses• Proprietary

17

XMLXML:

• Extensible Markup Language• A lightweight SGML designed for network use• Arbitrary elements can be defined (<STUDENT-NUMBER>, <PART-NO>, etc)

• Eliminates problems encountered in extending HTML:– Extension by fiat e.g. <FONT>– Public experiments e.g. the <BLINK> tag– The standards process e.g. Maths

• Agreement achieved quickly• Support from industry (SGML vendors, Microsoft, etc.)

18

XML Support

Microsoft have expressed support for XML:"Internet Explorer version 4.0 will support a few XML applications (such as CDF). Microsoft will be supporting XML in future versions of Internet Explorer" See http://www.microsoft.com/standards/xml-f.htm

Note how they will be supporting an ISO standard!

19

MetadataMetadata - the missing architectural component from the initial implementation of the web

AddressingURL

Data formatHTML

TransportHTTPMetadata

PICS, TCN,

MCF, DSig,

DC,...

20

Metadata Requirements

Imagine a university prospectus on the web

Requirement Protocol

Available in Middle East,where porn filters in use

PICS (rating system)

Resource discovery (find“Bath prospectus”)

DubIin Core

Legally binding assertion Digital Signature(DSig)

Delivered in appropriateformat (HTML, PDF)

Transparent ContentNegotiation

21

Metadata Standards

PICS Agreement within industry (US Communications Decency Act perceived as threat)Format moving to XML in PICS/NG

Dublin Core Pressure from library community results in changes to HTML 4Format likely to move to XML

Digital SignaturesBased on PICS/NG

W3C to set up a Metadata Coordination Group

22

Other XML DevelopmentsXML seems to be gaining momentum:PICS Moving from rating system to key part of

metadata architecture

CDF Channel Definition FormatMicrosoft proposal for push technology

OPS Open Profiling SpecificationMicrosoft proposal

XML Web CollectionsMicrosoft proposal for defining relationships between resource.

MCF using XMLNetscape proposal for describing metadata for collections of resources using XML

CML Chemical Markup Language

MML Math Markup Language

23

Scripting

Background:• Netscape's Javascript (renamed from

Livescript) was first widely-deployed scripting language

• Problems with inter-working between different versions

• Problems with inter-working across browsers (Microsoft and Jscript)

• Problems with use of multiple scripting languages in a document

24

Scripting

Developments:• Javascript handed to standards body (ECMA)

See http://www.ecma.ch/memento/tc39.htm• W3C developing standards for integrating scripting

languages with HTMLSee http://www.w3.org/TR/WD-script

• W3C working on Document Object Model (DOM) " .. a platform- and language-neutral interface that will allow programs and scripts to dynamically access and update the content, structure and style of documents."See http://www.w3.org/MarkUp/DOM/

25

Java

Java:• Development began by Sun in early 1990s

(known as Oak)• Moved to Web and released in 1995• Programming language and virtual machine

environment (provides portability and security)

• See http://java.sun.com/

26

Java ApplicationsJava is gaining momentum:• Interactive applications• Enhanced user interfaces• Replacing conventional

desktop applications• Extending browsers

http://www.mini.co.uk/

27

Java Standardisation

Java developments:• Sun submitting Java to standards body

(ISO/IEC JTC1)• Concerns over process ("Microsoft believes

that .. that Sun wishes to retain full ownership and control over its Java specifications ..")

• See http://java.sun.com/aboutJava/standardization/index.html

28

Distributed Searching - The Problem

End users face difficulties due to the wide variety of search interfaces available

29

Possible Solutions

Agree to use the same software• Unlikely to happen• Undesirable

Agree to use implement similar interfaces• Probably not feasible

Have a centralised database• Scaling problems

Use software which implements protocol designed to provide common search interface across diverse services

• e.g. Z39.50

30

An Applications Solution

Metacrawler can be used to search several large search engines.

Problems:• Breaks if APIs change

• Centralised system

http://www.metacrawler.com/

31

Z39.50 - What Is It?

Z39.50:• A protocol which specifies data structures and

interchange rules that allow a client machine to search databases on a server machine and retrieve records that are identified as a result of the search

• Maintained by Library of Congress• Developed by ZIG

Why is it important?• Powerful searching• Local, familiar interface• Retrieves structured data

32

Z39.50 HistoryZ39.50 (1988)

• NISO work with roots in OSI work

• "an unimplementable abomination which should never have been adopted"

• "Inspired" WAIS (which was not interoperable)

Z39.50 (1992)• Implementation experience• OSI now regarded as failure

Z39.50 (version 3)• Accepted as ISO standard in 1996 ISO (23950)• Implemented using TCP/IP• Toolkits, profiles, etc now available

Taken from Clifford Lynch's article at http://hosted.ukoln.ac.uk/mirrored/lis-journals/dlib/dlib/dlib/april97/04contents.html

33

Z39.50 Pilot

UKOLN is piloting Z39.50 across a number of services (UKOLN web site, BUBL, eLib project database, ...)

Imagine searching across JISC services (and institutions):

Find the chemical XML browser, and relevant reviews & papers.Search HENSA software archive, Mailbase lists, a Chemistry gateway and Imperial college web site

34

Related Protocols

LDAP Lightweight Directory Access ProtocolDerived from X.500 directory service See "Lightweight Directory Access Protocol" http://ds.internic.net/rfc/rfc1777.txt

See also http://www.novell.com/products/nds/ldap.html

http://www.critical-angle.com/ldapworld/Welcome.html

whois++ Derived for whois protocol for finding people (IETF)See "Architecture of the Whois++ Index Service" at the URL http://ds.internic.net/rfc/rfc1913.txt

35

What The Software Companies Say

Netscape (see http://search.netscape.com/newsref/std/standards_qa.html)

• [We will] aggressively support open standards wherever they exist

• Work within the open standards process to innovate valuable new functionality in ways that promote openness and interoperability.

• All current Netscape products implement and support the existing open standards appropriate to their functionality.

Microsoft (see http://premium.microsoft.com/msdn/library/sdkdoc/inetcsdk_2htc.htm)

• Microsoft is fully committed to the HTML standards articulated by the World Wide Web Consortium (W3C) and the international Internet community.

36

Caveat Emptor!Beware of free software - it can be expensive!

Remember Your Music Collection?

7" single Your favourite single12" LP The album containing the hit12" LP Greatest hits

CD When you bought your CD

Record companies are happy to sell you the same information in several formats!

Remember Your Music Collection?

7" single Your favourite single12" LP The album containing the hit12" LP Greatest hits

CD When you bought your CD

Record companies are happy to sell you the same information in several formats!

Is The Same True Of Your Information Systems?

Home-grownGopher The hit of 1992WWW The HTML 2 versionWWW (2) Revamped, based on

Netscapeisms WWW (3) Revamped, based on

HTML 4 and CSSWWW (4) ??

Microsoft and Netscape will be happy to sell you tools to manipulate the same information!

Is The Same True Of Your Information Systems?

Home-grownGopher The hit of 1992WWW The HTML 2 versionWWW (2) Revamped, based on

Netscapeisms WWW (3) Revamped, based on

HTML 4 and CSSWWW (4) ??

Microsoft and Netscape will be happy to sell you tools to manipulate the same information!

37

Conclusions

• Without standards, costs are liable to escalate• Software companies are happy to take our money• OSI networking standard gave standardisation

process a bad name• Current IETF / W3C process of developing

standards and gaining implementation experience is valuable

• Standards are not frozen• The difficult choice may be "What standard?"

38

Further InformationList of Standards Bodies

http://www.yahoo.com/Reference/Standards/http://www.iso.ch/VL/Standards.htmlhttp://www.cmpcmm.com/cc/standards.html

World Wide Web Consortiumhttp://www.w3.org/

IETFhttp://www.ietf.cnri.reston.va.us/home.htmlhttp://info.isoc.org/home.html

ISOhttp://www.iso.ch/welcome.html

ECMAhttp://www.ecma.ch/

ISO-HTMLftp://ftp.cs.tcd.ie/isohtml/

Microsoft and Standardshttp://www.microsoft.com/standards/

Netscape and Standardshttp://search.netscape.com/newsref/std/standards_qa.html

39

On Julius Caesar, Queen Eanfleda, and the lessons from time past1 Dual standards rather than a single standard cause trouble.2 If you must have dual standards, specify mandatory

conversions or interfaces between them.3 Never leave anything implementation-dependent4 If irregularities are unavoidable in a standard (e.g. because of

external constraints), put them where they will do the least damage.

5 Never alter standards to please the rich and powerful, unless the changes can be justified on firm technical grounds.

6 Even the most rich and powerful can be persuaded that they will benefit from changing from their local standard to a general one.

7 The most effective standards are those you take so for granted you don't have to think about them.

8 If provisions of standards are based on external assumptions or constraints unrelated to the purpose of the standard, they are likely to appear irrational.http://www.kcl.ac.uk/kis/support/cc/staff/brian/caesar.html