Croatian Internet serials 1 Croatian Electronic Publishing Results of a survey on e-serials and...

47
1 oatian Internet serials Croatian Electronic Publishing Results of a survey on e-serials and usage of metadata Sofija Klarin, Sonja Pigac, Damir Pavelić [email protected], [email protected], [email protected] National and University Library, Zagreb

Transcript of Croatian Internet serials 1 Croatian Electronic Publishing Results of a survey on e-serials and...

Page 1: Croatian Internet serials 1 Croatian Electronic Publishing Results of a survey on e-serials and usage of metadata Sofija Klarin, Sonja Pigac, Damir Pavelić.

1Cro

atia

n In

tern

et s

eria

ls

Croatian Electronic Publishing

Results of a survey on e-serials

and usage of metadata

Sofija Klarin, Sonja Pigac, Damir Pavelić

[email protected], [email protected], [email protected]

National and University Library, Zagreb

Faculty of Economics, Zagreb

Page 2: Croatian Internet serials 1 Croatian Electronic Publishing Results of a survey on e-serials and usage of metadata Sofija Klarin, Sonja Pigac, Damir Pavelić.

2

Topics

Part 1• Context: facts, presumptions and questions

Part 2• Results of the survey Croatian remote

access e-serials

Part 3

Use of metadata in e-serials, possibilities for Croatia

Page 3: Croatian Internet serials 1 Croatian Electronic Publishing Results of a survey on e-serials and usage of metadata Sofija Klarin, Sonja Pigac, Damir Pavelić.

3

1. Electronic publishing using the Internet

• explosion of publishing activities since 90s raises the problems of searching, retrieval, identification and preservation of electronic documents

• World Wide Web (1995)• Cataloguer-based management

vs.• Author-based management(Koehler)

Page 4: Croatian Internet serials 1 Croatian Electronic Publishing Results of a survey on e-serials and usage of metadata Sofija Klarin, Sonja Pigac, Damir Pavelić.

4

1.1 How big is the Web?

Lawrence & Giles (1999):• 800 million web pages• 15 TB of information• 6 TB of text

BrightPlanet - LexiBot software(2000)• 19 TB - the “surface” Web• 7,500 TB - the “deep” Web

Kulturarw3 project - Sweden• web harvesting• 7.5 million files• 300 GB

Croatia (since 1991)• 8000 .hr domains• types, number of

files?• types of resources?• publishers?

Too big?

Page 5: Croatian Internet serials 1 Croatian Electronic Publishing Results of a survey on e-serials and usage of metadata Sofija Klarin, Sonja Pigac, Damir Pavelić.

5

1.2 Lawrence & Giles (1999):

• 83% of sites contain commercialcontent and 6% contain scientific or educational content in the Web

Valuable

material?

05.08.2000 most visited Croatian sites (Proof)

Page 6: Croatian Internet serials 1 Croatian Electronic Publishing Results of a survey on e-serials and usage of metadata Sofija Klarin, Sonja Pigac, Damir Pavelić.

6

1.3 Persistence of Web documents (Koehler,1999)

• Web pages are unstable– go under change (in a year 99% of web

pages - some degree of change)– disappear – 5% return within a specific period of time

• Two types of change– change of content (20% in a week)– change of structure (20% in a week)

TooToo

ephemeral ?ephemeral ?

Page 7: Croatian Internet serials 1 Croatian Electronic Publishing Results of a survey on e-serials and usage of metadata Sofija Klarin, Sonja Pigac, Damir Pavelić.

7

1.4 Low use of metadata on the WWW

Lawrence & Giles (1999)

• the simple HTML "keywords" and "description" metatags are only used on the homepages of 34% of sites

• only 0.3% of sites use the Dublin Core metadata standard

• who are Web “publishers”? – can they accept standards for management

and interchange of metadata?

Search/retrieval?

Reliability?

Authenticity?

Interchange?

Publishers?

Page 8: Croatian Internet serials 1 Croatian Electronic Publishing Results of a survey on e-serials and usage of metadata Sofija Klarin, Sonja Pigac, Damir Pavelić.

8

1.5 Products of electronic publishing

• local access • hybrid • remote access

resources

• monograph publications

(finite publications.)• continuing

resources?– serials– integrating resources

• data

or/and• programs

• public access• restricted access

• static• dynamic

New types of

resources?

Page 9: Croatian Internet serials 1 Croatian Electronic Publishing Results of a survey on e-serials and usage of metadata Sofija Klarin, Sonja Pigac, Damir Pavelić.

9

2. The survey (January 2000 - April 2001)

• The aim of the survey on e-serials: quantity, categories, persistence, publishers, metadata usage in Croatian web space...

• sample:

- electronic publications which consist of successive parts with numerical or chronological designations

- in Croatian or produced by Croatian publishers, available via WWW

• items excluded:

OPACs or databases, lists/archives, web sites, online services, advertisements

Page 10: Croatian Internet serials 1 Croatian Electronic Publishing Results of a survey on e-serials and usage of metadata Sofija Klarin, Sonja Pigac, Damir Pavelić.

10

2.1 Identification

• Lists, directories, portals, search engines:

CroLinks http://www.crolinks.com

www.hr - News, media, journals

Iskon - Net.hr portal http://www.iskon.hr

Google, Yahoo

• from their print versions• from publishers

Page 11: Croatian Internet serials 1 Croatian Electronic Publishing Results of a survey on e-serials and usage of metadata Sofija Klarin, Sonja Pigac, Damir Pavelić.

11

2.2 Numbers

• Total number: 153

disappeared: 16

changed URL: 12

ceased: 2

changed the title: 1

• NL Denmark - 1069 (2000)• NL Norway - 299 (1999)

Page 12: Croatian Internet serials 1 Croatian Electronic Publishing Results of a survey on e-serials and usage of metadata Sofija Klarin, Sonja Pigac, Damir Pavelić.

12Cro

atia

n In

tern

et s

eria

ls

Religious magazines

2.3 Categories:

Weekly/fortnightly magazines

Scientific journals

Student journalsSerials published by universities,

scientific institutesSerials published by civil services

Serials of unknown type

Newspapers

--------------------------------------------------------------------Sums 153

28

42

9

10

8

14

4

9

Serials published by societies

Serials published by companies

11

18

Journals

Page 13: Croatian Internet serials 1 Croatian Electronic Publishing Results of a survey on e-serials and usage of metadata Sofija Klarin, Sonja Pigac, Damir Pavelić.

13

2.4 Editions: electronic, both electronic and printed

110

42 +

1

e.g. Vjesnik, Večernji list, Slobodna Dalmacija

both electronic and print

e.g. Mountain Bikinig, Morsko prase

electronic only Internet Monitor

print became electronic

Page 14: Croatian Internet serials 1 Croatian Electronic Publishing Results of a survey on e-serials and usage of metadata Sofija Klarin, Sonja Pigac, Damir Pavelić.

14

2.5 Place of publication:

– Zagreb: 115

– Split: 6

– Rijeka: 5

– Osijek, Dubrovnik, Varaždin, Čakovec

Slavonski Brod: 2

– Karlovac, Zadar, Pula Koprivnica, Ičići,

Prelog, Sv. Ivan Zelina, Rovinj, Virovitica 1

– other:(AT) 1

– unknown: 4

Page 15: Croatian Internet serials 1 Croatian Electronic Publishing Results of a survey on e-serials and usage of metadata Sofija Klarin, Sonja Pigac, Damir Pavelić.

15

.hr 82%

2.6 URLs: Croatian domain or …?

www.vjesnik.hr

www.vecernji-list.hr

www.slobodnadalmacija.hr

www.nacional.hr

www.vef.hr/vetarhiv

www.nn.hr/Glasilo/index.htm

www.hi-fi.hr/hgz

wam.hi-fi.hr

www.agr.hr/smotra/index.htm

www.monitor.hr

www.gradst.hr/engmod

www.bug.hretc.

.com 17%A

www.hrvatska.com/glas-

podravine

duhovno-vrelo.com

www.win-ini.com

cyberdream.croadria.com

www.zarez.com

www.hrvatska.com/bilten.html

www.kapital.com

etc.

other 1%www.moravek.net/kla

www.hrvatskenovine.at

C

B

1 item 3 URLs / domains (.hr .com .net) 1 item 2 URLs / domains (.hr .com)

Page 16: Croatian Internet serials 1 Croatian Electronic Publishing Results of a survey on e-serials and usage of metadata Sofija Klarin, Sonja Pigac, Damir Pavelić.

16

2.6.1 Domains, URLs• 28 items have top-level domain name e.g. www.vjesnik.hr, www.morsko-prase.hr

• 12 items changed URL:– 5 from first/second... level domain to top-level domain name

e.g. http://www.hbk.hr/GK/gk.htm http://www.glas-koncila.hr

– 5 internal changes of the site (domain)

e.g. http://www.kdb.hr/projekt/paedro/index.htm

http://www.kdb.hr/paedro/

– 1 .hr .com

– 1 .com .hr

• 16 items disappeared:– 11 .hr 68,75% (total .hr 82%)– 5 .com 31.35% (total .com 17%)

Page 17: Croatian Internet serials 1 Croatian Electronic Publishing Results of a survey on e-serials and usage of metadata Sofija Klarin, Sonja Pigac, Damir Pavelić.

17

2.7 Chronological overview

0

5

10

15

20

25

30

35

‘94 ‘95 ‘96 ‘97 ‘98 ‘99 2000 2001

year titles

1994 2

1995 6

1996 11

1997 21

1998 26

1999 33

2000 24

2001 2

unknown 26

Page 18: Croatian Internet serials 1 Croatian Electronic Publishing Results of a survey on e-serials and usage of metadata Sofija Klarin, Sonja Pigac, Damir Pavelić.

18

2.8 Low metadata use

Croatian e-serials

• HTML metatags

”keywords”

“description”

“author” – 32.8% (September

2000)– 33.3% (April 2001)

• 1 title - DC metadata standard

Lawrence & Giles (1999)• simple HTML metatags

are only used on the homepages of 34% of sites.

• Only 0.3% of sites use the Dublin Core metadata standard.

Page 19: Croatian Internet serials 1 Croatian Electronic Publishing Results of a survey on e-serials and usage of metadata Sofija Klarin, Sonja Pigac, Damir Pavelić.

19

2.9 Metadata<HTML><HEAD><META NAME="GENERATOR" CONTENT="Adobe PageMill 3.0 Win"><META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-2"><TITLE>ACS-AGRICULTURAE CONSPECTUS SCIENTIFICUS</TITLE><LINK REV="made" HREF="mailto:[email protected]"><META NAME="keywords" CONTENT="Croatia, agriculture, science, publication,

agricultural, economics, rural, sociology, plant, pathology, herbology, animal, nutrition, engineering, soil, amelioration, microbiology, dairy, agronomy, breeding, genetics, botany, zoology, crops, fishery, beekeeping, husbandry, forades, grassland, ornamental, ladnscape, architecture, farm, management, enology, viticulture, pomology">

<META NAME="description" CONTENT="On-line Scientific Journal" AGRICULTURAE CONSPECTUS SCIENTIFICUS PUBLISHED BY FACULTY OF AGRICULTURE UNIVERSITY ZAGREB>

<META NAME="copyright" CONTENT="ACS Agriculture Conspectus Scientificus"><META NAME="revisit-after" CONTENT="60 Days"><META NAME="Robot" CONTENT="ALL"><META NAME="DC.Title" CONTENT="ACS-AGRICULTURAE CONSPECTUS

SCIENTIFICUS"><META NAME="DC.Creator" CONTENT="Agriculture Conspectus Scientificus, Faculty

of Agriculture, Zagreb CROATIA"><META NAME="DC.Publisher" CONTENT="Faculty of Agriculture University of

Zagreb"></HEAD>

Page 20: Croatian Internet serials 1 Croatian Electronic Publishing Results of a survey on e-serials and usage of metadata Sofija Klarin, Sonja Pigac, Damir Pavelić.

20

2.10 Metadata questionnaire

• sent in April 2001 by e-mail to 160 e-publishers, editors, webmasters…

• to find out more about their familiarity with metadata, and their intentions to use metadata and cooperate with librarians

• an effort to raise the awareness among publishers of the need for “electronic title page” to be included in their publications

Page 21: Croatian Internet serials 1 Croatian Electronic Publishing Results of a survey on e-serials and usage of metadata Sofija Klarin, Sonja Pigac, Damir Pavelić.

21

Do you know what metadata is?

0

10

20

30

40

50

60

70

80

90

NO YES NO YES

Do you usemetadata?

27 answers representing 32 publications received (17,3% or 20,6%)

6 incorrect statements:4 claim to use metadata (they don’t!)2 claim not to use metadata (they do!)

Page 22: Croatian Internet serials 1 Croatian Electronic Publishing Results of a survey on e-serials and usage of metadata Sofija Klarin, Sonja Pigac, Damir Pavelić.

22

The benefits of metadata • facilitates search and retrieval 69,6%• promotes the company/publ. 56,5%• helps identify the author and the

content of the publication 52,2%• everybody uses metadata 13%• reliability and authenticity of publ. 8,7%• contains copyright information 4,3%

<title> 95,7%

<keywords> 95,7%

<author> 52,2%

<description> 60,9%

<copyright> 21,7%

Page 23: Croatian Internet serials 1 Croatian Electronic Publishing Results of a survey on e-serials and usage of metadata Sofija Klarin, Sonja Pigac, Damir Pavelić.

23

Metadata is created by...

0

5

10

15

20

25

30

35

40

45

webmaster

editor-in-chief

both

25,8% don’t use metadata because they:•know nothing about metadata 50%•don’t have enough time 12,5%•don’t have enough employees 12,5%

Page 24: Croatian Internet serials 1 Croatian Electronic Publishing Results of a survey on e-serials and usage of metadata Sofija Klarin, Sonja Pigac, Damir Pavelić.

24

Meatadata generators? (DC-dot, TagGen, DC assist, EdNA, AHDS, Reggie, Nordic DC metadata generator, SAFARI)

• aware of their existence 11%• not aware 71%

– would like to be informed 100%

Page 25: Croatian Internet serials 1 Croatian Electronic Publishing Results of a survey on e-serials and usage of metadata Sofija Klarin, Sonja Pigac, Damir Pavelić.

25

Metadata is contained in...

• homepage only 26,1%• all pages (same metadata) 17,4%• all pages (different metadata) 47,8%

Page 26: Croatian Internet serials 1 Croatian Electronic Publishing Results of a survey on e-serials and usage of metadata Sofija Klarin, Sonja Pigac, Damir Pavelić.

26

Metadata standardization?1. Have you heard of metadata standardization?2. Which metadata schema do you know of?3. Would a metadata guideline help you?4. Is standardization important for your work?5. Would you like to have standardized metadata in your publ.?

0

1020304050607080

90100

NO YES - dc u o YES NO YES NO YES NO

Page 27: Croatian Internet serials 1 Croatian Electronic Publishing Results of a survey on e-serials and usage of metadata Sofija Klarin, Sonja Pigac, Damir Pavelić.

27

Could librarians help you?

0

10

20

30

40

50

60

70

80

90

YES

NO

• librarians work on standardization of bibl. description 48%

• I’d appreciate any help 44%• librarians describe print publ. 32%• librarians work on standardization of

metadata 12%• we are already familiar with library activities

(ISBN,ISSN,CIP…) 24%

• librarians don’t know much about the Web 50%

• webmasters should do that44%

• can do it by myself25%

Page 28: Croatian Internet serials 1 Croatian Electronic Publishing Results of a survey on e-serials and usage of metadata Sofija Klarin, Sonja Pigac, Damir Pavelić.

28

E-journals available through the library WebPAC?

YES 93,8%• it’s useful information for users 75%• it’s important to treat both print and

e-publ. in the same way 75%• it’s useful for publishers 46,4%

NO 6,3%• people prefer to use search

engines • web publications often change

their URLs - “I’m not sure librarians should catalogue them”

Page 29: Croatian Internet serials 1 Croatian Electronic Publishing Results of a survey on e-serials and usage of metadata Sofija Klarin, Sonja Pigac, Damir Pavelić.

29

Dublin Core Metadata Initiative survey

From Feb. 20th to March 9th, 2001.

The purpose of the questionnaire was to help achieve some of the DC Libraries Working Group’s objectives for 2001, including: (1) to collect and share examples of Dublin Core use in libraries and (2) to stimulate discussion that will feed into the process of drafting an application profile for the use of Dublin Core in libraries

DC-General and DC-Libraries lists, CORC Users List, and The Alberta Library Metadata List

29 responses from 9 countires

Most used: creator, publisher, title, rights, type, identifier, format, description

Low use of qualifiers

http://dublincore.org

Page 30: Croatian Internet serials 1 Croatian Electronic Publishing Results of a survey on e-serials and usage of metadata Sofija Klarin, Sonja Pigac, Damir Pavelić.

30

3. Use of metadata in e- serials and possibilities in Croatia

E-serials

- digital / hybrid libraries

- databases (publishers, vendors)

cooperation (BIBLINK) hosted.ukoln.ac.uk/biblink

- separately (web pages)

Page 31: Croatian Internet serials 1 Croatian Electronic Publishing Results of a survey on e-serials and usage of metadata Sofija Klarin, Sonja Pigac, Damir Pavelić.

31

3.1. Using metadata

1. Inside the document – HTML (XML)

<head> metadata </head><body> document described above </body)

2. Separate file

- metadata records + links to e-serials (bibliography, similar serials…)

- file containing metadata – link from web page

with no metadata in the <header> (DC web page)

Page 32: Croatian Internet serials 1 Croatian Electronic Publishing Results of a survey on e-serials and usage of metadata Sofija Klarin, Sonja Pigac, Damir Pavelić.

32

3.2 Metadata schemes

- before Internet and electronic publications (cataloguing, exchange – MARC, GILS, CIMI)

- development of Internet (searching, cataloguing, exchange)

Qualified Dublin Core (dublincore.org)

- translations versions (21 language)- no Croatian but translation is finished

Page 33: Croatian Internet serials 1 Croatian Electronic Publishing Results of a survey on e-serials and usage of metadata Sofija Klarin, Sonja Pigac, Damir Pavelić.

33

3.3. Creation & conversion tools

- Creating metadata (templates)Nordic DC metadata creator (including URN generator)

(choice of controlled vocabularies, classification, date format, identifier)

- Creation / change of templatesReggie, Mantis (OCLC) HotMETA (search DC)

- Automatic extraction / gathering from HTML (enter URL)

DC-dot (results in DC, RDF, XHTML - aditional

corrections possible)

Donor metatagenerator (similar to Nordic DC)

Page 34: Croatian Internet serials 1 Croatian Electronic Publishing Results of a survey on e-serials and usage of metadata Sofija Klarin, Sonja Pigac, Damir Pavelić.

34

- Automatic production

Klarity (automatically generates metadata based on concepts found in text)

Scorpion (automatic classification to DDC)

- Commercial software

TagGen Dublin Core edition (number of schemes and possibilities)

Metabrowser (shows Metadata and Web Pages simultaneously)

http://dublincore.org/tools

3.3. Creation & conversion tools

Page 35: Croatian Internet serials 1 Croatian Electronic Publishing Results of a survey on e-serials and usage of metadata Sofija Klarin, Sonja Pigac, Damir Pavelić.

35

DC-dot - ( http://www.agr.hr/smotra )

3.3. Creation & conversion tools

Page 36: Croatian Internet serials 1 Croatian Electronic Publishing Results of a survey on e-serials and usage of metadata Sofija Klarin, Sonja Pigac, Damir Pavelić.

36

Donor - ( http://www.agr.hr/smotra )

3.3. Creation & conversion tools

Page 37: Croatian Internet serials 1 Croatian Electronic Publishing Results of a survey on e-serials and usage of metadata Sofija Klarin, Sonja Pigac, Damir Pavelić.

37

3.3. Creation & conversion tools

Metabrowser – “Metabrowser is a web browser that catalogues web pages using schemas such as Dublin Core, GILS, AGLS. Metabrowser allows metadata to be added to web pages accessible from a local or network drive or sent to an external system such as a database or firewalled web server”

Page 38: Croatian Internet serials 1 Croatian Electronic Publishing Results of a survey on e-serials and usage of metadata Sofija Klarin, Sonja Pigac, Damir Pavelić.

38

3.3. Creation & conversion tools

Conversion:

- DC -> MARC (Dan, Fin, Is, Nor, Swe, US)

Nordic Metadata Project: DC to MARC converter

(www.bibsys.no/mete/d2m)

- Crosswalks: DC, MARC, MARC21, EAD, GILS,ISAD, FGDC

(www.ukoln.ac.uk/metadata/interoperability)

Page 39: Croatian Internet serials 1 Croatian Electronic Publishing Results of a survey on e-serials and usage of metadata Sofija Klarin, Sonja Pigac, Damir Pavelić.

39

3.3. Creation & conversion tools

Nordic metadata project: DC to MARC converter

008010508s 245 $a ACS-AGRICULTURAE CONSPECTUS SCIENTIFICUS260 $b Faculty of Agriculture University of Zagreb

856 $u http://www.agr.hr/smotra

Page 40: Croatian Internet serials 1 Croatian Electronic Publishing Results of a survey on e-serials and usage of metadata Sofija Klarin, Sonja Pigac, Damir Pavelić.

40

Conversion MARC -> XML -> MARC

( www.logos.com/marc)

( www.culture.fr/BiblioML) - additional applications needed

3.3. Creation & conversion tools

Page 41: Croatian Internet serials 1 Croatian Electronic Publishing Results of a survey on e-serials and usage of metadata Sofija Klarin, Sonja Pigac, Damir Pavelić.

41

3.4. Which model / scheme ?

- company / organization needs - connection and cooperation with other

companies / organizations - budget - standardization - softver and upgrading possibilities - exchange of data / records

LibrariesPublishersVendors

different needs and aims

Page 42: Croatian Internet serials 1 Croatian Electronic Publishing Results of a survey on e-serials and usage of metadata Sofija Klarin, Sonja Pigac, Damir Pavelić.

42

Libraries - bibliographic control, - up-to-date record collections

(users benefit), - exchange

Publishers - timely, accurate and full exposure of their products and services,

- search and retrieval – benefit users and publisher,

- standardized record in databases for possible exchange and profit

Cooperation !

3.4.1 Choose scheme and strategy - Croatia

Page 43: Croatian Internet serials 1 Croatian Electronic Publishing Results of a survey on e-serials and usage of metadata Sofija Klarin, Sonja Pigac, Damir Pavelić.

43

Use knowledge and experience from foreign projects:

BiblinkCORC (Cooperative online resources cataloguing)DONOR (Directory of Netherlands online resources)

- Inform publishers of standards and possibilities (survey)

- Point out necessity of standardization and use of one primary (major) scheme (Dublin Core ?)

- Show them how to use free web-available tools

3.4.1 Choose scheme and strategy - Croatia

Page 44: Croatian Internet serials 1 Croatian Electronic Publishing Results of a survey on e-serials and usage of metadata Sofija Klarin, Sonja Pigac, Damir Pavelić.

44

3.5 DC – RDF - XML

Dublin Core is enough for basic description (qualified) – serves our needs for the beginning

RDF (Resource Description Framework) is about to become standard (semantic web)

XML (eXtended Markup Language) is already growing standard (strucure, exchange, e-business, internal control…)

Page 45: Croatian Internet serials 1 Croatian Electronic Publishing Results of a survey on e-serials and usage of metadata Sofija Klarin, Sonja Pigac, Damir Pavelić.

45

RDF - development is still in process but…

Many projects and tools exist (creation, conversion)

Constant work, often non-commercial (learn & use)

Croatia - use same metadata scheme (DC?) enriched with internal metadata scheme if needed (for publishers use)

- embed it into HTML documents

- convert to RDF-XML eventualy

3.5 DC – RDF - XML

Page 46: Croatian Internet serials 1 Croatian Electronic Publishing Results of a survey on e-serials and usage of metadata Sofija Klarin, Sonja Pigac, Damir Pavelić.

46

3.6. Conclusion

Low use of any metadata scheme opens possibility to adopt one primary scheme (DC?) and emerging standard (RDF?)

Concentrate on the start and strategy, use experience from others

Build environment to help publishers (similar to Biblink)

Cooperation among libraries and publishers is essential

Page 47: Croatian Internet serials 1 Croatian Electronic Publishing Results of a survey on e-serials and usage of metadata Sofija Klarin, Sonja Pigac, Damir Pavelić.

47

3.7 Links

http://dublincore.org www.ifla.org

www.ukoln.ac.uk www.w3c.org

www.editeur.org www.xml.com

www.logos.com/marc www.culture.fr/BiblioML