Post on 24-Jan-2016
description
1GL11 – December 14-15, 2009
Usage of grey literature in open archives
J. Schöpfel (University of Lille 3)
C. Boukacem-Zeghmouri (University of Lille 3)
H. Prost (INIST-CNRS)
0
1000
2000
3000
4000
5000
6000
7000
2GL11 – December 14-15, 2009
Size of repositories and total number of items
1
10
100
1000
10000
100000
1000000
10000000
1 7
13 19 25 31 37 43 49 55 61 67 73 79 85 91 97
103
109
115
121
127
133
Total number of items (cumulated)
Number of items in archives (ranking)
2008
2009
3GL11 – December 14-15, 2009
Content evolution
1.87m items
= x2,7 since 2008
Representativity 10% (?)
Part of GL unchanged (17%)
But: +200,000 new grey items
Other*, ndArticles
Grey literatureDatasets
* = heritage, books…
4GL11 – December 14-15, 2009
GL document types
Conferences
Reports
ETD
Working papers
Other
Courseware
5GL11 – December 14-15, 2009
Repository type and presence of grey literature
0
10
20
30
40
50
60
70
80
90
Institutional Doc-type Subject-based Other
yes
no
= 74% of all repositories contain GL
(and 93% of IR)
6GL11 – December 14-15, 2009
Size of repository and number of grey items
HAL
HAL SHS
PERSEE
IRD
INRA
(standard scores)0
10000
20000
30000
40000
50000
60000
70000
0 50000 100000 150000 200000 250000 300000
2008
2009HAL
INRA
PERSEEHAL SHS
IRD
TEL
I-Revues
HAL-INRIA
7GL11 – December 14-15, 2009
Quality improvement
0%
10%
20%
30%
40%
50%
60%
70%M
etad
ata
Val
idat
ion
2008
2009
Slightly more archives with specific metadata for grey items
Significant more archives with some kind of content validation and/or quality control
8GL11 – December 14-15, 2009
Access to full text…
53%38%
9%
All items Restricted NA
(+ 5%)
9GL11 – December 14-15, 2009
… but items without fulltext
Half of all open archives contain bibliographic records that don’t link to the document
Part of these records varies from 5 to 90%
Overall part of records without fulltext: 16%
10GL11 – December 14-15, 2009
Usage statistics of GL
0
20
40
60
80
100
120Average downloads per document type
Importance of grey literature: 2,2 (ETD)
University of Toulouse (OATAO)
11GL11 – December 14-15, 2009
Usage statistics of GL
Average downloads per document type
Importance of grey literature:
4,7 - 7 (ETD)
1,4 - 3 (reports)
1,3 (conferences)
IFREMER (Archimer)
010203040506070
12GL11 – December 14-15, 2009
Usage statistics of GL
Importance of grey literature: 1,7 - 5 (working papers)
0
2
4
6
8
10
12
14
16
18
20
1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009
Average downloads per document type
Working papers
Articles
RePEc
13GL11 – December 14-15, 2009
Problems
Cumulative statistics
No history
No details (formats, …)
No specific information on GL
0
1000
2000
3000
4000
5000
6000
7000 HAL
Without metadata, no statistics
14GL11 – December 14-15, 2009
Metadata
On the one hand:
- Difficulties in identifying the types of documents.- Only « published or unpublished » document.
- No count of results.
15GL11 – December 14-15, 2009
Metadata
On the other hand:
- Query by: author’s affiliation scientific department research theme document type keywords
- Choice with date- Choice with full text or not- Ranking of results
16GL11 – December 14-15, 2009
(Figures in parentheses refer to the 7-day period ending 16-Nov-2009 00:00).
Successful requests: 132,810 (6,975) Average successful requests per day: 914 (996) Successful requests for pages: 132,810 (6,975) Average successful requests for pages per day: 914 (996) Failed requests: 84 (0) Distinct files requested: 530 (526) Distinct hosts served: 40,015 (3,609) Corrupt logfile lines: 55 Unwanted logfile entries: 2,743,109 Data transferred: 172.81 gigabytes (9.18 gigabytes) Average data transferred per day: 1.19 gigabytes (1.31 gigabytes)
INP Toulouse
Log analysis (1)
Diversity of tools (Google Analytics / Sitemap, Webalizer Xtended, AWStats, HAL, PhpMyVisite, Analog …)
17GL11 – December 14-15, 2009
Log analysis (2)
reqs: %bytes: last time: file
•2044: 0.40%: 29/Nov/09 22:53: García Martinez (2009) Development and validation of the Euler-Lagrange formulation on a parallel a... •1283: 0.26%: 29/Nov/09 22:53: Delgado Zambrano (2009) Bioréacteur à membrane externe pour le traitement d'effluents contenant de... •1115: 0.30%: 29/Nov/09 22:53: Sepret (2009) Application de la PIV sur traceurs fluorescents à l'étude de l'entraîneme... •1063: 0.21%: 29/Nov/09 22:53: Nerisson (2009) Modélisation du transfert des aérosols dans un local ventilé. •1057: 0.95%: 29/Nov/09 17:34: Delabrouille (2004) Caractérisation par MET de fissures de corrosion sous contrainte d'alliages... •1029: 0.14%: 29/Nov/09 22:53: Rajsiri (2009) Knowledge-based system for collaborative process specification. •1014: 0.79%: 29/Nov/09 22:10: Delay (2005) Analyse des écoulements transitoires dans les systèmes d'injection directe... •984: 0.88%: 29/Nov/09 22:55: Geneau (2006) Procédé d'élaboration d'agromatériau composite naturel par extrusion biv...
INP Toulouse
18GL11 – December 14-15, 2009
Log analysis (3)
Pastel ParisTech
Access to website: search engins, geographical origin, strategies, etc.
On site behaviour: bouncing, downloading, duration, domains, etc.
19GL11 – December 14-15, 2009
Towards standardization:PIRUS (JISC)
Publisher and Institutional Repository Usage Statistics
For authors and institutions
Article level (DOI)
COUNTER compliant
XML prototype
Article Report 1: <title>Number of Successful Full-Text Article Requests by Month and DOI</title>
20GL11 – December 14-15, 2009
Towards standardization: PIRUS 2 (JISC)
COUNTER standards & PIRUS results
Different « Article Reports » (core set of standard usage statistics reports)
Open Source software for production and sharing of usage statistics on article (item) level for OA
Cost analysis
Final report in December 2010
21GL11 – December 14-15, 2009
Towards standardization: OA-Statistik (DINI)
For authors (usage follow-up), readers-scientists (relevance, alert), institutions (impact)
Article level (= document)
Tools for transfer/sharing (network)
Added-value services
22GL11 – December 14-15, 2009
Towards standardization: other websites, projects
LogEc http://logec.repec.org/ Usage statistics of the RePEc repository
IFABC http://www.ifabc.org/ Definition of usage metrics (user, visit…)
SURF http://www.surffoundation.nl/nl/projecten/Pages/SURE.aspx Aggregation of log files
JISC Usage statistics review http://ie-repository.jisc.ac.uk/250/ Proposal of standard
23GL11 – December 14-15, 2009
Recommendations (1)
Recipient: authors, users, institutions
COUNTER principle: different levels, with a basic minimum level (AR1)
Selection of minimum elements for a basic log analysis(who, what, request type, when, identifier)
24GL11 – December 14-15, 2009
Recommendations (2)
Definition of elements and terminology (access, downloading, visit, request, hit…)
Periodicity (monthly) and delay (30 days)
Distinction full text / records
25GL11 – December 14-15, 2009
Recommendations (3)
Added-value services* :Modular statistics (collections, document types, time period
etc.)
Summary tables
Assistance-help / FAQ
Link with other tools measuring the impact of deposited items (citations, tagging etc.)
(…)
* see PLoS http://article-level-metrics.plos.org/
26GL11 – December 14-15, 2009
Forthcoming
2010 IRIS case study (Lille 1)
2010 Final report of DUAO-F project
2010 Study on search engines
??? Partnership with JISC/COUNTER and DINI
??? Project with CCSD and/or COUPERIN
27GL11 – December 14-15, 2009
Thank you!
Joachim SchöpfelUniversity Charles de Gaulle Lille 3
joachim.schopfel@univ-lille3.fr++ (0) 33 688 35 01 47
Chérifa Boukacem-ZeghmouriUniversity Charles de Gaulle Lille 3
boukacemc@yahoo.fr++ (0) 33 620 62 18 12
Hélène ProstINIST-CNRS
helene.prost@inist.fr ++33 (0) 383 50 47 12