The Value of Old Data: Trends in GSA Data Repository Usage Matt Hudson, Geological Society of...

1
The Value of Old Data: Trends in GSA Data Repository Usage Matt Hudson, Geological Society of America, 3300 Penrose Place, Boulder CO 80301 AIM The purpose of this study was to determine: • who is using the GSA Data Repository, • what data is being used, and • how data usage compares to journal article usage. INTRODUCTION The Geological Society of America (GSA) Data Repository (DR) was established in 1974 as an open file in which authors of articles in GSA journals and books could place information that supplements and expands on their original papers. While not intended as a true data repository, meaning the data in the DR cannot be searched or manipulated on the online site, its usage provides a glimpse of the value of older data. The online version began in 1996, but only included data back to 1992. In 2004 the online version expanded to include the complete archive, but analysis of DR usage did not become possible until April 2011 when GSA installed Google Analytics. As of October 2014, the DR contains data from more than 4,600 papers. The number of items deposited per year has grown considerably. Prior to the mid-1990s, the Repository received data from <50 papers per year, but that figure has increased to >350 papers per year. This increase in DR items parallels a larger trend. Many government funding agencies, such as the National Science Foundation, U.S. Geological Survey, and the Research Councils UK now have data handling policies. The Registry of Research Data Repositories, an inventory of all data repositories, has now identified more than 900 repositories, and in 2012 Thomson Reuters launched the Data Citation Index, which tracks this growing pool of data. This growing emphasis on data accessibility has raised questions. Who is using data and how is the data being used? How long should data be stored and made available? METHODS GSA began using Google Analytics to track online visits to its DR in 2011. This software provides information about the numbers of visitors, page views, and the technology, location, and behavior of visitors. This usage can then be sorted according to the various pages of a Web site, and since the DR includes the year of the data in all url subdirectories, it is possible to sort this usage by age of the data. The article-usage data was drawn from GeoScienceWorld usage statistics. GSA is one of the founders of GeoScienceWorld, and currently GSA’s journals are hosted at GeoScienceWorld and on the Society’s own site at www.gsapubs.org. GeoScienceWorld is uniquely capable of providing usage statistics for GSA’s journals that reveal the number of full-text article views for each issue of a journal. In this way, the usage per volume can be determined. RESULTS Who is Using GSA Data? The DR receives visitors from more than 120 countries and territories per year. These visitors account for >20,000 sessions per year that produce >38,000 page views. Google Analytics can determine the location of 99% of these users. From 2011 to 2014, the top eight visitor-producing countries remained consistent: United States, China, United Kingdom, Germany, Canada, Australia, Japan, and France. Approximately 40% of all visitors came from the United States. The percent of visitors from most countries remained stable; however, visitors from China increased from 8% in 2011 to 11% in 2014, making it the second highest contributor. CONCLUSIONS The GSA Data Repository receives worldwide usage. Forty percent of visitors come from the United States, and a growing portion of visitors are coming from China, the second highest user. Although 83% of the Data Repository usage is for data produced in the previous 10 years, the archival data continues to receive views long after it has been published. Articles and data less than one year old both account for a quarter of all views, but archival articles receive more interest than archival data, suggesting that over time the original articles are more valuable to readers than the data behind them. For more information: Matt Hudson Geological Society of America 3300 Penrose Place Boulder, CO 80301 303-357-1020 [email protected] How Data Usage Compares to Article Usage One might expect that the usage of GSA’s data would follow similar patterns to the usage of GSA’s journal articles, particularly for a journal like Geology that is of a similar age to the DR. In some respects this is true. The most recent year’s worth of content accounts for 24% of the annual DR usage and 25% of Geology’s article views. Taken as a whole, however, this trend does not continue. Only 65% of Geology’s usage is for papers published in the past 10 years, compared to 83% of the DR usage. This is particularly surprising given that there are a number of vehicles in place to drive readers to the most recent articles, such as e-mail alerts and RSS feeds for each new issue. This indicates that there is greater interest in archival articles than there is in archival data. Not surprisingly, the most recent content for GSA Bulletin, which has an archive dating back to 1890, accounts for an even smaller portion of views. When GSA Bulletin’s archive is divided up into 8 equal segments similar to the structure of the Geology and DR analysis, the most recent two time slices account for only 59% of overall usage, well below Geology and the DR. What Data is Being Viewed? Google Analytics tracks usage only in the form of views. How much of this data is being used in new studies cannot be determined, but analytics can show what data users are interested in viewing. Not surprisingly, the most recent data receives the largest percentage of viewers. Between 2011 and 2014, 83% of the DR usage was for data produced in the past 10 years. In order to take into account that more data has been added to the DR in recent years, and thus is likely to produce more views, the average number of views per item was also examined, which shows that the most recent data receives ~26 views per year whereas older data receives ~5 views per year. The oldest data was eliminated because the low number of items made the usage calculations unreliable. 2014 DR Usage by Country 2011–2014 DR Usage by Age of Data 2013 Geology article usage by age 0–8 yr old 40% 8–16 18% 16–24 12% 24–32 9% 32–40 6% 40–48 8% 48–56 3% 56–64 3% 2013 GSA Bulletin article usage by age 0–5 yr old 50% 5–10 15% 10–15 13% 15–20 8% 20–25 6% 25–30 4% 30–35 2% 35–40 1% 2013 Geology article usage by age 0–5 yr old 63% 5–10 20% 10–15 8% 15–20 3% 20–25 2% 25–30 2% 30–35 1% 35–40 1% 2011–2014 DR Usage by Age of Data 1974 1977 1980 1983 1986 1989 1992 1995 1998 2001 2004 2007 2010 2013 0 100 200 300 400 Items Added to DR Per Year 0 3 6 9 12 15 18 21 24 27 30 33 36 39 0 5 10 15 20 25 30 2011-2014 views per DR item Age of DR item (yr) Views 2014 DR Usage by Country United States China United Kingdom Germany Canada Australia Japan 2011 DR Usage by Country United States United Kingdom China Canada Germany Australia Japan

Transcript of The Value of Old Data: Trends in GSA Data Repository Usage Matt Hudson, Geological Society of...

Page 1: The Value of Old Data: Trends in GSA Data Repository Usage Matt Hudson, Geological Society of America, 3300 Penrose Place, Boulder CO 80301 INTRODUCTION.

The Value of Old Data: Trends in GSA Data Repository UsageMatt Hudson, Geological Society of America, 3300 Penrose Place, Boulder CO 80301

AIM

The purpose of this study was to determine:• who is using the GSA Data Repository,• what data is being used, and• how data usage compares to journal article usage.

INTRODUCTIONThe Geological Society of America (GSA) Data Repository (DR) was established in 1974 as an open file in which authors of articles in GSA journals and books could place information that supplements and expands on their original papers. While not intended as a true data repository, meaning the data in the DR cannot be searched or manipulated on the online site, its usage provides a glimpse of the value of older data. The online version began in 1996, but only included data back to 1992. In 2004 the online version expanded to include the complete archive, but analysis of DR usage did not become possible until April 2011 when GSA installed Google Analytics.

As of October 2014, the DR contains data from more than 4,600 papers. The number of items deposited per year has grown considerably. Prior to the mid-1990s, the Repository received data from <50 papers per year, but that figure has increased to >350 papers per year.

This increase in DR items parallels a larger trend. Many government funding agencies, such as the National Science Foundation, U.S. Geological Survey, and the Research Councils UK now have data handling policies. The Registry of Research Data Repositories, an inventory of all data repositories, has now identified more than 900 repositories, and in 2012 Thomson Reuters launched the Data Citation Index, which tracks this growing pool of data.

This growing emphasis on data accessibility has raised questions. Who is using data and how is the data being used? How long should data be stored and made available?

METHODSGSA began using Google Analytics to track online visits to its DR in 2011. This software provides information about the numbers of visitors, page views, and the technology, location, and behavior of visitors. This usage can then be sorted according to the various pages of a Web site, and since the DR includes the year of the data in all url subdirectories, it is possible to sort this usage by age of the data.

The article-usage data was drawn from GeoScienceWorld usage statistics. GSA is one of the founders of GeoScienceWorld, and currently GSA’s journals are hosted at GeoScienceWorld and on the Society’s own site at www.gsapubs.org. GeoScienceWorld is uniquely capable of providing usage statistics for GSA’s journals that reveal the number of full-text article views for each issue of a journal. In this way, the usage per volume can be determined.

RESULTS

Who is Using GSA Data?

The DR receives visitors from more than 120 countries and territories per year. These visitors account for >20,000 sessions per year that produce >38,000 page views. Google Analytics can determine the location of 99% of these users.

From 2011 to 2014, the top eight visitor-producing countries remained consistent: United States, China, United Kingdom, Germany, Canada, Australia, Japan, and France. Approximately 40% of all visitors came from the United States. The percent of visitors from most countries remained stable; however, visitors from China increased from 8% in 2011 to 11% in 2014, making it the second highest contributor.

CONCLUSIONS

• The GSA Data Repository receives worldwide usage. Forty percent of visitors come from the United States, and a growing portion of visitors are coming from China, the second highest user.

• Although 83% of the Data Repository usage is for data produced in the previous 10 years, the archival data continues to receive views long after it has been published.

• Articles and data less than one year old both account for a quarter of all views, but archival articles receive more interest than archival data, suggesting that over time the original articles are more valuable to readers than the data behind them.

For more information:

Matt HudsonGeological Society of America3300 Penrose PlaceBoulder, CO [email protected]

How Data Usage Compares to Article Usage

One might expect that the usage of GSA’s data would follow similar patterns to the usage of GSA’s journal articles, particularly for a journal like Geology that is of a similar age to the DR. In some respects this is true. The most recent year’s worth of content accounts for 24% of the annual DR usage and 25% of Geology’s article views. Taken as a whole, however, this trend does not continue.

Only 65% of Geology’s usage is for papers published in the past 10 years, compared to 83% of the DR usage. This is particularly surprising given that there are a number of vehicles in place to drive readers to the most recent articles, such as e-mail alerts and RSS feeds for each new issue. This indicates that there is greater interest in archival articles than there is in archival data. Not surprisingly, the most recent content for GSA Bulletin, which has an archive dating back to 1890, accounts for an even smaller portion of views. When GSA Bulletin’s archive is divided up into 8 equal segments similar to the structure of the Geology and DR analysis, the most recent two time slices account for only 59% of overall usage, well below Geology and the DR.

What Data is Being Viewed?

Google Analytics tracks usage only in the form of views. How much of this data is being used in new studies cannot be determined, but analytics can show what data users are interested in viewing.

Not surprisingly, the most recent data receives the largest percentage of viewers. Between 2011 and 2014, 83% of the DR usage was for data produced in the past 10 years.

In order to take into account that more data has been added to the DR in recent years, and thus is likely to produce more views, the average number of views per item was also examined, which shows that the most recent data receives ~26 views per year whereas older data receives ~5 views per year. The oldest data was eliminated because the low number of items made the usage calculations unreliable.

2014 DR Usage by Country

2011–2014 DR Usage by Age of Data

2013 Geology article usage by age

0–8 yr old40%

8–1618%

16–2412%

24–329%

32–406%

40–488%

48–563%

56–643%

2013 GSA Bulletin article usage by age

0–5 yr old50%

5–1015%

10–1513%

15–208%

20–256%

25–304%

30–352%

35–401%

2013 Geology article usage by age

0–5 yr old63%

5–1020%

10–158%

15–203%

20–252%

25–302%

30–351% 35–40

1%

2011–2014 DR Usage by Age of Data

19741976

19781980

19821984

19861988

19901992

19941996

19982000

20022004

20062008

20102012

20140

50

100

150

200

250

300

350

400

Items Added to DR Per Year

0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 380

5

10

15

20

25

30

2011-2014 views per DR item

Age of DR item (yr)

View

s

2014 DR Usage by Country

United StatesChinaUnited KingdomGermanyCanadaAustraliaJapanFranceSwitzerlandItalyOther

2011 DR Usage by Country

United StatesUnited KingdomChinaCanadaGermanyAustraliaJapanFranceItalyTaiwanOther