Managing and disseminating scientific data and information: A technical discussion

3
Managing and Disseminating Scientific Data and Information: A Technical Discussion Sponsored by SIG/STI Jian Qin, Moderator Bonnie Carroll Information International Associates, Inc. Joe Futrelle National Center for Supercomputing Applications, UIUC Jon Jablonski Knight Library Document Center, University of Oregon Tsering Wangyal Shawa Geosciences and Map Library, Princeton University The management of scientific data and information is becoming increasingly important as the data volume grows rapidly. Scientific data management is affected by many factors in policy, legal, and technical areas. What is involved in scientific data management? What is the status of research and development in this area? How are libraries and information institutions responding to this new area of collections and services? This panel brings experts in this area to address some of these questions. Scientific data and information entail a continuum raging from raw research data to published papers. “Data” include digital observations, scientific monitoring, data from sensors, metadata, model output and scenarios, qualitative or observed behavioral data, visualizations, and statistical data collected for administrative and commercial purposes. “Information” generally refers to conclusions obtained from analysis of data and the results of research. The computational technologies have led to an exponential growth in the volume and types of

Transcript of Managing and disseminating scientific data and information: A technical discussion

Page 1: Managing and disseminating scientific data and information: A technical discussion

Managing and Disseminating Scientific Data andInformation: A Technical Discussion

Sponsored by SIG/STI

Jian Qin, Moderator

Bonnie CarrollInformation International Associates, Inc.

Joe FutrelleNational Center for Supercomputing Applications, UIUC

Jon JablonskiKnight Library Document Center, University of Oregon

Tsering Wangyal ShawaGeosciences and Map Library, Princeton University

The management of scientific data and information is becoming increasingly important as the data volume grows rapidly. Scientific data management is affected by many factors in policy, legal, and technical areas. What is involved in scientific data management? What is the status of research and development in this area? How are libraries and information institutions responding to this new area of collections and services? This panel brings experts in this area to address some of these questions.

Scientific data and information entail a continuum raging from raw research data to publishedpapers. “Data” include digital observations, scientific monitoring, data from sensors, metadata,model output and scenarios, qualitative or observed behavioral data, visualizations, andstatistical data collected for administrative and commercial purposes. “Information” generallyrefers to conclusions obtained from analysis of data and the results of research. Thecomputational technologies have led to an exponential growth in the volume and types of

Page 2: Managing and disseminating scientific data and information: A technical discussion

scientific data and information. Very large databases and data grids have been built around thecountry in the last couple of decades to store research data. We have seen terms such as“terabytes” and “petabytes” being used to describe the very large volume of scientific data andinformation.

The enormous size and complexity of scientific data resulted in management problems. Forexamples, many data repositories created by research projects simply use big flat directories;there is no access to important metadata in scientists’ notebook and heads. There are alsoopportunities, however, in the face of these problems for developing methods and tools foreffective management of scientific data collections so that scientists can reuse, disseminate,and share the data more easily.

The International Council for Science (ICSU) Assessment Panel (2004) recently released areport on the policy and management issues in scientific data and information. The panel’sreport identified the opportunities and challenges, role of public and private sectors, and currentand future research in managing and disseminating scientific data and information. Themanagement areas include creation of logical collections, physical data handling,interoperability support, security support, data ownership, metadata management and access,archiving, persistence, knowledge and information discovery, and data dissemination andpublication. The techniques and methods in library and information science can be applied tomany of these areas to relieve scientists from data management tasks.

As an active member in the ICSU Committee on Data for Science and Technology (CODATA), the US has responded to the demand in scientific data and information management with a series of studies in this area. The National Science Board (NSB) of NSF (2005) produced a report as the result of two workshops participated by experts from relevant communities. The report reviews the role of long-lived digital data collections in democratization of science and education, the rapidly increasing need for digital data collections, and the policies and strategies that are developed to facilitate the management, preservation, and sharing of digital data. Among the recommendations made to the NSF, information management methods and technologies as well as data scientists as a future career path were brought to attention in particular.

This panel brings experts in scientific data management to give an overview of the field and address some of the issues mentioned above. Bonnie Carroll will give an overview of the activities in scientific data management in the United States, which will introduce areas of research and applications in scientific data management and potentials for information professionals and education.

Page 3: Managing and disseminating scientific data and information: A technical discussion

The second speaker Joe Futrelle will present the work in creating standards for data andmetadata models and representations based on the earthquake engineering community’s useof data. The most recent project that Futrelle was involved is NEESGrid, an earthquakeengineering data repository with distributed data nodes across the country. His presentation willfocus on the NEESGrid project and report the steps and procedures taken in the process.

As digital scientific data and information grow, research libraries are facing challenges in how to organize these digital data and provide data services to the research community. Jon Jablonski and Tsering Wangyal Shawa will present the case of geo-spatial data management and services in academic libraries. The past few years have seen the fastest growth in web-based repositories for geo-spatial data. Academic libraries, local, state and federal agencies, and non-governmental organizations have all made data available to users of geographic information systems (GIS). More recently, online mapping applications have become de rigueur, opening up cartographic visualization to those without access to expensive GIS software or specialized training. In mid-2005, Google Earth raised the bar even farther: giving the general public access to a global, seamless representation of the planet connected to Google's vast keyword index of the World Wide Web. Each of these developments has presented challenges and opportunities to the cartographic information community. Among them are massive datasets in a variety of file formats; (unreasonably?) high expectations of the end-user community; rapidly evolving metadata standards, varying levels of compliance to metadata standards. A framework will be presented that the University of Oregon's Map Library is adopting to evaluate existing systems in light of these challenges and opportunities.