Microsoft Azureweb.stanford.edu/group/dlss/pasig/PASIG_September2014/...Microsoft Azure for Research...

15

Transcript of Microsoft Azureweb.stanford.edu/group/dlss/pasig/PASIG_September2014/...Microsoft Azure for Research...

Page 1: Microsoft Azureweb.stanford.edu/group/dlss/pasig/PASIG_September2014/...Microsoft Azure for Research Group @azure4research Azure Research Awards (>320 to date) Training and Webinar
Page 3: Microsoft Azureweb.stanford.edu/group/dlss/pasig/PASIG_September2014/...Microsoft Azure for Research Group @azure4research Azure Research Awards (>320 to date) Training and Webinar
Page 4: Microsoft Azureweb.stanford.edu/group/dlss/pasig/PASIG_September2014/...Microsoft Azure for Research Group @azure4research Azure Research Awards (>320 to date) Training and Webinar

Microsoft Azure for Research Group

@azure4research

Azure Research Awards (>320 to date) Training and Webinar series Technical papers & curriculum Research community engagements

www.azure4research.com

Page 5: Microsoft Azureweb.stanford.edu/group/dlss/pasig/PASIG_September2014/...Microsoft Azure for Research Group @azure4research Azure Research Awards (>320 to date) Training and Webinar

Preservation and Long-term Access through NETworked Services

• Ensure long-term access to Europe’s

cultural and scientific heritage − Improve decision-making about long term

preservation − Ensure long-term access to valued digital

content − Control the costs through automation,

scalable infrastructure − Ensure wide adoption across the user

community − Establish market place for preservation

services and tools

• Build practical solutions − Integrate existing expertise, designs and tools − Share and build

Page 6: Microsoft Azureweb.stanford.edu/group/dlss/pasig/PASIG_September2014/...Microsoft Azure for Research Group @azure4research Azure Research Awards (>320 to date) Training and Webinar

SCAPE

• Develop scalable services for planning and execution of preservation strategies

• Open source platform for semi-automated workflows for large-scale, heterogeneous collections of complex digital objects.

Vorführender
Präsentationsnotizen
Characterisation: describing the significant properties of digital objects in a format neutral manner Planning: It will enable organisations to build preservation plans appropriate to their preservation objectives, the characteristics of their collections, and the preservation actions that are available Action: A non-destructive action that creates new data from existing data in the archive, with the intent of preserving or increasing access to information stored in the archive. Testbed: The general goal of the Testbed subproject is to provide a dedicated research environment that allows the systematic execution of experiments by distributed actors, enabling the automated evaluation of experiment results, the reproducibility of experiments, the long-term availability of structured experiment documentation and shared access to the experiments themselves
Page 7: Microsoft Azureweb.stanford.edu/group/dlss/pasig/PASIG_September2014/...Microsoft Azure for Research Group @azure4research Azure Research Awards (>320 to date) Training and Webinar

AIT Austrian Institute of Technology GmbH

The British Library

Internet Memory Foundation

Ex Libris Ltd.

Fachinformationszentrum Karlsruhe, Gesellschaft für Wissenschaftlich-Technische Information GmbH

Koninklijke Bibliotheek

KEEP SOLUTIONS LDA

Microsoft Research

Österreichische Nationalbibliothek

Open Planets Foundation

Statsbiblioteket

Science and Technologies Facilities Council

Technische Universität Berlin

Technische Universität Wien

The University of Manchester

Universite Pierre et Marie Curie – Paris 6

Vorführender
Präsentationsnotizen
Characterisation: describing the significant properties of digital objects in a format neutral manner Planning: It will enable organisations to build preservation plans appropriate to their preservation objectives, the characteristics of their collections, and the preservation actions that are available Action: A non-destructive action that creates new data from existing data in the archive, with the intent of preserving or increasing access to information stored in the archive. Testbed: The general goal of the Testbed subproject is to provide a dedicated research environment that allows the systematic execution of experiments by distributed actors, enabling the automated evaluation of experiment results, the reproducibility of experiments, the long-term availability of structured experiment documentation and shared access to the experiments themselves
Page 8: Microsoft Azureweb.stanford.edu/group/dlss/pasig/PASIG_September2014/...Microsoft Azure for Research Group @azure4research Azure Research Awards (>320 to date) Training and Webinar

Target formats

• OpenXML • ODF • UOF • HTML • XCDL (format

defined in PLANETS)

• WordPerfect 5 • WordPerfect 6 • DOS Word • Word 2, 6, 95 • Word 97-2003 • RTF • ODF • OpenXML

Source formats

Page 9: Microsoft Azureweb.stanford.edu/group/dlss/pasig/PASIG_September2014/...Microsoft Azure for Research Group @azure4research Azure Research Awards (>320 to date) Training and Webinar

Select documents for conversion Format identification Select converters

o Manual converter selection o Automatic converter selection

Start Conversion

Landing page Portal user/visitor External links Login

Authentication

Conversion

Ingest documents o Individual

documents o Collections

Manage collection

Ingest

Select document(s) for comparison

Select comparison operator View visual representation of

comparison

Quality Assurance (Comparison)

Analyse ingest data Analyse conversion data Analyse comparison data Generate report/log Select report/log for viewing

Reporting and analysis

Vorführender
Präsentationsnotizen
Page 10: Microsoft Azureweb.stanford.edu/group/dlss/pasig/PASIG_September2014/...Microsoft Azure for Research Group @azure4research Azure Research Awards (>320 to date) Training and Webinar
Page 12: Microsoft Azureweb.stanford.edu/group/dlss/pasig/PASIG_September2014/...Microsoft Azure for Research Group @azure4research Azure Research Awards (>320 to date) Training and Webinar

Comparison .DOCX

Format transformation

Page 13: Microsoft Azureweb.stanford.edu/group/dlss/pasig/PASIG_September2014/...Microsoft Azure for Research Group @azure4research Azure Research Awards (>320 to date) Training and Webinar

Comparison .DOCX

Format transformation

13 13

.DOCX

Open Office MS Word

OCR Processing

Feature extraction / comparison

.ODT

Screen Print – XPS

Page 14: Microsoft Azureweb.stanford.edu/group/dlss/pasig/PASIG_September2014/...Microsoft Azure for Research Group @azure4research Azure Research Awards (>320 to date) Training and Webinar

Extendible functionality Extendible data store Scalable computation Virtualization Common platform for creating services Support for client applications on diverse computing platforms

Vorführender
Präsentationsnotizen
Illustrated use for file format migration So, what is wrong. First, our focus on traditional files may be debilitating because it may focus our attention in a wrong direction Second, this is all based on the contemporary technology that will evolve – we are subject to the same vulnerability as the media we wish to preserve, and even more than we are aware of -- because the digital is inherently about bit processing and not about bit storing. Bit storing is necessary but not sufficient for digital to be kept alive.
Page 15: Microsoft Azureweb.stanford.edu/group/dlss/pasig/PASIG_September2014/...Microsoft Azure for Research Group @azure4research Azure Research Awards (>320 to date) Training and Webinar