Digital Antiquity Sustaining Database Semantics Sustaining Database Semantics Keith W. Kintigh...

33
Digital Antiquity Sustaining Database Semantics Sustaining Database Semantics Keith W. Kintigh School of Human Evolution and Social Change Arizona State University [email protected] In the Session Organized by Stuart Jeffrey Taking the Long View: Putting Sustainability at the Heart of Data Creation CAA Granada 7 April 2010

Transcript of Digital Antiquity Sustaining Database Semantics Sustaining Database Semantics Keith W. Kintigh...

Page 1: Digital Antiquity Sustaining Database Semantics Sustaining Database Semantics Keith W. Kintigh School of Human Evolution and Social Change Arizona State.

Digital Antiquity Sustaining Database Semantics

Sustaining Database SemanticsKeith W. Kintigh

School of Human Evolution and Social ChangeArizona State University

[email protected]

In the Session Organized by Stuart Jeffrey

Taking the Long View: Putting Sustainability at the Heart of Data Creation

CAA Granada7 April 2010

Page 2: Digital Antiquity Sustaining Database Semantics Sustaining Database Semantics Keith W. Kintigh School of Human Evolution and Social Change Arizona State.

Digital Antiquity Sustaining Database Semantics

Background

• Today, digital databases (spreadsheets) are often the only loci of irreplaceable records of systematically collected archaeological observations

• In the US, databases are often not curated at all and are rapidly being lost.

• Digital repositories e.g., ADS & tDAR can provide preservation and access

Page 3: Digital Antiquity Sustaining Database Semantics Sustaining Database Semantics Keith W. Kintigh School of Human Evolution and Social Change Arizona State.

Digital Antiquity Sustaining Database SemanticsWhat Semantic Metadata are

Necessary to Adequately Sustain/Document a

Database?• Sufficient information for an archaeologist not familiar with the specifics of a project to make sensible analytical use of the data• Necessary for comparative and synthetic

research• Necessary to reevaluate conclusions

based on systematic evidence• Our ethical (legal) obligation is to

preserve our data make data useable

Page 4: Digital Antiquity Sustaining Database Semantics Sustaining Database Semantics Keith W. Kintigh School of Human Evolution and Social Change Arizona State.

Digital Antiquity Sustaining Database SemanticsAdequate Preservation is

Rarely Achieved in Museum Contexts

• Too frequently the media are curated so there is no long term preservation of data

• Semantic metadata is often on paper• e.g., existing coding manual, coding keys

• But adequate semantic documentation is more comprehensive than analysts would typically think to write down

Page 5: Digital Antiquity Sustaining Database Semantics Sustaining Database Semantics Keith W. Kintigh School of Human Evolution and Social Change Arizona State.

Digital Antiquity Sustaining Database SemanticsDocumenting Databases

• Internally encoded: Structure, Table Names, Column Names & Data Types

• Usually not internally encoded:• Each Column

• Nature of the column values (not just string, etc.)• Arbitrary (lot number, provenience label)• Measurement (units of measure and methods)• Coded or abbreviated value (nominal variables)

• Coded Nominal Values within Columns• Label & description of every value and how it is

distinguished from others (101=rabbit)

Page 6: Digital Antiquity Sustaining Database Semantics Sustaining Database Semantics Keith W. Kintigh School of Human Evolution and Social Change Arizona State.

Digital Antiquity Sustaining Database Semantics

More Subtle Points• Are all values in a coding key used?

• Fish vs species of fish; birds, reptiles etc.• Can lead to conclusion that a species, of bird,

for example, is absent when in fact species was not recorded to this level (i.e., missing data)

• Academic traditions influence what is needed in more subtle ways. • What constitutes an adequate description

varies.• What works for an Americanist might not work

for a European Medievalist

• Probably no absolute adequacy• We can do better and we must move forward

Page 7: Digital Antiquity Sustaining Database Semantics Sustaining Database Semantics Keith W. Kintigh School of Human Evolution and Social Change Arizona State.

Digital Antiquity Sustaining Database Semantics

Our Approach

Page 8: Digital Antiquity Sustaining Database Semantics Sustaining Database Semantics Keith W. Kintigh School of Human Evolution and Social Change Arizona State.

Digital Antiquity Sustaining Database Semantics

Digital AntiquityDigital Antiquity is a newly established multi-institutional organization based in the US devoted to enhancing preservation and access to the digital records of archaeological investigations:•to permit scholars to more effectively create and communicate knowledge of the long-term human past; •to enhance the management of archaeological resources; and •to provide for the long-term preservation of irreplaceable records of archaeological investigations. Business model targets technical, financial and

sociological sustainability in 4-5 years

Page 9: Digital Antiquity Sustaining Database Semantics Sustaining Database Semantics Keith W. Kintigh School of Human Evolution and Social Change Arizona State.

Digital Antiquity Sustaining Database Semantics

Digital Antiquity’s Software

• Aspiring to be an on-line, open source, trusted digital repository for archaeological data and documents

• Provides preservation and free, on-line discovery and access for archaeological data and documents

• Web-based ingest interface: the contributor uploads data and is prompted for detailed metadata

• Advanced tools for data integration across inconsistently recorded databases

Page 10: Digital Antiquity Sustaining Database Semantics Sustaining Database Semantics Keith W. Kintigh School of Human Evolution and Social Change Arizona State.

Digital Antiquity Sustaining Database Semantics

Database Ingest• Elicit Project & Information Resource

metadata• Location, Time, Keywords, Credit, etc

Page 11: Digital Antiquity Sustaining Database Semantics Sustaining Database Semantics Keith W. Kintigh School of Human Evolution and Social Change Arizona State.

Digital Antiquity Sustaining Database Semantics

Upload the Database

Page 12: Digital Antiquity Sustaining Database Semantics Sustaining Database Semantics Keith W. Kintigh School of Human Evolution and Social Change Arizona State.

Digital Antiquity Sustaining Database Semantics

Database Documentation• For each column in the database

• Indicate data type (measurement or coded integer)

• Indicate the material class and nature of variable• For each measurement, elicit units (e.g., m, kg)

• For each coded value (string or number)• Provide a digital “Coding Sheet” specific to that

analyst and dataset that associates codes with labels and descriptions

• Associate each coded value labels with an ontology node with a standard definition

• The original values do not change

Page 13: Digital Antiquity Sustaining Database Semantics Sustaining Database Semantics Keith W. Kintigh School of Human Evolution and Social Change Arizona State.

Digital Antiquity Sustaining Database Semantics

Column Registration

Page 14: Digital Antiquity Sustaining Database Semantics Sustaining Database Semantics Keith W. Kintigh School of Human Evolution and Social Change Arizona State.

Digital Antiquity Sustaining Database Semantics

Coding Sheets

Page 15: Digital Antiquity Sustaining Database Semantics Sustaining Database Semantics Keith W. Kintigh School of Human Evolution and Social Change Arizona State.

Digital Antiquity Sustaining Database Semantics

Ontologies• Ontology is a map of the

semantic relationships among a set of concepts. 

• In tDAR, ontologies are ordinarily hierarchical (tree-like) and represent an arbitrary number of levels of class-subclass relationships

• For a given variable, a user community develops an ontology to enable integration –not centrally controlled

Page 16: Digital Antiquity Sustaining Database Semantics Sustaining Database Semantics Keith W. Kintigh School of Human Evolution and Social Change Arizona State.

Digital Antiquity Sustaining Database Semantics

Define Ontology

Page 17: Digital Antiquity Sustaining Database Semantics Sustaining Database Semantics Keith W. Kintigh School of Human Evolution and Social Change Arizona State.

Digital Antiquity Sustaining Database Semantics

Page 18: Digital Antiquity Sustaining Database Semantics Sustaining Database Semantics Keith W. Kintigh School of Human Evolution and Social Change Arizona State.

Digital Antiquity Sustaining Database SemanticsMap Coding Sheet to

Ontology

Page 19: Digital Antiquity Sustaining Database Semantics Sustaining Database Semantics Keith W. Kintigh School of Human Evolution and Social Change Arizona State.

Digital Antiquity Sustaining Database SemanticsIntegration: Standard

Approach• Standardization at or before the time of

data ingest (least common denominator)

• This will fundamentally not work in archaeology• For legacy data sets, the lcd is very low• Different regional traditions in terminology,

materials (lithics ceramics), and their analyses

• Enforced standardization is a non-starter for the profession in the US

Page 20: Digital Antiquity Sustaining Database Semantics Sustaining Database Semantics Keith W. Kintigh School of Human Evolution and Social Change Arizona State.

Digital Antiquity Sustaining Database Semantics

tDAR Data Integration• Because the digital encoding of the

semantics known to the repository• We have the ability to combine datasets

• Created by different investigators• Using incommensurate coding schemes

• into a dataset in which the observations are analytically comparable

Page 21: Digital Antiquity Sustaining Database Semantics Sustaining Database Semantics Keith W. Kintigh School of Human Evolution and Social Change Arizona State.

Digital Antiquity Sustaining Database Semantics

tDAR Process• Query to Identify Relevant Databases• User selects databases move into user

workspace• Select Columns to Integrate• Specify Filtering & Aggregation of Ontology

Values• Perform Aggregation

• Obtain integrated database with commensurate observations

• Download Result & Analyze It• In Place (beta, needs documentation)

http://tdar.org

Page 22: Digital Antiquity Sustaining Database Semantics Sustaining Database Semantics Keith W. Kintigh School of Human Evolution and Social Change Arizona State.

Digital Antiquity Sustaining Database Semantics

Query

Page 23: Digital Antiquity Sustaining Database Semantics Sustaining Database Semantics Keith W. Kintigh School of Human Evolution and Social Change Arizona State.

Digital Antiquity Sustaining Database Semantics

Add Results to Workspace

Page 24: Digital Antiquity Sustaining Database Semantics Sustaining Database Semantics Keith W. Kintigh School of Human Evolution and Social Change Arizona State.

Digital Antiquity Sustaining Database SemanticsSelect Databases to

Integrate

Page 25: Digital Antiquity Sustaining Database Semantics Sustaining Database Semantics Keith W. Kintigh School of Human Evolution and Social Change Arizona State.

Digital Antiquity Sustaining Database Semantics

Define Integration Conditions

Page 26: Digital Antiquity Sustaining Database Semantics Sustaining Database Semantics Keith W. Kintigh School of Human Evolution and Social Change Arizona State.

Digital Antiquity Sustaining Database Semantics

Filtering and Aggregation

Page 27: Digital Antiquity Sustaining Database Semantics Sustaining Database Semantics Keith W. Kintigh School of Human Evolution and Social Change Arizona State.

Digital Antiquity Sustaining Database Semantics

Initial Datasets

Knowth

Durrington Walls

Page 28: Digital Antiquity Sustaining Database Semantics Sustaining Database Semantics Keith W. Kintigh School of Human Evolution and Social Change Arizona State.

Digital Antiquity Sustaining Database Semantics

Integrated Dataset

Page 29: Digital Antiquity Sustaining Database Semantics Sustaining Database Semantics Keith W. Kintigh School of Human Evolution and Social Change Arizona State.

Digital Antiquity Sustaining Database Semantics

Output• Output Database

• 3 columns, area, FUSD FUSP• observations from both datasets (with

any filtering eliminating cases)• provenience and stratum values are

the same as in the original databases• Taxon values are values in the

ontology with aggregation performed

• Database is downloaded and analysed by user.

Page 30: Digital Antiquity Sustaining Database Semantics Sustaining Database Semantics Keith W. Kintigh School of Human Evolution and Social Change Arizona State.

Digital Antiquity Sustaining Database Semantics

Output File

Page 31: Digital Antiquity Sustaining Database Semantics Sustaining Database Semantics Keith W. Kintigh School of Human Evolution and Social Change Arizona State.

Digital Antiquity Sustaining Database Semantics

To Come in tDAR Integration• User dictated integration is in place• Query-oriented, ad hoc data integration• Based on a query, tDAR identifies

databases that satisfy data requirement of the query: i.e., that are relevant and record needed variables

• Interact, as necessary with the user• Perform integration on-the-fly, i.e. using

ontologies, align key portions of the metadata for the selected columns

• Output is an integrated dataset with maximum resolution and minimal changes

Page 32: Digital Antiquity Sustaining Database Semantics Sustaining Database Semantics Keith W. Kintigh School of Human Evolution and Social Change Arizona State.

Digital Antiquity Sustaining Database SemanticsAcknowledgments

• Andrew W. Mellon Foundation• National Science Foundation• Collaborators at ASU

• K. Selcuk Candan, Tiffany Clark, Hasan Davulcu, John Howard, Shelby Manney, Ben Nelson, Margaret Nelson, Yan Qi, Katherine Spielmann

• Digital Antiquity Board of Directors

Sander van der Leeuw, Arizona State University (ASU) [chair]Carol Ackerson, Girl Scouts Arizona Cactus-Pine CouncilJeffrey Altschul, SRI FoundationKim Bullerdick, Owner, BI, L.L.C.Jaime Casap, Google, Inc.John Howard, University College, Dublin

Keith Kintigh, ASUTim Kohler, Washington State UniversityFred Limp, University of ArkansasHarry Papp, L. Roy Papp & Associates Julian Richards, University of YorkDean Snow, The Pennsylvania State University

Page 33: Digital Antiquity Sustaining Database Semantics Sustaining Database Semantics Keith W. Kintigh School of Human Evolution and Social Change Arizona State.

Digital Antiquity Sustaining Database Semantics

Questions?http://tdar.org