DDI-HeatExchangers Inc. DDI Since 1980 514-696-7961 [email protected] .
DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.
-
Upload
santiago-patience -
Category
Documents
-
view
217 -
download
0
Transcript of DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.
![Page 1: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/1.jpg)
DDI TRAINING WORKSHOPWendy Thomas
November 28-29, 2012
![Page 2: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/2.jpg)
Overview of Workshop – Day 1• DDI Use Cases• Identification, Versioning, and Referencing• Modules (structural overview)• Questionnaire content and layout• Concepts, Variables, Logical Record, Physical Store• Use of DDI within a research process• Use of DDI within a archival/management system
![Page 3: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/3.jpg)
Overview of Workshop – Day 2• SND Issue areas / information and discussion
• Geography and DDI• DDI 3.1 changes and the future of DDI-L• Tools and resources• The status of DDI-RDF
![Page 4: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/4.jpg)
Credits• Unspecified slides - Wendy Thomas (MPC)• DDI in 60 Seconds – Arofan Gregory, (ODaF)• OAIS diagram - Herve L’Hours (UKDA)• Remainder of Slides (source indicator in upper left):
• The slides were developed for several DDI workshops at IASSIST conferences and at GESIS training in Dagstuhl/Germany
• Major contributors• Wendy Thomas, Minnesota Population Center• Arofan Gregory, Open Data Foundation
• Further contributors• Joachim Wackerow, GESIS – Leibniz Institute for the Social Sciences• Pascal Heus, Open Data Foundation
• Attribute: http://creativecommons.org/licenses/by-sa/3.0/legalcode
![Page 5: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/5.jpg)
License
Details on next slide.
S01 5
![Page 6: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/6.jpg)
License (cont.)
On-line available at: http://creativecommons.org/licenses/by-sa/3.0/This is a human-readable summary of the Legal Code at:
http://creativecommons.org/licenses/by-sa/3.0/legalcode
S01 6
![Page 7: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/7.jpg)
DDI-L Lifecycle Model
Metadata Reuse
S03 7
![Page 8: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/8.jpg)
Learn DDI-L in 60 Seconds
![Page 9: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/9.jpg)
Study
Concepts
Concepts
measures
SurveyInstruments
using
Questions
made up of
Universes
about
Copyright © GESIS – Leibniz Institute for the Social Sciences, 2010Published under Creative Commons Attribute-ShareAlike 3.0 Unported
![Page 10: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/10.jpg)
Questions
Responses
collect
resulting in
Data Files
Variables
made up of
Categories/Codes,
Numbers
with values of
Copyright © GESIS – Leibniz Institute for the Social Sciences, 2010Published under Creative Commons Attribute-ShareAlike 3.0 Unported
![Page 11: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/11.jpg)
THAT’S PRETTY MUCH IT.
![Page 12: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/12.jpg)
Concepts
Variables
Concepts Codes Categories
Summary StatisticsPhysical
Location
Studies
![Page 13: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/13.jpg)
DDI-L USE CASES
S06 13
Learning DDI: Pack S06Copyright © GESIS – Leibniz Institute for the Social Sciences, 2010
Published under Creative Commons Attribute-ShareAlike 3.0 Unported
![Page 14: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/14.jpg)
Archival Ingestion and Metadata Value-Add
• This use case concerns how DDI 3 can support the ingest and migration functions of data archives and data libraries.
S06 14
![Page 15: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/15.jpg)
Microdata/Aggregates
<DDI 3>[Full meta-data set]
(?)
+ Data ArchiveData Library
Ingest Processing
<DDI 3>[Full or
additional metadata]
Archival events
Supports automation of processing if good
DDI metadata is captured upstream
Provides good format &foundation for value-added metadata by archive
Provides a neutral format for data
migration as analysis packages are
versioned
PreservationSystems
Can packageData and metadata for preservation purposes – populate other standard formats
DisseminationSystems
S06
![Page 16: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/16.jpg)
<g:LocalHoldingPackage>
<s:StudyUnit>with full content
OR
<g:Group>with full content
<s:StudyUnit>new value added content
<a:Archive>
<a:LifeCycleEvents>
capture ingest processing events+
S06 16
![Page 17: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/17.jpg)
Data Dissemination/Data Discovery
• This use case concerns how DDI-L can support the discovery and dissemination of data.
S06 17
![Page 18: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/18.jpg)
Microdata/Aggregates
<DDI-L>[Full meta-data set]
Codebooks
+
RegistriesCatalogues
Question/Concept/Variable Banks
Databases,repositories
Websites
Research Data Centers
Data-Specific Info Access
Systems
<DDI-L>Can add archivalevents meta-data
Rich metadata supports auto-generation of websites,packages of specific, related materials, and other delivery formats and applications
S06
![Page 19: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/19.jpg)
<c:ConceptScheme><c:UniverseScheme><c:GeographicStructureScheme><c:GeographicLocationScheme><d:QuestionScheme><d:ControlConstructScheme><l:VariableScheme><l:CategoryScheme><l:CodeScheme><p:PhysicalStructureScheme><p:RecordLayoutScheme><a:OrganizationScheme><s:StudyUnit> [descriptive content]
• Store as separate resources
• Use content to feed a different registry structure
S06 19
![Page 20: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/20.jpg)
Question/Concept/Variable Banks
• This use case describes how DDI 3 can support question, concept, and variable banks. These are often termed “registries” or “metadata repositories” because they contain only metadata – links to the data are optional, but provide implied comparability. The focus is metadata reuse.
S06 20
![Page 21: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/21.jpg)
<DDI 3>QuestionsFlow Logic
Codings
<DDI 3>Variables
CategoriesCodes
<DDI 3>Concepts
Question Bank
VariableBank
ConceptBank
<DDI 3>QuestionsFlow Logic
Codings
<DDI 3>Variables
CategoriesCodes
<DDI 3>Concepts
Users and
Applications
Users and
Applications
Users and
Applications
Supports butdoes not requireISO 11179
Because DDI has links, each type of bank functions in a modular, complementary way.
S06
![Page 22: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/22.jpg)
<g:ResourcePackage>• Question Bank
• <d:QuestionScheme>• <d:ControlConstructScheme>
• Variable Bank• <l:CategoryScheme>• <l:CodeScheme>• <l:VariableScheme>
• Concept Bank• <c:ConceptScheme>
S06 22
![Page 23: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/23.jpg)
Questionnaire Generation, Data Collection, and Processing
• This use case concerns how DDI 3 can support the creation of various types of questionnaires/CAI, and the collection and processing of raw data into microdata.
S06 23
![Page 24: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/24.jpg)
Paper Questionnaire
<DDI 3>ConceptsUniversesQuestionsFlow Logic
Types of Metadata:• Concepts (conceptual module)• Universe (conceptual module)• Questions (datacollection module)• Flow Logic (datacollection module)• Variables (logicalproduct module)• Categories/Codes (logicalproduct module)• Coding (datacollection module)
<DDI 3>ConceptsUniversesQuestionsFlow Logic
Final
Online SurveyInstrument
CAIInstrument
<DDI 3>VariablesCoding
+
Raw Data
+
<DDI 3>Categories
CodesPhysical Data Product
Physical Instance
Microdata
DDI capturesthe content – XMLallows for each application to do its own presentation
S06
![Page 25: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/25.jpg)
studyunit.xsdconceptualcomponent.xsddatacollection.xsdlogicalproduct.xsdphysicaldataproduct.xsdphysicalinstance.xsd
Previous structure PLUS<l:LogicalProduct>
<l:DataRelationship><l:VariableScheme>
<p:PhysicalDataProduct><p:PhysicalStructureScheme><p:RecordLayoutScheme>
<pi:PhysicalInstance>
S06 25
![Page 26: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/26.jpg)
DDI For Use within a Research Project
• This use case concerns how DDI-L can support various functions within a research project, from the conception of the study through collection and publication of the resulting data.
S06 26
![Page 27: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/27.jpg)
<DDI-L>ConceptsUniverseMethodsPurposePeople/Orgs
<DDI-L>QuestionsInstrument
<DDI-L>Data CollectionData Processing
<DDI-L>Funding Revisions
SubmittedProposal
$€ £
PresentationsArchive/
RepositoryPublication
+++
+
+
<DDI-L>VariablesPhysical Stores
PrinicpalInvestigator
Collaborators
Research Staff
Data
S06
![Page 28: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/28.jpg)
<s:StudyUnit><s:Abstract><s:Purpose><r:FundingInformation><c:ConceptualComponents>
<c:Concepts><c:Universe>
<d:DataCollection><d:Methodology><d:QuestionScheme><d:ControlConstructScheme>
<l:LogicalProduct><l:DataRelationship><l:CategoryScheme><l:CodeScheme><l:VariableScheme>
<p:PhysicalDataProduct><pi:PhysicalInstance><a:Archive>
<a:OrganizationScheme>
• Version 1.0.0 Preparing the proposal for funding
S06 28
![Page 29: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/29.jpg)
<s:StudyUnit><s:Abstract><s:Purpose><r:FundingInformation><c:ConceptualComponents>
<c:Concepts><c:Universe>
<d:DataCollection><d:Methodology><d:QuestionScheme><d:ControlConstructScheme>
<l:LogicalProduct><l:DataRelationship><l:CategoryScheme><l:CodeScheme><l:VariableScheme>
<p:PhysicalDataProduct><pi:PhysicalInstance><a:Archive>
<a:OrganizationScheme>
• Version 1.0.0 Preparing the proposal for funding
• Version 1.1.0 Entering funding information and revising/versioning earlier content
S06 29
![Page 30: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/30.jpg)
<s:StudyUnit><s:Abstract><s:Purpose><r:FundingInformation><c:ConceptualComponents>
<c:Concepts><c:Universe>
<d:DataCollection><d:Methodology><d:QuestionScheme><d:ControlConstructScheme>
<l:LogicalProduct><l:DataRelationship><l:CategoryScheme><l:CodeScheme><l:VariableScheme>
<p:PhysicalDataProduct><pi:PhysicalInstance><a:Archive>
<a:OrganizationScheme>
• Version 1.0.0 Preparing the proposal for funding
• Version 1.1.0 Entering funding information and revising/versioning earlier content
• Version 2.0.0 Preparing for data collection
S06 30
![Page 31: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/31.jpg)
<s:StudyUnit><s:Abstract><s:Purpose><r:FundingInformation><c:ConceptualComponents>
<c:Concepts><c:Universe>
<d:DataCollection><d:Methodology><d:QuestionScheme><d:ControlConstructScheme>
<l:LogicalProduct><l:DataRelationship><l:CategoryScheme><l:CodeScheme><l:VariableScheme>
<p:PhysicalDataProduct><pi:PhysicalInstance><a:Archive>
<a:OrganizationScheme>
• Version 1.0.0 Preparing the proposal for funding
• Version 1.1.0 Entering funding information and revising/versioning earlier content
• Version 2.0.0 Preparing for data collection
• Version 3.0.0 Completing the study and preparing the data
S06 31
![Page 32: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/32.jpg)
Metadata Mining for Comparison, etc.
• This use case concerns how collections of DDI-L metadata can act as a resource to be explored, providing further insight into the comparability and other features of a collection of data to help researchers identify data sets for re-use.
S06 32
![Page 33: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/33.jpg)
Types of Metadata•Universe (conceputualcomponent module)•Concept (conceputualcomponent module)•Question (datacollection module)•Variable (logicalproduct module)
MetadataRepositories/
Registries
<DDI-L>Instances
Questions Variable
ConceptsUniverse
?
<DDI-L>Comparison•Questions•Categories•Codes•Variables•Universe•ConceptsRecodesHarmonizationsData Sets
![Page 34: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/34.jpg)
Register/Administrative Data
• This use case concerns how DDI-L can support the retrieval, organization, presentation, and dissemination of register data
S06 34
![Page 35: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/35.jpg)
Register/Administrative
Data StoreRegisterAdmin.DataFile
Query/Request
Other Data Collection
Integrated Data Set
Generation Instruction (data collection module)Lifecycle Events (Archive module)
Variables, Categories, Codes,Concepts, Etc.
Processing (Data Collection module)
Comparison/mapping (Comparison module)
[Lifecycle continues normally]
S06
![Page 36: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/36.jpg)
<g:Group><cm:Comparison>
<s:StudyUnitReference><s:StudyUnit>
<d:DataCollection><d:Methodology><d:ProcessEvent><l:LogicalProduct>
<l:DataRelationship><l:VariableScheme>
<p:PhyscialDataProduct><pi:PhyscialInstance>
Emphasis is on the process of collection
May includeNCube LogicalProduct
If data is obtained from multiple studies, Group and comparison may be used
S06 36
![Page 37: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/37.jpg)
Implementing GSBPM Content
•This use case concerns the use of DDI as an underlying model within GSBPM and how DDI can be used to implement the model
![Page 38: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/38.jpg)
The Generic Staistical Business Process Model (GSBPM)• The METIS group is a part of UN/ECE which addresses
metadata issues for national statistical agencies (and other producers of official statistics)• This community uses both SDMX and DDI
• They have produced a reference model of the statistical production process• The DDI 3 Lifecycle Model was a major input• GSBPM has a much greater level of detail
S20 38
![Page 39: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/39.jpg)
S20 39
![Page 40: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/40.jpg)
Getting into the details• Some technical basics
• Identification, Versioning, and Reference
• Overall structures for organizing and packaging metadata• Modules and Schemes
• Data capture• Questionnaire structure
• Data description and storage• Concepts, Variables, Records, Data files (physical stores)
• View from the bottom up
![Page 41: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/41.jpg)
Identification, Versioning and Reference
![Page 42: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/42.jpg)
Rationale• Because several organizations are involved in the
creation of a set of metadata throughout the lifecycle flow:• Rules for maintenance, versioning, and identification must be
universal• Reference to other organization’s metadata is necessary for re-use
– and very common
S08 42
![Page 43: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/43.jpg)
Maintenance Rules
• A maintenance agency is identified by a reserved code based on its domain name (similar to it’s website and e-mail)• There is a register of DDI agency identifiers which we will look at
later in the course• Maintenance agencies own the objects they maintain
• Only they are allowed to change or version the objects• Other organizations may reference external items in their
own schemes, but may not change those items• You can make a copy which you change and maintain, but once
you do that, you own it!
S08 43
![Page 44: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/44.jpg)
Versioning Rules
• If a “published” object changes in any way, its version changes
• This will change the version of any containing maintainable object
• Typically, objects grow and are versioned as they move through the lifecycle
• Versionables inherit their agency from the maintainable object they live in at the time of origin
S08 44
![Page 45: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/45.jpg)
Versioning: ChangesConceptScheme XV 1.0.0 - Concept A v 1.0.0- Concept B v 1.0.0- Concept C v 1.0.0
ConceptScheme XV 1.1.0 - Concept A v 1.1.0- Concept B v 1.0.0- Concept C v 1.1.0Add: Concept D v 1.0.0
ConceptScheme XV 2.0.0 - Concept A v 1.2.0- Concept B v 1.0.0- Concept C v 1.2.0- Concept D v 1.1.0Add:Concept E v 1.0.0
ConceptScheme XV 3.0.0 - Concept D v 1.1.0- Concept E v 1.0.0
references references
references
references
Note: You can also reference entireschemes and make additions
S08 45
![Page 46: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/46.jpg)
Identifiable Rules
• Identifiers are assigned to each identifiable object, and are unique within their maintainable parent
• Identifiable objects inherit their version from their containing versionable parent (if any) at their time of origin
• Identifiable objects inherit their maintaining agency from the maintainable object they live in at the time of origin
S08 46
![Page 47: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/47.jpg)
Maintainable, Versionable, and Identifiable
• DDI 3 places and emphasis on re-use• This creates lots of inclusion by reference!• This raises the issue of managing change over time
• The Maintainable, Versionable, and Identifiable scheme in DDI was created to help deal with these issues
• An identifiable object is something which can be referenced, because it has an ID
• A versionable object is something which can be referenced, and which can change over time – it is assigned a version number
• A maintainable object is something which is maintained by a specified agency, and which is versionable and can be referenced – it is given a maintenance agency
S08 47
![Page 48: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/48.jpg)
Basic Element Types
Differences from DDI 1/2--Every element is NOT identifiable--Many individual elements or complex elements may be versioned--A number of complex elements can be separately maintained
S08 48
![Page 49: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/49.jpg)
DDI 3.1 Identifiers• There are two ways to provide identification for a DDI 3
object:• Using a set of XML fields• Using a specially-structured URN
• The structured URN approach is preferred• URNs are a very common way of assigning a universal, public
identifier to information on the Internet• However, they require explicit statement of agency, version, and ID
information in DDI 3• Providing element fields in DDI 3 allows for much
information to be defaulted• Agency can be inherited from parent element• Version can be inherited or defaulted to “1.0.0”
S08 49
![Page 50: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/50.jpg)
Parts of the Identification Series
• Identifiable Element• Identifier:
• ID• Identifying Agency• Version • Version Date• Version Responsibility• Version Rationale• UserID• Object Source
• Variable• Identifier:
• V1 • us.mpc• 1.1.0 [default is 1.0.0]• 2007-02-10• Wendy Thomas• Spelling correction
S08 50
![Page 51: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/51.jpg)
URN Detailed Example
urn=“urn:ddi:us.mpc:VariableScheme. VarSch01.1.4.0:Variable.V1.1.1.0”
This is a URN From DDI
For a variable
In a variable scheme
The scheme agency is us.mpc
With identifierVarSch01 Version 1.1.0
Variable ID is V1Version 1.4.0
S08 51
![Page 52: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/52.jpg)
Referencing• When referencing an object, you must provide:
• The maintenance agency• The identifier• The version
• Often, these are inherited from a maintainable object• This is part of their identification
S08 52
![Page 53: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/53.jpg)
DDI References• References in DDI may be within a single instance or
across instances• Metadata can be re-packaged into many different groups and
instances• “Internal” references are made to objects in the same instance • “External” reference are made to objects in other DDI instances
• Identifiers must provide:• The containing maintainable (a module or a scheme)
• Agency, ID, and Version• The identifiable/versionable object
• ID (and version if versionable)
• Like identifiers, DDI references may be made using URNs or element fields
S08 53
![Page 54: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/54.jpg)
Reference Examples• Internal<VariableReference isReference=“true” isExternal=“false” lateBound=“false”>
<Scheme isReference=“true” isExternal=“false” lateBound=“false”>
<ID>VarSch01</ID> <IdenftifyingAgency>us.mpc</IdentifyingAgency> <Version>1.4.0</Version> </Scheme> <ID>V1</ID> <IdenftifyingAgency>us.mpc</IdentifyingAgency> <Version>1.1.0</Version></VariableReference>
S08 54
![Page 55: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/55.jpg)
Reference Examples• External
<VariableReference isReference=“true” isExternal=“true” lateBound=“false”><urn>urn:ddi:us.mpc:VariableScheme.VarSch01.1.4.0:Variable.V1.1.1.0</urn>
</VariableReference>
S08 55
![Page 56: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/56.jpg)
DDI XML Schemas and Main Structures
S09 56
Learning DDI: Pack S09Copyright © GESIS – Leibniz Institute for the Social Sciences, 2010
Published under Creative Commons Attribute-ShareAlike 3.0 Unported
![Page 57: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/57.jpg)
DDI-L Main Structures and Concepts• XML Schemas• DDI Modules• DDI Schemes• DDI Profiles• A Simple Example
S09 57
![Page 58: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/58.jpg)
XML Schemas, DDI Modules, and DDI Schemes
<file>.xsd<file>.xsd<file>.xsd<file>.xsd
XML Schemas DDI Modules
May Correspond
DDI Schemes
May Contain
Correspond to a stage in the lifecycle
S09 58
![Page 59: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/59.jpg)
XML Schemas
• archive• comparative• conceptualcomponent• datacollection• dataset• dcelements• DDIprofile• ddi-xhtml11• ddi-xhtml11-model-1• ddi-xhtml11-modules-1• group• inline_ncube_recordlayout
• instance• logicalproduct• ncube_recordlayout• physicaldataproduct• physicalinstance• proprietary_record_layout• reusable• simpledc20021212• studyunit• tabular_ncube_recordlayout• xml• set of xml schemas to support xhtml
S09 59
![Page 60: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/60.jpg)
Reminder: DDI Modules and Schemes
• DDI has two important structures:• “Modules”• “Schemes”
• A module is a package of metadata corresponding to a stage of the lifecycle or a specific structural function
• A scheme is a list of reusable metadata items of a specific type
• Many DDI modules contain DDI schemes
S09 60
![Page 61: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/61.jpg)
XML Schemas, DDI Modules, and DDI Schemes
Instance
Study Unit
Physical Instance
DDI Profile
Comparative
Data Collection
Logical Product
Physical Data Structure
Archive
Conceptual Component
Reusable
Ncube
Inline ncube
Tabular ncube
Proprietary
Dataset
S09 61
![Page 62: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/62.jpg)
XML Schemas, DDI Modules, and DDI Schemes
Instance
Study Unit
Physical Instance
DDI Profile
Comparative
Data Collection
Logical Product
Physical Data Structure
Archive
Conceptual Component
Reusable
Ncube
Inline ncube
Tabular ncube
Proprietary
Dataset
S09 62
![Page 63: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/63.jpg)
XML Schemas, DDI Modules, and DDI Schemes
Instance
Study Unit
Physical Instance
DDI Profile
Comparative
Data Collection· Question Scheme· Control Construct Scheme· Interviewer Instruction Scheme
Logical Product· Category Scheme· Code Scheme· Variable Scheme· NCube Scheme
Physical Data Structure· Physical Structure Scheme· Record Layout Scheme
Archive· Organization Scheme
Conceptual Component· Concept Scheme· Universe Scheme· Geographic Structure Scheme· Geographic Location Scheme
Reusable
Ncube
Inline ncube
Tabular ncube
Proprietary
Dataset
S09 63
![Page 64: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/64.jpg)
Why Schemes?• You could ask “Why do we have all these annoying schemes in DDI?”
• There is a simple answer: reuse!• DDI-L supports the concept of metadata registries (e.g., question banks, variable banks)
• DDI-L also needs to show specifically where something is reused• Including metadata by reference helps avoid error and
confusion• Reuse is explicit
S09 64
![Page 65: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/65.jpg)
Packaging structures
![Page 66: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/66.jpg)
DDI Instance
Citation Coverage
Other Material / NotesTranslation Information
Study Unit Group
Resource Package
3.1 Local Holding Package
S04 66
![Page 67: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/67.jpg)
Citation / Series StatementAbstract / Purpose
Coverage / Universe / Analysis Unit / Kind of DataOther Material / Notes
Funding Information / Embargo
Conceptual Components
DataCollection
LogicalProduct
PhysicalDataProduct
Physical Instance
Archive DDI Profile
Study UnitS04 67
![Page 68: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/68.jpg)
Group
Conceptual Components
DataCollection
LogicalProduct
PhysicalDataProduct
Sub Group
Archive
DDI Profile
Citation / Series StatementAbstract / Purpose
Coverage / UniverseOther Material / Notes
Funding Information / Embargo
Study Unit Comparison
S04 68
![Page 69: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/69.jpg)
Resource Package
Any module EXCEPTStudy Unit, GroupOrLocal Holding Package
Any Scheme:OrganizationConceptUniverseGeographic Structure Geographic Location QuestionInterviewer InstructionControl Construct CategoryCodeVariableNCubePhysical StructureRecord Layout
Citation / Series StatementAbstract / Purpose
Coverage / UniverseOther Material / Notes
Funding Information / Embargo
S04 69
![Page 70: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/70.jpg)
Local Holding Package (3.1 and later)
Depository Study Unit OR Group Reference:[A reference to the stored version of the deposited study unit.]
Local Added Content:[This contains all content available in a Study Unit whose source is the local archive.]
Citation / Series StatementAbstract / Purpose
Coverage / Universe Other Material / Notes
Funding Information / Embargo
S04 70
![Page 71: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/71.jpg)
Study Unit• Study Unit
• Identification• Coverage
• Topical• Temporal• Spatial
• Conceptual Components• Universe• Concept• Representation (optional
replication)
• Purpose, Abstract, Proposal, Funding
• Identification is mapped to Dublin Core and basic Dublin Core is included as an option
• Geographic coverage mapped to FGDC / ISO 19115• bounding box• spatial object• polygon description of levels and
identifiers
• Universe Scheme, Concept Scheme• link of concept, universe,
representation through Variable• also allows storage as a ISO/IEC
11179 compliant registry
S04 71
![Page 72: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/72.jpg)
Archive• An archive is whatever organization or individual has
current control over the metadata• Contains persistent lifecycle events• Contains archive specific information
• local identification• local access constraints
S04 72
![Page 73: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/73.jpg)
Data Collection• Methodology• Question Scheme
• Question• Response domain
• Instrument• using Control Construct
Scheme
• Coding Instructions• question to raw data• raw data to public file
• Interviewer Instructions
• Question and Response Domain designed to support question banks• Question Scheme is a
maintainable object
• Organization and flow of questions into Instrument• Used to drive systems like
CASES and Blaise
• Coding Instructions• Reuse by Questions,
Variables, and comparison
S04 73
![Page 74: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/74.jpg)
Logical Product• Category Schemes• Coding Schemes• Variables• NCubes• Variable and NCube Groups• Data Relationships
• Categories are used as both question response domains and by code schemes
• Codes are used as both question response domains and variable representations
• Link representations to concepts and universes through references
• Built from variables (dimensions and attributes)• Map directly to SDMX structures• More generalized to
accommodate legacy data
S04 74
![Page 75: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/75.jpg)
Physical storage• Physical Data Structure
• Links to Data Relationships• Links to Variable or NCube Coordinate• Description of physical storage structure
• in-line, fixed, delimited or proprietary
• Physical Instance• One-to-one relationship with a data file• Coverage constraints• Variable and category statistics
S04 75
![Page 76: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/76.jpg)
Group• Resource Package
• Allows packaging of any maintainable item as a resource item
• Group • Up-front design of groups – allows inheritance• Ad hoc (“after-the-fact”) groups – explicit comparison
using comparison maps for Universe, Concept, Question, Variable, Category, and Code
• Local Holding Package• Allows attachment of local information to a deposited
study without changing the version of the study unit itself
S04 76
![Page 77: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/77.jpg)
DDI Lifecycle Model and Related Modules
StudyUnit
Data Collection
LogicalProduct
PhysicalData Product
PhysicalInstance
Archive
Groups and Resource Packages are a means of publishing any portion or combination of sections of the life cycle
Local Holding Package
S04 77
![Page 78: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/78.jpg)
Building from Component PartsUniverseScheme
ConceptScheme
CategoryScheme
CodeScheme
QuestionScheme
Instrument
Variable Scheme
NCube Scheme
ControlConstructScheme
LogicalRecord
RecordLayout Scheme [Physical Location]
PhysicalInstance
S04 78
![Page 79: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/79.jpg)
Concepts
Universes
Variables
Codes
Categories
Conceptualcomponent
Logicalproduct
Data collection
Questions
Physical dataproduct
RecordLayout
Physicalinstance
CategoryStats
Study Unit Example: Schematic
Study Unit
S09 79
![Page 80: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/80.jpg)
DDI’s “Meta-Module”• One module is unlike all of the others in DDI – the DDI
Profile• This is a “meta-module” – it talks about how the DDI-L is
being used by a specific application or organization
S09 80
![Page 81: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/81.jpg)
DDI Profiles• The DDI Profile module lets you describe which fields you
use in your institution’s flavor of DDI• It is useful for performing machine validation of received instances• It is useful documentation for human users
• You provide a set of information for each element allowed in a complete DDI instance• If it is used or not used• If optional fields (per the XML schema) are required
• Provides the ability to describe DDI Templates• Element AlternateName, Description and Instructions• Required, default, fixed values
S09 81
![Page 82: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/82.jpg)
<pr:DDIProfile xmlns="ddi:profile:3_1" id="DDIProfileSTUDYNO"> <pr:XPathVersion>1.0</pr:XPathVersion> <pr:DDINamespace>3.1</pr:DDINamespace> <pr:XMLPrefixMap> <pr:XMLPrefix>s</pr:XMLPrefix> <pr:XMLNamespace>ddi:studyunit:3_1<pr:/XMLNamespace> </pr:XMLPrefixMap> <pr:Used path="/DDIInstance/VersionResponsibility"/> <pr:Used path="/DDIInstance/Citation/Title“/> <pr:Used path="/DDIInstance/Citation/Creator" required="true" > <pr:AlternateName>Author</pr:AlternateName> <pr:Used path="/DDIInstance/StudyUnit/Citation/Title"/> ..... <pr:NotUsed path="/DDIInstance/StudyUnit/FundingInformation"/> </pr:DDIProfile>
S09 82
![Page 83: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/83.jpg)
Content details• Questionnaire content and design
• Breaking up content into its component parts • Separating processes that occur at different points in the lifecycle• Sharing common components between different points and objects
within the lifecycle
• Data Dictionary basics• Conceptual components• Variables• Organization of variables into records• Physical data stores
• A quick look from the bottom up
![Page 84: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/84.jpg)
Questions and Instruments• DDI 3 separates the questions which make up a survey
instrument from the survey instrument itself• Questions can be re-used!
• There are several different types of question text• Many of these are the normal string types found throughout DDI 3
S11 84
![Page 85: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/85.jpg)
Questionnaires• Questions
• Question Text• Response Domains
• Statements• Pre- Post-question text
• Instructions• Routing information• Explanatory materials
• Question Flow
S11 85
![Page 86: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/86.jpg)
Simple QuestionnairePlease answer the following:1. Sex (1) Male (2) Female2. Are you 18 years or older? (0) Yes (1) No (Go to Question 4)3. How old are you? ______4. Who do you live with? __________________5. What type of school do you attend? (1) Public school (2) Private school (3) Do not attend school
S11 86
![Page 87: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/87.jpg)
Simple QuestionnairePlease answer the following:1. Sex (1) Male (2) Female2. Are you 18 years or older? (0) Yes (1) No (Go to Question 4)3. How old are you? ______4. Who do you live with? __________________5. What type of school do you
attend? (1) Public school (2) Private school (3) Do not attend school
• Questions
S11 87
![Page 88: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/88.jpg)
Simple QuestionnairePlease answer the following:1. Sex (1) Male (2) Female2. Are you 18 years or older? (0) Yes (1) No (Go to Question 4)3. How old are you? ______4. Who do you live with? __________________5. What type of school do you
attend? (1) Public school (2) Private school (3) Do not attend school
• Questions
• Response Domains• Code• Numeric• Text
S11 88
![Page 89: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/89.jpg)
Representing Response Domains• There are many types of response domains
• Many questions have categories/codes as answers• Textual responses are common• Numeric responses are common• Other response domains are also available in DDI 3 (time, mixed
responses)
S11 89
![Page 90: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/90.jpg)
Category and Code Domains• Use CategoryDomain when NO codes are provided for
the category response[ ] Yes
[ ] No
• Use CodeDomain when codes are provided on the questionnaire itself
1. Yes
2. No
S11 90
![Page 91: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/91.jpg)
Category Schemes and Code Schemes
• Use the same structure as variables• Create the category scheme or schemes first (do not
duplicate categories)• Create the code schemes using the categories
• A category can be in more than one code scheme• A category can have different codes in each code scheme
S11 91
![Page 92: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/92.jpg)
Numeric and Text Domains• Numeric Domain provides information on the range of acceptable numbers that can be entered as a response
• Text domains generally indicate the maximum length of the response and can limit allowed content using a regular expression
• Additional specialized domains such as DateTime are also available
• Structured Mixed Response domain allows for multiple response domains and statements within a single question, when multiple response types are required
S11 92
![Page 93: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/93.jpg)
Simple QuestionnairePlease answer the following:1. Sex (1) Male (2) Female2. Are you 18 years or older? (0) Yes (1) No (Go to Question 4)3. How old are you? ______4. Who do you live with? __________________5. What type of school do you
attend? (1) Public school (2) Private school (3) Do not attend school
• Questions
• Response Domains• Code• Numeric• Text
• Statements
S11 93
![Page 94: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/94.jpg)
Simple QuestionnairePlease answer the following:1. Sex (1) Male (2) Female2. Are you 18 years or older? (0) Yes (1) No (Go to Question 4)3. How old are you? ______4. Who do you live with? __________________5. What type of school do you
attend? (1) Public school (2) Private school (3) Do not attend school
• Questions
• Response Domains• Code• Numeric• Text
• Statements
• Instructions
S11 94
![Page 95: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/95.jpg)
Simple QuestionnairePlease answer the following:1. Sex (1) Male (2) Female2. Are you 18 years or older? (0) Yes (1) No (Go to Question 4)3. How old are you? ______4. Who do you live with? __________________5. What type of school do you
attend? (1) Public school (2) Private school (3) Do not attend school
• Questions
• Response Domains• Code• Numeric• Text
• Statements
• Instructions
• Flow
Skip Q3
S11 95
![Page 96: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/96.jpg)
Question 1 Question 2
Question 3 Question 4 Question 5
Is Q2 = 0 (yes)
Yes
No
S11 96
Statement 1
![Page 97: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/97.jpg)
Approach to Survey Analysis• Identify
• Question Text• Statements • Instructions or informative materials• Response Domains (by type)
• Determine the universe structure and concepts• Walk through the flow logic
S11 97
![Page 98: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/98.jpg)
Completing Question Items• Create CodeSchemes reusing common categories• Determine range for NumericDomains• Determine maximum length of TextDomains• Write up control constructs (easiest is to list all
QuestionConstruct, all Statement Items)
S11 98
![Page 99: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/99.jpg)
• Yes• No• Don’t know• Yes• No• Yes• No• Yes, always• Sometimes• Some do, some don’t• Not to my knowledge• Never – I don’t let them• Never – I don’t have a television • Yes• No• Not to my knowledge
Example: Reusing Categories• Yes• No• Don’t know• Yes, always• Sometimes• Some do, some don’t• Not to my knowledge• Never – I don’t let them• Never – I don’t have a television
BECOMES
Full list of all categories: Shorter list of reusable categories:
S11 99
![Page 100: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/100.jpg)
Flow Logic• Master Sequence
• Every instrument has one top-level sequence
• Question and statement order• Routing – IfThenElse (see next slide)
• After Statement 2 (all respondents read this)• After Q2 Else goes to statement• After Q5 Else goes back to a sequence
S11 100
![Page 101: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/101.jpg)
SI 1 Q1 SI 2
Q2
Q5
Q6
Q8 SI 4Q3 Q4
Q7
IfThenElse 3
Else
end
IfThenElse 2
IfThenElse 1
SI 3
Then
Then
Then
Else
Else
S11 101
![Page 102: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/102.jpg)
Example: Master Sequence• Statement 1• Question 1• Statement 2• IFThenElse 1
• Then Sequence 1• Question 2• IFThenElse 2
• Then SEQuence 2Question 3, Question 4, IFThenElse 3, Question 8, Statement 4 [Then SEQuence 3 (Question 6,Question 7)]
• Else Statement 3
S11 102
![Page 103: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/103.jpg)
Process Items• General Coding Instruction
• Missing Data (left as blanks)• Suppression of confidential information such as name or
address• Generation Instructions
• Recodes• Review of text answers where items listed as free text result in
more than one nominal level variable• Create variable for each with 0=no 1=yes
• Or a count of the number of different items provided by a respondent
• Aggregation etc.• The creation of new variables whose values are programmatically
populated (mostly from existing variables)
S12 103
![Page 104: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/104.jpg)
Conceptual Components
• Conceptual components are defined early in the study process. They are the who, what, where, and when of the study.
S10 104
![Page 105: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/105.jpg)
Difference Between Conceptual Components and Coverage• Conceptual Components
• Coverage• Spatial Coverage
• Topical Coverage
• Temporal Coverage
• For use by the study, organization, community
• High level search and links to geographic systems
• High level search and links to broader world of knowledge
• High level search
S10 105
![Page 106: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/106.jpg)
Concepts• A concept may be structured or unstructured and consists
of a Name, a Label, and a Description. A description is needed if you want to support comparison. Concepts are what questions and variables are designed to measure and are normally assigned by the study (organization or investigator).
S10 106
![Page 107: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/107.jpg)
Universe• This is the universe of the study which can combines the
who, what, when, and where of the data• Census top level universe: “The population and
households within Kenya in 2010”• Sub-universes: Households, Population, Males,
Population between 15 and 64 years of age, …
S10 107
![Page 108: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/108.jpg)
Universe Structure• Hierarchical
• Makes clear that “Owner Occupied Housing Units” are part of the broader universe “Housing Units”
• Can be generated from the flow logic of a questionnaire
• Referenced by variables and question constructs• Provides implicit comparability when 2 items reference the same
universe
S10 108
![Page 109: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/109.jpg)
Population and Housing Units in Kenya in 2010
Housing Units
Population
Males Persons 15 years and Older
Variable A Universe Reference:
Males, 15 years of age and older in Kenya in 2010
S10 109
![Page 110: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/110.jpg)
ISO/IEC 11179-1International Standard ISO/IEC 11179-1: Information technology – Specification and standardization of data elements – Part 1: Framework for the specification and standardization of data elements Technologies de l’informatin – Spécifiction et normalization des elements de données – Partie 1: Cadre pout la specification et la normalization des elements de données. First edition 1999-12-01 (p26) http://metadata-standards.org/11179-1/ISO-IEC_11179-1_1999_IS_E.pdf
Universe
Concept
Variable RepresentationQuestion Response Domain
Variable ORQuestion Construct
Data ElementConcept
S20 110
New in 3.2 Data Element
![Page 111: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/111.jpg)
Variables
• Variables are created as a result of data processing, either from questions or other data collection/harvesting activities.
S12 111
![Page 112: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/112.jpg)
General Variable Components• VariableName, Label and Description• Links to Concept, Universe, Question, and Embargo
information• Provides Analysis and Response Unit• Provides basic information on its role:
• isTemporal• isGeographic• isWeight
• Describes Representation
S12 112
![Page 113: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/113.jpg)
Representation• Detailed description of the role of the variable• References related weights (standard and variable)• References all instructions regarding coding and
imputation• Describes concatenated values• Additivity and aggregation method • Value representation • Specific Missing Value description (proposed DDI
3.2)• Can be used in combination with any representation type
S12 113
![Page 114: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/114.jpg)
Value Representation• Provides the following elements/attributes to all representation types:• classification level (“nominal”, “ordinal”, “interval”, “ratio”,
“continuous”)• blankIsMissingValue (“true” “false”)• missingValue (expressed as an array of values)• These last 2 may be replaced in 3.2 by a missing values
representation section• Is represented by one of four representation types (numeric, text, code, date time)
• Additional types are under development (i.e., scales)
S12 114
![Page 115: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/115.jpg)
Code Representation• Code schemes link category labels and content to a code
used in the data file• Codes can be numeric or text• Hierarchies are described by level, completeness, and
relationship of items contained in a level
S12 115
![Page 116: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/116.jpg)
Code Scheme Options• Use in its entirety• Use only specified levels• Use only most discrete items (higher levels are treated as
group labels)• Use only the specified codes or code range
S12 116
![Page 117: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/117.jpg)
<l:CodeScheme id=”CS_1”> <l:CategorySchemeReference> <r:ID>CatScheme_1</r:ID> </l:CategorySchemeReference> <l:HierarchyType>Irregular</l:HierarchyType> <l:Level levelNumber=”1”> <l:Name>2 digit code</l:Name> </l:Level> <l:Level levelNumber=”2” > <l:Name>4 digit code</l:Name> </l:Level> .....
S12 117
![Page 118: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/118.jpg)
....<l:Code isDiscrete=”false” levelNumber=”1” > <l:CategoryReference><r:ID>C_1</r:ID></l:CategoryReference> <l:Value>10</l:Value> <l:Code isDiscrete=”true” levelNumber=”2”> <l:CategoryReference><r:ID>C_2</r:ID></l:CategoryReference> <l:Value>1010</l:Value> </l:Code> <l:Code isDiscrete=” true” levelNumber=”2” > <l:CategoryReference><r:ID>C_3</r:ID></l:CategoryReference> <l:Value>1020</l:Value> </l:Code> </l:Code> <l:Code isDiscrete=” true” levelNumber=”1” > <l:CategoryReference><r:ID>C_4</r:ID></l:CategoryReference> <l:Value>20</l:Value> </l:Code> </l:CodeScheme>
S12 118
![Page 119: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/119.jpg)
Numeric• Use for variables where numeric response is self
explanatory (e.g., age in years)• Continuous or discrete• Specific valid levels or ranges• Missing value codes can be identified• Data is intended to be analyzed as numbers
S12 119
![Page 120: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/120.jpg)
Text• Data is intended to be analyzed as text
• Geographic codes may be numbers but are analyzed as text or string (leading zeros used)
• Content can be any text• Constrain length• Constrain regular expression
• A US ZIP Code is text• 5 characters• numeric characters 0-9 only
S12 120
![Page 121: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/121.jpg)
Date Time• Allows specification of format• Allows statistical software to handle appropriately
S12 121
![Page 122: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/122.jpg)
Data Relationship
S14 122
Learning DDI: Pack S14Copyright © GESIS – Leibniz Institute for the Social Sciences, 2010
Published under Creative Commons Attribute-ShareAlike 3.0 Unported
![Page 123: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/123.jpg)
What we’re covering• How Data Relationship provides the link between the
physical record storage and their logical intellectual content
• How Variables and NCubes are grouped into Logical Records
• How Logical Records define complex file relationships
S14 123
![Page 124: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/124.jpg)
Understanding Data Relationships • Data files can be described as following a structure
• What are the record types?• What variables make up each record type?• How do I know which record type I have?• How can I find a unique record of a specific type?
• How are records related?• DDI provides the information to automate processing of
the data files themselves
S14 124
![Page 125: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/125.jpg)
Logical vs. Physical• Every data file has one or more “logical records” (a record of analysis rather than a physical record)
• The logical description separates the support provided in the variable content from the physical structure
• DDI provides both human readable and machine actionable information to support programming
• Minimal information is REQUIRED even for single record type simple files. The LogicalRecord ID is the link between the physical store of the data and the logical description of its content
S14 125
![Page 126: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/126.jpg)
Data Relationship• Logical Record:
• Assigns an ID to the logical record• Provides information on the logical record type • Identifies support for breaking the logical record into 2
or more physical segments in a storage structure• Explains unique case identification • Provides the content of logical record (Variables and
NCubes)• Record Relationship
• Provides links or “keys” between logical record types
S14 126
![Page 127: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/127.jpg)
Logical Record • Identification• Description• hasLocator [boolean] and Variable Value Reference• To a variable that declares the record type
• Support for Multiple Segments• Specifies variable for this information
• Case Identification• Options for identifying a unique case within a record
type• Variables OR NCubes in record and variableQuantity [integer]
S14 127
![Page 128: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/128.jpg)
Logical RecordMinimum Requirements • Identification• Description• hasLocator [boolean] and Variable Value Reference• Support for Multiple Segments• Case Identification• Variables OR NCubes in record and variableQuantity
[integer]
S14 128
![Page 129: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/129.jpg)
Record Type Locator
• EXAMPLE 1:• Household Record
• variable rectype = “H”
• Person Record
• variable rectype = “P”
• EXAMPLE 2:• Record Type A
• variable chariter = [blank]
• Record Type B
• variable chariter ≥ ‘000’
S14 129
![Page 130: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/130.jpg)
Case Identification• Simple case examples:
• Case Number• Survey form number• Any single variable unique number within a record type
• Complex case identification:• Concatenated keys• Conditional concatenated keys
S14 130
![Page 131: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/131.jpg)
Complex Files and Record Relationships
• Complex files consist of more than one record type stored in one or more files
• Contains the complete Logical Record description for each record type
• Provides information on the relationship between records• Provides the link(s) to other records• Provides the link(s) between waves
S14 131
![Page 132: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/132.jpg)
RecordRelationship• A pairwise relationship of a source and target record• Describes the relationship:
• Source and target record• Type of relationship (=, >, <, ≠, ≤, ≥)
• Notice that the case identification of a record type is frequently used as a key for the relationship link
S14 132
![Page 133: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/133.jpg)
Data File: Persons
Data File: Households
PersonID
Age Gender HouseholdID
Logical Record Structure
HouseholdID
Household Income
Housing UnitType
HouseholdType
Logical Record Structure
Note: this is a logical relationship – the fact that the records are in two files instead of one is unimportant.
S14 133
![Page 134: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/134.jpg)
Describing Data Storage• To describe how data is stored, DDI-L separates the
storage structures from the file actually containing the data• The storage structures are reused
• The storage structure is called a physical data product• The data files are called physical instances
S15 134
![Page 135: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/135.jpg)
Study Unit
Data Collection
Logical Data File
Physical Structure 1
Physical Structure 2
Physical Instance (full file)
Physical Instance (subset of records)
Physical Instance (full file)
S16 135
![Page 136: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/136.jpg)
Linkages Step 1• Define and identify the LogicalRecord within Data Relationship in the Logical Product
• PhysicalDataProduct – Physical Structure• Format• Default values• Link to LogicalRecord • Declaration of its physical segments (in terms of its storage in this
structure)
S15 136
![Page 137: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/137.jpg)
Linkages Step 2• PhysicalDataProduct – Record Layout
• Link from RecordLayout to PhysicalRecordSegment• Link from DataItem to a Variable or NCube description and to the
physical location of the data in the data file• PhysicalInstance
• Link to the RecordLayout(s) found in the file • Link to the actual file of data
S15 137
![Page 138: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/138.jpg)
Logical Product
LogicalRecordVariables
PhysicalDataProductPhysicalStructure
REF: LogicalProductDefines
PhysicalRecordSegments
RecordLayoutREF: PhysicalRecordSegment
DataItemREF: Variable
Gives physical location in record
Physical Instance
REF: RecordLayoutREF: Data File
Summary StatisticsREF: Variables
Data File
1..n
n..n
1..1
Technically a 1..n but the
additional data files must be
the equivalent of an identical backup copy
S15138
![Page 139: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/139.jpg)
Complexity
• May seem like a lot of referencing and indirection for a simple example
• Structure is designed to handle much more complex structures in a consistent manner
• For example health interview surveys may have records for multiple person types, incidence or event records, biomarkers, and relationship or situational change records stored and linked in many different ways
• Same structure handles all levels of complexity
S15 139
![Page 140: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/140.jpg)
Describing the Physical Store• Link to a LogicalRecord• Different structures to describe different storage formats
• We use XML Schema substitutions• ASCII, internal, proprietary, etc.
• Information on relational links between record types stored in one or many data files (physical relationship)
• Links to Variables and NCube cells
S15 140
![Page 141: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/141.jpg)
Physical Description
• Physical Data Product• Can describe any number of physical stores of data• Describes the gross record layout
• Reference to a LogicalRecord• Information on the use of multiple physical segments to store the
data in the LogicalRecord• Provides default values for various data typing information
• Describes the record layout in detail• Links to a GrossRecordStructure• Provides a detailed link between a specific variable and its
physical storage location
S15 141
![Page 142: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/142.jpg)
PhysicalDataProduct
PhysicalStructureScheme:Reference to LogicalRecord
GrossRecordStructureIdentifies PhysicalSegments
RecordLayoutScheme
ncube_recordlayout
inline_ncube_recordlayout tabular_ncube_recordlayout
proprietary
DataSet
RecordLayoutScheme:Reference to PhysicalStructure
BaseRecordLayout
Alternates for BaseRecordLayout
Uses XML Schema
“substitution groups”
S15 142
![Page 143: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/143.jpg)
PhysicalDataProduct
• ncube_recordlayout• Allows for a record per aggregation case containing
multiple ncubes listed in a fixed or comma delimited layout (used by the example)
• inline_ncube_recordlayout• Allows the data to be listed as a table in-line in the
PhysicalDataProduct
• tabular_ncube_recordlayout• Describes a 2-dimensional tabular layout as used by
spreadsheets
S15 143
![Page 144: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/144.jpg)
Proprietary Record Layout• Used for describing data files for proprietary software
packages• Statistical packages (SPSS, SAS, etc.)• Relational databases (Oracle, SQL Server, etc.)
• Uses a “handle” (DataItemAddress) to define the variable location, instead of a known location within the file• The files are typically binary, so a positional or delimited location
does not work• Examples: variable name, column name
• Allows for proprietary datatypes, outputs, and properties• These describe software-specific parameters that can be defined
by the user, according to the software package they use
S15 144
![Page 145: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/145.jpg)
Data Set• Allows for capturing the data in a DDI-specific XML format, as part of the DDI file• Useful for archival storage of the data, where the data
and metadata live in the same file/package• Useful for feeding temporary data files to visualization
packages/Web services• Usually subsets of the full data file• Many visualization packages expect data in XML format• Web services demand that the communications are performed in
an XML format
• This is a very verbose way of expressing the data – files get much larger!
S15 145
![Page 146: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/146.jpg)
Physical Instance
S16 146
Learning DDI: Pack S16Copyright © GESIS – Leibniz Institute for the Social Sciences, 2010
Published under Creative Commons Attribute-ShareAlike 3.0 Unported
![Page 147: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/147.jpg)
Files of Data• Data files are represented in the DDI metadata with a
module called a physical instance• This is just a metadata object which represents the
existence of a physical file• It also carries summary and category statistics because these
change from data file to data file
S16 147
![Page 148: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/148.jpg)
Physical Instance
• Has a one-to-one relationship with a physical file of data (plus a back-up if one exists)
• Allows for full record subsets of a large data set using record selection
• Houses summary and category statistics that are specific to a particular file• Note that these can be in-line, referenced in another
physical instance, or referenced as a separate data file (with complete logical product, physical data structure, and physical instance)
S16 148
![Page 149: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/149.jpg)
Record of a Physical Instance• Link to a physical storage structure• Specifics of the range of records in the file
• Record type selection• Geographic selection• Topical selection
• Summary statistics• Identification and location of the actual data file
S16 149
![Page 150: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/150.jpg)
Record Subsets• PhysicalRecordSegment
• 79 record segments with each segment in its own file (US 2000 Census SF3)
• Geography• Using SpatialCoverage to limit to a single country
(Eurobarometer – Germany)• Date/Time
• Use TemporalCoverage to limit to a single year (General Social Survey – 1998)
• Topic• Use TopicalCoverage to limit to a single topical
definition (Female cases only)
S16 150
![Page 151: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/151.jpg)
NHGIS Processing: NHGIS project separates physical data files by geographic type
StateCountyPlaceTract Alabama
AlaskaArizona
Arkansas
AlabamaAlaskaArizonaArkansas
StateCountyPlaceTract
S16 151
One file per state with all geographic levels
One file per geographic level with all states
![Page 152: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/152.jpg)
PhysicalInstanceIdentfication
Refeference to RecordLayout(s)ID/location of Data File
GrossFileStructure[Check sums and processing info]
Summary Statistics[Variables]
Category Statistics[allows for single level filters]
Coverage limitationsData Fingerprint
S16 152
![Page 153: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/153.jpg)
From the Bottom Up
• This section summarizes what we have learned starting from the data item and working up to the full metadata description
S17 153
Learning DDI: Pack S17Copyright © GESIS – Leibniz Institute for the Social Sciences, 2010
Published under Creative Commons Attribute-ShareAlike 3.0 Unported
![Page 154: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/154.jpg)
DDI-L from the data item up
PhysicalInstance
PhysicalDataProduct
LogicalProduct
DDI-L breaks down a data file into three major components:-The LogicalProduct describes the data dictionary-The PhysicalDataProduct describes the file structure-The PhysicalInstance describes an actual instance of the file
DDI-L breaks down a data file into three major components:-The LogicalProduct describes the data dictionary-The PhysicalDataProduct describes the file structure-The PhysicalInstance describes an actual instance of the file
S17 154
![Page 155: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/155.jpg)
DDI-L from the data item up
PhysicalInstance
PhysicalDataProduct
LogicalProduct
DataFileIdentification, GrossFileStructure,Statistics, ProprietaryInfo
The PhysicalInstance identifies the file (name, path/uri), holds statistics (#recs, #vars, freq, min, max, etc.) and other applicable proprietary info
The PhysicalInstance identifies the file (name, path/uri), holds statistics (#recs, #vars, freq, min, max, etc.) and other applicable proprietary info
The PhysicalInstance refers to a record layout in the PhysicalDataProduct
(the same data can be stored in different formats/locations or the same record can
contain different data)
The PhysicalInstance refers to a record layout in the PhysicalDataProduct
(the same data can be stored in different formats/locations or the same record can
contain different data)
S17 155
![Page 156: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/156.jpg)
DDI-L from the data item up
PhysicalInstance
PhysicalDataProduct
LogicalProduct
DataFileIdentification, GrossFileStructure,Statistics, ProprietaryInfo
PhysicalStructureScheme/PhysicalStructure
RecordLayoutScheme/RecordLayout ORRecordLayoutScheme/ProprietaryRecordLayout
The PhysicalDataProduct describes the PhysicalStructure of the file and (what are the data components) and its RecordLayout (variable location, formatting, etc).
The PhysicalDataProduct describes the PhysicalStructure of the file and (what are the data components) and its RecordLayout (variable location, formatting, etc).
The same structure can be used by multiple layoutsDifferent layouts are used to describe text and proprietary files.
The same structure can be used by multiple layoutsDifferent layouts are used to describe text and proprietary files.
The PhysicalStructure refers to a logical record in the
LogicalProduct(the same set of variables can be stored in
different ways)
The PhysicalStructure refers to a logical record in the
LogicalProduct(the same set of variables can be stored in
different ways)
S17 156
![Page 157: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/157.jpg)
DDI-L from the data item up
PhysicalInstance
PhysicalDataProduct
LogicalProduct
DataFileIdentification, GrossFileStructure,Statistics, ProprietaryInfo
PhysicalStructureScheme/PhysicalStructure
RecordLayoutScheme/RecordLayout ORRecordLayoutScheme/ProprietaryRecordLayout
The LogicalProduct describes the data dictionary Variables (name,label, formats, etc.), the Codes & Categories (classifications) as well as the the Logical Record for storage.
The LogicalProduct describes the data dictionary Variables (name,label, formats, etc.), the Codes & Categories (classifications) as well as the the Logical Record for storage.
The Data Relationship can describe complex hierarchical structures and indexes.
The Data Relationship can describe complex hierarchical structures and indexes.
The LogicalProduct result from earlier life cycle stages.
(this information is not available in traditional data files)
The LogicalProduct result from earlier life cycle stages.
(this information is not available in traditional data files)
VariableScheme/Variable
DataRelationship/LogicalRecord
CategoryScheme/Category
CodeScheme/Code
S17 157
![Page 158: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/158.jpg)
DDIInstance/StudyUnit
DDI-L from the data item up
PhysicalInstance
PhysicalDataProduct
LogicalProduct
VariableScheme/Variable
DataRelationship/LogicalRecord
CategoryScheme/Category
CodeScheme/Code
PhysicalStructureScheme/PhysicalStructure
RecordLayoutScheme/RecordLayout ORRecordLayoutScheme/ProprietaryRecordLayout
DataFileIdentification, GrossFileStructure,Stattistics, ProprietaryInfo
DataCollection
ConceptualComponents
QuestionScheme/QuestionControlConstruct, Instruction, Instrument,…
ConceptScheme/ConceptUniverseScheme/Universe
Abstract, Coverage, Purpose, …The XML is contained by a StudyUnit wrapped by a DDIInstance.
The XML is contained by a StudyUnit wrapped by a DDIInstance.
The ConceptualComponents describes the concepts, universe and the DataCollecion module captures the questionnaire and survey instrument.
The ConceptualComponents describes the concepts, universe and the DataCollecion module captures the questionnaire and survey instrument.
S17 158
![Page 159: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/159.jpg)
DDIInstance/StudyUnit
DDI-L from the data item up
PhysicalInstance
PhysicalDataProduct
LogicalProduct
VariableScheme/Variable
DataRelationship/LogicalRecord
CategoryScheme/Category
CodeScheme/Code
PhysicalStructureScheme/PhysicalStructure
RecordLayoutScheme/RecordLayout ORRecordLayoutScheme/ProprietaryRecordLayout
DataFileIdentification, GrossFileStructure,Stattistics, ProprietaryInfo
DataCollection
ConceptualComponents
QuestionScheme/QuestionControlConstruct, Instruction, Instrument,…
ConceptScheme/ConceptUniverseScheme/Universe
Abstract, Coverage, Purpose, …
S17 159
![Page 160: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/160.jpg)
DDI in context• Managing Research
• Individual research• Large research projects – longitudinal multi-researcher
• Managing digital resources
![Page 161: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/161.jpg)
Managing research
![Page 162: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/162.jpg)
Individual researchers• Tools – using the software they know• Clarifying what metadata needs to be captured for future
preservation and discovery• Building/locating metadata resources that support
comparison• Getting metadata from individual researchers is not a new
problem – DDI can’t solve it but can provide some direction
![Page 163: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/163.jpg)
The Longitudinal Version of GSBPM• In 2011 at a Dagstuhl workshop on Longitudinal metadata
a modification of the GSBPM was developed to describe data production for large on-going research projects
• This work is still under development but may result in a more detailed lifecycle model for DDI moving forward
![Page 164: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/164.jpg)
S01 164
Note the similarity to the DDI Combined Lifecycle Model and the top level of the GSBPM
![Page 165: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/165.jpg)
S01 165
![Page 166: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/166.jpg)
S01 166
![Page 167: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/167.jpg)
Upstream Metadata Capture• Because there is support throughout the lifecycle, you can capture the metadata as it occurs
• It is re-useable throughout the lifecycle• It is versionable as it is modified across the lifecycle
• It supports production at each stage of the lifecycle• It moves into and out of the software tools used at each
stage
S05 167
![Page 168: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/168.jpg)
Metadata Driven Data Capture• Questions can be organized into survey instruments
documenting flow logic and dynamic wording• This metadata can be used to create control programs for Blaise,
CASES, CSPro and other CAI systems
• Generation Instructions can drive data capture from registry sources and/or inform data processing post capture
S05 168
![Page 169: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/169.jpg)
Reuse of Metadata• You can reuse many types of metadata, benefitting from the work of others• Concepts• Variables• Categories and codes• Geography• Questions
• Promotes interoperability and standardization across organizations
• Can capture (and re-use) common cross-walks
S05 169
![Page 170: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/170.jpg)
Managing digital resources
![Page 171: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/171.jpg)
Management of Data and Metadata• Managing metadata:
• Capture – goal is to capture at point of origin• Reuse – reduce burden, reduce error, comparison• Quality control – reuse, replication, paradata• Preservation – metadata in a non-proprietary format• Provenance – how the data was created• Processing – metadata driven processing• Discovery and access • Analysis support and information
• Digital objects• Data as a unique object – without metadata its just a number
![Page 172: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/172.jpg)
Data/Metadata Mgmt Activities• Data Capture
• determining what is to be collected from whom and how• Data Processing
• cleaning, normalizing, aggregating, harmonizing, creation of data products
• Process evaluation and revision• quality control, process improvement, evaluation and
analysis• Data Discovery
• Finding data, accessing data• Preservation
• short term and archival• Administrative tracking
• who has control, where in the process
![Page 173: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/173.jpg)
Data/Metadata Management• Downside: There’s a lot more to manage
• Greater depth than many other digital objects• Greater detail that can be leveraged for discovery, access, and
application• Costly to translate into a standard format
• Upside: We’ve been managing digital data for over 40 years• No need to reinvent the wheel• DDI as a metadata structure is not an “all or nothing” approach• DDI uptake has moved out of the archives and is moving into the
production process
![Page 174: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/174.jpg)
Working within a Library/Archive System
• Actionable and informational metadata• What do you need to “do” with the metadata?
• Discovery• How deep do you want to go?• How integrated do you want the results to be?
• Visualization / Manipulation• Analysis• Preservation / Archive
![Page 175: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/175.jpg)
Archive/Data Discovery and Delivery
• Data and Metadata are generally received from external organizations
• Focus is on moving data and metadata to a preservation format and supporting discovery and delivery tools
• Management of ingest process (process management)
• “Value Added” material
![Page 176: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/176.jpg)
![Page 177: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/177.jpg)
Archive/Data Discovery and Delivery• Capturing full content
• Machine actionable• Information for discovery• Retaining links to other materials, collections and grouping
• Added value metadata from archive • Variable, question, and data element groups related to subject and
keyword access• Linking to a common geography description• Linking to an overall organization description• Tracking archival management activities and processes
![Page 178: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/178.jpg)
Working with producers/researchers
• How much can you influence depositors?• Ingest tools that result in DDI metadata • Provision of reusable materials (schemes) or controlled
vocabularies• metadata management tools• Training
• What can be pushed back to long term depositors? • Resource package material? • Metadata of deposited data so that only differences are reported? • Tools to manage change over time?
![Page 179: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/179.jpg)
General use statements
![Page 180: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/180.jpg)
Upstream Metadata Capture• Because there is support throughout the lifecycle, you can capture the metadata as it occurs
• It is re-useable throughout the lifecycle• It is versionable as it is modified across the lifecycle
• It supports production at each stage of the lifecycle• It moves into and out of the software tools used at each
stage
S05 180
![Page 181: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/181.jpg)
Reuse of Metadata• You can reuse many types of metadata, benefitting from the work of others• Concepts• Variables• Categories and codes• Geography• Questions
• Promotes interoperability and standardization across organizations
• Can capture (and re-use) common cross-walks
S05 181
![Page 182: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/182.jpg)
Metadata Driven Data Capture• Questions can be organized into survey instruments documenting flow logic and dynamic wording• This metadata can be used to create control programs
for Blaise, CASES, CSPro and other CAI systems
• Generation Instructions can drive data capture from registry sources and/or inform data processing post capture
S05 182
![Page 183: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/183.jpg)
Management of Information, Data, and Metadata
• An organization can manage its organizational information, metadata, and data within repositories using DDI 3 to transfer information into and out of the system to support:• Controlled development and use of concepts, questions,
variables, and other core metadata• Development of data collection and capture processes• Support quality control operations• Develop data access and analysis systems
S05 183
![Page 184: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/184.jpg)
DAY 2
![Page 185: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/185.jpg)
DDI-C and DDI-L
• DDI has 2 development lines• DDI Codebook (DDI-C)• DDI Lifecycle (DDI-L)
• Both lines will continue to be improved• DDI-C focusing just on single study codebook structures• DDI-L focusing on a more inclusive lifecycle model and
support for machine actionability
S01 185
![Page 186: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/186.jpg)
Background• Concept of DDI and definition of needs grew out of the data archival community
• Established in 1995 as a grant funded project initiated and organized by ICPSR
• Members:• Social Science Data Archives (US, Canada, Europe) • Statistical data producers (including US Bureau of the
Census, the US Bureau of Labor Statistics, Statistics Canada and Health Canada)
• February 2003 – Formation of DDI Alliance• Membership based alliance• Formalized development procedures
S02 186
![Page 187: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/187.jpg)
Early DDI:Characteristics of DDI-C• Focuses on the static object of a codebook• Designed for limited uses
• End user data discovery via the variable or high level study identification (bibliographic)
• Only heavily structured content relates to information used to drive statistical analysis
• Coverage is focused on single study, single data file, simple survey and aggregate data files
• Variable contains majority of information (question, categories, data typing, physical storage information, statistics)
S02 187
![Page 188: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/188.jpg)
Limitations of these Characteristics• Treated as an “add on” to the data collection process
• Focus is on the data end product and end users (static)
• Limited tools for creation or exploitation• The Variable must exist before metadata can be created
• Producers hesitant to take up DDI creation because it is a cost and does not support their development or collection process
S02 188
![Page 189: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/189.jpg)
Origins of the DDI Alliance
• DDI-C was developed by an informal network of individuals from the social science community and official statistics• Funding was through grants
• It was decided that a more formal organization would help to drive the development of the standard forward• Many new features were requested• The DDI Alliance was born to facilitate the development
in a consistent and on-going fashion
S03 189
![Page 190: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/190.jpg)
DDI Alliance Structure• DDI-L specifications are created by committees drawn from
among the member organizations• Some outside experts are invited to attend
• The Steering Committee governs the organization• The Expert Committee votes to approve all published work
• One representative per member organization• The Technical Implementation Committee (TIC) creates the
technical work products (XML schemas, UML models, documentation, etc.)
• Working Groups are short term groups working on future DDI topical content (i.e., Survey Design & Implementation)
• Tools Catalog Group describing tools and software to work with DDI
• Web Site Maintenance Group
S03 190
![Page 191: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/191.jpg)
Moving from DDI-C to DDI-L• DDI Alliance members wished to support current DDI-C users and will continue to support this specification
• The limitations of DDI-C needed to be addressed in order to move the standard forward to a broader audience and user base
• Requirements for DDI-L came out of the original committee as well as the broader data archive community
• The development of the first wave of software for DDI-C raised additional requirements
S03 191
![Page 192: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/192.jpg)
Requirements for DDI-L• Improve and expand the machine-actionable aspects of
the DDI to support programming and software systems• Support CAI instruments through expanded description of
the questionnaire (content and question flow)• Support the description of data series (longitudinal
surveys, panel studies, recurring waves, etc.)• Support comparison, in particular comparison by design
but also comparison-after-the fact (harmonization)• Improve support for describing complex data files (record
and file linkages)• Provide improved support for geographic content to
facilitate linking to geographic files (shape files, boundary files, etc.)
S03 192
![Page 193: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/193.jpg)
DDI Lifecycle Model
Metadata Reuse
S03 193
![Page 194: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/194.jpg)
Relationship to Other Standards: Archival
• Dublin Core• Basic bibliographic citation information• Basic holdings and format information
• METS• Upper level descriptive information for managing digital objects• Provides specified structures for domain specific metadata
• OAIS • Reference model for the archival lifecycle
• PREMIS• Supports and documents the digital preservation process
S20 194
![Page 195: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/195.jpg)
Relationship to Other Standards: Non-Archival
• ISO 19115 – Geography• Metadata structure for describing geographic feature files such as
shape, boundary, or map image files and their associated attributes• ISO/IEC 11179
• International standard for representing metadata in a Metadata Registry
• Consists of a hierarchy of “concepts” with associated properties for each concept
• ISO 17369 SDMX • Exchange of statistical information (time series/indicators) • Supports metadata capture as well as implementation of registries
S20 195
![Page 196: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/196.jpg)
Mining the Archive• With metadata about relationships and structural
similarities• You can automatically identify potentially comparable data sets• You can navigate the archive’s contents at a high level• You have much better detail at a low level across divergent data
sets
S05 196
![Page 197: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/197.jpg)
Metadata Coverage• Dublin Core
• ISO/IEC 11179
• ISO 19115
• Statistical Packages
• METS
• PREMIS
• SDMX
• DDI
• [Packaging]• Citation• Geographic Coverage• Temporal Coverage• Topical Coverage• Structure information
• Physical storage description• Variable (name, label, categories, format)
• Source information• Methodology• Detailed description of data • Processing• Relationships• Life-cycle events• Management information• Tabulation/aggregation
S20 197
![Page 198: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/198.jpg)
Moving from DDI 1/2 to DDI 3• DDI Alliance members wished to support current DDI 1/2 users and will continue to support this specification
• The limitations of DDI 1/2 needed to be addressed in order to move the standard forward to a broader audience and user base
• Requirements for DDI 3 came out of the original committee as well as the broader data archive community
• The development of the first wave of software for DDI 1/2 raised additional requirements
S03 198
![Page 199: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/199.jpg)
Requirements for 3.0• Improve and expand the machine-actionable aspects of
the DDI to support programming and software systems• Support CAI instruments through expanded description of
the questionnaire (content and question flow)• Support the description of data series (longitudinal
surveys, panel studies, recurring waves, etc.)• Support comparison, in particular comparison by design
but also comparison-after-the fact (harmonization)• Improve support for describing complex data files (record
and file linkages)• Provide improved support for geographic content to
facilitate linking to geographic files (shape files, boundary files, etc.)
S03 199
![Page 200: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/200.jpg)
S03 200DDI 1 / 2Document Description
Citation of the codebook documentGuide to the codebookDocument statusSource for the document
Study DescriptionCitation for the studyStudy InformationMethodologyData AccessibilityOther Study Material
File DescriptionFile Text (record and relationship information)Location Map (required for nCubes optional for microdata)
Data DescriptionVariable Group and nCube GroupVariable (variable specification, physical location, question, & statistics)nCube
Other Material
![Page 201: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/201.jpg)
Our Initial Thinking…The metadata payload from DDI 1/2 was re-
organized to cover these areas.
S03 201
![Page 202: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/202.jpg)
S03 202
Study CitationDocument SourceStudy Information
Study MethodologyQuestions
File TextLocation MapPhysical LocationStatistics
Variable specificationnCubesVariable & nCube Groups
Data Accessibility
Other Material
![Page 203: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/203.jpg)
Wrapper
For later parts of the lifecycle,
metadata is reused heavily
from earlierModules.
The discovery and analysis itself creates
data and metadata, re-used in future
cycles.
S03
203
![Page 204: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/204.jpg)
Realizations• Many different organizations and individuals are involved throughout this process• This places an emphasis on versioning and exchange
between different systems• There is potentially a huge amount of metadata reuse throughout an iterative cycle• We needed to make the metadata as reusable as
possible• Every organization acts as an “archive” (that is, a maintainer and disseminator) at some point in the lifecycle • When we say “archive” in DDI 3, it refers to this function
S03 204
![Page 205: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/205.jpg)
DDI 3 and the Data Life Cycle
• A survey is not a static process: It dynamically evolves across time and involves many agencies/individuals
• DDI 1/2 is about archiving, DDI 3 across the entire “life cycle”• DDI 3 focuses on metadata reuse (minimizes redundancies/discrepancies, support comparison)• Also supports multilingual, grouping, geography, and others• DDI 3 is extensible
S03
205
![Page 206: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/206.jpg)
Approach• Shift from the codebook centric model of early versions of
DDI to a lifecycle model, providing metadata support from data study conception through analysis and repurposing of data
• Shift from an XML Data Type Definition (DTD) to an XML Schema model to support the lifecycle model, reuse of content and increased controls to support programming needs
• Redefine a “single DDI instance” to include a “simple instance” similar to DDI 1/2 which covered a single study and “complex instances” covering groups of related studies. Allow a single study description to contain multiple data products (for example, a microdata file and aggregate products created from the same data collection).
• Incorporate the requested functionality in the first published edition
S03
206
![Page 207: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/207.jpg)
Development of DDI 3• 2004 – Acceptance of a new
DDI paradigm• Lifecycle model• Shift from the codebook centric /
variable centric model to capturing the lifecycle of data
• Agreement on expanded areas of coverage
• 2005• Presentation of schema structure• Focus on points of metadata
creation and reuse
• 2006• Presentation of first complete 3.0
model • Internal and public review
• 2007• Vote to move to Candidate
Version• Establishment of a set of use
cases to test application and implementation
• 2008• April: DDI 3.0 published
• 2009• DDI 3.1 approved for publication
in May 2009• Published October 2009• Bugs and feature corrections
identified during the first year of use, some were backward incompatible
S03 207
![Page 208: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/208.jpg)
DDI 3.2• Currently working on DDI 3.2 which will address bug and feature corrections
• Publication for review in 2011• Noted areas of correction:
• Broader support for controlled vocabularies• Clarification of record relationship• Clarification of ID and URN structures• Missing value declarations• Expanded Response Domain/Representation options
S03 208
![Page 209: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/209.jpg)
Change• DDI 3 is a major change from DDI 1/2 in terms of content
and structure. Lets step back and look at:• Basic differences between DDI 1/2 and DDI 3• Applications for DDI 1/2 and DDI 3• Differences that allow DDI 3 to do more• How these differences provide support for better management of
information, data, and metadata
S05 209
![Page 210: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/210.jpg)
Differences Between DDI 1/2 and 3• DDI 1/2
• Codebook based• Format XML DTD• After-the-fact• Static• Metadata replicated• Simple study• Limited physical storage options
• DDI 3• Lifecycle based• Format XML Schema• Point of occurrence• Dynamic• Metadata reused• Simple study, series, grouping, inter-study comparison• Unlimited physical storage options
S05 210
![Page 211: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/211.jpg)
DDI 1/2 Applications• Simple survey capture• High level study description with variable information for
stand alone studies• Descriptions of basic nCubes (individual statistical tables)• Replicating the contents of a codebook including the data
dictionary• Collection management beyond bibliographic records
S05 211
![Page 212: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/212.jpg)
DDI 3 Applications• Describing a series of studies such as a longitudinal survey or cross-cultural survey
• Capturing comparative information between studies• Sharing and reusing metadata outside the context of a specific study
• Capturing data in the XML• Capturing process steps from conception of study through data capture to data dissemination and use
• Capturing lifecycle information as it occurs, and in a way that can inform and drive production
• Management of data and metadata within an organization for internal use or external access
S05 212
![Page 213: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/213.jpg)
Why can DDI 3 do more?• It is machine-actionable – not just documentary• It’s more complex with a tighter structure • It manages metadata objects through a structured
identification and reference system that allows sharing between organizations
• It has greater support for related standards• Reuse of metadata within the lifecycle of a study and
between studies
S05 213
![Page 214: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/214.jpg)
Reuse Across the Lifecycle• This basic metadata is reused across the lifecycle
• Responses may use the same categories and codes which the variables use
• Multiple waves of a study may re-use concepts, questions, responses, variables, categories, codes, survey instruments, etc. from earlier waves
S05 214
![Page 215: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/215.jpg)
Reuse by Reference• When a piece of metadata is re-used, a reference can be
made to the original• In order to reference the original, you must be able to
identify it• You also must be able to publish it, so it is visible (and can
be referenced)• It is published to the user community – those users who are
allowed access
S05 215
![Page 216: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/216.jpg)
Change over Time
• Metadata items change over time, as they move through the data lifecycle• This is especially true of longitudinal/repeat cross-
sectional studies
• This produces different versions of the metadata• The metadata versions have to be maintained as they change over time• If you reference an item, it should not change: you
reference a specific version of the metadata item
S05 216
![Page 217: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/217.jpg)
DDI Support for Metadata Reuse• DDI allows for metadata items to be identifiable
• They have unique IDs• They can be re-used by referencing those IDs
• DDI allows for metadata items to be published• The items are published in resource packages
• Metadata items are maintainable• They live in “schemes” (lists of items of a single type) or in
“modules” (metadata for a specific purpose or stage of the lifecycle)
• All maintainable metadata has a known owner or agency• Maintainable metadata may be versionable
• Versions reflect changes over time• The versionable metadata has a version number
S05 217
![Page 218: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/218.jpg)
Study A
uses
Variable ID=“X”
Resource Package published in
Study B
re-uses by reference
Ref=“Variable X”
uses
Variable ID=“X”
uses
Study B
Ref=“Variable X”
S05 218
![Page 219: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/219.jpg)
Variable ID=“X” Version=“1.0”
Variable ID=“X” Version=“1.1”
Variable ID=“X” Version=“2.0”
changes over time
changes over time
Variable Scheme ID=“123” Agency=“GESIS”
contained in
S05 219
![Page 220: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/220.jpg)
Management of Information, Data, and Metadata
• An organization can manage its organizational information, metadata, and data within repositories using DDI 3 to transfer information into and out of the system to support:• Controlled development and use of concepts, questions,
variables, and other core metadata• Development of data collection and capture processes• Support quality control operations• Develop data access and analysis systems
S05 220
![Page 221: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/221.jpg)
Upstream Metadata Capture• Because there is support throughout the lifecycle, you can capture the metadata as it occurs
• It is re-useable throughout the lifecycle• It is versionable as it is modified across the lifecycle
• It supports production at each stage of the lifecycle• It moves into and out of the software tools used at each
stage
S05 221
![Page 222: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/222.jpg)
Metadata Driven Data Capture• Questions can be organized into survey instruments
documenting flow logic and dynamic wording• This metadata can be used to create control programs for Blaise,
CASES, CSPro and other CAI systems
• Generation Instructions can drive data capture from registry sources and/or inform data processing post capture
S05 222
![Page 223: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/223.jpg)
Reuse of Metadata• You can reuse many types of metadata, benefitting from the work of others• Concepts• Variables• Categories and codes• Geography• Questions
• Promotes interoperability and standardization across organizations
• Can capture (and re-use) common cross-walks
S05 223
![Page 224: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/224.jpg)
Virtual Data
• When researchers use data, they often combine variables from several sources• This can be viewed as a “virtual” data set• The re-coding and processing can be captured as
useful metadata• The researcher’s data set can be re-created from this
metadata• Comparability of data from several sources can be
expressed
S05 224
![Page 225: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/225.jpg)
Mining the Archive• With metadata about relationships and structural
similarities• You can automatically identify potentially comparable data sets• You can navigate the archive’s contents at a high level• You have much better detail at a low level across divergent data
sets
S05 225
![Page 226: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/226.jpg)
DDI - Codebook
![Page 227: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/227.jpg)
Nesstar – HANDS ON
![Page 228: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/228.jpg)
Data Collection/Processing• Data collection in the lifecycle• Representing question text• Questions and questionnaires• Representing response domains• Processing collected data
S11 228
![Page 229: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/229.jpg)
Production • Evaluate current processes
• What is done?• Who does it?• How is it done (existing software, processes)?
• Where do sections of DDI 3 fit into the process?• Where does metadata first come into existence?• What metadata can be reused?
• What sections of metadata be “produced” directly from existing metadata?• Time/cost savings• Consistency
S11 229
![Page 230: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/230.jpg)
Internal Consistency• Standards within an organization or community
• Concept schemes• Question schemes• Coding schemes
• Interoperability between different proprietary software systems• allows forward flexibility for software decisions• allows specialized software for sub-processes
S11 230
![Page 231: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/231.jpg)
Data Collection / Production
• End use is no longer the only focus• Major selling point of DDI 3 to production organizations is
its ability to “inform and drive the process”• Metadata content is reused in DDI 3 so capturing it early
is an advantage to the producer• Metadata captured early can drive the production process
S11 231
![Page 232: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/232.jpg)
Metadata-Driven Processing: An Example
Survey DesignTool
CAITool
Interviewer Respondent
What is your socio-economic status?
I’m very, very
wealthy!
DDI 3Question Bank
This replaces older processes wheresurveys/CAI were created by hand, anddocumented after-the-fact.
SurveyDocumentation
Generated from DDI 3
S11 232
![Page 233: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/233.jpg)
Capturing and Reusing Metadata
• Whether captured at inception or created after-the-fact some sections must be completed before other sections can be completed
• The capture of metadata at point of inception in a non-proprietary structure that can be transferred out-of and into process software provides incentive for metadata creation during the life cycle of the data
S11 233
![Page 234: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/234.jpg)
Metadata Flow• DDI is built on the life cycle of the data and some
information naturally occurs earlier than other information• Reuse of and reference to certain types of information
such as universe, concepts, categories, and coding schemes prescribe a creation order
S11 234
![Page 235: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/235.jpg)
Universe Scheme
ConceptScheme
Organization / Individual
StudyUnitCitation
StudyUnitCoverage
CategoryScheme
CodingScheme
QuestionScheme
VariableScheme
ProcessingEvent(coding)
DataRelationships
NCube
RecordStructure
Remaining Physical Data Product Items
Physical Instance
Archive / Group / etc.
STEP 1 STEP 2
STEP 3 optional
Remaining Logical Product items
STEP 4 STEP 5
STEP 6
STEP 7
Instrument
S11 235
Control Construct Scheme
![Page 236: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/236.jpg)
Questions to Variables
Question DevelopmentSoftware
Identifying Universe and Concepts
Building or ImportingQuestion Textand ResponseDomains
InstrumentDevelopmentSoftware CAI
Organizingquestions andflow logic
Capturing rawresponse dataand processdata
Data Processing Software
Data cleaning and verification
Recoding and/orderiving new data elements using existing ornew categories or coding schemes
DDI DDI
REGISTRY
S11 236
![Page 237: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/237.jpg)
DAY 2
![Page 238: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/238.jpg)
Geographic Structure• Level
• Code, Name, coverage limitation, description
• Parent• Reference to a single parent geography• This is used to describe single hierarchies
• OR Geographic Layer• References multiple base levels where multiple hierarchies are
layered to create a resulting polygon
S10 238
![Page 239: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/239.jpg)
STATES10 239
County
![Page 240: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/240.jpg)
COUNTY
S10 240
County Subdivision
Census Tract
Place
![Page 241: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/241.jpg)
Hierarchies and Layers• State (040)
• County (050)• County Subdivision (060)• Census Tract (140)
• Place (160)
• Portion of a Census Tract within a County Subdivision within a Place
• Layer References:• 140• 060• 160
S10 241
![Page 242: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/242.jpg)
Geographic Location• Level description and/or a reference to the level description in the Geographic Structure
• Reference to the variable containing the identifier of the geographic location
• Description of a specific geographic location:• Code• Name• Geographic time • Bounding Polygon• Excluding Polygon
S10 242
![Page 243: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/243.jpg)
Structure and LocationSTRUCTURE:• Level: 040• Name: State• U.S. State or state equivalent including Legal Territories
and the District of Columbia• Parent: 010 [country]
LOCATION:• Level Reference: 040• Variable Reference: STATEFP• Name: Minnesota• Code Value: 27• Geographic Time: Start: 1857 End: 9999• Bounding Polygon or Shape File Reference: for each
boundary over time
S10 243
![Page 244: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/244.jpg)
DDI Basics (continued)• Study level information (continued)
• Data capture• Questions, question flow• Collection and processing events
• Variables• Data dictionary contents• Record relationships• Physical storage• Statistics
• From the bottom up• Grouping and comparison
![Page 245: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/245.jpg)
Comparison• There are two types of comparison in DDI 3:
• Comparison by design• Ad-hoc (after-the-fact) comparison
• Comparison by design can be expressed using the grouping and inheritance mechanism
• Ad-hoc comparison can be described using the comparison module
• The comparison module is also useful for describing harmonization when performing case selection activities
S18 245
![Page 246: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/246.jpg)
Data Comparison• To compare data from different studies (or even waves of the
same study) we use the metadata• The metadata explains which things are comparable in data sets
• When we compare two variables, they are comparable if they have the same set of properties• They measure the same concept for the same high-level universe, and
have the same representation (categories/codes, etc.)• For example, two variables measuring “Age” are comparable if they
have the same concept (e.g., age at last birthday) for the same top-level universe (i.e., people, as opposed to houses), and express their value using the same representation (i.e., an integer from 0-99)
• They may be comparable if the only difference is their representation (i.e., one uses 5-year age cohorts and the other uses integers) but this requires a mapping
S18 246
![Page 247: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/247.jpg)
DDI Support for Comparison• For data which is completely the same, DDI provides a
way of showing comparability: Grouping• These things are comparable “by design”• This typically includes longitudinal/repeat cross-sectional studies
• For data which may be comparable, DDI allows for a statement of what the comparable metadata items are: the Comparison module• The Comparison module provides the mappings between similar
items (“ad-hoc” comparison)• Mappings are always context-dependent (e.g., they are sufficient
for the purposes of particular research, and are only assertions about the equivalence of the metadata items)
S18 247
![Page 248: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/248.jpg)
Comparability• The comparability of a question or variable can be
complex. You must look at all components. For example, with a question you need to look at:• Question text• Response domain structure
• Type of response domain• Valid content, category, and coding schemes
• The following table looks at levels of comparability for a question with a coded response domain
• More than one comparability “map” may be needed to accurately describe comparability of a complex component
S18 248
![Page 249: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/249.jpg)
Detail of question comparabilityComparison
MapTextual Content of Main Body
Category Code Scheme
Same Similar Same Similar Same Different
Question X X X
X X X
X X X
X X X
X X X
X X X
X X X
X X X
S18 249
![Page 250: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/250.jpg)
Tools and resources
![Page 251: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/251.jpg)
Tools/Projects
• DDI-L has only been an official standard since April 2008• Despite this, many tools are being developed• Some useful tools already exist
• Some tools are available, others are projects which would be willing to share code (or partner) as the basis for further development• The list may not be complete• IASSIST has a DDI Tools panel every year – see online
presentations• There is an online tools database at the DDI Alliance site
S20 251
![Page 252: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/252.jpg)
Tools/Projects (cont.)
• Nesstar (developed by Norwegian Social Sciences Data Services)• Commercial product supporting DDI 1.*/2.* (Editor is
free.)• Provides an editing interface, visualization/tabulation,
and server-to-server data exchange• Nesstar editor is used by the IHSN Metadata Toolkit,
which adds publishing functionality for HTML, PDF, and CD-ROMs
• Useful for migration to DDI-L
S20 252
![Page 253: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/253.jpg)
Tools/Projects (cont.)
• DDI Foundation Tools Program• Joint initiative by several organizations to develop open-
source tools for DDI-L• Includes DeXT (UKDA) and GESIS-developed tools for
transformations to and from DDI 1.0 – 3.0 and statistical packages (SAS, SPSS. Stata)
• Provides a utilities package for Java development, including validation, XML beans, URN resolution
• Now developing a suite of tools for editing DDI-L instances based on a common application framework (work is lead out of the Danish Data Archive)
S20 253
![Page 254: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/254.jpg)
Tools/Projects (cont.)• Canadian RDC Network
• Producing DDI-L-based tools for many DDI use cases• Editing• Migration from DDI-Codebook• Registries• Repositories• Metadata mining
• All tools will be open-source when completed (over next 2 years)• Some available now on request
S20 254
![Page 255: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/255.jpg)
Tools/Projects (cont.)
• Colectica (by Algenta)• Commercial tool supporting survey instrument creation,
and other editing functions of DDI• Has a repository component• Has Web and PDF publishing functionality• Supports DDI-C, DDI-L, Blaise, Cases, and CSPro files• DDI 3.1 is the native file format
• CSPro• Is currently developing support for DDI-L• Already supports DDI-C• Free product
S20 255
![Page 256: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/256.jpg)
Tools/Projects (Cont.)• Space-Time Research
• Has DDI-C and DDI-L support in their line of products (SuperCross, SuperWeb, etc.), for loading micro-data into their proprietary databases
• Commercial tool providing point-and-click functionality for tabulation of microdata
• Support for SDMX expression of tabulations• Uses SDMX RESTful Web services (sort of…)
• Questacy• Based on an online documentation tool for the LISS panel study at
CentERdata• Willing to partner to productize the code base• Database-driven application using PHP and other easy Web
development technologies
![Page 257: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/257.jpg)
Tools/Projects (Cont.)• Exanda
• Online tabulation system based on DDI-L• Intended to be released as open source, but no
committed delivery date• Uses freely available software components (Flex,
Apache Cocoon, etc.)• QDDS
• Documentation system for questionnaires developed by GESIS - Leibniz Institute for the Social Sciences
• Uses DDI-C, plans for supporting DDI-L in future• Freely available, but not open source
![Page 258: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/258.jpg)
Tools/Projects (Cont.)• University of Tokyo
• Producing a multi-lingual DDI editor• English-language interface not yet available (2012/13?)• Will be open-source
• Stat Transfer• Has implemented support for going to/from statistical packages to
DDI 3.1
![Page 259: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/259.jpg)
Tools/Projects (Cont.)• Blaise
• Has support for exporting DDI-L descriptions of surveys• Developed at University of Michigan (ISR - SRO)
• Various (GESIS, University of Kansas, etc.)• Code for exporting DDI from statistical packages (SAS, SPSS)• Generally available free if you know who to ask
![Page 260: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/260.jpg)
DDI Resources• DDI Alliance Site
• http://www.ddialliance.org• General link to all resources/news• Link to Sourceforge for standards distributions• Link to prototype page – good for examples• There is a DDI newsletter you can subscribe to
• Tools/Resources Page• http://tools.ddialliance.org• Best place for tools, slides, and resources
S20 260
![Page 261: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/261.jpg)
DDI Resources (cont.)• Mailing Lists
• www.icpsr.umich.edu/mailman/admin/• All of the lists starting with “DDI” are related to DDI
topics• General list• List for each sub-committee• Not all groups are active • User list is the best general place
• Open Data Foundation Site• www.opendatafoundation.org• White papers, other resources/tools
S20 261
![Page 262: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/262.jpg)
DDI Resources (cont.)• DDI Agency Registry
• http://tools.ddialliance.org/?lvl1=community&lvl2=agencyid
• Sign up for unique global agency identifier – helps provide interoperability between organizations
• Currently deploying permanent registry• International Household Survey Network
• http://surveynetwork.org• DDI-C-based toolkit available for developing countries
(some free tools)• Catalog of surveys, many documented in DDI (NADA) –
open source
S20 262
![Page 263: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/263.jpg)
Best Practices(available at DDI Alliance website)• Implementation and Governance • Work flows - Data Discovery and Dissemination: User Perspective • Work flows - Archival Ingest and Metadata Enhancement • Work flows for Metadata Creation Regarding Recoding, Aggregation
and Other Data Processing Activities • Controlled Vocabularies • Creating a DDI Profile • DDI 3.0 Schemes • Versioning and Publication • DDI as Content for Registries • Management of DDI 3.0 Unique Identifiers • DDI 3.0 URNs and Entity Resolution • High-Level Architectural Model for DDI Applications
S20 263
![Page 264: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/264.jpg)
Use Cases (available at DDI Alliance website)
• Questasy: Documenting and Disseminating Longitudinal Data Online Using DDI 3
• Building a Modular DDI 3 Editor• Using DDI 3 for Comparison• Extracting Metadata From the Data Analysis Workflow• Questionnaire Management and DDI: The QDDS Case• Grouping of Survey Series Using DDI 3• An Archive's Perspective on DDI 3
S20 264
![Page 265: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/265.jpg)
DDI Events
• IASSIST• www.iassistdata.org• Not an official DDI event, but many DDI-related
presentations and meetings• DDI Alliance Expert Committee meets before or after every
year• 38th Meeting in Washington DC, was hosted by NORC,
June 2012• 39th Meeting in Köln, Germany, hosted by GESIS - Leibniz
Institute for the Social Sciences• DDI Workshops often given day before the meeting• Annual meetings go US-Canada-US-Outside North
America-US-Canada-US-Outside North America etc.
S20 265
![Page 266: DDI TRAINING WORKSHOP Wendy Thomas November 28-29, 2012.](https://reader030.fdocuments.us/reader030/viewer/2022012918/5516b377550346f6208b5340/html5/thumbnails/266.jpg)
DDI Events (cont.)
• European DDI User’s Group• 3rd Meetings was last December at Gothenburg, Sweden• 4th Meeting will be in Bergen, Norway, December 2012• Preceded by a DDI Implementers workshop• North American User Group now being formed
• GESIS-Sponsored Autumn Events• Schloss Dagstuhl workshops
• Open Data Foundation meetings• Spring meeting in Europe• Winter meeting in the US• DDI is a major topic of discussion
S20 266