OASIS Electronic Trial Master File Standard Technical Committee Metadata Component Layer Discussion
OASIS Electronic Trial Master File Standard Technical Committee Content Classification Layer
description
Transcript of OASIS Electronic Trial Master File Standard Technical Committee Content Classification Layer
OASIS Electronic Trial Master File Standard Technical
Committee
Content Classification Layer
January 20, 20149:00 – 10:00 AM PST
AgendaTopic Presenter
9:00-9:05 Call to Order & Roll Call Zack Schmidt
9:05-9:10 Approval of Minutes https://www.oasis-open.org/committees/documents.php?wg_abbrev=etmf
All
TC Process and Administration (deferred) Chet Ensign
2
9:10-9:20 Outreach Subcommittee - All Jennifer Alpert9:20-9:50 Tech presentation – Content Classification Layer Z. Schmidt/Aliaa
9:50-9:55 New Business All
9:55-10:00 Next meeting agenda / Date Z. Schmidt
Name Company Voting Status Present?Jennifer Alpert Palchak CareLex Voter y
Aliaa Badr CareLex Voter yOleksiy (Alex) Palinkash CareLex Voter yTroy Jacobson Forte Research Voter yLou Chappuie Individual Voter yLisa Mulcahy Individual Non-Voter yRobert Gehrke Mayo Clinic Voter n
Rich Lustig Oracle Non-Voter yMichael Agard Paragon Solutions Non-Voter yChristopher McSpiritt Paragon Solutions Non-Voter y
Jamie O’Keefe Paragon Solutions Non-Voter nFran Ross Paragon Solutions Non-Voter yPeter Alterman SAFE-BioPharma Voter yCatherine Schmidt SterlingBio Voter yZack Schmidt SureClinical Voter yTrish Whetzel, PhD SureClinical Non-Voter yPeter Junge Beijing Sursen Observer nLaura Hilty Forte Research Observer nTony O’Hare Forte Research Observer nEldin Rammell Rammell Consulting Observer nRobin Cover OASIS staff Non-Voter nChet Ensign OASIS staff Non-Voter n
Roll Call
Meeting Etiquette• Announce your name prior to making comments or
suggestions • Keep your phone on mute when not speaking (#6)
• Do not put your phone on hold – Hang up and dial in again when finished with your other call – Hold = Elevator Music = very frustrated speakers and participants
• Meetings will be recorded and posted– Another reason to keep your phone on mute when not speaking!
• Use the join.me “Chat” feature for questions / comments / Votes
• We will follow Robert’s Rules of OrderNOTE: This meeting is being recorded and minutes will be posted on TC page after the
meeting
From eTMF Std TC to Participants:Hi everyone: remember to keep your phone on mute
4
• Status – New Members:– Oracle – Joined– In Progress: EMC, Kaiser Permanente, Shire,
Medtronics• Activities / Milestones
Outreach Subcommittee
• Status• Timeline• In parallel with other Tech work from charter
Tech Discussion
–Classification System Components:
• Classification Categories
– Taxonomy, hierarchy
• Metadata (‘Tags’)– Characterizes content
• Content Model– Published set of
classifications, metadata for a domain (e.g., eTMF)
Content Classification System Discussion
Classification Categories Component
– Hierarchy of categories
• Categories, subcategories, content types
– Defined relationships with rules: Parent-Child
– All categories, content types required to have unique names and machine codes
– Each content type is associated with Metadata Properties (includes core and domain-specific)
– Content items are linked to content types.
– Unique classification and term codes based on Universal Decimal Classification System (UDC) numbering, widely used in libraries worldwide. Human and machine readable; infinitely expandable
– Can be described, edited and validated using OWL editor (like open source editor Protégé’)
– Supports any simple text vocabulary, including TMF Ref Model and other vocabularies
– W3C OWL2 and RDF/XML supported
Classification Categories Component
StudyDigital Content
Classification Categories Hierarchy
Metadata Component– Used to tag or index digital content itemsMetadata Classes:Core - Comprised of four areas:
File Properties, Classification, Audit Trail Business Process
Domain-specific -- Metadata for a domain in life sciences such as eTMF, finance, legal administration, or others. Uses standards-based terms from groups like NCIOrg Specific – Metadata that meets organizations needs – not standards basedGeneral – obtained from public standards-based vocabulary terminology resources like dublin core Annotation Properties
Metadata about classification categories and metadata: Core, Org-Specific metadata
Metadata ComponentCore Metadata Example – File Properties:
Content Model Component
– Contains classification hierarchy, metadata in machine readable format:
Content Model Component
Term Sourcing Concepts:• Terms adopted by standards bodies should be used first in eTMF model
Primary Term Sources for eTMF Classification System:– Internet Standards Dev Orgs: W3C, IETF, ISO, etc.
» Required for interoperability of machine code
– NIH NCIthesaurus: Term database for FDA, CDISC, HL7, other orgs
» Required for interoperability of clinical / health sciences data
Secondary Term Sources for eTMF Classification System:• Industry sources – widely used terms in enterprise content mgmt software, TMF RM
Classification System – Term Sources
*Spec, Table 6, p21
Classification Categories Component
– Classification hierarchy and numbering is based on UDC library numbering standard and XML naming
– Digital dot notation – Designed for human and machine readability
– Each number is also a unique code for naming and ordering in the hierarchy
– Primary Categories (PC): Three digit. eTMF: 100-200
– Subcategories (SC): Two digit: 10-99
– Content Types (CT): : Two digit: 10-99
– Maximum number of Sub-Category divisions is 5, excluding the 3-digits for the Primary Category
[1] Per spec section 2.1.1; 6.0
Classification Categories Component
Classification Categories Hierarchy and Numbering [1]:
Hierarchy Numbering/Naming Considerations: • Flexible, standards-based approach (W3C XML compliant naming*)• Ability to add multiple hierarchy divisions / levels
• Proposed: 5 divisions = [100*905) = 5.9x1011 Content Types• Uniqueness of numbers – usable as machine code identifiers• Machine readable, human readable• No sorting issues, no need for leading zeros*, no special chars
*Leading zeros in XML syntax are ignored: http://www.w3.org/TR/REC-xml/
Numbering and Naming Scheme
Numbering
• Primary Categories and Sub-Categories :
– Category Code number
• Content Type:
– Content Type ID
Naming
• Primary Categories and Sub-Categories
– Simple text-based names
– Unique name, 64 char limit
– Abbreviation – 16 char limit suggested
– Compatible with W3C XML naming standards :
No special characters :
( ) < > ? / % # @ !
Classification Categories ComponentExample: Classification Categories Hierarchy, Naming, Numbering
Modifying Classification Category Entities – General Editing Rules
Domain Specific
– Classifications cannot be deleted –> Reserve/Unreserve
– Modifications allowed to some annotation properties (see spec)
– Codes (Category Codes, CT Type ID) cannot be generated
Organization Specific
– Classifications can be deleted
– Modifications allowed for classification metadata, annotations
– Codes (Category Codes, CT Type ID) can be generated
Classification Categories Component
Classification Category, Content Type Editing Rules*
Type Import Terms Generate Code
Add/Modify Delete/Reserve
DomainSpecific
Yes No No/Yes** Reserve/Unreserve
OrganizationSpecific
Yes Yes Yes/Yes Delete
*Spec, Table 6, p21
**Annotation metadata
Classification Editing Tool – Free, Open Source Protégé (From Stanford University: http://protege.stanford.edu/ )
*Spec, Table 6, p21
Protégé Editor:-Edit Classification Taxonomy and Metadata Terms-Validate Taxonomy and Term name compliance-Create valid RDF/XML Ontology
Proposed Classification System has following Properties:
• Based on Naming and Numbering that is W3C XML compliant
– No special characters: ( ) & # @ / … etc.
– No leading zeros in classification numbers
• Based on Universal Decimal Classification (UDC) system for content classification:
– 100199 : eTMF Domain
– UDC system used in 170+ countries worldwide; expandable, human and machine readable, sortable http://en.wikipedia.org/wiki/Universal_Decimal_Classification
• Flexible and customizable for organizations, yet interoperable
– Domain classifications – Standardized; Organization-specific classifications – Editable
• Defined set of rules for Editing, modifying Taxonomy
• Any Organization can Modify/Edit taxonomy using open source editors like Protégé
Classification Categories - Summary
*Spec, Table 6, p21
Appendix
Content Classification System – Core Terms needed for Architecture – Objectives:
• Classification, Subclassification concept -
– Supports RDF/XML, OWL languages
– Non-domain specific, generic terms
– Easily understandable by anyone - conveys concept
– Conveys hierarchy
– No conflicts – not a reserved term in RDF/XML, OWL or other compilers/ IDE’s
– First priority – Source terms from standards bodies
Classification System – Core Terms
*Spec, Table 6, p21
Content Classification System – Core Terms needed for Architecture
• Classification, Subclassification term concept:
Classification System – Core Terms
*Spec, Table 6, p21
Term Options: Source DefinitionCategory, SubCategory NIH NCIthesaurus Category: ‘This term is used informally
to mean a class of things’ (NCI code: C25372); Subcategory: ‘A subdivision that has common differentiating characteristics within a larger category.’ (NCI Code C25692)
Class, SubClass W3C OWL Class: ‘Resources may be divided into groups called classes’ SubClass: ‘Subclasses are classes; If a class C is a subclass of a class C', then all instances of C will also be instances of C'. (W3C RDF Class def)
TMF Zone, Section TMF Ref Model TMF Zone = Primary Classification (no published def found online) Section = SubClassification (no published def found online)
Proposed Term
Content Classification System – Core Terms needed for Architecture
• Classification, Subclassification term concept:
Classification System – Core Terms
*Spec, Table 6, p21
Term Options: Source +/-Category, SubCategory NIH NCIthesaurus +Everyone knows it
+Describes hierarchy+In use by standards body (NIH NCI Thesaurus)+Generic
Class, SubClass W3C OWL +Describes hierarchy+In use by standards body+Generic - Could be a reserved word for some development tools
TMF Zone, Section TMF Ref Model +In use by TMF RM users-Doesn’t convey hierarchy-Not in use by standards body-Not Generic
Proposed Term
Content Classification System – Core Terms needed for Architecture – Objectives:
• Content Type concept
– Supports RDF/XML, OWL languages
– Non-domain specific, generic terms
– Easily understandable by anyone – conveys concept
– No conflicts – not a reserved term in RDF/XML, OWL or other compilers/ IDE’s
– First priority – Source terms from standards bodies
Classification System – Core Terms
*Spec, Table 6, p21
Content Classification System – Core Terms needed for Architecture
• Content Type term concept:
Classification System – Core Terms
*Spec, Table 6, p21
Term Source DefinitionContent Type W3C &
CareLexOracle
W3C: ‘Specifies the nature of a linked resource’ W3C and RFC2045] and [RFC2046]
CareLex: A content type is a reusable collection of metadata, business processes, behavior, and other settings for a category of items or documents in electronic content material.
Oracle: Content types are used to define the metadata that you can associate with content.
Artifact TMF Ref Model ‘A collection of documents’Wikipedia (Not published)
Proposed Term
Content Classification System – Core Terms needed for Architecture
• Content Type term concept:
Classification System – Core Terms
*Spec, Table 6, p21
Term Source +/-Content Type W3C +Widely used in internet SW
+ECM SW use - Microsoft, Oracle, Alfresco, etc. +In use by standards body (W3C)+Generic
Artifact TMF Ref Model +In use by TMF RM users-Not in use by standards body-Not Generic -Doesn’t convey concept of metadata
Proposed Term
• Roll call
• Reports– Outreach– Tech Discussion: Classification Layer: Core Metadata (Charter item 2, p.2)
• New business
Draft Agenda: Next Meeting