2 st ISA-TAB workshop Outcome/Summary (to date) Workshops on Data Standards (WODS) – EBI,...
-
Upload
bruce-foster -
Category
Documents
-
view
214 -
download
1
Transcript of 2 st ISA-TAB workshop Outcome/Summary (to date) Workshops on Data Standards (WODS) – EBI,...
2st ISA-TAB workshop
Outcome/Summary (to date)
Workshops on Data Standards (WODS) – EBI, Cambridge, UK 16th, 17th and 18th June 2008
This workshop is funded by the BBSRC Tools and Resources (WODS, BB/E025080/1),
with contributions from EBI and NERC Bioinformatics Center
Monday, 16th
Reviewing XSLT
XSLT issues discussed and solutions
FuGE extensions’ status– ACTION (Andy) to create a ‘portal’ on FuGE page to list all the
extensions, their status, contacts, links and examples etc• This can help maximize interactions and advertise the status of
each extensions (like MIBBI does for the checklists)
– E.g. the RNAi group (that is building a FuGE extension in this domain) needs to develop something to describe a microtiter plate and the work on the Array Design can be reused
• If there will be a FuGE-derived MAGE-ML* then this group could ‘reuse’ the ADF part or other parts
• ACTION (Javier,Helen) explore encoding of microplate representation in the datafiles and referencing from Assay
– *Problem is that we do not have a final decision on where there will be a MGED’s extension of FuGE
• (ACTION, Helen) to check with the MAGE list + MGED board
XSLT issues discussed and solutions
Namespace inconsistency– Should we have FuGE ‘controlled’ namespaces?
• The OBO Foundry is considering doing it for ontologies
– ACTION (Andy) to ask the list if desiderable• namespace is a critical issue for xsl processing, not so much
for other parsing methods• this is probably a recommendation for those that wish to use
XSLT for presentation purpose; a dedicated page will be set up on the ISA-TAB website to list such XSLT recommendations ACTION (Philippe)
Annotation overloading– Descriptions are used as term gathering fields– We could recommend on the fly ‘term creation’ (collection
of term as supplied by users)?– ACTION (Andy, Ally) add recommendations on FuGE wiki
(explain the use of FuGE)• Paper soon out on recommendations to extend FuGE
Name attribute optionality- When this is missing XSLT uses the ID, giving a less ‘human
readable’ transformations- We could recommend that name is used when readability is
required/preferred; ACTION (Philippe) to modify XSL templates Way to categorize assays is not in FuGE
- How to code ‘technology’ and ‘endpoint’ to categorize the assay (InvestigationComponent)?
- It can be done implicitly, but would useful to have these as explicit objects- However, as there will not be a FuGE v1.1, ‘work around’ to any issues
or needs will be done via recommendations ACTION (Andy) to add in FuGE wiki
Reagents info are on ISA-TAB– Flow cytometry examples have more depth/granularity, e.g. all the
reagents are listed; they have coded it via Material- Even if FuGE recommends to do it via structured Protocol (see Gel-ML)
– ACTION (Andy, Ally) point them to design patters on FuGE wiki– ISA-TAB can (somehow) deal with it via ‘Protocol Component’ field, just
added
XSLT issues discussed and solutions
More FuGE ML files are needed to test the current scripts- ACTION (Ally) to give more example to Philippe from Symba- ACTION (Philippe) to send script to Ally- ACTION (Philippe) to set up a XSLT page on ISA-TAB to post
all the scripts Then the scripts will be tested with FuGE extensions e.g.
GelML- More example to test to evaluate and finalize the scripts- ACTION (Frank, Philippe) collaboratively finalize the scripts
for GelML
Final comment on scope of ISA-TAB in relation to FuGE - FuGE or other XMLx are more granular/expressive- We got to accept that fact that when transform in ISA-TAB
we will/may loose/compress some of the info
XSLT scripts’ library next steps
To be done by AE team
MAGE-TAB to ISA-TAB converter
Tuesday, 17th
Reviewing the ISA-TAB
Investigation File – Changes and decisions (1)
Add ‘Investigation PubMed ID’ and ‘Investigation Publication DOI’ to Investigation section- Only for paper describing across Studies
Studies section need to be singular (Study) Comment point: Study header is sufficient to ‘separate’
the sections, no need for have ‘start/end repeatable block’
If developers want to add a ‘comment’ then this would be ‘#this is a comment thingy’.- Comment must have # as the first char- But in Study/Assay by adding a ‘column comment’ (see
Table 5 in spec v0.3) for the users Create a new section ‘Study Publications’ where we
group publication’s attributes, moving the id, description and date under ‘Study Section’
Create a new section ‘Study Design Descriptors’
Investigation File – Changes and decisions (2)
All fields’ name are case sensitive- Edit every field must have first letter upper case- Section headers go all upper case- To allow easy visualization when ‘imported’ in spreadsheet
File will be interpreted in a Unicode Any subsections within a repeatable block (Study) must
remain within the block- But the order of the subsection within the block can vary
Use the triplet (type, accession, source ref) consistently, if ontology/CV is used, if not, Name and Type are entered as free text (add example in the spec)
Add ‘Protocol Parameter Name’, followed by ‘Protocol Parameter Type’ and ‘Protocol Parameter Type Term Accession’, ‘Protocol Parameter Type Term Source REF’
Investigation File – Changes and decisions (3)
Correct to allow for multiple values, for ‘Protocol Parameter Type/Term/Source’ and ‘Study Design Type/Term/Source’ triplets
Add ‘Protocol URI’ and ‘Protocol Version’ fields in the ‘Study Protocol’ subsection- The pointer to external file(s) allows to users to provide these in the
format they wish- URI should be resolvable- Ultimately these requirement are up to the implementers; similarly to
make e.g. mandatory other Protocol fields Remove ‘Protocol Component Parameter, Instrument
Component, Software’ triplets and ‘Processing’ fields Add ‘Protocol Component Name’, ‘Protocol Component Types’,
‘Protocol Component Types Term Accession’, ‘Protocol Component Types Term Source REF’ - Used for listing, e.g. instruments, software, reagents, operator- Semicolon separator ACTION (Marco) to provide examples of options
Clean up all the field names and make them ‘unique’ by prefix them with the name of the section, e.g. ‘Investigation PubMed ID’ vs ‘Study PubMed ID’
Study/Assay File – Recommendations
The table represents a graph and each edge needs to appear at least one, nodes do not need to be repeated, e.g. Microarray (technology) / Gene expression (endpoint) tab
- Document how to represent the case when when 2 different analysis protocols are applied to the same set of data file
- In this case we follow MAGE-TAB by repeating vertically the data file names (only, not need to repeat the previous columns) followed by a new analysis protocol and output data file names
Factors Value can be referenced in both Study and Assay tabs But the same value cannot be in both tabs, examples to be
added
Tuesday, 17th
Tools and implementations
Scripts and tools’ plans From ISA-TAB to FuGE ML
- To be done (Phil, Ally wants this ;-)- Map the ISAcreator java model to FuGE general elements
- Ally to help checking/validating mapping From FuGE ML to ISA-TAB
- Work in progrosses XSLT under development- Philippe, Frank, Ally, Nigel and Andy
ISA-TAB creation- ISAcreator (will be open source)- Other tools from participating systems….
ISA-TAB validation- Common, minimum validation rules/scripts to be defined/developed
(e.g. structure, case sensitive)- Use part of the ISAcreator configuration as library
-> Google doc with list of basic rules (to be identified when creating the v1 spec)
-> The ISAcreator config code will be stripped down to the basic rules and posted on the ISA-TAV sf site (SVN)
ISA-TAB and MAGE-TAB- Helen and Susanna to talk to MGED, ref to NIH grant
Wednesday, 18th
Next steps and publication plans
Release plans
Release candidate 1, ISA-TAB v1- Philippe, Marco to edit/add all the agree changes in the spec
-> done by June 27th- Dave, Ally, Kieran check and review
-> done by July 18th- All to read and comment/suggest
-> wiki pages will be set up on ISA-TAB site to facilitate discussion-> all comments received by end of August
- Philippe fix the current ISA-TAB examples to reflect new spec-> Release candidate 1, ISA-TAB v1 out by mid Sept
This version will include details on fields in Investigation file and the list of fields allowed in the Study and Assay files- The specific Assay files defined by the participating
communities, will be listed and new can be added, without having to release new versions
Pending issue
Reference system for SEND and CDISC (STDM)- Take this discussion on the ISA-TAB list with interested
parties, Michael and Steve in particular - Subject ID in STDM should be the same of Source
Name ISA-TAB (as add another subject ID column?); then add the file as external
- Each STDM file has a Study ID, Domain ID, Subject ID (2 types of these, probabaly we can use the UsubjID) and Idvar (column) and Idvarval (column value)
Publication and next workshop
Publication – suggested content- Rationale and use case for ISA-TAB- History from MAGE-TAB to ISA-TAB- Present it as format (and interface to a format) not a ‘standard’
-> Describe scripts making it ‘interoperable’ with other formats
- Example of implementations to date-> Tools/systems that have output/input in this format-> Also (simply) more real examples from communities
posted on this site- Start writing ~end of this year, to submit ~early next year;
journal to be decided later
Next workshop would be a users meeting (in 2009)- To fix minor issues, recommendations, ambiguities, sharing
development approaches, components etc….