2 st ISA-TAB workshop Outcome/Summary (to date) Workshops on Data Standards (WODS) – EBI,...

2st ISA-TAB workshop

Outcome/Summary (to date)

Workshops on Data Standards (WODS) – EBI, Cambridge, UK 16th, 17th and 18th June 2008

This workshop is funded by the BBSRC Tools and Resources (WODS, BB/E025080/1),

with contributions from EBI and NERC Bioinformatics Center

Monday, 16th

Reviewing XSLT

XSLT issues discussed and solutions

FuGE extensions’ status– ACTION (Andy) to create a ‘portal’ on FuGE page to list all the

extensions, their status, contacts, links and examples etc• This can help maximize interactions and advertise the status of

each extensions (like MIBBI does for the checklists)

– E.g. the RNAi group (that is building a FuGE extension in this domain) needs to develop something to describe a microtiter plate and the work on the Array Design can be reused

• If there will be a FuGE-derived MAGE-ML* then this group could ‘reuse’ the ADF part or other parts

• ACTION (Javier,Helen) explore encoding of microplate representation in the datafiles and referencing from Assay

– *Problem is that we do not have a final decision on where there will be a MGED’s extension of FuGE

• (ACTION, Helen) to check with the MAGE list + MGED board


Namespace inconsistency– Should we have FuGE ‘controlled’ namespaces?

• The OBO Foundry is considering doing it for ontologies

– ACTION (Andy) to ask the list if desiderable• namespace is a critical issue for xsl processing, not so much

for other parsing methods• this is probably a recommendation for those that wish to use

XSLT for presentation purpose; a dedicated page will be set up on the ISA-TAB website to list such XSLT recommendations ACTION (Philippe)

Annotation overloading– Descriptions are used as term gathering fields– We could recommend on the fly ‘term creation’ (collection

of term as supplied by users)?– ACTION (Andy, Ally) add recommendations on FuGE wiki

(explain the use of FuGE)• Paper soon out on recommendations to extend FuGE

Name attribute optionality- When this is missing XSLT uses the ID, giving a less ‘human

readable’ transformations- We could recommend that name is used when readability is

required/preferred; ACTION (Philippe) to modify XSL templates Way to categorize assays is not in FuGE

- How to code ‘technology’ and ‘endpoint’ to categorize the assay (InvestigationComponent)?

- It can be done implicitly, but would useful to have these as explicit objects- However, as there will not be a FuGE v1.1, ‘work around’ to any issues

or needs will be done via recommendations ACTION (Andy) to add in FuGE wiki

Reagents info are on ISA-TAB– Flow cytometry examples have more depth/granularity, e.g. all the

reagents are listed; they have coded it via Material- Even if FuGE recommends to do it via structured Protocol (see Gel-ML)

– ACTION (Andy, Ally) point them to design patters on FuGE wiki– ISA-TAB can (somehow) deal with it via ‘Protocol Component’ field, just

added


More FuGE ML files are needed to test the current scripts- ACTION (Ally) to give more example to Philippe from Symba- ACTION (Philippe) to send script to Ally- ACTION (Philippe) to set up a XSLT page on ISA-TAB to post

all the scripts Then the scripts will be tested with FuGE extensions e.g.

GelML- More example to test to evaluate and finalize the scripts- ACTION (Frank, Philippe) collaboratively finalize the scripts

for GelML

Final comment on scope of ISA-TAB in relation to FuGE - FuGE or other XMLx are more granular/expressive- We got to accept that fact that when transform in ISA-TAB

we will/may loose/compress some of the info

XSLT scripts’ library next steps

To be done by AE team

MAGE-TAB to ISA-TAB converter

Tuesday, 17th

Reviewing the ISA-TAB

Investigation File – Changes and decisions (1)

Add ‘Investigation PubMed ID’ and ‘Investigation Publication DOI’ to Investigation section- Only for paper describing across Studies

Studies section need to be singular (Study) Comment point: Study header is sufficient to ‘separate’

the sections, no need for have ‘start/end repeatable block’

If developers want to add a ‘comment’ then this would be ‘#this is a comment thingy’.- Comment must have # as the first char- But in Study/Assay by adding a ‘column comment’ (see

Table 5 in spec v0.3) for the users Create a new section ‘Study Publications’ where we

group publication’s attributes, moving the id, description and date under ‘Study Section’

Create a new section ‘Study Design Descriptors’


All fields’ name are case sensitive- Edit every field must have first letter upper case- Section headers go all upper case- To allow easy visualization when ‘imported’ in spreadsheet

File will be interpreted in a Unicode Any subsections within a repeatable block (Study) must

remain within the block- But the order of the subsection within the block can vary

Use the triplet (type, accession, source ref) consistently, if ontology/CV is used, if not, Name and Type are entered as free text (add example in the spec)

Add ‘Protocol Parameter Name’, followed by ‘Protocol Parameter Type’ and ‘Protocol Parameter Type Term Accession’, ‘Protocol Parameter Type Term Source REF’


Correct to allow for multiple values, for ‘Protocol Parameter Type/Term/Source’ and ‘Study Design Type/Term/Source’ triplets

Add ‘Protocol URI’ and ‘Protocol Version’ fields in the ‘Study Protocol’ subsection- The pointer to external file(s) allows to users to provide these in the

format they wish- URI should be resolvable- Ultimately these requirement are up to the implementers; similarly to

make e.g. mandatory other Protocol fields Remove ‘Protocol Component Parameter, Instrument

Component, Software’ triplets and ‘Processing’ fields Add ‘Protocol Component Name’, ‘Protocol Component Types’,

‘Protocol Component Types Term Accession’, ‘Protocol Component Types Term Source REF’ - Used for listing, e.g. instruments, software, reagents, operator- Semicolon separator ACTION (Marco) to provide examples of options

Clean up all the field names and make them ‘unique’ by prefix them with the name of the section, e.g. ‘Investigation PubMed ID’ vs ‘Study PubMed ID’

Study/Assay File – Recommendations

The table represents a graph and each edge needs to appear at least one, nodes do not need to be repeated, e.g. Microarray (technology) / Gene expression (endpoint) tab

- Document how to represent the case when when 2 different analysis protocols are applied to the same set of data file

- In this case we follow MAGE-TAB by repeating vertically the data file names (only, not need to repeat the previous columns) followed by a new analysis protocol and output data file names

Factors Value can be referenced in both Study and Assay tabs But the same value cannot be in both tabs, examples to be

added

Tuesday, 17th

Tools and implementations

Scripts and tools’ plans From ISA-TAB to FuGE ML

- To be done (Phil, Ally wants this ;-)- Map the ISAcreator java model to FuGE general elements

- Ally to help checking/validating mapping From FuGE ML to ISA-TAB

- Work in progrosses XSLT under development- Philippe, Frank, Ally, Nigel and Andy

ISA-TAB creation- ISAcreator (will be open source)- Other tools from participating systems….

ISA-TAB validation- Common, minimum validation rules/scripts to be defined/developed

(e.g. structure, case sensitive)- Use part of the ISAcreator configuration as library

-> Google doc with list of basic rules (to be identified when creating the v1 spec)

-> The ISAcreator config code will be stripped down to the basic rules and posted on the ISA-TAV sf site (SVN)

ISA-TAB and MAGE-TAB- Helen and Susanna to talk to MGED, ref to NIH grant

Wednesday, 18th

Next steps and publication plans

Release plans

Release candidate 1, ISA-TAB v1- Philippe, Marco to edit/add all the agree changes in the spec

-> done by June 27th- Dave, Ally, Kieran check and review

-> done by July 18th- All to read and comment/suggest

-> wiki pages will be set up on ISA-TAB site to facilitate discussion-> all comments received by end of August

- Philippe fix the current ISA-TAB examples to reflect new spec-> Release candidate 1, ISA-TAB v1 out by mid Sept

This version will include details on fields in Investigation file and the list of fields allowed in the Study and Assay files- The specific Assay files defined by the participating

communities, will be listed and new can be added, without having to release new versions

Pending issue

Reference system for SEND and CDISC (STDM)- Take this discussion on the ISA-TAB list with interested

parties, Michael and Steve in particular - Subject ID in STDM should be the same of Source

Name ISA-TAB (as add another subject ID column?); then add the file as external

- Each STDM file has a Study ID, Domain ID, Subject ID (2 types of these, probabaly we can use the UsubjID) and Idvar (column) and Idvarval (column value)

Publication and next workshop

Publication – suggested content- Rationale and use case for ISA-TAB- History from MAGE-TAB to ISA-TAB- Present it as format (and interface to a format) not a ‘standard’

-> Describe scripts making it ‘interoperable’ with other formats

- Example of implementations to date-> Tools/systems that have output/input in this format-> Also (simply) more real examples from communities

posted on this site- Start writing ~end of this year, to submit ~early next year;

journal to be decided later

Next workshop would be a users meeting (in 2009)- To fix minor issues, recommendations, ambiguities, sharing

development approaches, components etc….

2 st ISA-TAB workshop Outcome/Summary (to date) Workshops on Data Standards (WODS) – EBI,...

Documents

Transcript of 2 st ISA-TAB workshop Outcome/Summary (to date) Workshops on Data Standards (WODS) – EBI,...