Introduction to Bioinformatics and Biological databases Nicky Mulder: [email protected].
Data Curation and Management activities within the UCT Computational Biology Group Dr Nicky Mulder.
-
Upload
jonah-owen -
Category
Documents
-
view
212 -
download
0
Transcript of Data Curation and Management activities within the UCT Computational Biology Group Dr Nicky Mulder.
![Page 1: Data Curation and Management activities within the UCT Computational Biology Group Dr Nicky Mulder.](https://reader035.fdocuments.us/reader035/viewer/2022070407/56649e4d5503460f94b43ac7/html5/thumbnails/1.jpg)
Data Curation and Management activities within the UCT
Computational Biology Group
Dr Nicky Mulder
![Page 2: Data Curation and Management activities within the UCT Computational Biology Group Dr Nicky Mulder.](https://reader035.fdocuments.us/reader035/viewer/2022070407/56649e4d5503460f94b43ac7/html5/thumbnails/2.jpg)
Outline
Activities at UCT:– High-throughput biology data– Sequence annotation– DAS annotation development
Issues we face A note on standards and ontologies
![Page 3: Data Curation and Management activities within the UCT Computational Biology Group Dr Nicky Mulder.](https://reader035.fdocuments.us/reader035/viewer/2022070407/56649e4d5503460f94b43ac7/html5/thumbnails/3.jpg)
High-throughput biology data
Close ties with CPGR Microarray data storage –BASE Proteomics data:
– Annotation –pipeline required– Storage –LIMS required
![Page 4: Data Curation and Management activities within the UCT Computational Biology Group Dr Nicky Mulder.](https://reader035.fdocuments.us/reader035/viewer/2022070407/56649e4d5503460f94b43ac7/html5/thumbnails/4.jpg)
BASE
BioArray Software Environment Open source database for storage of array-
type data Manages raw data (images) and annotations Has limited LIMS options Can include specifications for MIAME
compliance
![Page 5: Data Curation and Management activities within the UCT Computational Biology Group Dr Nicky Mulder.](https://reader035.fdocuments.us/reader035/viewer/2022070407/56649e4d5503460f94b43ac7/html5/thumbnails/5.jpg)
BASE Sample Information
![Page 6: Data Curation and Management activities within the UCT Computational Biology Group Dr Nicky Mulder.](https://reader035.fdocuments.us/reader035/viewer/2022070407/56649e4d5503460f94b43ac7/html5/thumbnails/6.jpg)
BASE Sample Information
![Page 7: Data Curation and Management activities within the UCT Computational Biology Group Dr Nicky Mulder.](https://reader035.fdocuments.us/reader035/viewer/2022070407/56649e4d5503460f94b43ac7/html5/thumbnails/7.jpg)
BASE experimental info
![Page 8: Data Curation and Management activities within the UCT Computational Biology Group Dr Nicky Mulder.](https://reader035.fdocuments.us/reader035/viewer/2022070407/56649e4d5503460f94b43ac7/html5/thumbnails/8.jpg)
Proteomics Data
Still in progress Peptide identification programs Additional cross-linking from results to public
database annotations Storage of experimental data and resulting
identifications Include MIAPE compliance Linking to genomics data –standards required
![Page 9: Data Curation and Management activities within the UCT Computational Biology Group Dr Nicky Mulder.](https://reader035.fdocuments.us/reader035/viewer/2022070407/56649e4d5503460f94b43ac7/html5/thumbnails/9.jpg)
Sequence Annotation 1
Paeano pipeline for annotation of cDNAs from non-model organisms
Uses collection of publicly available and custom software
Results are stored under projects Links provided to array data in BASE
![Page 10: Data Curation and Management activities within the UCT Computational Biology Group Dr Nicky Mulder.](https://reader035.fdocuments.us/reader035/viewer/2022070407/56649e4d5503460f94b43ac7/html5/thumbnails/10.jpg)
Sequence Annotation 2
Glossina (Tsetse) EST annotation project Held annotation jamboree at UWC Worked with Twiki tool developed by JBIRC Data to be submitted to public databases
![Page 11: Data Curation and Management activities within the UCT Computational Biology Group Dr Nicky Mulder.](https://reader035.fdocuments.us/reader035/viewer/2022070407/56649e4d5503460f94b43ac7/html5/thumbnails/11.jpg)
Twiki system
![Page 12: Data Curation and Management activities within the UCT Computational Biology Group Dr Nicky Mulder.](https://reader035.fdocuments.us/reader035/viewer/2022070407/56649e4d5503460f94b43ac7/html5/thumbnails/12.jpg)
Twiki system
![Page 13: Data Curation and Management activities within the UCT Computational Biology Group Dr Nicky Mulder.](https://reader035.fdocuments.us/reader035/viewer/2022070407/56649e4d5503460f94b43ac7/html5/thumbnails/13.jpg)
DAS Annotation Tool
Distributed Annotation System –allows viewing of annotation from different sources
Can overlay your own data/annotation Facilitates information sharing without issue of updates Repositories distributed in different geographical
locations Extension of DASTy2 –developed at NBN Development of DAS annotation tool underway
![Page 14: Data Curation and Management activities within the UCT Computational Biology Group Dr Nicky Mulder.](https://reader035.fdocuments.us/reader035/viewer/2022070407/56649e4d5503460f94b43ac7/html5/thumbnails/14.jpg)
DASTy
![Page 15: Data Curation and Management activities within the UCT Computational Biology Group Dr Nicky Mulder.](https://reader035.fdocuments.us/reader035/viewer/2022070407/56649e4d5503460f94b43ac7/html5/thumbnails/15.jpg)
Links to other DAS viewers
![Page 16: Data Curation and Management activities within the UCT Computational Biology Group Dr Nicky Mulder.](https://reader035.fdocuments.us/reader035/viewer/2022070407/56649e4d5503460f94b43ac7/html5/thumbnails/16.jpg)
DAS annotation tool
Collaborative visual annotation tool- Annotation- Comments
- Sequences - Features- Non positional features
- Methodology of trust on a collaborative annotation process
![Page 17: Data Curation and Management activities within the UCT Computational Biology Group Dr Nicky Mulder.](https://reader035.fdocuments.us/reader035/viewer/2022070407/56649e4d5503460f94b43ac7/html5/thumbnails/17.jpg)
Data curation and management issues
HTB software licenses are expensive Open Source not always maintained Ensuring regular backups (data size) Keeping data up to date Researchers leave data after project –not updated to
new versions Privacy –researchers share data only with
collaborators, patient data is private Sharing and linking data
![Page 18: Data Curation and Management activities within the UCT Computational Biology Group Dr Nicky Mulder.](https://reader035.fdocuments.us/reader035/viewer/2022070407/56649e4d5503460f94b43ac7/html5/thumbnails/18.jpg)
Standards and ontologies
Use a controlled vocabulary (controlled list of terms) or ontology (set of terms with relations)
Enables easy data retrieval and sharing Easy comparison of results from different labs Compatibility with other labs/databases world-
wide Ease of uploading data into public databases Unambiguous report of research
![Page 19: Data Curation and Management activities within the UCT Computational Biology Group Dr Nicky Mulder.](https://reader035.fdocuments.us/reader035/viewer/2022070407/56649e4d5503460f94b43ac7/html5/thumbnails/19.jpg)
Open Biomedical Ontologies
Central location for accessing well-structured controlled vocabularies and ontologies for use in the biological and medical sciences
Provides simple format for ontologies Scope include anatomy, phenotype,
development, disease, “omics”, experiment, etc.
http://obo.sourceforge.net
![Page 20: Data Curation and Management activities within the UCT Computational Biology Group Dr Nicky Mulder.](https://reader035.fdocuments.us/reader035/viewer/2022070407/56649e4d5503460f94b43ac7/html5/thumbnails/20.jpg)
Data exchange standards
Microarray standards –MIAME and MAGE Proteomics Standards Initiative (PSI) Systems Biology Markup Language (SBML) –
computer-readable format for representing models of networks
Biological Pathways Exchange (BioPAX) –format for representing pathways
![Page 21: Data Curation and Management activities within the UCT Computational Biology Group Dr Nicky Mulder.](https://reader035.fdocuments.us/reader035/viewer/2022070407/56649e4d5503460f94b43ac7/html5/thumbnails/21.jpg)
Conclusions
Some tools in place for curation and management of different data types
Need better education of researchers to encourage this
Ontologies and standards are important in digital data curation and management, need to encourage compliance with international standards
![Page 22: Data Curation and Management activities within the UCT Computational Biology Group Dr Nicky Mulder.](https://reader035.fdocuments.us/reader035/viewer/2022070407/56649e4d5503460f94b43ac7/html5/thumbnails/22.jpg)
Acknowledgements
Funding:
Collaborations:– CPGR– Researchers at UCT