High-throughput Biological Data The data deluge
description
Transcript of High-throughput Biological Data The data deluge
![Page 1: High-throughput Biological Data The data deluge](https://reader036.fdocuments.us/reader036/viewer/2022083006/56813e42550346895da82946/html5/thumbnails/1.jpg)
High-throughput Biological DataThe data deluge
• Hidden in these data is information that reflects
– existence, organization, activity, functionality …… of biological machineries at different levels in living organisms
Most effectively utilising this information will prove to be essential for Integrative Bioinformatics
![Page 2: High-throughput Biological Data The data deluge](https://reader036.fdocuments.us/reader036/viewer/2022083006/56813e42550346895da82946/html5/thumbnails/2.jpg)
Data Issues ……• Data collection: getting the data
• Data representation: data standards, data normalisation …..
• Data organisation and storage: database issues …..
• Data analysis and data mining: discovering “knowledge”, patterns/signals, from data, establishing associations among data patterns
• Data utilisation and application: from data patterns/signals to models for bio-machineries
• Data visualization: viewing complex data ……
• Data transmission: data collection, retrieval, …..
• ……
![Page 3: High-throughput Biological Data The data deluge](https://reader036.fdocuments.us/reader036/viewer/2022083006/56813e42550346895da82946/html5/thumbnails/3.jpg)
Bio-Data Analysis and Data Mining• Existing/emerging bio-data analysis and mining tools for
– DNA sequence assembly
– Genetic map construction
– Sequence comparison and database searching
– Gene finding
– ….
– Gene expression data analysis
– Phylogenetic tree analysis, e.g. to infer horizontally-transferred genes
– Mass spec. data analysis for protein complex characterization
– ……
• Current mode of work:
Often enough: developing ad hoc tools for each individual application
![Page 4: High-throughput Biological Data The data deluge](https://reader036.fdocuments.us/reader036/viewer/2022083006/56813e42550346895da82946/html5/thumbnails/4.jpg)
Bio-Data Analysis and Data Mining• As the amount and types of data and their
cross connections increase rapidly
• the number of analysis tools needed will go up “exponentially”– blast, blastp, blastx, blastn, … from BLAST family
of tools– gene finding tools for human, mouse, fly, rice,
cyanobacteria, …..– tools for finding various signals in genomic
sequences, protein-binding sites, splice junction sites, translation start sites, …..
![Page 5: High-throughput Biological Data The data deluge](https://reader036.fdocuments.us/reader036/viewer/2022083006/56813e42550346895da82946/html5/thumbnails/5.jpg)
Bio-Data Analysis and Data Mining
Many of these data analysis problems are fundamentally the same problem(s) and can
be solved using the same set of tools: e.g. clustering or optimal segmentation by
Dynamic Programming
Developing ad hoc tools for each application (by each group of individual researchers) may soon become inadequate as bio-data production capabilities further ramp up
![Page 6: High-throughput Biological Data The data deluge](https://reader036.fdocuments.us/reader036/viewer/2022083006/56813e42550346895da82946/html5/thumbnails/6.jpg)
Bio-data Analysis, Data Mining and Integrative
Bioinformatics
To have analysis capabilities covering wide range of problems, we need to discover the common
fundamental structures of these problems;
HOWEVER in biology one size does NOT fit all…
Goal is development of a data analysis infrastructure in support of Genomics and
beyond
![Page 7: High-throughput Biological Data The data deluge](https://reader036.fdocuments.us/reader036/viewer/2022083006/56813e42550346895da82946/html5/thumbnails/7.jpg)
Algorithms in bioinformatics• string algorithms• dynamic programming• machine learning (Neural Netsworks, k-Nearest Neighbour,
Support Vector Machines, Genetic Algorithm, ..)• Markov chain models• hidden Markov models• Markov Chain Monte Carlo (MCMC) algorithms• stochastic context free grammars• EM algorithms• Gibbs sampling• clustering• tree algorithms• text analysis• hybrid/combinatorial techniques and more…
![Page 8: High-throughput Biological Data The data deluge](https://reader036.fdocuments.us/reader036/viewer/2022083006/56813e42550346895da82946/html5/thumbnails/8.jpg)
Sequence analysis and homology searching
![Page 9: High-throughput Biological Data The data deluge](https://reader036.fdocuments.us/reader036/viewer/2022083006/56813e42550346895da82946/html5/thumbnails/9.jpg)
Finding genes and regulatory elements
![Page 10: High-throughput Biological Data The data deluge](https://reader036.fdocuments.us/reader036/viewer/2022083006/56813e42550346895da82946/html5/thumbnails/10.jpg)
Expression data
![Page 11: High-throughput Biological Data The data deluge](https://reader036.fdocuments.us/reader036/viewer/2022083006/56813e42550346895da82946/html5/thumbnails/11.jpg)
Functional genomics
• Monte Carlo
![Page 12: High-throughput Biological Data The data deluge](https://reader036.fdocuments.us/reader036/viewer/2022083006/56813e42550346895da82946/html5/thumbnails/12.jpg)
Protein translation