Biological Databases - Computational Bioscience Program at ...compbio.ucdenver.edu/77112015/Dowell...

Biological Databases

What will we discuss today? •  Why are databases the backbone of bioinforma7cs? •  The basic structure of a database •  Data versus annota7on •  Types of DBs: Genbank, PubMed and NCBI •  Query strategies •  Quality of data

issues

http://techcrunch.com/2012/11/25/the-big-data-fallacy-data-%E2%89%A0-information-%E2%89%A0-insights/

Biologists Collect Lots of Data

•  Hundreds of thousands of species •  Millions of ar7cles in scien7fic journals •  Gene7c informa7on:

–  gene names (thousands) –  phenotype of mutants (infinite?) –  loca7on of genes/muta7ons on chromosmes –  linkage (distances between genes)

•  High Throughput technology – Rapid inexpensive DNA sequencing

– Many methods of collec7ng genotype data •  Assays for specific polymorphisms •  Genome-‐wide SNP chips

•  Must have data quality assessment prior to analysis

One sequencer => 1-2Tb/week !!

Curated Biological Data DNA, nucleotide sequences

Gene boundaries, topology Gene structure

Introns, exons, ORFs, splicing

Expression data Mass spectometry

Mass spectometry (metabolomics, proteomics)

Post-Translational protein Modification (PTM)

Curated Biological Data Proteins, residue sequences

MCTUYTCUYFSTYRCCTYFSCD Extended sequence information

Secondary structure

Hydrophobicity, motif data

Protein-protein interaction

Curated Biological data 3D Structures, folds

WHAT is a database? •  A collec7on of data that needs to be:

–  Structured –  Searchable –  Updated (periodically) –  Cross referenced

•  Challenge: –  To change “meaningless” data into useful informa7on that can be

accessed and analysed the best way possible.

For example: HOW would YOU organize all biological sequences so that the biological informa7on is op7mally accessible?

http://en.wikibooks.org/wiki/Data_Management_in_Bioinformatics

A Spreadsheet can be a Database

•  columns are Fields •  Rows are Records •  Can search for a term within just one field

•  Or combine searches across several fields

SNP ID SNPSeq ID!

Gene +primer -primer Hap A Hap B Hap C

D1Mit160_1" 10.MMHAP67FLD1.seq"

lymphocyte antigen 84"

AAGGTAAAAGGCAATCAGCACAGCC"

TCAACCTGGAGTCAGAGGCT"

C — A

M-05554_1" 12.MMHAP31FLD3.seq"

procollagen, type III, alpha "

TGCGCAGAAGCTGAAGTCTA"

TTTTGAGGTGTTAATGGTTCT"

C — A

M-05554_2" X60184" complement component factor i"

ACTTCCAGCCCTGGCTCT"

ATATGCCACCAAGAAGCA"

A C —

M-09947_3" AF067835" caspase 8" TCACAGAGGGAAACATGAAG"

CTCCACATTGAACCAAAGCA"

M-11415_1" U02023" insulin-like growth factor binding protein "

GGGAAAAGCCTGAAAGAAGC"

AGCTGAAACCGGACATCAAT"

T G —

D1Mit284_3"

J05234" nucleolin" TGTTGGAACCGACTTCTTCA"

AAGAGTCAAAGAATTTATGGAATGA"

•  Internal organiza7on – Controls speed and flexibility

•  A unity of programs that – Store – Extract – Modify

Database

Store Extract Modify

USER(S)

DBMS organisa7on types •  Flat file databases (flat DBMS)

–  Simple, restric7ve, table

•  Hierarchical databases (hierarchical DBMS) –  Simple, restric7ve, tables

•  Rela7onal databases (RDBMS) –  Complex,versa7le, tables

•  Object-‐oriented databases (ODBMS) –  Complex, versa7le, objects

•  Data Warehouses and Distributed Databases

Information system

Query system

Storage System

Why are flat files s7ll used?

Structured Data

•  Repository of informa7on

•  managed and accessed differently

•  Flat-‐file (text) •  Rela7onal (key) •  “talk” to each other

Rela7onal databases

•  Data is stored in mul7ple related tables

•  Data rela7onships across tables can be either many-‐to-‐one or many-‐to-‐many

•  A few rules allow the database to be viewed in many ways

Rela7onal Databases

•  What have we achieved? –  No repea7ng informa7on –  Less storage space –  Be`er reality representa7on –  Easy modifica7on/management –  Easy usage of any combina7on of records

Three reasons to care …

•  Database prolifera7on – Dozens to hundreds at the moment

•  More and more scien7fic discoveries result from inter-‐database analysis and mining

•  Rising complexity of required data-‐combina7ons – E.g. transla7onal medicine: “from bench to bedside” (genomic data vs. clinical data)

Standard Data Formats •  DNA sequence = ACGT, but what about gaps, unknown le`ers, etc. –  How many le`ers per line ??? –  ?? Spaces, numbers, headers, etc. –  Store as a string, code as binary numbers, etc.

•  Use a completely different format for proteins?

Need standard formats!!

FASTA Format •  William Pearson (1985)

•  The FASTA format is now universal for all databases and sohware that handles DNA and protein sequences

>URO1 uro1.seq Length: 2018 November 9, 2000 11:50 Type: N Check: 3854 ..!CGCAGAAAGAGGAGGCGCTTGCCTTCAGCTTGTGGGAAATCCCGAAGATGGCCAAAGACA!ACTCAACTGTTCGTTGCTTCCAGGGCCTGCTGATTTTTGGAAATGTGATTATTGGTTGTT!GCGGCATTGCCCTGACTGCGGAGTGCATCTTCTTTGTATCTGACCAACACAGCCTCTACC!CACTGCTTGAAGCCACCGACAACGATGACATCTATGGGGCTGCCTGGATCGGCATATTTG!TGGGCATCTGCCTCTTCTGCCTGTCTGTTCTAGGCATTGTAGGCATCATGAAGTCCAGCA!GGAAAATTCTTCTGGCGTATTTCATTCTGATGTTTATAGTATATGCCTTTGAAGTGGCAT!CTTGTATCACAGCAGCAACACAACAAGACTTTTTCACACCCAACCTCTTCCTGAAGCAGA!TGCTAGAGAGGTACCAAAACAACAGCCCTCCAAACAATGATGACCAGTGGAAAAACAATG

One header line, starts with > with a [return] at end All other characters are part of sequence.

Mul7-‐Sequence FASTA file >FBpp0074027 type=protein; loc=X:complement(16159413..16159860,16160061..16160497); ID=FBpp0074027; name=CG12507-‐PA;

parent=FBgn0030729,FBtr0074248; dbxref=FlyBase:FBpp0074027,FlyBase_Annota7on_IDs:CG12507 PA,GB_protein:AAF48569.1,GB_protein:AAF48569; MD5=123b97d79d04a06c66e12fa665e6d801; release=r5.1; species=Dmel; length=294;

MRCLMPLLLANCIAANPSFEDPDRSLDMEAKDSSVVDTMGMGMGVLDPTQ PKQMNYQKPPLGYKDYDYYLGSRRMADPYGADNDLSASSAIKIHGEGNLA SLNRPVSGVAHKPLPWYGDYSGKLLASAPPMYPSRSYDPYIRRYDRYDEQ YHRNYPQYFEDMYMHRQRFDPYDSYSPRIPQYPEPYVMYPDRYPDAPPLR DYPKLRRGYIGEPMAPIDSYSSSKYVSSKQSDLSFPVRNERIVYYAHLPE IVRTPYDSGSPEDRNSAPYKLNKKKIKNIQRPLANNSTTYKMTL >FBpp0082232 type=protein; loc=3R:complement(9207109..9207225,9207285..9207431); ID=FBpp0082232; name=mRpS21-‐PA;

parent=FBgn0044511,FBtr0082764; dbxref=FlyBase:FBpp0082232,FlyBase_Annota7on_IDs:CG32854-‐PA,GB_protein:AAN13563.1,GB_protein:AAN13563; MD5=dcf91821f75ffab320491d124a0d816c; release=r5.1; species=Dmel; length=87;

MRHVQFLARTVLVQNNNVEEACRLLNRVLGKEELLDQFRRTRFYEKPYQV RRRINFEKCKAIYNEDMNRKIQFVLRKNRAEPFPGCS >FBpp0091159 type=protein; loc=2R:complement(2511337..2511531,2511594..2511767,2511824..2511979,2512032..2512082); ID=FBpp0091159;

name=CG33919-‐PA; parent=FBgn0053919,FBtr0091923; dbxref=FlyBase:FBpp0091159,FlyBase_Annota7on_IDs:CG33919-‐PA,GB_protein:AAZ52801.1,GB_protein:AAZ52801; MD5=c91d880b654cd612d7292676f95038c5; release=r5.1; species=Dmel; length=191;

MKLVLVVLLGCCFIGQLTNTQLVYKLKKIECLVNRTRVSNVSCHVKAINW NLAVVNMDCFMIVPLHNPIIRMQVFTKDYSNQYKPFLVDVKIRICEVIER RNFIPYGVIMWKLFKRYTNVNHSCPFSGHLIARDGFLDTSLLPPFPQGFY QVSLVVTDTNSTSTDYVGTMKFFLQAMEHIKSKKTHNLVHN >FBpp0070770 type=protein; loc=X:join(5584802..5585021,5585925..5586137,5586198..5586342,5586410..5586605); ID=FBpp0070770; name=cv-‐PA;

parent=FBgn0000394,FBtr0070804; dbxref=FlyBase:FBpp0070770,FlyBase_Annota7on_IDs:CG12410-‐PA,GB_protein:AAF46063.1,GB_protein:AAF46063; MD5=0626ee34a518f248bbdda11a211f9b14; release=r5.1; species=Dmel; length=257;

MEIWRSLTVGTIVLLAIVCFYGTVESCNEVVCASIVSKCMLTQSCKCELK NCSCCKECLKCLGKNYEECCSCVELCPKPNDTRNSLSKKSHVEDFDGVPE LFNAVATPDEGDSFGYNWNVFTFQVDFDKYLKGPKLEKDGHYFLRTNDKN LDEAIQERDNIVTVNCTVIYLDQCVSWNKCRTSCQTTGASSTRWFHDGCC ECVGSTCINYGVNESRCRKCPESKGELGDELDDPMEEEMQDFGESMGPFD GPVNNNY …

Reformaung Data Files

•  Much of the rou7ne (yet annoying) work of bioinforma7cs involves messing around with data files to get them into formats that will work with various sohware

•  Then messing around with the results produced by that sohware to create a useful summary…

Accession Numbers!! (keys) •  Databases are designed to be searched by accession numbers (and locus IDs)

•  These are guaranteed to be non-‐redundant, accurate, and not to change.

•  Searching by gene names and keywords is doomed to frustra7on and probable failure

Neither scien7sts nor computers can be trusted to accurately and consistently annotate database entries!!

Accessing database informa7on

•  A request for data from a database is called a query

•  Queries can be of three forms: – Choose from a list of parameters – Query by example (QBE) – Structured Query Language (SQL)

Web Query

•  Most databases have a web-‐based query tool

•  It may be simple…

… or complex

Query Languages •  The standard

– SQL (Structured Query Language) originally called SEQUEL (Structured English QUEry Language)

– Developed by IBM in 1974; introduced commercially in 1979 by Oracle Corp.

– Standard interac7ve and programming language for geung informa7on from and upda7ng a database.

– RDMS (SQL), ODBMS (Java, C++, OQL etc)

Database Searching A database can only be searched in ways that it was designed to be searched

Boolean: "AND" and "OR" searches

Bad to search for "human hemoglobin" in a 'Descrip-on' field

Much be`er to search for "homo sapiens in 'Organism' AND "HBB" in 'gene name'

Strategies

•  Use accession numbers whenever possible •  Start with broad keywords and narrow the

search using more specific terms •  Try variants of spelling, numbers, etc. •  Search all relevant databases

• Be persistent!!

Data versus metadata (annota7on)

Primary vs derived data

Heterogeneity in data (Scien7fic data domains)

Genome Ontology •  Biology is a messy science

•  Assortment of names, mutants, odd phenotypes –  “sonic hedgehog”

•  Genome Ontology – Molecular func7on (specific tasks) – Biological process (broad biological goal) – Cellular component (loca7on)

GiGo: Data Quality Ma`ers

AATDB, AceDb, ACUTS, ADB, AFDB, AGIS, AMSdb, !

ARR, AsDb, BBDB, BCGD, Beanref, Biolmage,!BioMagResBank, BIOMDB, BLOCKS, BovGBASE,!

BOVMAP, BSORF, BTKbase, CANSITE, CarbBank,!CARBHYD, CATH, CAZY, CCDC, CD4OLbase, CGAP,!

ChickGBASE, Colibri, COPE, CottonDB, CSNDB, CUTG,!CyanoBase, dbCFC, dbEST, dbSTS, DDBJ, DGP, DictyDb,!Picty_cDB, DIP, DOGS, DOMO, DPD, DPlnteract, ECDC,!ECGC, EC02DBASE, EcoCyc, EcoGene, EMBL, EMD db,!ENZYME, EPD, EpoDB, ESTHER, FlyBase, FlyView,!

GCRDB, GDB, GENATLAS, Genbank, GeneCards,!Genline, GenLink, GENOTK, GenProtEC, GIFTS,!

GPCRDB, GRAP, GRBase, gRNAsdb, GRR, GSDB,!HAEMB, HAMSTERS, HEART-2DPAGE, HEXAdb, HGMD,!

HIDB, HIDC, HlVdb, HotMolecBase, HOVERGEN, HPDB,!HSC-2DPAGE, ICN, ICTVDB, IL2RGbase, IMGT, Kabat,!

KDNA, KEGG, Klotho, LGIC, MAD, MaizeDb, MDB,!Medline, Mendel, MEROPS, MGDB, MGI, MHCPEP5!

Micado, MitoDat, MITOMAP, MJDB, MmtDB, Mol-R-Us,!MPDB, MRR, MutBase, MycDB, NDB, NRSub, 0-lycBase,!OMIA, OMIM, OPD, ORDB, OWL, PAHdb, PatBase, PDB,!PDD, Pfam, PhosphoBase, PigBASE, PIR, PKR, PMD,!

PPDB, PRESAGE, PRINTS, ProDom, Prolysis, PROSITE,!PROTOMAP, RatMAP, RDP, REBASE, RGP, SBASE,!

SCOP, SeqAnaiRef, SGD, SGP, SheepMap, Soybase,!SPAD, SRNA db, SRPDB, STACK, StyGene,Sub2D,!

SubtiList, SWISS-2DPAGE, SWISS-3DIMAGE, SWISS-!MODEL Repository, SWISS-PROT, TelDB, TGN, tmRDB,!TOPS, TRANSFAC, TRR, UniGene, URNADB, V BASE,!VDRR, VectorDB, WDCM, WIT, WormPep, YEPD, YPD,!

YPM, etc .................. !!!!!!

Some Biological databases …

Some sta7s7cs •  More than 1000 different databases •  Generally accessible through the web (useful link: www.expasy.ch/alinks.html) •  Variable size: <100Kb to >10Gb

–  DNA: > 10 Gb –  Protein: 1 Gb –  3D structure: 5 Gb –  Other: smaller

•  Update frequency: daily to annually

NAR Database Issue

•  Online collec7on of biological databases: h`p://www.oxfordjournals.org/nar/database/c/

GenBank

DDBJ EMBL

Entrez

getentry

NIG CIB EBI

• Submissions • Updates

Public Sequence Databases Same sequence information in all three, but different tools for searching and retrieval

GenBank •  Contains all DNA and protein sequences described in the scien7fic literature or collected in publicly funded research

•  Fla{ile: Composed en7rely of text •  Each submi`ed sequence is a record •  Had fields for Organism, Date, Author, etc. •  Unique iden7fier for each sequence

– Locus and Accession #

Growth of Genbank

GenBank Flat File (GBFF) LOCUS MUSNGH 1803 bp mRNA ROD 29-AUG-1997 DEFINITION Mouse neuroblastoma and rat glioma hybridoma cell line NG108-15 cell TA20 mRNA, complete cds. ACCESSION D25291 NID g1850791 KEYWORDS neurite extension activity; growth arrest; TA20. SOURCE Murinae gen. sp. mouse neuroblastma-rat glioma hybridoma cell_line:NG108-15 cDNA to mRNA. ORGANISM Murinae gen. sp. Eukaryotae; mitochondrial eukaryotes; Metazoa; Chordata; Vertebrata; Mammalia; Eutheria; Rodentia; Sciurognathi; Muridae; Murinae. REFERENCE 1 (sites) AUTHORS Tohda,C., Nagai,S., Tohda,M. and Nomura,Y. TITLE A novel factor, TA20, involved in neuronal differentiation: cDNA cloning and expression JOURNAL Neurosci. Res. 23 (1), 21-27 (1995) MEDLINE 96064354 REFERENCE 3 (bases 1 to 1803) AUTHORS Tohda,C. TITLE Direct Submission JOURNAL Submitted (18-NOV-1993) to the DDBJ/EMBL/GenBank databases. Chihiro Tohda, Toyama Medical and Pharmaceutical University, Research Institute for Wakan-yaku, Analytical Research Center for Ethnomedicines; 2630 Sugitani, Toyama, Toyama 930-01, Japan (E-mail:CHIHIRO@ms.toyama-mpu.ac.jp, Tel:+81-764-34-2281(ex.2841), Fax:+81-764-34-5057) COMMENT On Feb 26, 1997 this sequence version replaced gi:793764. FEATURES Location/Qualifiers source 1..1803 /organism="Murinae gen. sp." /note="source origin of sequence, either mouse or rat, has not been identified" /db_xref="taxon:39108" /cell_line="NG108-15" /cell_type="mouse neuroblastma-rat glioma hybridoma" misc_signal 156..163 /note="AP-2 binding site" GC_signal 647..655 /note="Sp1 binding site" TATA_signal 694..701 gene 748..1311 /gene="TA20" CDS 748..1311 /gene="TA20" /function="neurite extensiion activity and growth arrest effect" /codon_start=1 /db_xref="PID:d1005516" /db_xref="PID:g793765" /translation="MMKLWVPSRSLPNSPNHYRSFLSHTLHIRYNNSLFISNTHLSRR KLRVTNPIYTRKRSLNIFYLLIPSCRTRLILWIIYIYRNLKHWSTSTVRSHSHSIYRL RPSMRTNIILRCHSYYKPPISHPIYWNNPSRMNLRGLLSRQSHLDPILRFPLHLTIYY RGPSNRSPPLPPRNRIKQPNRIKLRCR" polyA_site 1803 BASE COUNT 507 a 458 c 311 g 527 t ORIGIN 1 tcagtttttt tttttttttt tttttttttt tttttttttt tttttttttg ttgattcatg 61 tccgtttaca tttggtaagt tcacaggcct cagtcaacac aattggactg ctcaggaaat 121 cctccttggt gaccgcagta tacttggcct atgaacccaa gccacctatg gctaggtagg 181 agaagctcaa ctgtagggct gactttggaa gagaatgcac atggctgtat cgacatttca 241 catggtggac ctctggccag agtcagcagg ccgagggttc tcttccgggc tgctccctca 301 ctgcttgact ctgcgtcagt gcgtccatac tgtgggcgga cgttattgct atttgccttc 361 cattctgtac ggcattgcct ccatttagct ggagagggac agagcctggt tctctagggc 421 gtttccattg gggcctggtg acaatccaaa agatgagggc tccaaacacc agaatcagaa 481 ggcccagcgt atttgtaaaa acaccttctg gtgggaatga atggtacagg ggcgtttcag 541 gacaaagaac agcttttctg tcactcccat gagaaccgtc gcaatcactg ttccgaagag 601 gaggagtcca gaatacacgt gtatgggcat gacgattgcc cggagagagg cggagcccat 661 ggaagcagaa agacgaaaaa cacacccatt atttaaaatt attaaccact cattcattga 721 cctacctgcc ccatccaaca tttcatcatg atgaaacttt gggtcccttc taggagtctg 781 cctaatagtc caaatcatta caggtctttt cttagccata cactacacat cagatacaat 841 aacagccttt tcatcagtaa cacacatttg tcgagacgta aattacgggt gactaatccg 901 atatatacac gcaaacggag cctcaatatt ttttatttgc ttattccttc atgtcggacg 961 aggcttatat tatggatcat atacatttat agaaacctga aacattggag tacttctact 1021 gttcgcagtc atagccacag catttatagg ctacgtcctt ccatgaggac aaatatcatt 1081 ctgaggtgcc acagttatta caaacctcct atcagccatc ccatatattg gaacaaccct 1141 agtcgaatga atttgagggg gcttctcagt agacaaagcc accttgaccc gattcttcgc 1201 tttccacttc atcttaccat ttattatcgc ggccctagca atcgttcacc tcctcttcct 1261 ccacgaaaca ggatcaaaca acccaacagg attaaactca gatgcagata aaattccatt 1321 tcacccctac tatacatcaa agatatccta ggtatcctaa tcatattctt aattctcata 1381 accctagtat tatttttccc agacatacta ggagacccag acaactacat accagctaat 1441 ccactaaaca ccccacccca tattaaaccc gaatgatatt tcctatttgc atacgccatt 1501 ctacgctcaa tccccaataa actaggaggt gtcctagcct taatcttatc tatcctaatt 1561 ttagccctaa tacctttcct tcatacctca aagcaacgaa gcctaatatt ccgcccaatc 1621 acacaaattt tgtactgaat cctagtagcc aacctactta tcttaacctg aattgggggc 1681 caaccagtag acacccattt attatcattg gccaactagc ctccatctca tacttctcaa 1741 tcatcttaat tcttatacca atctcaggaa ttatcgaaga caaaatacta aaattatatc 1801 cat //

Features (AA seq)

DNA Sequence

Header • Title • Taxonomy • Citation

Fields

h`p://www.ncbi.nlm.nih.gov/Genbank

•  Once upon a time, GenBank mailed out sequences on CD-ROM disks a few times per year.

•  At least doubles in size every 18 months

•  There are approximately 130,671,233,801 bases, from 142,284,608 reported sequences in the tradi7onal GenBank divisions as of August 2011.

Distribu7on of sequence databases

•  Books, ar7cles 1968 -‐> 1985 •  Computer tapes 1982 -‐>1992 •  Floppy disks 1984 -‐> 1990 •  CD-‐ROM 1989 -‐> ? •  FTP 1989 -‐> ? •  On-‐line services 1982 -‐> 1994 •  WWW 1993 -‐> ? •  DVD 2001 -‐> ? •  Mailing hard drives 2009 -‐> ?

•  Many sequences in GenBank correspond to the same gene

•  genomic clones, full length mRNA, various kinds of ESTs, submi`ed by different inves7gators

•  RefSeq is the “Reference Sequence” for a gene -‐ as determined by GenBank curators –  best guess given the current evidence, can change –  usually based on the longest mRNA –  usually has both 5’ and 3’ UTR

•  Not necessarily reliable –  A lot is not yet known… eg, alterna7ve splicing

Last thoughts on Genbank ...

•  Ohen only use FASTA files (eg for BLAST) •  GBFF are simply human readable versions of these records

•  GBFF have become a vehicle for a lot more informa7on than they where meant to do

•  Keep in mind that GenBank is DNA centric and is a poor vehicle for protein and mRNA expression/interac7on informa7on

Many Datasets at NCBI •  The NCBI hosts a huge interconnected database system that, in addi7on to DNA and protein, includes: –  Journal Ar7cles (PubMed) – Gene7c Diseases (OMIM) – Polymorphisms (dbSNP) – Gene Expression (GEO) – Cytogene7cs (CGH/SKY/FISH & CGAP) – Taxonomy – Chemistry (PubChem)

Ensembl at EBI/EMBL

http://genome.cshlp.org/content/14/5/971.full

KEGG: Kyoto Encylopedia of

Genes and Genomes •  Enzyma7c and regulatory pathways •  Mapped out by EC number and cross-‐referenced to genes in all known organisms (wherever sequence informa7on exits)

•  Parallel maps of regulatory pathways

http://www.wwpdb.org

Golden Rules

•  Use published databases and methods – Supported, maintained, trusted by community

•  Document what you have done !!! – Sequence iden7fica7on numbers – Server, database, program VERSION – Program parameters

•  Assess reliability of results

Bio-‐databases: A short word on problems

•  Even today we face some key limita7ons –  There is no single standard format

•  Every database or program has its own format

–  There is no standard nomenclature •  Every database has its own names

–  Data is not fully op7mized •  Some datasets have missing informa7on without indica7ons of it

–  Data errors •  Data is some7mes of poor quality, erroneous, misspelled •  Error propaga7on resul7ng from computer annota7on

What to take home •  Databases are a collec7on of data

–  Need to access and maintain easily and flexibly

•  Biological informa7on is vast and some7mes very redundant

•  Computers can only create data, they do not give answers

•  Learn to use the big reliable databases (e.g. NCBI)

•  Open access to sequences is not only essen7al for all of the work we do, if it was not there, there would be no bioinforma7cs, no BLAST, no Computa7onal Bioscience Program

•  Open access to sequence informa7on is not all that needs to be open. We also need open access to the literature.

http://mibiol.biol.lu.se.webbhotell.ldc.lu.se/Bioinformatics/Exercises/databases.html

http://wiki.bio.dtu.dk/teaching/index.php/Exercise:_Searching_the_GenBank_database

http://biocourse.sanbi.ac.za/wp-content/uploads/2013/02/Biological-Databases-Exercises.pdf

RECOMMENDED EXERCISES

Biological Databases - Computational Bioscience Program at ...compbio.ucdenver.edu/77112015/Dowell...

Documents

Transcript of Biological Databases - Computational Bioscience Program at ...compbio.ucdenver.edu/77112015/Dowell...

Mario Russo - McConnell Dowell

Technical Report, 'Dowell Chemical Seal Ring and Dowell ... · is available from Dowell-Schlumberger service ... py~ ~th sea Temperature and pressure changes during cementing ...

Copyright 2020, Lande Dowell

David H. Dowell dowell@slac.stanford.edu RF Gun Commissioning LCLS FAC April 16, 2007 1 RF Gun Commissioning Experience David H. Dowell for the LCLS Injector.

Digital Life_Mary Dowell

LCLS Gun Status & Plans - slac.stanford.edu · David H. Dowell dowell@slac.stanford.edu RF Gun Operations Talk November 15, 2006 1 LCLS Gun Status & Plans David H. Dowell Operations

DOWELL DENTAL GROUP

BioScience Trends. 2021; 15(1):41-49. BioScience Trends ...

Nina Dowell

Mark Dowell EC/JRC

Get Noticed. - The Dowell Groupthedowellgroupre.com/.../Dowell-Team-Seller-Presentation-Updated-2… · flfiff˙flˆ ˇ ˘ DANIELLE DOWELL 312.391.5655 cell ddowell@danielledowell.com

Dowell radar DA 28nov12 - CPAESS

AUTHOR UERY FORM - Computational Bioscience …compbio.ucdenver.edu/Hunter_lab/Cohen/17014_10006.pdf · AUTHOR UERY FORM Book: Chapter: 6 ... 6.10 Conclusion 34 ... this PDF is an

Coiled Tubing Dowell Handbook

Machine Learning - Computational Bioscience at the School ...compbio.ucdenver.edu/7711 2013/Leach_CPBS7711_10012013... · Machine Learning • From Wikipedia: ... to data space vector

Dowell Membership Forms Sept14

Insights from the MACE Operations Survey LARRY DOWELL, DOWELL MANAGEMENT 1.

Dowell Road Widening

Mc Dowell Project Gopal

mc dowell MTN pavedTRAILS GREY