Get'a'good'text'editor' 'Week'1,'Lecture'2' white

6
8/30/12 1 2012 ( BMMB 597D: Analyzing Next Genera>on Sequencing Data Week 1, Lecture 2 István Albert Biochemistry and Molecular Biology and Bioinforma>cs Consul>ng Center Penn State Get a good text editor Desired features: syntax highligh3ng, line numbering, ability to view white space Komodo Edit Sublime Text TextMate There are many other op>ons. Download the data for the lecture The url sent out via email (also on the course webpage) hYp://downloads.yeastgenome.org/cura>on/chromosomal_feature/saccharomyces_cerevisiae.gff Biological file formats Each file format represents 1. Informa3on – types of knowledge that are stored in the file 2. Op3miza3on – types of opera>ons that are easy/efficient to perform The above implies that some informa>on may not be present or cannot be easily extracted from a certain file format.

Transcript of Get'a'good'text'editor' 'Week'1,'Lecture'2' white

Page 1: Get'a'good'text'editor' 'Week'1,'Lecture'2' white

8/30/12'

1'

2012'('BMMB'597D:'Analyzing'Next'Genera>on'Sequencing'Data'

''Week'1,'Lecture'2'

István'Albert''

Biochemistry'and'Molecular'Biology''and'Bioinforma>cs'Consul>ng'Center'

'Penn'State'

Get'a'good'text'editor'

Desired'features:'syntax'highligh3ng,'line'numbering,'ability'to'view'white'space''•  Komodo'Edit'•  Sublime'Text'•  TextMate''

There'are'many'other'op>ons.''

Download'the'data'for'the'lecture'

The'url'sent'out'via'email'(also'on'the'course'webpage)''

hYp://downloads.yeastgenome.org/cura>on/chromosomal_feature/saccharomyces_cerevisiae.gff'''

Biological'file'formats'

Each'file'format'represents' '

1.   Informa3on'–'types'of'knowledge'that'are' stored'in'the'file ''

2.   Op3miza3on'–''types'of'opera>ons'that'are'easy/efficient'to'perform'

The'above'implies'that'some'informa>on'may'not'be'present'or'cannot'be'easily'extracted'from'a'certain'file'format. '

Page 2: Get'a'good'text'editor' 'Week'1,'Lecture'2' white

8/30/12'

2'

Tabular'formats'

•  Many'common'bioinforma>cs'data'formats'are'column'based'and'tab(separated''

•  First'format'we'deal'with'will'be'the''

GFF3 '–'Generic''Feature''Format'

(search'for'GFF3'to'see'the'specifica>on'for'version'3 )''

hYp://www.sequenceontology.org/gff3.shtml' '

The'GFF3'specifica>on'

GFF'format'Search'for'GFF3'!'hYp://www.sequenceontology.org/gff3.shtml'

Tab'separated'with'9'columns.'Missing'aYributes'may'be'replaced'with'a''dot'!'.'

1.   Seqid'''''''''''(usually'chromosome)'2.   Source'''''''''(where'is'the'data'coming'from)'3.   Type'''''''''''''(usually'a'term'from'the'sequence'ontology)'4.   Start'''''''''''''(interval'start'rela>ve'to'the'seqid)'5.   End'''''''''''''''(interval'end'rela>ve'to'the'seqid)'6.   Score''''''''''''(the'score'of'the'feature,'a'floa>ng'point'number)'7.   Strand''''''''''(+/(/.)'8.   Phase'''''''''''(used'to'indicate'reading'frame'for'coding'sequences)'9.   APributes''''(semicolon'separated'aYributes'!'Name=ABC;ID=1)'

Example'aYribute'specifica>on:'name=REB1;id=YP33546

Variants'of'GFF'–'GTF'2 ''

GTF'2'–'Gene'Transfer'Format' same'9'columns'as'the'GFF''

hPp://mblab.wustl.edu/GTF2 .html'

Differences''1.  Only'a'subset'of'types'are'allowed'in'column'3:'CDS, start_codon, stop_codon a nd'a'

few'more''

2.  AYribute'column'format'change,'key'values'are'separated'by'space'and'not'semicolon'=''3.  Two'mandatory'aYributes'at'the'end'of'the'record:'

'•  gene_id'value;'''''A'globally'unique'iden>fier'for'the'genomic'source'of'the'transcript'

'•  transcript_id'value;'''''A'globally'unique'iden>fier'for'the'predicted'transcript.'

'Example'aYribute'specifica>on:'name “REB1”; id “YP33546”'

Page 3: Get'a'good'text'editor' 'Week'1,'Lecture'2' white

8/30/12'

3'

What'do'the'terms'mean?' Sequence'ontology'browser'

Searching'for''

X_element_combinatorial_repeat''

Unix'commands'in'this'lecture'

'•  wc, cat, head, tail, sort, cut, grep, more, clear

Handy'Tips''

CTRL(C'!'interrupts'any'process'that'may'be'running''

clear'!'clears'the'screen''

'cursor'keys'allow'you'to'recall'past'commands'''

'auto(complete'!'write'part'of'the'filename'then'press'TAB '

Page 4: Get'a'good'text'editor' 'Week'1,'Lecture'2' white

8/30/12'

4'

Inves>gate'your'data' Check'head/tail'of'the'file'

Paging'data'with:'less'(more)'

•  q'or'ESC'!'quits'the'pager'

•  SPACE'or'f'!'go'forward,'next'page'

•  b'!'go'backward'

•  /'word'!'search'for'a'word'''

•  /'!'repeats'the'search'for'the'last'word'

Find'paYerns'in'the'file'

Page 5: Get'a'good'text'editor' 'Week'1,'Lecture'2' white

8/30/12'

5'

Connec>ng'streams'

•  Input'streams:'entry'from'the'keyboard'or''files'

•  Output'streams:'print'on'screen,'into'files'

Stream'redirec>on'the'symbols'of'“arrows”'<,'>''

Input'stream'redirec>on'from'file:''<'filename'Output'stream'redirec>on'to'a'file:'>'filename''

Redirec>ng'to'a'file''creates/overwrites'that'file'

Piping'streams'

•  The'pipe'character''|'channels'the'output'of'one'command'into'the'other'

'(located'above'the'ENTER'key)'

'

You'can'pipe'mul>ple'commands'together'

Piping'commands'

Page 6: Get'a'good'text'editor' 'Week'1,'Lecture'2' white

8/30/12'

6'

Isola>ng'relevant'parts'of'our'file' How'many'of'each'elements'

Find'out'how'many'of'each'features' Homework'2'

•  Create'a'file'that'lists'all'possible'ontology'terms'that'are'present'in'the'provided'GFF'file'with'a'count'of'how'many'>mes'this'element'occurs'in'the'yeast'genome.'Sort'this'file'by'this'count'in'reverse'order'(hint:'man'sort)'

•  Pick'an'ontology'term'that'is'unfamiliar'to'you'and'look'it'up'in'the'Sequence'Ontology,'paste'the'explana>on'into'the'homework'