Lecture 2 · 2013. 11. 15. · Shotgun Sequencing In the last lecture: we can read fragments up to...

59
Introduction to Bioinformatics for Computer Scientists Lecture 2

Transcript of Lecture 2 · 2013. 11. 15. · Shotgun Sequencing In the last lecture: we can read fragments up to...

Page 1: Lecture 2 · 2013. 11. 15. · Shotgun Sequencing In the last lecture: we can read fragments up to a length of approx. 1000 bp → 1000 bp correspond roughly to the length of an average

Introduction to Bioinformatics for Computer Scientists

Lecture 2

Page 2: Lecture 2 · 2013. 11. 15. · Shotgun Sequencing In the last lecture: we can read fragments up to a length of approx. 1000 bp → 1000 bp correspond roughly to the length of an average

Preliminaries

● Knowledge test → I didn't expect you to know all that stuff

● Oral exam

→ We can do it in German if you like

● Summer seminar: Hot topics in Bioinformatics

→ I need to think whether talks can be in German as well

● Please send me an email such that I can set up a course mailing list

● Email: [email protected]

● Email list

● Slides at: www.exelixis-lab.org

Page 3: Lecture 2 · 2013. 11. 15. · Shotgun Sequencing In the last lecture: we can read fragments up to a length of approx. 1000 bp → 1000 bp correspond roughly to the length of an average

Preliminaries II

● Who has heard about suffix trees before in another lecture?

● There are some lectures about this in the course taught by Dr. Johannes Fischer (chair of P. Sanders)

Page 4: Lecture 2 · 2013. 11. 15. · Shotgun Sequencing In the last lecture: we can read fragments up to a length of approx. 1000 bp → 1000 bp correspond roughly to the length of an average

Wishes in Questionnaire

● Protein folding → we won't do this ● Answers to the biological questions → today● Answers to CS questions → at some later point● Course will contain a mix of algorithmic and

HPC aspects

Page 5: Lecture 2 · 2013. 11. 15. · Shotgun Sequencing In the last lecture: we can read fragments up to a length of approx. 1000 bp → 1000 bp correspond roughly to the length of an average

Last lecture

● Sequence data/sequence

● Nucleotide/base-pair

● DNA/RNA

● Ambiguity coding

● Generation time

● Sequencing

● Sanger Sequencing

● Next Generation Sequencing

● Genome

● Transcriptome

● Model Organism

● Double-stranded DNA

● Chromosome

● Coding versus non-coding DNA

● Protein data

Page 6: Lecture 2 · 2013. 11. 15. · Shotgun Sequencing In the last lecture: we can read fragments up to a length of approx. 1000 bp → 1000 bp correspond roughly to the length of an average

Today's outline

● More terminology & background ● Q & A session with Paschalia toward the end of

the lecture

Page 7: Lecture 2 · 2013. 11. 15. · Shotgun Sequencing In the last lecture: we can read fragments up to a length of approx. 1000 bp → 1000 bp correspond roughly to the length of an average

Shotgun Sequencing

Page 8: Lecture 2 · 2013. 11. 15. · Shotgun Sequencing In the last lecture: we can read fragments up to a length of approx. 1000 bp → 1000 bp correspond roughly to the length of an average

Shotgun Sequencing

● In the last lecture: we can read fragments up to a length of approx. 1000 bp

→ 1000 bp correspond roughly to the length of an average gene

● What do we do for reading genomes?

1) Break up genome randomly into fragments

2) Read fragments

3) Assemble fragments into a genome with computers

● Important terms & parameters:● Coverage: how many fragments/reads contain one nucleotide on the genome

A A A G G G

A A A G G T T

A A G G C

T T T T

1 2 3 3 3 3 3 2 1 1Coverage

Genome

Page 9: Lecture 2 · 2013. 11. 15. · Shotgun Sequencing In the last lecture: we can read fragments up to a length of approx. 1000 bp → 1000 bp correspond roughly to the length of an average

Shotgun Sequencing

● In the last lecture: we can read fragments up to a length of approx. 1000 bp

● What do we do for reading genomes?

1) Break up genome randomly into fragments

2) Read fragments

3) Assemble fragments into a genome with computers

● Important terms & parameters:● Coverage● Fragment length● Paired-end reads● De novo versus by reference assembly

Page 10: Lecture 2 · 2013. 11. 15. · Shotgun Sequencing In the last lecture: we can read fragments up to a length of approx. 1000 bp → 1000 bp correspond roughly to the length of an average

Shotgun SequencingThis is a simplistic view,

omitting many technical (lab) details

Page 11: Lecture 2 · 2013. 11. 15. · Shotgun Sequencing In the last lecture: we can read fragments up to a length of approx. 1000 bp → 1000 bp correspond roughly to the length of an average

Shotgun Sequencing

The length, coverage, and other properties of the fragments are important for designing

assembly algorithms!

Page 12: Lecture 2 · 2013. 11. 15. · Shotgun Sequencing In the last lecture: we can read fragments up to a length of approx. 1000 bp → 1000 bp correspond roughly to the length of an average

De novo versus by reference assembly

● There are two ways to conduct assemblies

● By reference: we want to assemble the genome of species X

→ there is a closely related species Y whose genome is already available

→ map reads of X to genome of Y to assemble them

→ also known as read mapping

Genome of Y

Reads of XBest match for each readof X on Y

Page 13: Lecture 2 · 2013. 11. 15. · Shotgun Sequencing In the last lecture: we can read fragments up to a length of approx. 1000 bp → 1000 bp correspond roughly to the length of an average

De novo versus by reference assembly

● There are two ways to conduct assemblies

● De novo: we want to assemble the genome of species X

→ there is a no closely related species of X whose genome is already available

→ assemble genome out of read soup

→ computational problem is much harder, in particular when reads are short

Genome of X

Read soup

Page 14: Lecture 2 · 2013. 11. 15. · Shotgun Sequencing In the last lecture: we can read fragments up to a length of approx. 1000 bp → 1000 bp correspond roughly to the length of an average

Paired-end Reads

● Two DNA fragments at either end of the sequence read

● AAAGGGTTT-------------TTTTTTAAAGGC● We know the distance between fragments

denoted by “-” here which is 13● This is the same for all paired-end reads

→ contains additional info

→ makes assembly process easier

Page 15: Lecture 2 · 2013. 11. 15. · Shotgun Sequencing In the last lecture: we can read fragments up to a length of approx. 1000 bp → 1000 bp correspond roughly to the length of an average

Back to DNA

● DNA encodes● Protein information● RNA information

● DNA is also know as the blueprint of life● In a cell, the DNA is organized in long molecules

called Chromosomes● Keep in mind

● Some parts of the DNA are coding ● Some parts are non-coding

Page 16: Lecture 2 · 2013. 11. 15. · Shotgun Sequencing In the last lecture: we can read fragments up to a length of approx. 1000 bp → 1000 bp correspond roughly to the length of an average

What's a gene?

● The coding parts of the DNA● Each gene (a contiguous string of DNA)

encodes for● Either RNA● Or a protein

Page 17: Lecture 2 · 2013. 11. 15. · Shotgun Sequencing In the last lecture: we can read fragments up to a length of approx. 1000 bp → 1000 bp correspond roughly to the length of an average

RNA & Protein sequences

● In RNA we just replace character T by U

● Protein data has a 20 letter alphabet!

● 3 DNA/RNA characters encode for one protein character!

● We call such a triplet of DNA/RNA characters a Codon!

● With 4 DNA/RNA characters we could encode for 4 * 4 * 4 = 64 characters

● … but we only have 20!

● There are some redundancies and other special cases

Page 18: Lecture 2 · 2013. 11. 15. · Shotgun Sequencing In the last lecture: we can read fragments up to a length of approx. 1000 bp → 1000 bp correspond roughly to the length of an average

Protein Alphabet

Protein characters CodonsCompressed representation,using DNA IUPAC ambiguous characterencoding we saw last time

Page 19: Lecture 2 · 2013. 11. 15. · Shotgun Sequencing In the last lecture: we can read fragments up to a length of approx. 1000 bp → 1000 bp correspond roughly to the length of an average

Protein Alphabet meaningAA['A'] = 0; /* alanine */ meaningAA['R'] = 1; /* arginine */ meaningAA['N'] = 2; /* asparagine*/ meaningAA['D'] = 3; /* aspartic acid*/ meaningAA['C'] = 4; /* cysteine */ meaningAA['Q'] = 5; /* glutamine */ meaningAA['E'] = 6; /* glutamic acid*/ meaningAA['G'] = 7; /* glycine */ meaningAA['H'] = 8; /* histidine */ meaningAA['I'] = 9; /* isoleucine */ meaningAA['L'] = 10; /* leucine */ meaningAA['K'] = 11; /* lysine */ meaningAA['M'] = 12; /* methionine */ meaningAA['F'] = 13; /* phenylalanine */ meaningAA['P'] = 14; /* proline */ meaningAA['S'] = 15; /* serine */ meaningAA['T'] = 16; /* threonine */ meaningAA['W'] = 17; /* tryptophan */ meaningAA['Y'] = 18; /* tyrosine */ meaningAA['V'] = 19; /* valine */

meaningAA['B'] = 20; /* asparagine, aspartic 2 and 3*/ meaningAA['Z'] = 21; /* glutamine glutamic 5 and 6*/

From theRAxML sourcecode:

Ambiguouscharacters

Page 20: Lecture 2 · 2013. 11. 15. · Shotgun Sequencing In the last lecture: we can read fragments up to a length of approx. 1000 bp → 1000 bp correspond roughly to the length of an average

Protein Alphabet

Note that mainly the third Codon position differs→ it is less vulnerable by mutations than the 1st and 2nd codonpositions

Page 21: Lecture 2 · 2013. 11. 15. · Shotgun Sequencing In the last lecture: we can read fragments up to a length of approx. 1000 bp → 1000 bp correspond roughly to the length of an average

Protein Evolution

● This redundancy plays a role in protein evolution

● We distinguish between

1) Synonymous substitutions/mutations (GCC → GCT ≡ Alanine → Alanine)

versus

2) Non-synonymous substitutions/mutations (GGT → GTT ≡ Glycine → Valine

Page 22: Lecture 2 · 2013. 11. 15. · Shotgun Sequencing In the last lecture: we can read fragments up to a length of approx. 1000 bp → 1000 bp correspond roughly to the length of an average

Tranlsating DNA ↔ Protein data

● DNA → Protein: not ambiguous, but redundant● Protein → DNA: ambiguous, several DNA

triplets encode for one Amino Acid● In bioinformatics we sometimes directly use the

Codons (triplets) instead of amino acids to use all information available!

● See for instance Codon evolution models later-on!

Page 23: Lecture 2 · 2013. 11. 15. · Shotgun Sequencing In the last lecture: we can read fragments up to a length of approx. 1000 bp → 1000 bp correspond roughly to the length of an average

Top-level viewChromosome: a long DNA molecule

Non-coding DNACoding DNA

Page 24: Lecture 2 · 2013. 11. 15. · Shotgun Sequencing In the last lecture: we can read fragments up to a length of approx. 1000 bp → 1000 bp correspond roughly to the length of an average

Top-level viewChromosome: a long DNA molecule

Genes

RNA RNA Protein RNA

Gene lengths vary: a typical gene is ≈1000 bp long

Page 25: Lecture 2 · 2013. 11. 15. · Shotgun Sequencing In the last lecture: we can read fragments up to a length of approx. 1000 bp → 1000 bp correspond roughly to the length of an average

Average Protein gene Lengths

Number ofProtein-coding genes

Protein sequence length → this is counted in # amino acid characters, not nucleotides, multiply by three to obtain DNA length!

Page 26: Lecture 2 · 2013. 11. 15. · Shotgun Sequencing In the last lecture: we can read fragments up to a length of approx. 1000 bp → 1000 bp correspond roughly to the length of an average

Average Protein gene Lengths

Number ofProtein-coding genes

Protein sequence length → this is counted in # amino acid characters, not nucleotides, multiply by three to obtain DNA length!

Logarithmic scale!

Page 27: Lecture 2 · 2013. 11. 15. · Shotgun Sequencing In the last lecture: we can read fragments up to a length of approx. 1000 bp → 1000 bp correspond roughly to the length of an average

Average Protein gene Lengths

Number ofProtein-coding genes

Protein sequence length → this is counted in # amino acid characters, not nucleotides, multiply by three to obtain DNA length!

Data for Caenorhabditis Elegans (C. Elegans)→ yet another model organism→ a roundworm

Page 28: Lecture 2 · 2013. 11. 15. · Shotgun Sequencing In the last lecture: we can read fragments up to a length of approx. 1000 bp → 1000 bp correspond roughly to the length of an average

Top-level view

How do we know where genes start?

RNA RNA Protein RNA

Page 29: Lecture 2 · 2013. 11. 15. · Shotgun Sequencing In the last lecture: we can read fragments up to a length of approx. 1000 bp → 1000 bp correspond roughly to the length of an average

Top-level view

How do we know where genes end?

RNA RNA Protein RNA

Page 30: Lecture 2 · 2013. 11. 15. · Shotgun Sequencing In the last lecture: we can read fragments up to a length of approx. 1000 bp → 1000 bp correspond roughly to the length of an average

Top-level view

Gene boundaries:→ special START/STOP Codons (DNA triplets)

RNA RNA Protein RNA

Page 31: Lecture 2 · 2013. 11. 15. · Shotgun Sequencing In the last lecture: we can read fragments up to a length of approx. 1000 bp → 1000 bp correspond roughly to the length of an average

All Codons

Now we have all 64

Page 32: Lecture 2 · 2013. 11. 15. · Shotgun Sequencing In the last lecture: we can read fragments up to a length of approx. 1000 bp → 1000 bp correspond roughly to the length of an average

Proteins

● What do they do?

● Structural proteins → tissue building blocks

● Enzymatic proteins → catalysts (steering/accelerating) of specific biochemical reactions in the body

● Examples: oxygen transport, immune defense, provide & store energy etc.

● Because there are many such processes we need many proteins

● Homo sapiens ≈ 20,000 proteins → number disputed

● Again: a protein is a sequence/string of amino acid characters

● Terminology: Instead of counting nucleotides/base pairs we count protein letters as residues

● Example: This protein string: AEFFQQP has 7 residues

Page 33: Lecture 2 · 2013. 11. 15. · Shotgun Sequencing In the last lecture: we can read fragments up to a length of approx. 1000 bp → 1000 bp correspond roughly to the length of an average

Protein Structure

Page 34: Lecture 2 · 2013. 11. 15. · Shotgun Sequencing In the last lecture: we can read fragments up to a length of approx. 1000 bp → 1000 bp correspond roughly to the length of an average

Role of Structure

● A protein does not only consist of a string of residues (called primary structure)

● A protein sequence also has:

1) Secondary

2) Tertiary

3) Quaternary

structure!

● The structure determines the function/effect of a protein

● One would like to predict the structure from the protein sequence (primary structure)

● Still a challenging problem

● We will not deal with this in our course though!

Page 35: Lecture 2 · 2013. 11. 15. · Shotgun Sequencing In the last lecture: we can read fragments up to a length of approx. 1000 bp → 1000 bp correspond roughly to the length of an average

Protein Structure Prediction

● Some protein structures are known → Crystallography

● Test programs on these● Contest: The Critical Assessment of protein

Structure Prediction www.predictioncenter.org ● Blind testing and benchmarking of programs

Page 36: Lecture 2 · 2013. 11. 15. · Shotgun Sequencing In the last lecture: we can read fragments up to a length of approx. 1000 bp → 1000 bp correspond roughly to the length of an average

Another challenging problem

● Can we predict the function of a gene and/or protein based on its sequence?

● It's generally known as gene function prediction● We will also omit this topic though

Page 37: Lecture 2 · 2013. 11. 15. · Shotgun Sequencing In the last lecture: we can read fragments up to a length of approx. 1000 bp → 1000 bp correspond roughly to the length of an average

3' and 5'

5'

3'

suga

rphospate

5'

3' 5'

3'AGTACG CGTACT

Page 38: Lecture 2 · 2013. 11. 15. · Shotgun Sequencing In the last lecture: we can read fragments up to a length of approx. 1000 bp → 1000 bp correspond roughly to the length of an average

Back to DNA again

● DNA comes in a double helix● A single string of DNA without the complement is

also called DNA strand● The bases A, C, G, T are connected via a backbone

molecule consisting of 5 carbon atoms labelled 1', 2',...,5'

● Backbone connections via the 3' and 5' units● Every DNA strand has a direction● By convention we write DNA sequences in the

direction from 5' → 3'

Page 39: Lecture 2 · 2013. 11. 15. · Shotgun Sequencing In the last lecture: we can read fragments up to a length of approx. 1000 bp → 1000 bp correspond roughly to the length of an average

Top-level view

→ Genes have a direction!→ depending on which strand of the double helics encodes the geneThey must be read from the correct side to be recognized!

RNA RNA Protein RNA

Page 40: Lecture 2 · 2013. 11. 15. · Shotgun Sequencing In the last lecture: we can read fragments up to a length of approx. 1000 bp → 1000 bp correspond roughly to the length of an average

The domains of lifeClassic paper: Woese C, Kandler O, Wheelis M (1990). "Towards a natural system of organisms: proposal for the domains Archaea, Bacteria, and Eucarya.". Proc Natl Acad Sci USA 87 (12): 4576–9

Page 41: Lecture 2 · 2013. 11. 15. · Shotgun Sequencing In the last lecture: we can read fragments up to a length of approx. 1000 bp → 1000 bp correspond roughly to the length of an average

The domains of life

Salty environmentsHot environments

???

Where is the common ancestor?

Page 42: Lecture 2 · 2013. 11. 15. · Shotgun Sequencing In the last lecture: we can read fragments up to a length of approx. 1000 bp → 1000 bp correspond roughly to the length of an average

The domains of life

Prokaryota: Cells without a nucleus,Mostly unicellular organisms

Eukaryota: organisms with a cellnucleus

Page 43: Lecture 2 · 2013. 11. 15. · Shotgun Sequencing In the last lecture: we can read fragments up to a length of approx. 1000 bp → 1000 bp correspond roughly to the length of an average

More about genes

● Prokaryot{es|a}: A gene encodes a protein or an RNA

● Eukaryot{es|a}: it's more complicated ● Not the entire gene sequence may encode for a protein, just parts of it● Within an eukaryotic gene we distinguish between

– Introns → not used in protein synthesis

– Exons → parts of the gene used for protein synthesis

● DNA data in Exons is called genomic DNA● DNA data in Introns is called complimentary DNA (cDNA)

● cDNA has some important applications in wet-lab DNA sequencing which we will not cover here!

Page 44: Lecture 2 · 2013. 11. 15. · Shotgun Sequencing In the last lecture: we can read fragments up to a length of approx. 1000 bp → 1000 bp correspond roughly to the length of an average

What does RNA do?

● As we already know RNA is similar to DNA● There are some chemical differences● RNA does not form a double-stranded Helix● DNA stores information● Like proteins RNA performs different functions in

the Cell● An Analogy:

● DNA is something like the hard disk● RNA and proteins are processing elements

Page 45: Lecture 2 · 2013. 11. 15. · Shotgun Sequencing In the last lecture: we can read fragments up to a length of approx. 1000 bp → 1000 bp correspond roughly to the length of an average

An overview

Page 46: Lecture 2 · 2013. 11. 15. · Shotgun Sequencing In the last lecture: we can read fragments up to a length of approx. 1000 bp → 1000 bp correspond roughly to the length of an average

RNA

● RNA is involved in the process of DNA Transcription

● RNA is a copy of a coding DNA strand (a gene)● And involved in the process of Transcription to

construct either:

1) A protein: DNA → RNA → Protein

This is called translation (coding RNA)

2) A non-coding RNA: DNA → RNA that has some other direct function in the cell

Page 47: Lecture 2 · 2013. 11. 15. · Shotgun Sequencing In the last lecture: we can read fragments up to a length of approx. 1000 bp → 1000 bp correspond roughly to the length of an average

RNA Splicing Eukaryota

Gene

Exon 1 Intron 1 Exon 2 Intron 2 Exon 3

Exon 1 Intron 1 Exon 2 Intron 2 Exon 3

Transcription

DNA

RNA

Exon 1 Exon 2 Exon 3

RNA splicing

Messenger RNA

Protein

Translation (Protein Synthesis)

Recycled in Nucleus

Page 48: Lecture 2 · 2013. 11. 15. · Shotgun Sequencing In the last lecture: we can read fragments up to a length of approx. 1000 bp → 1000 bp correspond roughly to the length of an average

Eukaryotic RNA

● Remember: Not the entire gene sequence may be transcribed/used

● Introns → not used● Exons → used● Introns are spliced out (“ausgestossen”) from

the RNA strand (corresponding to the full gene), after transcription

Page 49: Lecture 2 · 2013. 11. 15. · Shotgun Sequencing In the last lecture: we can read fragments up to a length of approx. 1000 bp → 1000 bp correspond roughly to the length of an average

Alternative Splicing

Gene

Exon 1 Intron 1 Exon 2 Intron 2 Exon 3

Exon 1 Intron 1 Exon 2 Intron 2 Exon 3

Transcription

DNA

RNA

Exon 1 Exon 3

Alternative RNA splicing

Messenger RNA

Protein A

Translation (Protein Synthesis)

Recycled in Nucleus

Exon 1 Exon 2

Protein B

Page 50: Lecture 2 · 2013. 11. 15. · Shotgun Sequencing In the last lecture: we can read fragments up to a length of approx. 1000 bp → 1000 bp correspond roughly to the length of an average

Alternative Splicing

Gene

Exon 1 Intron 1 Exon 2 Intron 2 Exon 3

Exon 1 Intron 1 Exon 2 Intron 2 Exon 3

Transcription

DNA

RNA

Exon 1 Exon 3

Alternative RNA splicing

Messenger RNA

Protein A

Translation (Protein Synthesis)

Recycled in Nucleus

Exon 1 Exon 2

Protein B

Greatly increases the “coding power”of a gene!

Page 51: Lecture 2 · 2013. 11. 15. · Shotgun Sequencing In the last lecture: we can read fragments up to a length of approx. 1000 bp → 1000 bp correspond roughly to the length of an average

Types of RNA

● mRNA: messenger RNA used to synthesize a protein

→ transports RNA data to the ribosome for protein synthesis

● rRNA: ribosomal RNA

→ carries out the translation in the ribosome via catalysis

● tRNA: transfer RNA

→ brings in the amino acids

Page 52: Lecture 2 · 2013. 11. 15. · Shotgun Sequencing In the last lecture: we can read fragments up to a length of approx. 1000 bp → 1000 bp correspond roughly to the length of an average

The importance of ribosomal RNA

● Different species do not have the same set of genes

● Only few genes are common to all species● The rRNA is such a gene● The most well-known gene is the 16S gene● Therefore, it can be used to infer evolutionary

relationships among all species

Page 53: Lecture 2 · 2013. 11. 15. · Shotgun Sequencing In the last lecture: we can read fragments up to a length of approx. 1000 bp → 1000 bp correspond roughly to the length of an average

RNA secondary structure

● RNA is a single-stranded sequence!

● Secondary structure has an influence on the function of the molecule

● There's also a tertiary structure

Stem:complimentaryBases bindA ↔ UC ↔ G

Loops: no Matching bases

Page 54: Lecture 2 · 2013. 11. 15. · Shotgun Sequencing In the last lecture: we can read fragments up to a length of approx. 1000 bp → 1000 bp correspond roughly to the length of an average

RNA Secondary Structure

● Importance for evolution of RNA

→ matching bases in a stem can not mutate independently from each other

● Research on predicting secondary structure from plain RNA sequence

Page 55: Lecture 2 · 2013. 11. 15. · Shotgun Sequencing In the last lecture: we can read fragments up to a length of approx. 1000 bp → 1000 bp correspond roughly to the length of an average

The Synthesis Process

Page 56: Lecture 2 · 2013. 11. 15. · Shotgun Sequencing In the last lecture: we can read fragments up to a length of approx. 1000 bp → 1000 bp correspond roughly to the length of an average

Central Dogma of Molecular Biology

replication

TranslationTranscription

Reverse Transcription

DNA RNA Protein

Serves some functions mainly in Viruses1975 Nobel prize

Page 57: Lecture 2 · 2013. 11. 15. · Shotgun Sequencing In the last lecture: we can read fragments up to a length of approx. 1000 bp → 1000 bp correspond roughly to the length of an average

An overview

Page 58: Lecture 2 · 2013. 11. 15. · Shotgun Sequencing In the last lecture: we can read fragments up to a length of approx. 1000 bp → 1000 bp correspond roughly to the length of an average

Gene Order Rearrangements

● Many organisms have similar or identical genes

… but they ● may be arranged differently on the genome

● in a different direction

Page 59: Lecture 2 · 2013. 11. 15. · Shotgun Sequencing In the last lecture: we can read fragments up to a length of approx. 1000 bp → 1000 bp correspond roughly to the length of an average

The one and only paper by Bill Gates

● W. Gates, C. Papadimitriou: “Bounds for sorting by prefix reversal”, 1979.