REMINDERS

34
REMINDERS 2 nd Exam on Nov.17 Coverage: Central Dogma of DNA Replication Transcription Translation Cell structure and function Recombinant DNA technology and molecular biology Protein analysis

description

REMINDERS. 2 nd Exam on Nov.17 Coverage: Central Dogma of DNA Replication Transcription Translation Cell structure and function Recombinant DNA technology and molecular biology Protein analysis. BIOINFORMATICS. BIOINFORMATICS. - PowerPoint PPT Presentation

Transcript of REMINDERS

Page 1: REMINDERS

REMINDERS 2nd Exam on Nov.17 Coverage:

Central Dogma of DNA• Replication• Transcription• Translation

Cell structure and functionRecombinant DNA technology and

molecular biologyProtein analysis

Page 2: REMINDERS

BIOINFORMATICS

Page 3: REMINDERS

BIOINFORMATICS Study of the structure of biological

information and biological systems Integrates theories and tools of

mathematics/statistics, computer science and information technology

Involves the use of hardware and software to study vast amounts of biological data

Page 4: REMINDERS

What is Bioinformatics? the field of science in which biology,

computer science, and information technology merge to form a single discipline

application of information technology to the storage, management and analysis of biological information

facilitated by the use of computers

Page 5: REMINDERS
Page 6: REMINDERS

FUNCTIONS Data Management

StorageRetrieval

Data Analysis

*Literature/Bibliography, Sequence, Structure, Taxonomy, Expression, etc.

Page 7: REMINDERS

BIOLOGICAL DATABASES Systematic data storage/retrieval Maintained on a regular basis Can contain various types of data

(integration)SequenceStructureOther pertinent information

Nucleotides and proteins are most common

Page 8: REMINDERS

DATABASES

a large, organized body of persistent data, usually associated with computerized software designed to update, query, and retrieve components of the data stored within the system

Biological databases consist usually of the nucleic acid sequences of the genetic material of various organisms as well as protein sequences and structures

Page 9: REMINDERS

DATABASES

e.g. nucleotide sequence database typically contains information such as contact name the input sequence with a description of the

type of molecule the scientific name of the source organism

from which it was isolated additional requirements

easy access to the information a method for extracting only that information

needed to answer a specific biological question

Page 10: REMINDERS

DATABASES• Sequence

– GenBank, European Nucleotide Archive (ENA) and DNA Data Bank of Japan (DDBJ); managed by the International Nucleotide Sequence Database Collaboration (INSDC)

– UniGene– Saccharomyces Genome Database (SGD)– UniProtKB (UniProtKB/Swiss-Prot or

UniProt/TrEMBL)– ExPASy

Page 11: REMINDERS

DATABASES Structure

Nucleic Acid Database (NDB) Protein Data Bank (PDB)Worldwide Protein Data Bank (wwPDB)ExPASy

Page 12: REMINDERS

DATA MINING Process by which testable hypotheses

are created regarding function/structure of gene/protein of interest through identifying similar sequences in “more established” organisms

Tools:Text-term searchSequence similarity search

Page 13: REMINDERS

Machine Learning Studies methods and the design of

computer programs based on past experience

Why?New methods are being introducedOld ones should be improved

Page 14: REMINDERS

“Units” of Information DNA (genome) RNA (transcriptome) Protein (proteome)

Page 15: REMINDERS

What is Being Analyzed? Sequence Structure Interactions Pathways Mutations/Evolutions

Page 16: REMINDERS

Why? Increasing amount of biological

information entailsOrganizationArchiving

Global unification/harmonization More biological discoveries

Functional/Structural similaritiesPhylogenetic/Evolutionary patterns

Page 17: REMINDERS

Applications Medicine Pharmaceuticals Biotechnology Agriculture

Page 18: REMINDERS

STRUCTURE DATABASES

Page 19: REMINDERS

Molecular Data

• When you draw a molecule,– You start with atoms– Then proceed with the structure– And the three-dimensional data

• What can be stored?– Coordinates– Sequences– Chemical graphs

• Atoms and bonds

Page 20: REMINDERS

Databases Protein Data Bank (PDB) Molecular Modeling Database (MMDB)

Page 21: REMINDERS

Techniques in the Laboratory X-ray Crystallography Nuclear Magnetic Resonance

Page 22: REMINDERS

Formats PDB mmCIF MMDB

Page 23: REMINDERS

Structure Viewers Cn3D RasMol WebMol Mage VRML CAD Swiss PDB Viewer

Page 24: REMINDERS

Promises of bioinformatics

Medicine Knowledge of protein structure facilitates

drug design Understanding of genomic variation allows

the tailoring of medical treatment to the individual’s genetic make-up

Genome analysis allows the targeting of genetic diseases

The effect of a disease or of a therapeutic on RNA and protein levels can be elucidated

The same techniques can be applied to biotechnology, crop and livestock improvement, etc...

Page 25: REMINDERS

Challenges in bioinformatics Explosion of information

Need for faster, automated analysis to process large amounts of data

Need for integration between different types of information (sequences, literature, annotations, protein levels, RNA levels etc…)

Need for “smarter” software to identify interesting relationships in very large data sets

Lack of “bioinformaticians” Software needs to be easier to access, use

and understand Biologists need to learn about the software, its

limitations, and how to interpret its results

Page 26: REMINDERS

SEQUENCE ALIGNMENT

Page 27: REMINDERS

Two or More Sequences Measure similarity Determine correspondences between

residues Find patterns of conservation Derive evolutionary relationships

Page 28: REMINDERS

Alignment Correspondences of nucleotides/amino

acids in two sequences or more are assignedAn assignment of correspondences that

preserves the order of the residues within the sequences is an alignment

Gaps are used to achieve this Sequence alignment refers to the

identification of residue-residue correspondences

Page 29: REMINDERS

Uses Homology

Similarities“Ancestry”

Genome annotationAssigning structure and function to

genes Database queries

For newly-discovered/unknown sequences

Page 30: REMINDERS

Tools• Dot Plots

– Diagonal lines of dots showing similarities between two sequences

• Scoring Matrices– Score reflects quality of each possible

alignment; best possible score is identified– Scoring scheme is crucial– PAM (Point Accepted Mutations) and

BLOSUM (BLOCKS Substitution Matrix)• Dynamic Programming

– Algorithmic technique that reuses previous computations

Page 31: REMINDERS

Scoring Penalties/Scores

Match (e.g. A – A)Mismatch (e.g. A C)Gap (e.g. A _)

• Linear Gap Penalty: Uniform• Affine Gap Penalty: Gap Existence vs. Gap

Extension

Page 32: REMINDERS

Local vs. Global Alignments Global Alignment

Similarities between majority of two sequences

Local AlignmentSimilarities between specific parts of

two sequences

Page 33: REMINDERS

Programs

Pairwise Sequence Alignment BLAST VAST FASTA

Multiple Sequence Alignment MAFFT

Page 34: REMINDERS

Needleman-Wunsch Algorithm• Can be used for global and alignments• Maximum-value function• A simple scoring scheme is assumed

Three steps– Initialization – Matrix fill (scoring) – Traceback (alignment)