Computational Biology, Part 12 Spreadsheet Basics I Robert F. Murphy Copyright 1996, 1999-2001. All...

43
Computational Biology, Part 12 Spreadsheet Basics I Robert F. Murphy Robert F. Murphy Copyright Copyright 1996, 1999- 1996, 1999- 2001. 2001. All rights reserved. All rights reserved.

Transcript of Computational Biology, Part 12 Spreadsheet Basics I Robert F. Murphy Copyright 1996, 1999-2001. All...

Page 1: Computational Biology, Part 12 Spreadsheet Basics I Robert F. Murphy Copyright  1996, 1999-2001. All rights reserved.

Computational Biology, Part 12Spreadsheet Basics I

Computational Biology, Part 12Spreadsheet Basics I

Robert F. MurphyRobert F. Murphy

Copyright Copyright 1996, 1999-2001. 1996, 1999-2001.

All rights reserved.All rights reserved.

Page 2: Computational Biology, Part 12 Spreadsheet Basics I Robert F. Murphy Copyright  1996, 1999-2001. All rights reserved.

CellsCells A spreadsheet is a two-dimensional array of A spreadsheet is a two-dimensional array of

cellscells. Each cell is uniquely identified by . Each cell is uniquely identified by the row and column at whose intersection it the row and column at whose intersection it lies. Most spreadsheets use letters to lies. Most spreadsheets use letters to specify columns and numbers to specify specify columns and numbers to specify rows. Thus cell C7 is in column C (the 3rd rows. Thus cell C7 is in column C (the 3rd column) and row 7.column) and row 7.

Cells can contain Cells can contain valuesvalues or or formulasformulas..

Page 3: Computational Biology, Part 12 Spreadsheet Basics I Robert F. Murphy Copyright  1996, 1999-2001. All rights reserved.

ValuesValues A A valuevalue is a constant entered into a cell. A is a constant entered into a cell. A

value may be value may be numericnumeric or or textualtextual.. NumericNumeric values include integers, real numbers values include integers, real numbers

expressed as decimals, or real numbers expressed as decimals, or real numbers expressed in scientific notation.expressed in scientific notation. Examples: “Examples: “55”, “”, “7.117.11”, “”, “2e-62e-6””

TextualTextual values normally consist of one or more values normally consist of one or more “printable” characters.“printable” characters. Examples: “Examples: “MassMass”, “”, “Created by R. StuartCreated by R. Stuart””

Page 4: Computational Biology, Part 12 Spreadsheet Basics I Robert F. Murphy Copyright  1996, 1999-2001. All rights reserved.

LabelsLabels A A labellabel is a textual value used to mark, is a textual value used to mark,

identify or clarify other cells.A label might identify or clarify other cells.A label might be a heading on top of a column of numbers be a heading on top of a column of numbers or an identifier beside an important value.or an identifier beside an important value. Examples: “Examples: “Temperature=Temperature=”, “”, “ConcentrationConcentration””

Page 5: Computational Biology, Part 12 Spreadsheet Basics I Robert F. Murphy Copyright  1996, 1999-2001. All rights reserved.

FormulasFormulas A A formulaformula is an entry in a cell that specifies is an entry in a cell that specifies

one or more calculations to be done to one or more calculations to be done to create a value for that cell. Formulascreate a value for that cell. Formulas must be identified to the program as distinct must be identified to the program as distinct

from textual values (normally by preceding from textual values (normally by preceding them with an them with an == or or ++))

may refer to other cellsmay refer to other cells may use may use operatorsoperators, such as , such as ** and and ^̂ may invoke may invoke functionsfunctions provided by the provided by the

spreadsheet programspreadsheet program

Page 6: Computational Biology, Part 12 Spreadsheet Basics I Robert F. Murphy Copyright  1996, 1999-2001. All rights reserved.

FunctionsFunctions A A functionfunction is something provided by the is something provided by the

spreadsheet that is replaced by a value spreadsheet that is replaced by a value during evaluation of a formula (it “returns a during evaluation of a formula (it “returns a value”).value”).

A function may or may not require A function may or may not require argumentsarguments..

Examples: Examples: SINSIN, , AVERAGEAVERAGE, , DATEDATE

Page 7: Computational Biology, Part 12 Spreadsheet Basics I Robert F. Murphy Copyright  1996, 1999-2001. All rights reserved.

Operator HierarchyOperator Hierarchy It is important to know the order in which It is important to know the order in which

operators are evaluated in spreadsheet operators are evaluated in spreadsheet formulas. This order is referred to as the formulas. This order is referred to as the operator hierarchyoperator hierarchy. When parentheses are . When parentheses are not present, exponentiation is performed not present, exponentiation is performed first, followed by multiplication & division, first, followed by multiplication & division, followed by addition and subtraction. When followed by addition and subtraction. When operators of equal hierarchy are present, operators of equal hierarchy are present, they are evaluated from left to right.they are evaluated from left to right.

Page 8: Computational Biology, Part 12 Spreadsheet Basics I Robert F. Murphy Copyright  1996, 1999-2001. All rights reserved.

Operator Hierarchy ExamplesOperator Hierarchy ExamplesWhat is A1 + A2*A3?What is A1 + A2*A3?

1111What is A1 + A2*A3 + A4?What is A1 + A2*A3 + A4?

2424What is (A1 + A2)*(A3 + A4)?What is (A1 + A2)*(A3 + A4)?

5454What is A1*A4/A2*A3?What is A1*A4/A2*A3?

32.532.5What is A1*A4/(A2*A3)?What is A1*A4/(A2*A3)?

1.31.3What is 10^A2/10^A3?What is 10^A2/10^A3?

0.0010.001What is 10^(A2/10)^A3?What is 10^(A2/10)^A3?

1010

Assume: A1=1, A2=2, A3=5, A4=13

Page 9: Computational Biology, Part 12 Spreadsheet Basics I Robert F. Murphy Copyright  1996, 1999-2001. All rights reserved.

FormatsFormats Values (including results of formulas) may Values (including results of formulas) may

be displayed in a variety of be displayed in a variety of formatsformats. For . For numeric values, the numeric values, the precisionprecision controls how controls how many decimals places are displayed. In many decimals places are displayed. In Excel, the precision is set on the Excel, the precision is set on the NumberNumber tab after selecting tab after selecting Cells... Cells... under the under the Format Format menu or using the Toolbar or buttons.menu or using the Toolbar or buttons.

Page 10: Computational Biology, Part 12 Spreadsheet Basics I Robert F. Murphy Copyright  1996, 1999-2001. All rights reserved.

Relative vs. Absolute ReferencesRelative vs. Absolute References A central feature of spreadsheet programs is A central feature of spreadsheet programs is

the ability to automatically change cell the ability to automatically change cell references when the contents of a cell are references when the contents of a cell are copied to other cells. This allows a formula copied to other cells. This allows a formula to be entered once but evaluated for many to be entered once but evaluated for many different cases.different cases.

To illustrate this, we will generate a model To illustrate this, we will generate a model that calculates the fraction of an ionizable that calculates the fraction of an ionizable group that is charged at various pH valuesgroup that is charged at various pH values

Page 11: Computational Biology, Part 12 Spreadsheet Basics I Robert F. Murphy Copyright  1996, 1999-2001. All rights reserved.

Simple model: Acid DissociationSimple model: Acid Dissociation For the dissociation of a weak acidFor the dissociation of a weak acid

HB HB B B-- + H + H++

HB is referred to as the conjugate acid and HB is referred to as the conjugate acid and B- is referred to as the conjugate baseB- is referred to as the conjugate base

The equilibrium equations areThe equilibrium equations are

Keq =H +[ ] B−[ ]

HB[ ]

pH=pK + logB−[ ]HB[ ]

Page 12: Computational Biology, Part 12 Spreadsheet Basics I Robert F. Murphy Copyright  1996, 1999-2001. All rights reserved.

Ionization equilibria for amino acidsIonization equilibria for amino acids Need to consider which groups on an amino Need to consider which groups on an amino

acid can be protonated/unprotonatedacid can be protonated/unprotonated The carboxyl and amino groups that are The carboxyl and amino groups that are

involved in peptide bonds have relatively involved in peptide bonds have relatively constant pKs of ~2 and ~9constant pKs of ~2 and ~9

The side chain pKs vary considerably.The side chain pKs vary considerably. Illustrations for Arg and Tyr follow.Illustrations for Arg and Tyr follow.

Page 13: Computational Biology, Part 12 Spreadsheet Basics I Robert F. Murphy Copyright  1996, 1999-2001. All rights reserved.

_

_

_

+

++

+ +

Arg- HArg

H2Arg+ H3Arg++

Page 14: Computational Biology, Part 12 Spreadsheet Basics I Robert F. Murphy Copyright  1996, 1999-2001. All rights reserved.

_

_Tyr--

_

HTyr-

_+

H2Tyr

+

H3Tyr+

Page 15: Computational Biology, Part 12 Spreadsheet Basics I Robert F. Murphy Copyright  1996, 1999-2001. All rights reserved.

Task: Given a pKTask: Given a pKaa and a pH calculate and a pH calculate

fraction of base in unprotonated formfraction of base in unprotonated form Step 1: Enter pKStep 1: Enter pKaa and pH as constant values and pH as constant values

into two cellsinto two cells

Example: An Ionizable GroupExample: An Ionizable Group

Page 16: Computational Biology, Part 12 Spreadsheet Basics I Robert F. Murphy Copyright  1996, 1999-2001. All rights reserved.

Step 2: Enter formula Step 2: Enter formula using references to using references to constantsconstants

[B][HB]

=10pH

10pK

Example: An Ionizable GroupExample: An Ionizable Group

Page 17: Computational Biology, Part 12 Spreadsheet Basics I Robert F. Murphy Copyright  1996, 1999-2001. All rights reserved.

Note: The formula was made visible in the Note: The formula was made visible in the spreadsheet by clicking the spreadsheet by clicking the Formulas Formulas box box on theon the View View tab after selecting tab after selecting Preferences...Preferences... under the under the ToolsTools menu menu

Example: An Ionizable GroupExample: An Ionizable Group

Page 18: Computational Biology, Part 12 Spreadsheet Basics I Robert F. Murphy Copyright  1996, 1999-2001. All rights reserved.

Step 3: Convert to fraction of BStep 3: Convert to fraction of B

Example: An Ionizable GroupExample: An Ionizable Group

Page 19: Computational Biology, Part 12 Spreadsheet Basics I Robert F. Murphy Copyright  1996, 1999-2001. All rights reserved.

Switch to viewing results of formulas rather Switch to viewing results of formulas rather than the formulas themselvesthan the formulas themselves

Example: An Ionizable GroupExample: An Ionizable Group

Page 20: Computational Biology, Part 12 Spreadsheet Basics I Robert F. Murphy Copyright  1996, 1999-2001. All rights reserved.

New Task: Calculate fraction of B for more New Task: Calculate fraction of B for more than one pH valuethan one pH value

Step 4: Rearrange cells so that each row can be Step 4: Rearrange cells so that each row can be devoted to one pH valuedevoted to one pH value

Example: An Ionizable GroupExample: An Ionizable Group

Page 21: Computational Biology, Part 12 Spreadsheet Basics I Robert F. Murphy Copyright  1996, 1999-2001. All rights reserved.

Step 5: Enter a formula to generate a series Step 5: Enter a formula to generate a series of increasing pH valuesof increasing pH values

Example: An Ionizable GroupExample: An Ionizable Group

Page 22: Computational Biology, Part 12 Spreadsheet Basics I Robert F. Murphy Copyright  1996, 1999-2001. All rights reserved.

Step 6: Copy the formula from cell Step 6: Copy the formula from cell B7B7 down to cell down to cell B8B8

Example: An Ionizable GroupExample: An Ionizable Group

Page 23: Computational Biology, Part 12 Spreadsheet Basics I Robert F. Murphy Copyright  1996, 1999-2001. All rights reserved.

Note that the reference to the pH value (cell Note that the reference to the pH value (cell A7A7) changed to ) changed to A8A8 (which we wanted to (which we wanted to happen) but that the reference to the pKa happen) but that the reference to the pKa (cell (cell B4B4) changed to ) changed to B5 B5 (which we didn’t)(which we didn’t)

Example: An Ionizable GroupExample: An Ionizable Group

Page 24: Computational Biology, Part 12 Spreadsheet Basics I Robert F. Murphy Copyright  1996, 1999-2001. All rights reserved.

““Normally” a formula containing the names of cells (called Normally” a formula containing the names of cells (called referencesreferences to to those cells) are updated when that formula is copied to another cellthose cells) are updated when that formula is copied to another cell

The row number is incremented by the difference in row numbers The row number is incremented by the difference in row numbers between the original location of the formula and the new locationbetween the original location of the formula and the new location

The column number is incremented by the difference in column numbersThe column number is incremented by the difference in column numbers

Relative ReferencesRelative References

Page 25: Computational Biology, Part 12 Spreadsheet Basics I Robert F. Murphy Copyright  1996, 1999-2001. All rights reserved.

Such a reference is termed a Such a reference is termed a relative referencerelative reference because the reference is because the reference is implicitly relative to the current cellimplicitly relative to the current cell

We may want to “fix” or “hold” a reference so that it doesn’t change We may want to “fix” or “hold” a reference so that it doesn’t change during copying a formuladuring copying a formula

This is termed an This is termed an absolute referenceabsolute reference and in Excel is created by putting a and in Excel is created by putting a dollar sign ($) in front of the row or column number, or bothdollar sign ($) in front of the row or column number, or both

Relative vs. Absolute ReferencesRelative vs. Absolute References

Page 26: Computational Biology, Part 12 Spreadsheet Basics I Robert F. Murphy Copyright  1996, 1999-2001. All rights reserved.

Step 7: Change the reference to the pKStep 7: Change the reference to the pKaa to to

an an absolute referenceabsolute reference

Example: An Ionizable GroupExample: An Ionizable Group

Page 27: Computational Biology, Part 12 Spreadsheet Basics I Robert F. Murphy Copyright  1996, 1999-2001. All rights reserved.

Step 8: Copy the formula in cell Step 8: Copy the formula in cell C7C7 down down (note that the reference to (note that the reference to B7B7 updates to updates to B8B8))

Example: An Ionizable GroupExample: An Ionizable Group

Page 28: Computational Biology, Part 12 Spreadsheet Basics I Robert F. Murphy Copyright  1996, 1999-2001. All rights reserved.

Step 9: Copy the formulas in cells Step 9: Copy the formulas in cells A8:C8A8:C8 down (note that the references to down (note that the references to A7A7,, A8 A8, , andand B8 B8 increment but increment but $B$4$B$4 doesn’t) doesn’t)

Example: An Ionizable GroupExample: An Ionizable Group

Page 29: Computational Biology, Part 12 Spreadsheet Basics I Robert F. Murphy Copyright  1996, 1999-2001. All rights reserved.

Step 10: Switch back to viewing values Step 10: Switch back to viewing values rather than formulas to see resultsrather than formulas to see results

Example: An Ionizable GroupExample: An Ionizable Group

Page 30: Computational Biology, Part 12 Spreadsheet Basics I Robert F. Murphy Copyright  1996, 1999-2001. All rights reserved.

NamesNames To make it easier to read formulas To make it easier to read formulas

containing many cell references, some containing many cell references, some spreadsheet programs allow the creation of spreadsheet programs allow the creation of namesnames for cells (like variable names in for cells (like variable names in programs)programs)

Page 31: Computational Biology, Part 12 Spreadsheet Basics I Robert F. Murphy Copyright  1996, 1999-2001. All rights reserved.

New Task: Define New Task: Define absolute nameabsolute name for cell for cell containing pKcontaining pKaa

Step 11: Select cell Step 11: Select cell B4B4 then then Define NameDefine Name

Example: An Ionizable GroupExample: An Ionizable Group

Page 32: Computational Biology, Part 12 Spreadsheet Basics I Robert F. Murphy Copyright  1996, 1999-2001. All rights reserved.

Note Excel has chosen a name based on the Note Excel has chosen a name based on the labellabel in the adjacent cell and that the default in the adjacent cell and that the default is for the name to refer to the currently is for the name to refer to the currently selected cellselected cell

Example: An Ionizable GroupExample: An Ionizable Group

Page 33: Computational Biology, Part 12 Spreadsheet Basics I Robert F. Murphy Copyright  1996, 1999-2001. All rights reserved.

Step 12: Use Step 12: Use Apply NameApply Name to replace all to replace all references to references to B4B4 in the spreadsheet with the in the spreadsheet with the new namenew name

Example: An Ionizable GroupExample: An Ionizable Group

Page 34: Computational Biology, Part 12 Spreadsheet Basics I Robert F. Murphy Copyright  1996, 1999-2001. All rights reserved.

Note formulas now show the nameNote formulas now show the name

Example: An Ionizable GroupExample: An Ionizable Group

Page 35: Computational Biology, Part 12 Spreadsheet Basics I Robert F. Murphy Copyright  1996, 1999-2001. All rights reserved.

New Task: Define New Task: Define relative namerelative name for pH for pH Step 13: Select cell Step 13: Select cell A7A7 and Define Name and Define Name

Example: An Ionizable GroupExample: An Ionizable Group

Page 36: Computational Biology, Part 12 Spreadsheet Basics I Robert F. Murphy Copyright  1996, 1999-2001. All rights reserved.

Note name now chosen based on label Note name now chosen based on label aboveabove B7B7. Change reference from . Change reference from $A$7$A$7 (default is absolute for names) to (default is absolute for names) to $A7$A7 (row (row number is allow to be relative)number is allow to be relative)

Example: An Ionizable GroupExample: An Ionizable Group

Page 37: Computational Biology, Part 12 Spreadsheet Basics I Robert F. Murphy Copyright  1996, 1999-2001. All rights reserved.

Step 14: Apply NameStep 14: Apply Name

Example: An Ionizable GroupExample: An Ionizable Group

Page 38: Computational Biology, Part 12 Spreadsheet Basics I Robert F. Murphy Copyright  1996, 1999-2001. All rights reserved.

Complex Models and GraphingComplex Models and Graphing Before creating complex models, it is Before creating complex models, it is

important to think about what graphical or important to think about what graphical or tabular output is desired from the model. tabular output is desired from the model. The organization of the spreadsheet should The organization of the spreadsheet should be optimized for this output. For example, be optimized for this output. For example, if graphing of if graphing of [P] [P] vs. vs. tt is desired, try to is desired, try to place all values for place all values for t t and and [P] [P] in consecutive in consecutive cells in adjacent rows or columns.cells in adjacent rows or columns.

Page 39: Computational Biology, Part 12 Spreadsheet Basics I Robert F. Murphy Copyright  1996, 1999-2001. All rights reserved.

A Model with a Single OutputA Model with a Single Output (Demonstration D2)(Demonstration D2)

Page 40: Computational Biology, Part 12 Spreadsheet Basics I Robert F. Murphy Copyright  1996, 1999-2001. All rights reserved.

A Workaround for GraphingA Workaround for Graphing A spreadsheet that calculates some desired A spreadsheet that calculates some desired

quantity (e.g., net charge) for a single value quantity (e.g., net charge) for a single value of some independent variable (e.g., pH), can of some independent variable (e.g., pH), can be used for graphing by adding cells in be used for graphing by adding cells in which various values of the dependent which various values of the dependent variable are manually tabulated as the variable are manually tabulated as the independent variable is changedindependent variable is changed

(Demonstration D3)(Demonstration D3)

Page 41: Computational Biology, Part 12 Spreadsheet Basics I Robert F. Murphy Copyright  1996, 1999-2001. All rights reserved.

Expanding a Model for GraphingExpanding a Model for Graphing A better approach to the same problem is to A better approach to the same problem is to

make many copies of the original make many copies of the original spreadsheet (using copy and paste) and spreadsheet (using copy and paste) and enter a different value of the independent enter a different value of the independent variable in each copy. The results can be variable in each copy. The results can be collected for graphing using references.collected for graphing using references.

(Demonstration D4)(Demonstration D4)

Page 42: Computational Biology, Part 12 Spreadsheet Basics I Robert F. Murphy Copyright  1996, 1999-2001. All rights reserved.

Grouping ConstantsGrouping Constants The best method is to redesign the original The best method is to redesign the original

sheet so that only one row (or column) is sheet so that only one row (or column) is needed for each value of the independent needed for each value of the independent variable. This allows using fill down for the variable. This allows using fill down for the subsequent rows of the sheet.subsequent rows of the sheet.

(Demonstration D5)(Demonstration D5)

Page 43: Computational Biology, Part 12 Spreadsheet Basics I Robert F. Murphy Copyright  1996, 1999-2001. All rights reserved.

Assigned Reading For Next ClassAssigned Reading For Next Class YeargersYeargers

Chapter 1Chapter 1 Section 2.1Section 2.1 Chapter 3 through Section 3.4Chapter 3 through Section 3.4 Chapter 4 through Section 4.2Chapter 4 through Section 4.2