Computer Modeling and Visualization in Chemistry Spring 2007211 Havemeyer, Columbia University
description
Transcript of Computer Modeling and Visualization in Chemistry Spring 2007211 Havemeyer, Columbia University
1
Computer Modeling and Visualization in ChemistrySpring 2007 211 Havemeyer, Columbia
University
Sat 10:00am – 12:30pmTeachers:Eric Knoll, Li Tian and Robert Abel.Class Format:The 1st half of each class will consist of a lecture introducing you to the new material. The 2nd half will be a lab section where you can perform some of the calculations and visualizations on a computer. Many of the classes will introduce a new subject, and the course will briefly touch on a wide area of subjects in chemistry.Homework:Mostly NONE!Grading:No letter grade, but you do need to attend the classes to receive the certificate and remain in the program.Topics covered (tentative):1. Structure of Molecules
Atoms, electronic structure, bonding, molecular conformations.2. Chemical Reactions
Energies of reactions and reaction mechanisms.3. Proteins and DNA
The Protein Data Bank, protein folding, classification of proteins, gene therapy.4. Molecular Modeling
Molecular Mechanics, quantum mechanics, nanotechnology, supercomputers.The above topics may be altered depending on student interest in the given or other related subjects.
2
Class Description
This survey course is for students who are interested in chemistry, medicine, nanotechnology, computer science, or biotechnology and who want to discover real world applications of computer technology that go beyond typical undergraduate chemistry. The class will touch on advanced topics such as molecular mechanics and quantum chemistry, which are the foundations of the simulation software packages that are now standard research tools in areas such as organic chemistry, biochemistry and drug discovery. For the majority of the classes, students will get hands on experience using these software packages to visualize and study the structure and reactivity of organic molecules, proteins and DNA.
3
Class Schedule DATE TEACHER TOPIC (tentative)
1 Jan. 27 Eric Intro to Computational Science
2 Feb. 3 Eric Review of Chem. Bonding & Molecular Orbitals
3 Feb. 10 Robert Chemical Physics
4 Feb. 17 Robert Kinetics / Molecular Theories
5 Feb. 24 Eric Molecular Mechanics & Quantum Mechanics
6 Mar. 3 Robert Molecular Dynamics, Stat Mech, Monte Carlo
7 Mar. 10 Li Hydrogen Bonding
Mar. 17 NO CLASS
8 Mar. 24 Robert The Hydrophobic Effect
9 Mar. 31 Eric Molecular Conformations / Intro to Proteins
Apr. 7 NO CLASS
10 Apr. 14 Li Enzymes
11 Apr. 21 Eric Bioinformatics: Diseases, NCBI
12 Apr. 28 Li DNA / RNA
13 May. 5 Lin Spectroscopy Techniques in Biochem: NMR
4
Computational Science
- Science Honors Program -
Computer Modeling and Visualization in Chemistry
Eric KnollEric Knoll
5
This presentation is for educational, non-profit purposes only.Please do not post or distribute this presentation to anyone outside of this course.
Most of the slides in this presentation are from a course called “Parallel Computing” taught by Prof. David Keyes at Columbia University
6
Three “pillars” of scientific investigation
Experiment Theory Simulation
(“theoretical experiments”)
Computational simulation :=
“a means of scientific discovery that employs a computer system to simulate a physical system according to laws derived from theory and experiment”
7
“There will be opened a gateway and a road to a large and excellent science
into which minds more piercing than mine shall penetrate to recesses still deeper.”
Galileo (1564-1642)(on ‘experimental mathematical analysis of nature’
appropriated here for ‘simulation science’)
8
An early visionary: L. F. Richardson
from book on numerical weather prediction (1922)
9
How well can we predict the weather?Path of Hurricane Katrina 2005
10
Ocean SimulationsTsunami
Sumatra-Andaman Earthquake Tsumi. >275,000 killed.
11
The third pillar Computational science has been the stuff of fantasy for
generations (Galileo, Richardson, etc.) Modern version, hosted on digital computers, has been
foreseen for about 60 years (Von Neumann, 1946) Aggressively promoted as a national agenda for about 15
years (Wilson, 1989) Just now beginning to earn acceptance beyond the circle
of practitioners
Experimental publication: except for the team that did the experiments, everybody believes it
Computational publication: except for the team that did the computations, nobody believes it.
12
The “Grand Challenges” of Wilson
A Magna Carta of high performance computing (1989)
Supercomputer as “scientific instrument” Attention to quality research indicators in
computational science Sample “grand challenge” – electronic structure Prospects for computer technology Why the NSF supercomputer centers
13
Wilson’s burden
“In this paper, I address some of the tougher requirements on … grand challenge research to ensure that is has enduring value.” Algorithm development Error control Software productivity Fostering technological advances in computers
14
Wilson’s fear
“… Often advocated is that because computers of a fixed performance are dropping rapidly in price, one should only buy inexpensive computers … expecting that today’s supercomputer performance will be achieved … in a few years’ time. This … would be terrible… It would violate the whole spirit of science, of pushing at the frontiers of knowledge and technology simultaneously.”
15
Wilson’s six examples
Weather prediction Astronomy Materials science Molecular biology Aerodynamics Quantum field theory
16
Wilson’s six examples
Weather prediction curse of dimensionality (r3 in space; r4 in time) chaotic behavior
Astronomy Materials science Molecular biology Aerodynamics Quantum field theory
17
Wilson’s six examples
Weather prediction Astronomy
need to escape limits of observational record curse of dimensionality
Materials science Molecular biology Aerodynamics Quantum field theory
18
Wilson’s six examples
Weather prediction Astronomy Materials science
Electronic structure problem: 3N-dimensional Schroedinger way behind Newton, Maxwell
Molecular biology Aerodynamics Quantum field theory
19
Wilson’s six examples
Weather prediction Astronomy Materials science Molecular biology
Conformation problem combinatorial Protein folding “stiff”
Aerodynamics Quantum field theory
20
Wilson’s six examples
Weather prediction Astronomy Materials science Molecular biology Aerodynamics
Turbulence Full system analysis, full envelope analysis
Quantum field theory
21
Wilson’s six examples
Weather prediction Astronomy Materials science Molecular biology Aerodynamics Quantum field theory
QED is perturbative QCD is fundamentally nonlinear
22
Hardware Infrastructure
ARCHITECTURES
Applications
A “perfect storm” for simulation
scientific models
numerical algorithms
computer architecture
scientific software engineering
1686
1947
1976
“Computational science is undergoing a phase transition.” – D. Hitchcock, DOE
(dates are symbolic)
23
Movement towards simulation science
Standards Tools: languages, libraries, interfaces, formats, templates Results: validation and verification
Publications Journals, e.g., IEEE/APS Computing in Science and Engineering Book series, e.g., Springer’s LNCS&E
Degree programs Approximately 50 US-based programs
http://www.nhse.org/cse_edu.html Birds-of-a-feather meetings at conferences
25
HPCC Bluebook (1992, OSTP)
Proposed 30% increase in federal support of HPCC (to $638M/yr)
Four major components: High performance computing
systems Advanced Software
Technology and Algorithms National Research and
Education Network Basic Research and Human
Resources
27
It’s not just government…
200 of the “Top 500” computer systems in the world are operated by industryhttp://www.top500.org/
15 “Fortune 500” companies were sponsors of the NCSA Banking: J.P. Morgan Information: The Tribune Company Insurance: Allstate Manufacturing: Caterpillar, FMC, Kodak, Motorola Merchandising: Sears Petroleum: Phillips, Schlumberger, Shell Pharmaceuticals: Lilly Transportation: Boeing, Ford, SABRE
28
Computation vs. Theory
Computation is usually better for: Generality (dimension, geometry, properties,
boundary conditions) Transferability of technique (to less expert users)
Theory is usually better for: Compactness Generalizability Insight
“The purpose of computing is insight, not numbers.” – R. W. Hamming
29
Computation vs. Experiment
Computation is usually better for: Economy Feasibility Latency Idealizations Safety and/or political repercussions
Experiment is usually better for: Reliability Reality
30
Lexical soup of related terms
Computer science: the science of organizing and operating computers, including algorithms
Information science: the science of acquiring, converting, storing, retrieving, and conceptualizing information
Computational mathematics/numerical analysis: mathematics of computation, esp. focused on practical difference between real arithmetic and computer arithmetic and other resolution limitations of computers in performing well-defined mathematical operations
Computational Science (& Engineering): the science of using computers in pursuit of the natural science (& engineering), especially those aspects that are not specific to a particular discipline
31
Lexical soup of related terms, cont.
Scientific computing: a combination of computational science, numerical analysis, and computer architecture primarily concentrating on efficient and accurate algorithms for approximating the solution of operator (and other) equations
Computational “X” (where “X” is a particular natural or engineering science, such as physics, chemistry, biology, geophysics, fluid dynamics, structural mechanics, electrodynamics, etc.): a specialized subset of scientific computing concentrating on techniques and practices particular to problems from “X”, together with support technologies from CS&E
32
Clarifying examples Computer science: architecture, systems software, data structures,
algorithmic complexity, networks, software engineering, intelligent agents, profiling, benchmarking, performance modeling, performance tuning
Information science: data bases, data mining, data compression, pattern recognition
Computational mathematics: error analysis, algorithmic stability, convergence
Computational science: scientific visualization, computational steering, parallel partitioning and mapping, multidisciplinary computing
33
Case studies from O’Leary (1997)
Cellular radio transmission --- placement of transmitters in building to avoid dead spots (50% physics/engineering, 10% numerical analysis, 40% computer science) Ray tracing, attenuation modeling
Image processing --- correction of Hubble images (25% astronomy, 25% signal processing, 25% mathematics, 25% computer science) Large, ill-conditioned inverse problem
Information retrieval --- latent semantic indexing of large data bases (50% disciplinary field, 10% mathematics, 40% computer science) Singular value decomposition
Smoke plume modeling --- predict spread of smoke and heat in burning building (25% physics/engineering, 50% mathematics, 25% computer science) Large scale parallel, uncertainty quantification
What does Computational Chemistry Involve?
34
Single Processor
35
Moore’s LawIn 1965, Gordon Moore of Intel observed an exponential growth in the number of transistors per integrated circuit and optimistically predicted that this trend would continue. It has. “Moore’s Law” refers to a doubling of transistors per chip every 18 months, which translates into performance, though not quite at the same rate.
36
Prefix review
“flop/s” means “floating point operations per sec”
1,000 Kiloflop/s Kf
1,000,000 Megaflop/s Mf
1,000,000,000 Gigaflop/s Gf
1,000,000,000,000 Teraflop/s Tf
1,000,000,000,000,000 Petaflop/s Pf
Your laptop
Lab engine
37
Pipelining Often, an operation (e.g., a
multiplication of two floating point numbers) is done in several stages inputstage1stage2output
Each stage occupies different hardware and can be operating on a different multiplication
Like assembly lines for airplanes, cars, and many other products
38
Consider laundry pipeliningAnne, Bing, Cassandra, and Dinesh must each wash (30 min), dry (40 min), and fold (20 min) laundry. If each waits until the previous is finished, the four loads require 6 hours.
39
Laundry pipelining, cont.If Bing starts his wash as soon as Anne finishes hers, and then Cassandra starts her wash as soon as Bing finishes his, etc., the four loads require only 3.5 hours.
Note that in the middle of the task set, all three stations are in use simultaneously.
For long streams, ideal speed-up approaches three – the number of available stations.
Imbalance between the stages, and pipe filling and draining effects make actual speedup less.
40
Arithmetic pipelining
An arithmetic operation may have 5 stages Instruction fetch (IF) Read operands from registers (RD) Execute operation (OP) Access memory (AM) Write back to memory (WB)
IF OP AM WBRD
IF OP AM WBRD
IF OP AM WBRD
Instru
ction
s
Time
…Actually, each of these stages may be superpipelined further!
41
Benefits of pipelining
Allows the computer to be physically larger Signals need travel only from one stage to the
next per clock cycle, not over entire computer
42
Problems with pipelining
Must find many operations to do independently, since results of earlier scheduled operations are not immediately available for the next; waiting may stall pipe
Conditionals may require partial results to be discarded If pipe is not kept full, the extra hardware is wasted, and
machine is slow
IF OP AM WBRD
IF OP AM WBRDstall
Create “x”
Consume “x”
43
Parallelism
44
Parallelism
Often, a large group of operations can be done concurrently, without memory conflicts
In our airplane example, each cell update involves only cells on neighboring faces Cells that do not share a face can be updated
simultaneously
No purple cell quantities are involved in each other’s updates.
45
Parallelism in building a wall
Each worker has an interior “chunk” of independent work, but workers require periodic coordination with their neighbors at their boundaries. One slow worker will eventually stall the rest. Potential speedup is proportional to the number of workers, less coordination overhead.
46
Benefits of parallelism
Allows the computer to be physically larger If we had one million computers, then each
computer would only have to do 8x109 operations per second
This would allow the computers to be about 3cm apart
47
Parallel processor configurations
In the airplane example, each processor in the 3D array (left) can be made responsible for a 3D chunk of space.
The global cross-bar switch is overkill in this case. A mesh network (below) is sufficient.
48
SMP & MPP paradigms
Massively Parallel Processor (MPP)Symmetric Multi-Processor (SMP)
cpu cpu cpu
Fast Interconnect
Shared memory
cpu
Mem
cpu
Mem
cpu
Mem
• two to hundreds of processors
• shared memory
• global addressing
Interconnect
• thousands of processors
• distributed memory
• local addressing
49
Concurrency has also grown
DOE’s ASCI roadmap is to go to 100 Teraflop/s by 2006
Variety of vendors engaged
Compaq Cray Intel IBM SGI
Up to 8,192 processors
Relies on commodity processor/memory units, with tightly coupled network
50
Bird’s-eye View of the Earth Simulator System
65m
50m Double Floor for IN Cables
Interconnection Network
(IN) Cabinets
Cartridge Tape Library System
Power Supply System
Air Conditioning
System
Processor Node
(PN) Cabinets
Disks
Japan’s Earth Simulator
52
New architecture on horizon: Blue Gene/L
180 Tflop/s configuration (65,536 dual processor chips)
To be delivered to LLNL in 2004 by IBM
53
Gordon Bell Prize “peak performance”
Year Type Application No. Procs
System Gflop/s
1988 PDE Structures 8 Cray Y-MP 1.0 1989 PDE Seismic 2,048 CM-2 5.6 1990 PDE Seismic 2,048 CM-2 14 1992 NB Gravitation 512 Delta 5.4 1993 MC Boltzmann 1,024 CM-5 60 1994 IE Structures 1,904 Paragon 143 1995 MC QCD 128 NWT 179 1996 PDE CFD 160 NWT 111 1997 NB Gravitation 4,096 ASCI Red 170 1998 MD Magnetism 1,536 T3E-1200 1,020 1999 PDE CFD 5,832 ASCI BluePac 627 2000 NB Gravitation 96 GRAPE-6 1,349 2001 NB Gravitation 1,024 GRAPE-6 11,550 2002 PDE Climate ~5,000 Earth Sim 26,500
Four orders of magnitude in 13 years
54
Gordon Bell Prize “price performance”
Year Application System $ per Mflops
MMflop/s
1989 Reservoir modeling CM-2 2,500 1990 Electronic structure IPSC 1,250 1992 Polymer dynamics cluster 1,000 1993 Image analysis custom 154 1994 Quant molecular dyn cluster 333 1995 Comp fluid dynamics cluster 278 1996 Electronic structure SGI 159 1997 Gravitation cluster 56 1998 Quant chromodyn custom 12.5 1999 Gravitation custom 6.9 2000 Comp fluid dynamics cluster 1.9 2001 Structural analysis cluster 0.24
Four orders of magnitude in 12 years
55
Gordon Bell Prize outpaces Moore’s Law
Four orders of magnitude in 13 years
Gordon Moore
Gordon Bell
<<Demi Moore>>
CONCUR-RENCY!!!
56
Problems with parallelism
Must find massive concurrency in the task Still need many computers, each of which must
be fast Communication between computers becomes a
dominant factor Amdahl’s Law limits speedup available based on
remaining non-concurrent work
57
Amdahl’s Law (1967)
11 )1( TfP
TfTP
PffT
TSpeedup
P /)1(
11
In 1967 Gene Amdahl of Cray Computer formulated his famous pessimistic formula about the speedup available from concurrency. If f is the fraction of the code that is parallelizable and P is the number of processors available, then the time TP to run on P nodes as a function of the time T1 to run on 1 is:
1
10
100
1000
10000
4 16 64 246 1024
f = 0.8f= 0.9f = 0.95f = 0.99f = 0.999f = 1.0
Sp
eed
up
Number of processors)1(
1lim
fSpeedup
P
58
Algorithms
59
Algorithms are key
“I would rather have today’s algorithms on yesterday’s computers than vice versa.”
Philippe Toint
60
The power of optimal algorithms
Advances in algorithmic efficiency can rival advances in hardware architecture
Consider Poisson’s equation on a cube of size N=n3
If n=64, this implies an overall reduction in flop of ~16 million
Year Method Reference Storage Flop
1947 GE (banded) Von Neumann & Goldstine
n5 n7
1950 Optimal SOR Young n3 n4 log n
1971 CG Reid n3 n3.5 log n
1984 Full MG Brandt n3 n3
2u=f 64
6464
*Six-months is reduced to 1 s
*
61
year
relative speedup
Algorithms and Moore’s Law This advance took place over a span of about 36 years, or 24
doubling times for Moore’s Law 22416 million the same as the factor from algorithms alone!
62
Computational Limitations: Triple-finiteness of computation Finite wordlength
A 64-bit word can represent how many numbers?264=(210)6 24~16x(103)6=16x1018 (IEEE floating point allocates 53 of these bits to the mantissa)
Finite wordlength A 64-bit architecture typically addresses about 1015 distinct words
of memory – this is small compared to, e.g., Avogadro’s number Finite number of operations
• A typical computation can get at most a few days on a teraflops-class machine, or about 1017 flops
63
Implications of triple-finiteness
Computational resolution fundamentally limited Computational accuracy fundamentally limited in
a way that some computations that appear to be achievable are not, because of sensitivity to arithmetic error
High-performance computing has growth potential in an employment sense
64
Computational Difficulties:Conditioning, accuracy, stability Problem is ill-conditioned if a small change in the input data can
make a large change in the output characteristic of the problem, not of the process in severe cases, no algorithm can help
Process is unstable if the result it produces is sensitive to floating point errors made during its execution characteristic of the process, not of the problem another algorithm may help
Only if a stable algorithm is applied to a well-conditioned problem may an accurate result be expected If either stability or accuracy is missing the computation may be a waste
of time and the result misleading We avoid at present the subject of nonuniqueness
Important in nonlinear problems
65
Applications
66
Environmentglobal climatecontaminant
transport
Lasers & Energycombustion
ICF
Engineeringcrash testingaerodynamics
Biologydrug designgenomics
AppliedPhysics
radiation transportsupernovae
Scientific
Simulation
In these, and many other areas, simulation is an important complement to experiment.
Today’s “terascale simulation”
67
Environmentglobal climatecontaminant
transport
Lasers & Energycombustion
ICF
Engineeringcrash testingaerodynamics
Biologydrug designgenomics
Experiments controversial
AppliedPhysics
radiation transportsupernovae
Scientific
Simulation
In these, and many other areas, simulation is an important complement to experiment.
Today’s “terascale simulation”
68
Environmentglobal climatecontaminant
transport
Lasers & Energycombustion
ICF
Engineeringcrash testingaerodynamics
Biologydrug designgenomics
Experiments controversial
AppliedPhysics
radiation transportsupernovae
Scientific
Simulation
Experiments dangerous
In these, and many other areas, simulation is an important complement to experiment.
Today’s “terascale simulation”
69
Environmentglobal climatecontaminant
transport
Lasers & Energycombustion
ICF
Engineeringcrash testingaerodynamics
Biologydrug designgenomics
Experiments controversial
AppliedPhysics
radiation transportsupernovae
Experiments prohibited or impossible
Scientific
Simulation
Experiments dangerous
In these, and many other areas, simulation is an important complement to experiment.
Today’s “terascale simulation”
70
Environmentglobal climatecontaminant
transport
Lasers & Energycombustion
ICF
Engineeringcrash testingaerodynamics
Biologydrug designgenomics
Experiments controversial
AppliedPhysics
radiation transportsupernovae
Experiments prohibited or impossible
Scientific
Simulation
Experiments dangerous
In these, and many other areas, simulation is an important complement to experiment.
Experiments difficult to instrument
Today’s “terascale simulation”
71
Environmentglobal climatecontaminant
transport
Lasers & Energycombustion
ICF
Engineeringcrash testingaerodynamics
Biologydrug designgenomics
Experiments controversial
AppliedPhysics
radiation transportsupernovae
Experiments prohibited or impossible
Scientific
Simulation
Experiments dangerous
In these, and many other areas, simulation is an important complement to experiment.
Experiments difficult to instrument
Experiments expensive
Today’s “terascale simulation”
72
Examples: multiple-scale applications Biopolymers, nanotechnology
1012 range in time, from 10-15 sec (quantum fluctuation) to 10-3 sec (molecular folding time)
typical computational model ignores smallest scales, works on classical dynamics only, but scientists increasingly want both
Galaxy formation 1020 range in space from binary star interactions
to diameter of universe heroic computational model handles all scales
with localized adaptive meshing
Supernova simulation, c/o A. Mezzacappa, ORNL
Supernovae simulation massive ranges in time
and space scales for radiation, turbulent convection, diffusion, chemical reaction, nuclear reaction
73
Parallel Computation: Discretization of Space (Grid)
Construct “grid” of triangles
Construct “control volumes” surrounding each vertex
Compute effluxes
Compute influxes
Compute internal sources
Finally, sum all fluxes and sources (with proper sign) and adjust value at vertex; then loop over all such vertices.
74
Adaptive Cartesian mesh
far fieldnear field
inviscid shock
75
Adaptive triangular mesh
viscous boundary layer
76
Unstructured grid for complex geometry
slat flaps
77
Scientific visualization adds insight
Computer becomes an experimental laboratory, like a windtunnel, and can be outfitted with diagnostics and imaging intuitive to windtunnel experimentalists.
78
What would scientists do with 100-1000x? Example: predicting future climates
Resolution refine horizontal from 160 to 40 km refine vertical from 105 to 15km
New physics atmospheric chemistry carbon cycle dynamic terrestrial vegetation (nitrogen and sulfur cycles and land-
use and land-cover changes) Improved representation of subgrid processes
clouds atmospheric radiative transfer
79
What would scientists do with 100-1000x? Example: predicting future climates
Simulations at various resolutions using the Los Alamos Parallel Ocean Program (POP) have demonstrated that, because equatorial meso-scale eddies have diameters ~10-200 km, the grid spacing must be < 10 km to adequately resolve the eddy spectrum. This is illustrated qualitatively in the set of four images of the sea-surface temperature (SST). Figure (a) shows a snapshot of SST from satellite observations, while the three other figures are snapshots from the POP simulations at resolutions of (b) 2, (c) 0.28, and (d) 0.1. The narrow, meandering current off the coast of Japan is the Kuroshio Current
80
Parallel Computation: N-body problem
Examples: Molecular Mechanics Galaxy Modeling