Applying a new software development paradigm to biology
-
Upload
jeff-long -
Category
Technology
-
view
347 -
download
0
description
Transcript of Applying a new software development paradigm to biology
Cover Page
Uploaded June 26, 2011
Applying a New
Software Development
Paradigm to Biology
Authors: M. C. Giddings and Jeffrey G. Long ([email protected])
Date: May 7, 2003
Forum: Poster session presented the Genome Informatics Conference, sponsored
by Cold Spring Harbor Laboratory.
Contents
Page 1: Abstract
Pages 2‐20: Slides (but no text) for presentation
License
This work is licensed under the Creative Commons Attribution‐NonCommercial
3.0 Unported License. To view a copy of this license, visit
http://creativecommons.org/licenses/by‐nc/3.0/ or send a letter to Creative
Commons, 444 Castro Street, Suite 900, Mountain View, California, 94041, USA.
Genome Informatics Long Preference: Oral presentation APPLYING A NEW SOFTWARE DEVELOPMENT PARADIGM TO BIOLOGY M.C. Giddings, University of North Carolina; J. Long Rules are typically hard-coded into software applications, and the maintenance of these rules as they change, due to updated domain knowledge or user requirements, results in a significant time and cost expenditure. Subject experts must communicate the rules they wish to see automated to programmers who often are not experts in the subject matter of the application; much can be lost in the translation. As this process continues through time, software systems become large and unwieldy, such that no one involved in a project can comprehend or manage it as a whole. There have been numerous initiatives directed at solving these problems, but the solutions have been only partially useful because the problems they address are actually secondary and symptomatic rather than primary. The premise of Ultra-Structure theory is that these issues can be addressed by removing most rules and all knowledge of the world from software and instead representing them the same way we represent data, i.e. as tables in a relational database. This approach combines key features of the normally disparate areas of management information systems, expert systems, and simulations, borrowing the strengths of each and potentially eliminating some of the known problems of each. Ultra-Structure has been applied to a variety of rule-based systems, and we are investigating its utility for biology. In particular, we’ve been building a multi-function prototype that can be used to store, in an integrated and manageable way, laboratory results, simulations, and general biological knowledge pertaining to microbial genomics and proteomics research efforts. Based on results thus far we believe the approach warrants further investigation. The presentation is intended to introduce Ultra-Structure theory, discuss the prototype biological system being developed, and generate discussion with our peers about the benefits and pitfalls of this approach.
Applying a New Software Development P di t Bi lParadigm to Biology: Developing applications that handle
complexity and stand the test of time
Morgan Giddings and Jeff LongGenome Informatics [email protected], [email protected]
Fundamental Hypothesis of Notational Engineering
Many problems in government, science, business, the arts, and engineering exist solely because of the way we currently represent them. These problems present an apparent “complexity barrier” and cannot bean apparent complexity barrier and cannot be resolved with more computing power or more money. Their resolution requires a new abstraction, which becomes the basis of a notational revolution andbecomes the basis of a notational revolution and solves a whole class of previously-intractableproblems.
May 20032
A New Notational System Often Requires a Change of Paradigm
A way of looking at a subject A way of looking at a subject An example, pattern, archetype, or model
A set of unconscious assumptions we have about a subjectabout a subject
May 20033
Current Paradigm Assumption 1
Computer applications are defined in terms of algorithms and dataalgorithms and data
Algorithms are the rules which are used to manipulate h d d d l di ithe data; data and rules are distinct
The model for this is the abacus When using computer systems algorithms are When using computer systems, algorithms are
implemented as software But all knowledge should be stored in a formal
(executable) “public” and readily updateable format
May 20034
(executable), public , and readily updateable format
Current Paradigm Assumption 2
Software can be designed using the same approaches as other engineering fieldsas other engineering fields
– e.g. civil, electrical, or aeronautical engineering, using the “waterfall” development methodology
– but it’s not the same: in addition to being complex, software and the requirements it supports are dynamic and change greatly over short periods of time
A new design approach is required that can handle both complexity and changing requirements
May 20035
Current Paradigm Assumption 3
Subject experts can communicate their requirements to programmersprogrammers
– but their expertise took many years to acquire– their own understanding will evolve
But subject experts must see working prototypes, not paper representations (e.g. flowcharts, OO diagrams), in order to truly understand what they will be gettingin order to truly understand what they will be getting
Subject experts must be able to directly and continuously update an application’s rules as needed
May 20036
Ultra-Structure Addresses These Issues
Remove 99% of all rules from the software R t th i t d d If/Th f Represent them in a standard If/Then form (multiple ‘Ifs’, multiple ‘Thens’)
Represent them as records of data within a Represent them as records of data within a very small set of tables
Distinction between rules and data largely disappears!
May 20037
We Need a More Insightful Way to Look at Complex Systems and Processes
observables surface structuregenerates
rules
f l
middle structure
constrainsgroups of rules deep structure
May 20038
The Ruleform Hypothesis
Complex system structures are created by not-necessarily complex processes; and these processes are created by the animation of competency rules. Competency rules can be grouped into a small number of classes whose form is
ib d b " l f " Whil th t l fprescribed by "ruleforms". While the competency rules of a system change over time, the ruleforms remain constant. A well-designed collection of ruleforms can anticipate all logically possible competency rules that might apply to the system andpossible competency rules that might apply to the system, and constitutes the deep structure of the system.
May 20039
How are Rules Best Represented?
Statement of rules and device for executing them can be different; need not be software for bothbe different; need not be software for both
Rules can be reformulated into a canonical form of “If a and b and c... then consider x and y and z”Th d illi f l b d i 10 Thousands or millions of rules can be grouped into 10-50 ruleforms (classes of rules) based on their syntax and semantics
These ruleforms can be implemented as tables in a RDBMS and managed easily by standard RDBMS tools; the application essentially becomes an Expert
May 200310
; pp y pSystem using a RDBMS
What is the Design Process?
Design proceeds by iterative prototype with thl f db k f llmonthly feedback from users; small
prototypes can easily evolve to any necessary level of complexitynecessary level of complexity
Basic design process is to:define what exists (existential rules)– define what exists (existential rules)
– define relations between these (network & authorization rules)
May 200311– define processes (protocol & meta-protocol rules)
Ultra-Structure Benefits
Software size is reduced by 2+ orders of magnitudei l t t d t d t t d t d– simpler to create, manage, understand, test, document, and
teach– remaining software has no knowledge of the world; it provides
b i t l l i th t k h t t bl t h k i h tbasic control logic that knows what tables to check in what order, how to resolve conflicts, etc.
The development team is very small (e.g. <10 people) and is therefore much more manageable than a large team of dozens or hundreds of developers, and it does a better job by any metric
May 200312
a better job by any metric
Ultra-Structure Benefits (cont’d)
Most knowledge is externalized and is in a gform anyone can see and understand
Subject experts can enter, change, and j gotherwise manage rules (knowledge) directly, without going to programmers for assistance
Knowledge is actionable not only by subject experts (e.g. as an encyclopedia) but also by th t f i i l ti
May 200313
the computer, for reasoning, simulations, decision support, etc.
Ultra-Structure Benefits (cont’d)
Programmers do not need to know or d t d ll l j t h t d t iunderstand all rules, just enough to determine
the classes of rules and the proper animation proceduresprocedures
Serious prototyping becomes feasible; communications with users improvescommunications with users improves
Testing & QA can be far more rigorous Documentation can be more complete
May 200314
Documentation can be more complete
Early Prototype of Biology Model
An integrated prototype has been developed to: – simulate simple RNA->polypeptide processsimulate simple RNA polypeptide process– store and analyze laboratory results– store general biological and chemical knowledge– compare simulated and actual lab resultscompare simulated and actual lab results– track sources of knowledge
Key conceptual components of model include:BioEntities (chemical elements and compounds biological– BioEntities (chemical elements and compounds, biological objects such as amino acids and RNA, lab techs)
– BioEvents (activities engaged in by BioEntities)resources (people books lab equipment that provided
May 200315
– resources (people, books, lab equipment that provided information used in model)
Examples of BioEntities
May 200316
P ibl R l ti b tPossible Relations between BioEntities and/or BioEvents
May 200317
H f ll thi d l bHopefully, this model can be generalized (The CoRE Hypothesis)
We can create “Competency Rule Engines”, or CoREs, consisting of <50 ruleforms, that are sufficient to represent all rules found among systems sharing broad family resemblances, e.g. all corporations. Their definitive deep structure will be permanent,
h i d b t f ll b f th f il hunchanging, and robust for all members of the family, whose differences in manifest structures and behaviors will be represented entirely as differences in competency rules. The animation procedures for each engine will be relatively simpleanimation procedures for each engine will be relatively simple compared to current applications, requiring less than 100,000 lines of code in a third generation language.
May 200318
References
Long, J., and Denning, D., “Ultra-Structure: A design theory for complex systems and processes.” In Communications of the ACM (January 1995)y p ( y )
Long, J., “A new notation for representing business and other rules.” In Long, J. (guest editor), Semiotica Special Issue on Notational Engineering, Volume 125-1/3 (1999)
Long, J., “How could the notation be the limitation?” In Long, J. (guest editor), Semiotica Special Issue on Notational Engineering, Volume 125-1/3 (1999)
Long J "Automated Identification of Sensitive Information in Documents Long, J., Automated Identification of Sensitive Information in Documents Using Ultra-Structure". In Proceedings of the 20th Annual ASEM Conference, American Society for Engineering Management (October 1999)
May 200319