Applying a new software development paradigm to biology

21
Cover Page Uploaded June 26, 2011 Applying a New Software Development Paradigm to Biology Authors: M. C. Giddings and Jeffrey G. Long ([email protected]) Date: May 7, 2003 Forum: Poster session presented the Genome Informatics Conference, sponsored by Cold Spring Harbor Laboratory. Contents Page 1: Abstract Pages 220: Slides (but no text) for presentation License This work is licensed under the Creative Commons AttributionNonCommercial 3.0 Unported License. To view a copy of this license, visit http://creativecommons.org/licenses/bync/3.0/ or send a letter to Creative Commons, 444 Castro Street, Suite 900, Mountain View, California, 94041, USA.

description

May 7-11, 2003: Giddings, M. C. and Long, J. “Applying a New Software Development Paradigm to Biology: Developing applications that handle complexity and stand the test of time”. Poster session presented with Dr. M. C. Giddings, of the University of North Carolina, Chapel Hill, at the Genome Informatics Conference, sponsored by Cold Spring Harbor Laboratory.

Transcript of Applying a new software development paradigm to biology

Page 1: Applying a new software development paradigm to biology

Cover Page 

Uploaded June 26, 2011 

 

Applying a New 

Software Development 

Paradigm to Biology  

Authors: M. C. Giddings and Jeffrey G. Long ([email protected]

Date: May 7, 2003 

Forum: Poster session presented the Genome Informatics Conference, sponsored 

by Cold Spring Harbor Laboratory.

Contents 

Page 1: Abstract 

Pages 2‐20: Slides (but no text) for presentation 

 

License 

This work is licensed under the Creative Commons Attribution‐NonCommercial 

3.0 Unported License. To view a copy of this license, visit 

http://creativecommons.org/licenses/by‐nc/3.0/ or send a letter to Creative 

Commons, 444 Castro Street, Suite 900, Mountain View, California, 94041, USA. 

Page 2: Applying a new software development paradigm to biology

Genome Informatics Long Preference: Oral presentation APPLYING A NEW SOFTWARE DEVELOPMENT PARADIGM TO BIOLOGY M.C. Giddings, University of North Carolina; J. Long Rules are typically hard-coded into software applications, and the maintenance of these rules as they change, due to updated domain knowledge or user requirements, results in a significant time and cost expenditure. Subject experts must communicate the rules they wish to see automated to programmers who often are not experts in the subject matter of the application; much can be lost in the translation. As this process continues through time, software systems become large and unwieldy, such that no one involved in a project can comprehend or manage it as a whole. There have been numerous initiatives directed at solving these problems, but the solutions have been only partially useful because the problems they address are actually secondary and symptomatic rather than primary. The premise of Ultra-Structure theory is that these issues can be addressed by removing most rules and all knowledge of the world from software and instead representing them the same way we represent data, i.e. as tables in a relational database. This approach combines key features of the normally disparate areas of management information systems, expert systems, and simulations, borrowing the strengths of each and potentially eliminating some of the known problems of each. Ultra-Structure has been applied to a variety of rule-based systems, and we are investigating its utility for biology. In particular, we’ve been building a multi-function prototype that can be used to store, in an integrated and manageable way, laboratory results, simulations, and general biological knowledge pertaining to microbial genomics and proteomics research efforts. Based on results thus far we believe the approach warrants further investigation. The presentation is intended to introduce Ultra-Structure theory, discuss the prototype biological system being developed, and generate discussion with our peers about the benefits and pitfalls of this approach.

Page 3: Applying a new software development paradigm to biology

Applying a New Software Development P di t Bi lParadigm to Biology: Developing applications that handle

complexity and stand the test of time

Morgan Giddings and Jeff LongGenome Informatics [email protected], [email protected]

Page 4: Applying a new software development paradigm to biology

Fundamental Hypothesis of Notational Engineering

Many problems in government, science, business, the arts, and engineering exist solely because of the way we currently represent them. These problems present an apparent “complexity barrier” and cannot bean apparent complexity barrier and cannot be resolved with more computing power or more money. Their resolution requires a new abstraction, which becomes the basis of a notational revolution andbecomes the basis of a notational revolution and solves a whole class of previously-intractableproblems.

May 20032

Page 5: Applying a new software development paradigm to biology

A New Notational System Often Requires a Change of Paradigm

A way of looking at a subject A way of looking at a subject An example, pattern, archetype, or model

A set of unconscious assumptions we have about a subjectabout a subject

May 20033

Page 6: Applying a new software development paradigm to biology

Current Paradigm Assumption 1

Computer applications are defined in terms of algorithms and dataalgorithms and data

Algorithms are the rules which are used to manipulate h d d d l di ithe data; data and rules are distinct

The model for this is the abacus When using computer systems algorithms are When using computer systems, algorithms are

implemented as software But all knowledge should be stored in a formal

(executable) “public” and readily updateable format

May 20034

(executable), public , and readily updateable format

Page 7: Applying a new software development paradigm to biology

Current Paradigm Assumption 2

Software can be designed using the same approaches as other engineering fieldsas other engineering fields

– e.g. civil, electrical, or aeronautical engineering, using the “waterfall” development methodology

– but it’s not the same: in addition to being complex, software and the requirements it supports are dynamic and change greatly over short periods of time

A new design approach is required that can handle both complexity and changing requirements

May 20035

Page 8: Applying a new software development paradigm to biology

Current Paradigm Assumption 3

Subject experts can communicate their requirements to programmersprogrammers

– but their expertise took many years to acquire– their own understanding will evolve

But subject experts must see working prototypes, not paper representations (e.g. flowcharts, OO diagrams), in order to truly understand what they will be gettingin order to truly understand what they will be getting

Subject experts must be able to directly and continuously update an application’s rules as needed

May 20036

Page 9: Applying a new software development paradigm to biology

Ultra-Structure Addresses These Issues

Remove 99% of all rules from the software R t th i t d d If/Th f Represent them in a standard If/Then form (multiple ‘Ifs’, multiple ‘Thens’)

Represent them as records of data within a Represent them as records of data within a very small set of tables

Distinction between rules and data largely disappears!

May 20037

Page 10: Applying a new software development paradigm to biology

We Need a More Insightful Way to Look at Complex Systems and Processes

observables surface structuregenerates

rules

f l

middle structure

constrainsgroups of rules deep structure

May 20038

Page 11: Applying a new software development paradigm to biology

The Ruleform Hypothesis

Complex system structures are created by not-necessarily complex processes; and these processes are created by the animation of competency rules. Competency rules can be grouped into a small number of classes whose form is

ib d b " l f " Whil th t l fprescribed by "ruleforms". While the competency rules of a system change over time, the ruleforms remain constant. A well-designed collection of ruleforms can anticipate all logically possible competency rules that might apply to the system andpossible competency rules that might apply to the system, and constitutes the deep structure of the system.

May 20039

Page 12: Applying a new software development paradigm to biology

How are Rules Best Represented?

Statement of rules and device for executing them can be different; need not be software for bothbe different; need not be software for both

Rules can be reformulated into a canonical form of “If a and b and c... then consider x and y and z”Th d illi f l b d i 10 Thousands or millions of rules can be grouped into 10-50 ruleforms (classes of rules) based on their syntax and semantics

These ruleforms can be implemented as tables in a RDBMS and managed easily by standard RDBMS tools; the application essentially becomes an Expert

May 200310

; pp y pSystem using a RDBMS

Page 13: Applying a new software development paradigm to biology

What is the Design Process?

Design proceeds by iterative prototype with thl f db k f llmonthly feedback from users; small

prototypes can easily evolve to any necessary level of complexitynecessary level of complexity

Basic design process is to:define what exists (existential rules)– define what exists (existential rules)

– define relations between these (network & authorization rules)

May 200311– define processes (protocol & meta-protocol rules)

Page 14: Applying a new software development paradigm to biology

Ultra-Structure Benefits

Software size is reduced by 2+ orders of magnitudei l t t d t d t t d t d– simpler to create, manage, understand, test, document, and

teach– remaining software has no knowledge of the world; it provides

b i t l l i th t k h t t bl t h k i h tbasic control logic that knows what tables to check in what order, how to resolve conflicts, etc.

The development team is very small (e.g. <10 people) and is therefore much more manageable than a large team of dozens or hundreds of developers, and it does a better job by any metric

May 200312

a better job by any metric

Page 15: Applying a new software development paradigm to biology

Ultra-Structure Benefits (cont’d)

Most knowledge is externalized and is in a gform anyone can see and understand

Subject experts can enter, change, and j gotherwise manage rules (knowledge) directly, without going to programmers for assistance

Knowledge is actionable not only by subject experts (e.g. as an encyclopedia) but also by th t f i i l ti

May 200313

the computer, for reasoning, simulations, decision support, etc.

Page 16: Applying a new software development paradigm to biology

Ultra-Structure Benefits (cont’d)

Programmers do not need to know or d t d ll l j t h t d t iunderstand all rules, just enough to determine

the classes of rules and the proper animation proceduresprocedures

Serious prototyping becomes feasible; communications with users improvescommunications with users improves

Testing & QA can be far more rigorous Documentation can be more complete

May 200314

Documentation can be more complete

Page 17: Applying a new software development paradigm to biology

Early Prototype of Biology Model

An integrated prototype has been developed to: – simulate simple RNA->polypeptide processsimulate simple RNA polypeptide process– store and analyze laboratory results– store general biological and chemical knowledge– compare simulated and actual lab resultscompare simulated and actual lab results– track sources of knowledge

Key conceptual components of model include:BioEntities (chemical elements and compounds biological– BioEntities (chemical elements and compounds, biological objects such as amino acids and RNA, lab techs)

– BioEvents (activities engaged in by BioEntities)resources (people books lab equipment that provided

May 200315

– resources (people, books, lab equipment that provided information used in model)

Page 18: Applying a new software development paradigm to biology

Examples of BioEntities

May 200316

Page 19: Applying a new software development paradigm to biology

P ibl R l ti b tPossible Relations between BioEntities and/or BioEvents

May 200317

Page 20: Applying a new software development paradigm to biology

H f ll thi d l bHopefully, this model can be generalized (The CoRE Hypothesis)

We can create “Competency Rule Engines”, or CoREs, consisting of <50 ruleforms, that are sufficient to represent all rules found among systems sharing broad family resemblances, e.g. all corporations. Their definitive deep structure will be permanent,

h i d b t f ll b f th f il hunchanging, and robust for all members of the family, whose differences in manifest structures and behaviors will be represented entirely as differences in competency rules. The animation procedures for each engine will be relatively simpleanimation procedures for each engine will be relatively simple compared to current applications, requiring less than 100,000 lines of code in a third generation language.

May 200318

Page 21: Applying a new software development paradigm to biology

References

Long, J., and Denning, D., “Ultra-Structure: A design theory for complex systems and processes.” In Communications of the ACM (January 1995)y p ( y )

Long, J., “A new notation for representing business and other rules.” In Long, J. (guest editor), Semiotica Special Issue on Notational Engineering, Volume 125-1/3 (1999)

Long, J., “How could the notation be the limitation?” In Long, J. (guest editor), Semiotica Special Issue on Notational Engineering, Volume 125-1/3 (1999)

Long J "Automated Identification of Sensitive Information in Documents Long, J., Automated Identification of Sensitive Information in Documents Using Ultra-Structure". In Proceedings of the 20th Annual ASEM Conference, American Society for Engineering Management (October 1999)

May 200319