Ernst - Unknown - Comparative Genomics and the Gene Concept

29
COMPARATIVE GENOMICS AND THE GENE CONCEPT ZACHAR Y ERNS T  A BSTRACT. The gene conc ept has fallen on hard time s in the philosoph y of biology . Altho ugh  we are confronted on a regular basis with reports that ‘the gene for such-and-such has been disc ov ere d, the rec eiv ed vie w in the phi los ophy of biolog y is that curren t workin gen omi cs shows that there is no such thing as the gene. In this paper, I argue that such a skeptical conclusion is unwarr anted. In fact, conte mpora ry work in genomics not only shows us that the gene does exist, but it points the way toward a precise characterization of the gene concept. In the course of making thi s argume nt, I pro vide an ov ervi ew of one con tempor ary appr oac h to gene dis cover y and genome annotation that makes crucial use of techniques from computer science. 1. I NTRODUCTION If there is a philosophical consensus on the status of the gene, it would be that current re- search in to molecular biology shows us that the gene is an outmoded concept. John Dupré p ut the point succinctly when he said that such modern research was ‘the beginning of the end’ of the traditional concept of the Me ndelian gene. This argument o wes much to the work of David Hull [8,9], whose classic skeptical stance on the reality of the gene has become somewhat of a received view. But the received view is mistaken; we have good reason to hold onto a suitably revised gene conc ept. In this paper , I will argue that doubt s about the gene conc ept are rooted in a faul ty theory of reference for theoretical terms. When we critically examine how the theory of refer- ence should be appl ie d to terms such as ‘gene’, then we see that we must attend to the details of  contemporary genomics resear ch if we are to determine whether genes exist. Accordi ngly , this paper provides an overview of one approach to comparative genomics research. This research strongly suggests a revised, but recognizable gene concept. This concept crucially make s re- course to the evolution of modularity . Thus, while I propose a positive solution to the problem of characterizing the gene concept, it also turns out that genomics research focuses our atten- tion on another (and perha ps more impor tant) prob lem. This is the proble m of unde rstan ding  why natural selection sometimes seems to favor the evolution of highly modular structures . Date : April 10, 2008. Many thank s to Ross Overbe ek for his instructio n at Argonne Natio nal Laborato ry, and to Alexa nder Rosen berg for saving me from a couple of awful howlers in this paper. 1

Transcript of Ernst - Unknown - Comparative Genomics and the Gene Concept

8/12/2019 Ernst - Unknown - Comparative Genomics and the Gene Concept

http://slidepdf.com/reader/full/ernst-unknown-comparative-genomics-and-the-gene-concept 1/29

COMPARATIVE GENOMICS AND THE GENE CONCEPT

ZACHARY ERNST

 A BSTRACT. The gene concept has fallen on hard times in the philosophy of biology. Although

 we are confronted on a regular basis with reports that ‘the gene for such-and-such’ has been

discovered, the received view in the philosophy of biology is that current workin genomics shows

that there is no such thing as the gene. In this paper, I argue that such a skeptical conclusion is

unwarranted. In fact, contemporary work in genomics not only shows us that the gene does

exist, but it points the way toward a precise characterization of the gene concept. In the course

of making this argument, I provide an overview of one contemporary approach to gene discovery 

and genome annotation that makes crucial use of techniques from computer science.

1. INTRODUCTION

If there is a philosophical consensus on the status of the gene, it would be that current re-

search into molecular biology shows us that the gene is an outmoded concept. John Dupré put

the point succinctly when he said that such modern research was ‘the beginning of the end’ of 

the traditional concept of the Mendelian gene. This argument owes much to the work of David

Hull [8,9], whose classic skeptical stance on the reality of the gene has become somewhat of a

received view.

But the received view is mistaken; we have good reason to hold onto a suitably revised gene

concept. In this paper, I will argue that doubts about the gene concept are rooted in a faulty 

theory of reference for theoretical terms. When we critically examine how the theory of refer-

ence should be applied to terms such as ‘gene’, then we see that we must attend to the details of 

contemporary genomics research if we are to determine whether genes exist. Accordingly, this

paper provides an overview of one approach to comparative genomics research. This research

strongly suggests a revised, but recognizable gene concept. This concept crucially makes re-

course to the evolution of modularity. Thus, while I propose a positive solution to the problem

of characterizing the gene concept, it also turns out that genomics research focuses our atten-

tion on another (and perhaps more important) problem. This is the problem of understanding 

 why natural selection sometimes seems to favor the evolution of highly modular structures.

Date : April 10, 2008.

Many thanks to Ross Overbeek for his instruction at Argonne National Laboratory, and to Alexander Rosenberg for

saving me from a couple of awful howlers in this paper.

1

8/12/2019 Ernst - Unknown - Comparative Genomics and the Gene Concept

http://slidepdf.com/reader/full/ernst-unknown-comparative-genomics-and-the-gene-concept 2/29

2. THE  LOGIC OF S KEPTICAL  A RGUMENTS

In order to motivate the central argument of this paper, I shall summarize and critique com-

mon arguments that aim to establish that the gene does not exist. When these skeptical argu-

ments have been criticized, we shall have better motivation to examine current research into

genomics in the following sections.

Current research into molecular biology has dashed all hope of a simple molecular imple-

mentation of the Mendelian gene. If we had hoped that genes would supervene on simple,

contiguous, easily identifiable stretches of DNA, then we must at least lower our expectations.

 Although classical genetics makes use of notions of dominant or recessive genes, it is now well

understood that such concepts are, at best, useful but severe idealizations. Genes (if they ex-

ist) are neither implemented in a simple, straightforward manner, nor are they inherited in a

simple, straightforward manner.

It is from these uncontroversial premises that skeptics about the gene – including John Dupré

and David Hull – make their arguments. These arguments draw upon premises that are often

pressed into service for anti-reductionist arguments concerning the gene. Indeed, I shall argue

that these arguments are too  closely related to these anti-reductionist arguments.

 According to these skeptical arguments, the term ‘gene’ is supposed to refer to whatever en-

tity implements the mechanisms of inheritance in a way that approximates classical Mendelian

theory about inheritance. So genes exist only if there is something that does implement inher-

itance in such a way. But when we begin to investigate how various segments of DNA imple-

ment the mechanisms of inheritance, we quickly discover that there is no simple story to be

told. The same, or functionally same, phenotypic characteristics are famously understood to

be multiply realized by many different possible segments of DNA [26]. Furthermore, owing to

complications arising from developmental facts,identical segments of DNA may instantiate dif-

ferent phenotypic characteristics. The point is a familiar one from anti-reductionist arguments,

namely, that the relationship between genotype and phenotype is hopelessly many-many, not

capable of any simple characterization by any finite set of bridge laws.

It should strike us as odd that these premises – which are typically the premises of anti-

reductionist arguments – should be pressed into service to support a non-existence claim about

genes. After all, reductionist theses are typically understood as conclusions about explanations

and terms; that is, reductionism is a  linguistic  thesis. But existence claims are obviously onto-

logical theses. Alan Garfinkel puts the point succinctly:

So reductionism, which is on its face an ontological question, is really a question

about the possibility of explanation: to say that something reduces to something 

else is to say that certain kinds of explanations exist. [5, p. 443]2

8/12/2019 Ernst - Unknown - Comparative Genomics and the Gene Concept

http://slidepdf.com/reader/full/ernst-unknown-comparative-genomics-and-the-gene-concept 3/29

Thus, it might appear at first glance to be a non-sequitur when Hull and Dupré argue that genes

do not exist by providing premises of anti-reductionist arguments. So it is important to try to

reconstruct this line of reasoning in more detail.

It is well appreciated that during the Modern Synthesis, it was Mendel’s work on the mech-

anisms of inheritance that allowed Darwin’s theory of evolution by natural selection to be puton a solid theoretical foundation. Although he had no way of guessing as to the physical im-

plementation of inheritance, Mendel’s insight was to recognize that the observed facts of in-

heritance could be explained by positing theoretical entities called ‘genes’ that would somehow 

influence the development of organisms, while also following simple rules of transmission from

parent to offspring.

Mendel’s rules of inheritance assumed that these posited entities would fall into various cat-

egories, including ‘dominant’ and ‘recessive’, that each gene would have an equal probability 

of being passed along from parent to offspring (the so-called ‘independence of assortment’ as-

sumption), and that they would affect the development of the organism in a straightforwardmanner. Of course, none of these assumptions have been borne out in the long run – inheri-

tance, for example, can be affected by so-called ‘driving genes’, and the mechanisms of assort-

ment are severely affected by the location of particular stretches of DNA along the chromosome.

Specifically, if two stretches of DNA are close together on the chromosome, then the probability 

that one will be inherited by the offspring is positively correlated with the inheritance of the

other. So these Mendelian assumptions have turned out to be false.

Skeptics about the gene have used these complications in a deceptively simple argument.

If the theoretical term ‘gene’ refers to an entity that controls inheritance, and which assorts

independently, then there simply is no such thing that answers to that description. Hence, theterm ‘gene’ fails to refer to anything at all; therefore, we are to conclude that genes do not exist.

Hull puts the argument in an interesting way. According to Hull, we have to distinguish two

possible scenarios that could play out in a reduction of one theory to another. On the one hand,

it may turn out that the reduced theory is discovered to be incorrect in some relatively minor

 ways; thus, in order to carry out the reduction, we would have to first ‘correct’ it in order to bring 

it into line with the reducing theory. Such would presumably be the case when we discover how 

to reduce (e.g.) Newton’s law of cooling to statistical mechanics – whereas we originally had

a deterministic and non-probabilistic theory, we ‘correct’ it by introducing statistical factors

into the theory. But according to Hull, this is not a problematic case, because the theory is

recognizably the same both before and after the reduction has been carried out.

On the other hand, it is possible to discover that the reduced theory must be modified beyond

recognition in order to bring it into line with the reducing theory. In such a case, we cannot

simply say that we are ‘correcting’ the reduced theory – instead, we are replacing it. As Hull puts

the point regarding the reduction of classical Mendelian genetics:

3

8/12/2019 Ernst - Unknown - Comparative Genomics and the Gene Concept

http://slidepdf.com/reader/full/ernst-unknown-comparative-genomics-and-the-gene-concept 4/29

My intuitive impression continues to be that the differences between the cor-

rected and uncorrected versions of these theories are too numerous and too

fundamental to consider the relationship between the two corrected theories re-

duction in the formal sense of the term. Pre-analytically, the relation between

Mendelian and molecular genetics is a paradigm case of theory reduction, butfrom the point of view of the logical empiricist analysis of theory reduction, it

looks more like replacement. [8, p. 660]

However, the simplicity of that argument belies a deep difficulty concerning the reference of 

theoretical terms. For nowhere else do we tie a term’s denotation to its original intensional

meaning. For example, although it is certainly true that the term ‘atom’ was originally intro-

duced to refer to an ‘indivisible thing’, and that there is no such (known) indivisible thing, we

do not conclude that atoms do not exist. Rather, we simply recognize that the original con-ception of the atom was in error. Indeed, if any theoretical term is ‘unrecognizable’ from the

perspective of its original meaning, the term ‘atom’ is.

In general, we feel free to allow the sense of a theoretical term to shift under the influence

of new information concerning that term. Thus, as we discovered that atoms were indeed ca-

pable of being divided into component parts, we simply allowed the term ‘atom’ to continue

to refer to those entities, in spite of the fact that they turned out not to answer to their original

conception. This strategy is underwritten by the causal theory of reference, attributed primarily 

to Quine [25] and Kripke [15]. According to the causal theory of reference, proper names and

theoretical terms may initially have their reference fixed with the help of a connotative defini-tion, their reference is in fact fixed by virtue of a causal chain which runs from the user of the

term (e.g. a practicing scientist) back through a series of experiences which may include con-

versation, writing and so on. That chain will eventually terminate in some causal influence that

the entity in question had on someone who fixed the referent of the term by stipulation. The

upshot of the causal theory of reference is that it is this causal relationship, and not a set of 

necessary and sufficient conditions, that fixes the referent of a theoretical term. In this way, we

are able to account for the continuity of a scientific theory in the face of radical theory change.

For although the meaning of a theoretical term may eventually change to the point at which it

is unrecognizable to its original users, the causal chain leading from that entity to the users of 

the term remains.

For the present purposes, the lesson is straightforward. We do not attempt to defend the

view that the Mendelian concept of the gene is alive and well. But we ought to question Hull’s

assumption that there is any cut-off point after which the term has been so dramatically revised

that it loses its ability to refer to the same entity. We should not expect the meaning of the term

‘gene’ to remain fixed in the light of ongoing scientific research any more than we should expect4

8/12/2019 Ernst - Unknown - Comparative Genomics and the Gene Concept

http://slidepdf.com/reader/full/ernst-unknown-comparative-genomics-and-the-gene-concept 5/29

the term ‘atom’ to retain its original connotation. It is a mistake to assert that any set of con-

ditions can be attached to the term that are necessary for that term to refer. Specifically, con-

tiguity of the chromosome, independence of assortment, a simple developmental story from

genotype to phenotype, and all such other conditions are not necessary (singly or jointly) for

the term ‘gene’ to refer.

2.1.  Indispensability.  According to a line of argument that has become widely accepted, we

are justified in claiming that a theoretical term refers to a real entity if the use of that term is

indispensable in explaining observed phenomena. Normally, however, when we are able to for-

mulate bridge laws relating some supervenient entity  A  to its underlying physical implemen-

tation B  in a straightforward, suitably non-disjunctive way, then that reduction may be taken

to show that we can  replace any mention of  A  in our explanations with a translation into thelanguage of  B . In other words, when we have a successful reduction in hand, that is taken to

show that the reduced entity is  dispensable . Thus, if a reduction is evidence at all concerning 

existence, then it should speak  against  the existence of the reduced entity. Conversely, when

 we find that we are unable to carry out a reduction, then we will typically assume that we are

correspondingly unable to eliminate the term in question. Thus, the use of that term is more

likely to be in dispensable.

For example, suppose that a metaphysical argument is proposed that only basic substances

such as subatomic particles exist, but not the ordinary objects such as tables and chairs that we

ordinarily take to be composed of those basic substances. Such metaphysical arguments typi-cally proceed by showing that there is no explanation or causal power possessed by tables and

chairs that cannot be fully explained by the causal powers of the particles that (we ordinarily 

take to) compose tables and chairs. Thus, the argument goes, we can – at least in principle – re-

place any talk of these ordinary objects with talk of basic substances. And so, the dispensability 

of these entities is taken as defeating any reason to believe that they do exist.

So regarding the gene, we find that there is a tension between antireductionist arguments

and arguments purportedly establishing that genes do not exist. For normally, the premises

of antireductionist arguments are taken to imply that the unreducible entity is  in dispensable,

and that we therefore have reason to believe that the entity exists. On the other hand, if the

entity in question can be reduced, then the use of that term is  dispensable , and we thereby 

lack at least some important justification for saying that the entity exists. But if we were to

accept the arguments of Hull and others, then the gene is completely different. For they take the

premises of antireductionist arguments to show that genes do not  exist. This tension between

antireductionism and indispensibility provides a further reason to question such arguments

purporting to show that the gene does not exist.5

8/12/2019 Ernst - Unknown - Comparative Genomics and the Gene Concept

http://slidepdf.com/reader/full/ernst-unknown-comparative-genomics-and-the-gene-concept 6/29

3. FUNCTIONAL  CHARACTERIZATION OF GENES?

In the best of all possible worlds, we could simply define each particular gene as a particular

sequence of nucleic acids, located in a specific place on the chromosome. It is a point not worth

belaboring here that such a definition is hopeless [26]. Obviously, if genes do exist, then there

 will be numerous small changes to the particular sequence of nucleic acids that will not affect

the identity of the gene. Furthermore, as philosophers of biology have long understood, the

same gene type may be tokened at two or more different locations on the chromosome without

affecting the identity of the gene. Indeed, such shifts appear to play a crucial role in evolutionary 

processes, and the reconstruction of the history of such changes gives us valuable insight into

the evolution of various species.1

For a philosopher of science, when a physical characterization fails, the obvious next step is

to try for a functional characterization. That is, for any particular gene, we may try to define it

using the following schema:

(3.1) Gene X   =def  any nucleic acid sequence performing function F 

Unfortunately, as is also well-appreciated by philosophers of biology, it is common for a par-

ticular sequence that performs one function in some species to perform a different function in

another species. Intuitively, we would like to be able to claim – if we were to have a workable

gene concept – that the same gene performs two different functions. However, schema (3.1) will

not countenance such a claim. Of course, one could always hold out for a disjunctive version

of schema (3.1), but there is no  a priori  way to set an upper limit on the number of possible

functions that a gene could perform. One could reasonably suspect, in fact, that without set-

ting an arbitrary limit on the number of possible contexts in which a particular sequence might

appear, that there is no upper limit to be had at all. Thus, it looks as if neither a physical and

reductive definition, nor a functional non-reductive definition will work for defining the gene.

No wonder, then, that philosophers of biology have despaired of coming up with a workable

definition of the gene.

4. C AUS ATIO N AND THEORETICAL  TERMS

 At this point, we have a trio of problematic proposals regarding the reference of the theoret-

ical term ‘gene’. First, we have the traditional Mendelian gene concept, which is well-knownto be incorrect, or at least to be so severely idealized that it is not to be found in the genome.

Second, we have the philosophical positions advocated by Hull and Dupré, according to which

the term ‘gene’ simply fails to refer at all. But as I have argued above, their negative arguments

ultimately fail because they rely upon problematic theories about the reference of theoretical

terms. Third, we have the possibility that a functional characterization of the gene concept can

1See below, in section 6.

6

8/12/2019 Ernst - Unknown - Comparative Genomics and the Gene Concept

http://slidepdf.com/reader/full/ernst-unknown-comparative-genomics-and-the-gene-concept 7/29

be made out. But for familiar reasons having to do with multiple realizability, this approach

fails as well.

These difficulties may properly be considered symptoms of a deeper problem regarding the

gene concept. For the question of whether the gene exists should be interpreted as the question

of whether the theoretical term ‘gene’ successfully refers. Thus, the question of whether thegene exists is primarily a question for the philosophy of language and specifically for the theory 

of reference. And the gene concept provides a particularly difficult test case for a theory of 

reference.

 As I have argued above, it is too quick to argue that the term ‘gene’ fails to refer merely be-

cause our current understanding of genetics demonstrates that the Mendelian gene concept is

inadequate. For such an argument implicitly depends upon a theory of reference that fixes that

reference of a term by giving something like a definite description of it. And such a picture has

been long recognized to be inadequate for the task of accounting for theory change. Thus, we

should not be surprised to find that such a theory of reference turns out to be inadequate forcharacterizing as complex a theory as that of genetic inheritance. Accordingly, a defense of the

gene concept requires (at least an outline of) a defense of a theory of reference that is plausible

on its own, while remaining compatible with the view that the term ‘gene’ successfully refers.

Unfortunately, the subject of the reference of theoretical terms is far too complex for the cur-

rent paper. However, I think that it is possible to argue that a causal theory of reference allows

us to retain a meaningful gene concept. That gene concept is one that emerges as a result of 

current research into genomics. Furthermore, standard objections to the causal theory of ref-

erence – as it is applied to theoretical terms – are problematic. This will be the subject of the

current section.

4.1.   Ostension and Theoretical Terms.   The obvious alternative to a theory of reference that is

based on definite descriptions or other intensional meanings is a causal theory. Indeed, the

causal theory of reference has become the received view for theoretical terms precisely because

it is capable of accounting for how terms maintain their reference while their sense changes

significantly. Thus, adopting a causal theory is a promising strategy for accounting for the gene

concept.

However, we immediately run into difficulties if we try to straightforwardly apply the causal

theory to this case. For on a standard picture of the causal theory, a term acquires its reference

through an initial ‘baptism’, in which a demonstrative is used to fix the reference of a term. For

example, a parent may fix the reference of the term ‘Joe’ by indicating a child and using the

demonstrative, ‘that child shall be called ‘Joe’ from now on’. Thus, the reference of a name may 

be fixed without having in mind a definite description of the object named. Furthermore, when

a person uses the name to refer to the object, she may successfully do so despite the fact that

her own understanding of the object’s properties are quite incorrect. So long as their use of the7

8/12/2019 Ernst - Unknown - Comparative Genomics and the Gene Concept

http://slidepdf.com/reader/full/ernst-unknown-comparative-genomics-and-the-gene-concept 8/29

8/12/2019 Ernst - Unknown - Comparative Genomics and the Gene Concept

http://slidepdf.com/reader/full/ernst-unknown-comparative-genomics-and-the-gene-concept 9/29

are what we might call ‘hybrid theories’, which are causal, but require intensional information

about at least some of the terms in order to fix their reference – the theories of Enç and Nola are

examples of this kind of hybrid theory.

The reference of the term ‘gene’ is threatened under any hybrid theory of reference, since the

intension of the term has obviously changed a great deal in the history of genetics. However,hybrid theories of reference face difficulties because the distinction between ostensible and

non-ostensible terms is extremely problematic. This is simply because the ability of an entity to

be directly observed just is a particular kind of causal power the entity possesses. Thus, even an

entity that is directly ostensible is ostensible because it has the causal power to affect our sense

organs in a particular way. This is apparent in Kripke’s discussion of the term ‘heat’, where he

describes the causal powers of molecular motion in terms of their ability to create certain effects

in our nervous system. In characterizing the manner in which we ostensibly refer to heat,Kripke

seems to equate reference by direct ostension with reference by more indirect methods:

 At any rate, we are able to identify heat, and be able to sense it by the fact that that

it produces in us a sensation of heat. It might here be so important to the concept

that its reference is fixed in this way, that if someone else detects heat by some

sort of instrument, but is unable to feel it, we might want to say, if we like, that

the concept of heat is not the same even though the referent is the same. [15, p.

131]

In short, because the ability to be observed is an instance of a causal power that could figure

into the use of the schema (S), it is far from clear how to draw a distinction between ostensible

and non-ostensible terms.

2

But even if we put aside this difficulty for the time being, we can stillidentify two two general types of cases that have traditionally been used to motivate a hybrid

account of how the reference of theoretical terms is fixed. These two cases are:

(1) cases in which the intensional meaning of a term is inadequate for fixing its reference,

and

(2) cases in which we are more likely to abandon the term rather than radically revise its

intensional meaning.

My contention here is that by attending to these cases, we are led to a better modification of the

theory of reference for theoretical terms, and that this modification makes sense of the contin-

ued use of the term ‘gene’. I shall discuss each in turn, before outlining the positive proposal.

2The difficulty of distinguishing between ostensible and non-ostensible terms is parallel to the familiar difficulty 

of distinguishing between observable and non-observable entities. For ‘direct’ observation requires the observed

thing to exert a causal influence upon our sense organs and a (perhaps implicit)theory of how theresulting sensory 

impressions reveal facts about it. In fact, I think it is reasonable to suspect that the distinctions between ostensible

and non-ostensible entities on the one hand, and observable and unobservable entities on the other, stand or fall

together.

9

8/12/2019 Ernst - Unknown - Comparative Genomics and the Gene Concept

http://slidepdf.com/reader/full/ernst-unknown-comparative-genomics-and-the-gene-concept 10/29

4.2.  Thales and the amber.  Nola has contended that if the bare causal theory were correct,

then people would be in a position to fix the reference of theoretical terms when they clearly 

lack the necessary level of understanding to do so. In particular, anyone who was in a position

to observe the effects of some theoretical entity would be able to stipulate a name for whatever

it is that happens to be the cause of those effects. But according to Nola, it is clear that (at leastin many cases) more is required.

For example, Nola recounts a story about Thales, who observed (what turned out to be) the

buildup of electrical charge on a piece of amber after it had been rubbed. If the bare causal the-

ory were correct, then Thales would have been in a position – with no further information about

electricity – to stipulate a term for ‘whatever it is that causes the attractive effects of amber after

it has been rubbed’, and would thereby have fixed the reference of a term upon electricity. But

according to Nola, this sort of case should strike us as wrong – it attributes ‘too much scien-

tific prescience to Thales in the absence of any theory about the item so picked out’ [18, p. 516].

Rather, in order for Thales to have successfully picked out electricity, he would have had to havehad some theory about how the entity causally brings about its effects.

However, even if we share Nola’s intuitions about Thales’s alleged inability to fix the reference

of any term upon electricity, there is still a difficult problem with requiring that Thales would

have to have had a theory about how electricity causes the attractive powers of the amber. This

difficulty can be brought out as a dilemma, for we must either require that the theory be correct

(or nearly correct), or we must waive the requirement. It should be clear that the first horn of 

the dilemma is unattractive for two reasons. First, it is plainly too demanding, and would put

the cart before the horse in that it often turns out that it is necessary to fix the reference of a

term before engaging in the kind of research that could lead to the correct theory about theentity’s causal powers. Second, if we require a correct theory of the entity’s causal powers, then

 we are treading too closely to a definite description theory of reference – for the correct theory 

could simply be used to fix the reference of the theoretical term without having to worry about

a causal theory of reference at all.

But we cannot weaken the requirement of truth, either. For suppose that we require that in

order for Thales to be able to fix the reference of the term, he need only have some theory or

other – even a false one. Although it is certainly true that a term may have its reference fixed

in spite of the fact that the intensional meaning of the term is wrong, it is strange to  require 

such a theory, while admitting that it might be totally false. To put the point rhetorically, it is

fair to wonder what a false theory adds to the reference-fixing ability of Thales that cannot be

otherwise be met while being agnostic about how the entity causes its observable effects. I thus

conclude that cases such as this one do not pose a difficulty for a bare causal theory of reference.

4.3.   Phlogiston.  We need now to consider cases in which the use of some term is abandoned

as we discover new information suggesting that the term fails to refer. The standard example10

8/12/2019 Ernst - Unknown - Comparative Genomics and the Gene Concept

http://slidepdf.com/reader/full/ernst-unknown-comparative-genomics-and-the-gene-concept 11/29

of this phenomenon is the failure of the theoretical term ‘phlogiston’ to refer to any real entity.

Enç and Nola both argue that the reference of ‘phlogiston’ was to have been fixed partially in

virtue of the intensional meaning of the term; thus, when it was discovered that its intensional

meaning was not satisfied by any real entity, that discovery ‘was tantamount to discovering that

phlogiston did not exist’ [2, p. 271].But the interesting feature of this example, which makes it not good support for any hybrid

theory of reference, is that the intensional meaning of the term was inextricably bound up with

the causal powers that were attributed to phlogiston. The following discussion from Enç is in-

structive:

For example, in the phlogiston case, when the term “phlogiston” was introduced,

it was at least believed that whatever causes fire can saturate air during combus-

tion and that when the air is saturated the fire dies out... Furthermore, the belief 

that this substance had the power to restore the metallic properties of calx and

to lead to death by suffocation... led to the belief that the substance in question

 was a new kind of substance. [2, p. 271].

Thus, when these beliefs were discovered to be false – i.e. that there is no substance meeting 

that description – scientists concluded that phlogiston does not exist. From this, Enç concludes

that ‘in introducing a term, the scientist is not just naming whatever it is that is responsible for

such and such phenomena, he is rather naming a kind of object partially specified by the kind-

constituting properties he believes the object to have and by the context in which the object

plays its explanatory role’ [2, p. 271]. According to this argument, a bare causal theory would

have it that the scientists were referring to oxygen (since oxygen is what is responsible for com-bustion), and they would merely have discovered that ‘phlogiston’ actually refers to oxygen, but

that some of their other beliefs about phlogiston were false (for example, that it is responsible

for suffocation).

Kyle Stanford and Philip Kitcher call this the ‘no failure of reference problem’ for the causal

theory [28]. In general form, the problem is that so long as the person who introduces the term

defines it as ‘the cause of  X ’, where X  is some real effect of some cause or other, then the term

is guaranteed to refer to that cause, whatever it may turn out to be. But their intuition, which is

plausible enough, is that if the cause turns out to be totally different from what the introducer

of the term has supposed it to be, then we are better off judging that the term fails to refer at all.

However, it is not so clear that the bare causal theory of reference really does lack the re-

sources to yield the correct judgment that ‘phlogiston’ fails to refer. In short, I think it is fair

to say that those who use this particular episode in the history of science have cherry-picked

certain features of the example. To see this, consider a simplified and fictional case resembling 

the historical example. Let us suppose that a scientist we shall call ‘Williams1’ inquires as to the

cause of combustion, supposing that there may be some such substance, and he accordingly 11

8/12/2019 Ernst - Unknown - Comparative Genomics and the Gene Concept

http://slidepdf.com/reader/full/ernst-unknown-comparative-genomics-and-the-gene-concept 12/29

defines ‘phlogiston1’ as ‘whatever substance causes combustion’. Our fictional scientist may 

develop all sorts of other beliefs about phlogiston1, many or all of which may be mistaken. He

may believe, for example, that it is emitted from burning bodies or that it has a negative weight.

But let us suppose that his original reference-fixing stipulation makes recourses only to the

particular causal property of causing combustion. When Levoisier discovers oxygen, Williams1may quite reasonably assert that ‘phlogiston1 is oxygen’, in spite of the fact that many of his spe-

cific beliefs about phlogiston1 will have to be revised or abandoned completely. And of course,

this is just what the bare causal theory would have be the case.

Now let us complicate the example somewhat. Suppose that another scientist – Williams2 –

inquires as to the causes of combustion and suffocation, hypothesizing that some substance is

the common cause of both. Then he stipulates that ‘phlogiston 2’ shall refer to ‘whatever sub-

stance is the cause of combustion and suffocation’. Like his counterpart, he may form a variety 

of other beliefs about this new substance, but these play no role in fixing the reference of the

term ‘phlogiston2’. Also like his counterpart, Williams2 stipulates the reference of ‘phlogiston2’according to schema (S) above, but in this case,Φ is conjunctive.

 According to the ‘no failure of reference’ objection, the bare causal theorist is committed to

the untenable thesis that ‘phlogiston2’ refers to something , when in fact, it fails to refer to any-

thing at all. However, the bare causal theory has the resources to yield the correct conclusion.

 After all, there is no substance whatsoever that is both the cause of combustion and suffocation.

So the bare causal theory does not erroneously say that ‘phlogiston 2’ refers to oxygen (or any 

other substance). Rather, a bare causal theory may rightly conclude that the term simply fails

to refer at all.

This example suggests that when the reference of a theoretical term ‘T ’ is fixed by stipulating that refers to whatever is the cause of Φ, then it is possible that ‘T ’ will not refer if there is no

single kind of entity that is the cause of Φ. One way that this can happen is if it is supposed that

Φ has some singular cause, but in fact, two or more different kinds of entity are the cause of 

Φ. And of course, this is precisely the type of case which proponents of hybrid theories use for

support.3

 A variety of other objections have been made to the bare causal theory of reference.4 I believe

that these other objections can be met. However, because the purpose of this paper is simply 

to defend the reference of one particular theoretical term – ‘gene’ – I shall assume at this point

that I have sufficiently motivated some doubts about the need for adopting a hybrid theory of 

reference.

3To take another example, Enç [2] discusses a hypothetical case in which Jones uses the name ‘Snowwhite’ to pick 

out the entity – whatever it is – that ate his lettuce and carrots last night. As the example proceeds, however, Jones

attributes many other events to Snowwhite (e.g. breaking Jones’s teacup, getting into the peanut butter). And as

these other causal powers are attributed to Snowwhite, Enç motivates the intuition that Jones is not successfully 

referring to anything. But this case may be dealt with in the same way as the phlogiston example above.4For instance, see Kitcher’s discussion of the so-called ‘qua problem’.

12

8/12/2019 Ernst - Unknown - Comparative Genomics and the Gene Concept

http://slidepdf.com/reader/full/ernst-unknown-comparative-genomics-and-the-gene-concept 13/29

5. CONNECTING  REFERENCE TO RESEARCH

I have argued that standard objections to the bare causal theory of reference ultimately fail,

and that in particular, criticisms fail that have been leveled against the term ‘gene’. However,

there is a motivation of these criticisms that is worth examining in more detail, with the aim

of saying something more positive about the reference of theoretical terms. The motivation for

criticisms of the causal theory seems to be that when the intensional meaning of the term has

changed beyond recognition, the research program within which the term was to play a role

must be abandoned or changed entirely. Frederick Kroon expresses this motivation explicitly:

Once again, then, the burden of reference for the term introduced rests broadly 

on the theory within which the term is embedded, and not on some cautious

causal descriptions of the form: ‘whatever it is that is responsible for such and

such phenomena’... [16, p. 50]

Here, I think that Kroon uses a correct observation to support a criticism that is too general.Specifically, Kroon is right to say that ‘the burden of reference... rests broadly on the theory 

 within which the term is embedded’. But when Kroon goes on to say that the ‘cautious causal

description’ – what we have been calling scheme (S ) – does not underpin the reference of the

theoretical term, this suggests the dubious position that the causal description of the entity can

be separated from the role of the referring term in the underlying research programme.

However, the nature of the research programme within which the term is embedded is de-

termined largely by the causal powers we attribute to the referent of that term. For example,

Kroon considers the case of Neptune, which was used by Kripke to support his causal theory 

of reference. Kroon asks us to consider the following purported counterexample to the causaltheory. Suppose that the term ‘Neptune’ was introduced to refer to whatever it is that is the

cause of some observed perturbations in the orbits of various planets. Of course, the original

intensional meaning of ‘Neptune’ was include the proposition that the entity is an unobserved

planet. Now suppose we were to discover that, through a very indirect and subtle route, Earth

is responsible for the observed perturbations.

 According to Kroon, the causal theory of reference is committed to the view that ‘Neptune’

refers to the planet Earth, whereas the correct conclusion is that the term ‘Neptune’ does not

refer at all. Although this may be the correct conclusion to draw in this specific case, I think 

that Kroon, Enç, and Nola have misdiagnosed the motivation for abandoning theoretical terms

(when it is appropriate to do so). For what motivates us to abandon a particular theoretical term

is that the entire research program ‘within which the term is embedded’ is given up. In contrast,

Kroon, Enç, and Nola assume that the divergence from some intensional meaning of the term

is what is responsible for the failure of the term to refer. But these are distinct phenomena – it

is possible for the intensional meaning to change without dramatically affecting the research

programme, and vice-versa.13

8/12/2019 Ernst - Unknown - Comparative Genomics and the Gene Concept

http://slidepdf.com/reader/full/ernst-unknown-comparative-genomics-and-the-gene-concept 14/29

To see this, let us consider variants on Kripke’s Neptune example. Suppose scientists stipu-

late that the term ‘Neptune’ shall refer to whatever (heretofore unobserved) planet causes the

observed perturbations in the orbits of other planets. Here, our research programme into the

nature of Neptune would be to calculate – based on Newtonian physics – what the mass and

position of such a planet would have to be. Then we would try to observe whether there was infact a new planet at the expected location.

Now consider two different ways in which such a research programme could yield surprising 

results. First, suppose that after the appropriate calculations, it turned out that there was not

a planet, but a large asteroid or other body in the appropriate location. In such a case, the

intensional meaning of the term ‘Neptune’ would have to be revised dramatically. However,

 we would not conclude that Neptune does not exist; instead, we would conclude that the term

‘Neptune’ has turned out – surprisingly – to refer to an asteroid instead of a planet.

In contrast, consider a scenario like the one discussed by Nola. In this second case, it turns

out that our understanding of gravitational attraction is dramatically wrong; it is not an unob-served planet that causes the perturbations, but the Earth (through a circuitous and surprising 

route). In this case, Nola is right when he says that the correct conclusion would be that the

term ‘Neptune’ does not refer at all.

In both cases, the intensional meaning of the term is importantly wrong; in the first case, it

turns out that there is no planet that causes the orbital perturbations. In the second, it turns

out that there is a planet, but not an unobserved one. Note that it would be a mistake to con-

clude that the intensional meaning was obviously more mistaken in one case than in the other.

For in one case, Neptune turns out not to be a planet at all; but in the second case, there is at

least a planet  (namely, Earth) causing the observed phenomena. So if the cases yield differentintuitions about the reference of the term ‘Neptune’, it is not because of obvious differences

regarding their intensional meanings. Rather, what explains the difference between these two

cases is that the research programme for investigating the cause of the observed phenomena

must be given up entirely in the second case, while it remains intact in the first case. Accord-

ingly, we judge that the term continues to refer when the research programme remains intact;

but when the research programme must be given up, we judge that the term fails to refer.

 Although it is obviously a difficult question as to when a particular research programme has

been given up, even a rough-and-ready judgment is good enough to make sense of traditional

examples that are used in discussions of the reference of theoretical terms. But more impor-

tantly, we better understand why a bare causal theory of reference is so plausible when it is

applied to theoretical terms, and why it seems to fail in some especially problematic cases.

Clearly, when one baptizes a theoretical term via schema (S ), and thereby attributes some

causal power to the putative entity named by the term, then that attribution will guide research

into the nature of that entity. For example, if one supposes that phlogiston or oxygen is the

14

8/12/2019 Ernst - Unknown - Comparative Genomics and the Gene Concept

http://slidepdf.com/reader/full/ernst-unknown-comparative-genomics-and-the-gene-concept 15/29

cause of combustion, then a researcher will try to discover the nature of phlogiston or oxygen

by observing what happens when combustion occurs. If one were to discover – as in the case of 

phlogiston – that the causal powers of the entity were radically misdescribed (perhaps because

nothing really has those causal powers) – then the research program must end or be revised

beyond recognition. In such a case, the natural conclusion to draw is that there is no referent of the problematic theoretical term.

This suggests that the plausibility of the causal theory for theoretical terms stems from the

relation between the causal powers of an entity and the relevant research programme. One

may explain why a theoretical term refers or fails to refer by citing the appropriate facts about

the research programme, not by citing facts about the original intensional meaning of the term.

 As the research programme evolves, the intentional meaning may change without losing the

referent of the term.

If my arguments so far are sound, then the lesson for reconstructing the gene concept is

straightforward. We must place primary importance on understanding the research programmethat purports to discover genes and elucidate their properties. If we want to understand whether

the term ‘gene’ refers, then we must understand whether contemporary research into genes

is actually tracing observed phenomena back to a referent of the term ‘gene’. The question of 

 whether this is indeed occurring may be determined only by understandingthe methodological

assumptions that are required by contemporary research. Thus, a discussion of contemporary 

genomics is required.

6. COMPARATIVE  GENOMICS  – A BIASED  O VERVIEW 

For the philosophy of biology, genomics provides an extremely valuable area of research. The

novelty of this methodology and the startling successes of genomics raise philosophical issues

that deserve a great deal of attention from philosophers of science. Furthermore, in addition to

raising new problems for study, the field of genomics also helps us to settle existing problems

relating to the definition and reference of theoretical terms, the status of reductionism, the role

of information processing technologies in the special sciences, and a host of other issues.5

However, the techniques used in genomics are so unfamiliar that it does require some time

to become sufficiently acquainted with them. So in this section, I shall offer a biased overview 

of one current approach that is making fast progress toward identifying genes, and determin-

ing the function of particular genes. This is merely one such approach – no representation is

made here that it is the best approach (on any particular measure). But I do allege that it is an

extremely informative  approach, deserving of careful study by philosophers of biology.

In what follows, I shall use the term ‘gene’ uncritically, following the usage that has become

standard in genomics research. In later sections, I shall turn to a critical analysis of this concept,

5Some of these other issues raised by contemporary genomics are surveyed in [3].

15

8/12/2019 Ernst - Unknown - Comparative Genomics and the Gene Concept

http://slidepdf.com/reader/full/ernst-unknown-comparative-genomics-and-the-gene-concept 16/29

and I shall argue that a useful and fairly traditional gene concept can be elaborated from this

usage.

6.1.  Preliminaries.  What makes it possible for an outsider to understand this particular re-

search programme is that the methodology outlined here is highly abstract – so abstract, in fact,

that many biological details may be omitted. So here, I shall give an overview of this research at

a high level of abstraction.6

First of all, it is useful to distinguish between two complementary projects in genomics re-

search. The project that is most familiar to philosophers of science as well as to the general

public is genome sequencing – this is the process of making a catalogue of the specific sequence

of nucleic acids that comprise the genome of a particular species. After this process is com-

pleted, we are left with an immensely long sequence of the familiar   A , T ,G ,C  characters that

standardly represent the genome. Of course, the most famous gene sequencing project is the

human genome project, which has successfully completed the sequencing of an entire human

genome.

However, for our purposes, the more interesting project is genome annotation . In many ways,

this is the more difficult project – for it aims at extracting useful information from the nucleic

acid sequences that are provided by genome sequencing. Genome annotation includes the

process of so-called ‘gene discovery’, as well as the extraction of information about how the

genes function together to implement the processes that are required for the organism. It is

the difference between genome sequencing and genome annotation that explains why signifi-

cant advances in gene therapy, diagnosis, and other areas did not follow immediately upon the

heels of the human genome project. For those advances require genome annotation, for which

genome sequencing is merely a necessary preliminary step.

6.2.   The Subsystems Approach.   Much of the research currently being conducted in genomics

concerns the synthesis of various compounds that are required for the cell to function. Particu-

larly, the process of synthesizing these compounds consists of absorbing nutrition through the

cell wall and driving it through a multi-stage process in which various intermediary compounds

are gradually transformed into others, eventually resulting in the final synthesis of the required

chemical.

 At this point, we must introduce the necessary vocabulary for describing such a process at a

sufficiently high level of abstraction.7 We shall use the term subsystem to refer to any multi-stageprocess that takes as input a particular chemical compound and outputs a new compound that

is synthesized by the cell. These subsystems may be multiply-realized – that is, there may be

6Indeed, it is an interesting feature of genomics research that it is common for computer scientists with no formal

training in biology to play an important role. This is both due to, and the cause of, the high level of abstraction that

is so common in genomics research.7Here, I outline themethodology and employ theterminology used in a series of papers primarily by Ross Overbeek 

and Rick Stevens [19–23].

16

8/12/2019 Ernst - Unknown - Comparative Genomics and the Gene Concept

http://slidepdf.com/reader/full/ernst-unknown-comparative-genomics-and-the-gene-concept 17/29

!!"#$%&'!()!*+,-.!/!.0.1,+$'*%$

234!!!*#$%&'+56*+.&'!

(!*'*-&'!3!7'5%*+&'!+$8.!

*'*-&'!3!7'5%*+&'!+$8.!)9/!

:,*+,-.1,+$'.&'!3!

*'*-&'!3!*'*-,-$

234!!!*#$%&'+56*+.&'!(!

);<;=;>? );/;=;!

<;@;=;AB

:,*+,-.1,+$'*%$

!!"#$%&'!((!)9/!

/;<;);=</;<;);=C

/;=;=;/

!!D5##,-&'!

(!)!*+,-.!

/!.0.1,+$'*%$

(!D5##,-&'!((!)9/

)9/!:,*+,-.1,+$'*%$

(!)9<!3,E&:6.:,1,#.',-*%$(!A!"81*6%&' 1E.81E*%$

(!"81*6%*%$ A!8$+,*':$E&:$

(!G.+.8$6,-$

);B;);A =;);=;==

=;=;=;<

A;);=;@) =;<;=;)/ );/;=;=B

<;@;=;=>

@;=;=;B

A;=;=;)C

);<;=;==B :,*+,-.1,+$'*%$

(!(&8,-$

(!)9<9A9@!

H$%6*E&:6.!

:,1,#.',-*%$

((!)9/!3,*+,-.1,+$'*%$

+$8.!)9/!3,*+,-.1,+$'*%$

%I!"

J'&#,-$

85K8&8%$+K,.8&-%E$8,8"'L*'.,:

+$%*K.',8+

(!(&8

FIGURE  6.1 . Subsystem diagram for Lysine biosynthesis.

many different combinations of distinct steps that will transform the same input compound

into the same output compound. Each of these possible implementations shall be referred to

as a  pathway . So in the language that is usually associated with antireductionist arguments in

the philosophy of biology, we say that the same subsystem may be multiply realized by many 

different pathways.

 We may thus represent any particular pathway by a diagram that resembles a directed graph;

each vertex of the graph represents a discrete step in the pathway, where that step is responsi-

ble for performing one transformation of a chemical compound into a different chemical com-

pound (and possibly giving off a different compound as a by-product). Genomicists refer tothese discrete steps as functional roles . Thus, a pathway is said to consist of a discrete ordered

set of functional roles.

The various possible implementations of a subsystem may be represented simultaneously 

in one diagram, which we shall call a  subsystem diagram . This is like a graph of a pathway,

except that it is the union of the set of possible pathway implementations. Thus, a subsystem

diagram will typically have branches representing thedifferent paths and sets of functional roles

by which an input compound may be transformed into the required output.

Genes are taken to be sequences of nucleic acids on the chromosome that synthesize the

proteins implementing a particular functional role. So genomics researchers assume that for

any particular functional role appearing in a pathway, there will be a corresponding gene im-

plementing that role. As I shall argue later, this quick gloss is not the full picture of what a gene

is, but it is the preliminary, rough-and-ready notion that is used in genomics research.

 With this hierarchy in mind – consisting of subsystems, pathways, functional roles, and genes

– we can describe the major problems of genome annotation that are most important for ge-

nomics research. Because almost every living organism will have to perform many of the same17

8/12/2019 Ernst - Unknown - Comparative Genomics and the Gene Concept

http://slidepdf.com/reader/full/ernst-unknown-comparative-genomics-and-the-gene-concept 18/29

tasks at the cellular level, subsystems frequently reappear across many different species. For ex-

ample, one compound – biohistidine – must be synthesized by virtually any living thing. Thus,

some token of the biohistidine synthesis subsystem will have to appear in the central machin-

ery of the cell in almost every living organism. However, multiple realizability ensures that this

subsystem may be implemented by more than one pathway, with potentially many differentgenes implementing the necessary combination of functional roles.

 We have an open problem of genome annotation when we discover that some species must

implement (e.g.) the biohistidine synthesis subsystem, but we do not know either which path-

 way is the appropriate token of that subsystem, or which genes implement the functional roles

of the pathway. This sort of problem has been called the ‘missing genes problem’, and the pro-

cess of discovering the genes that implement those functional roles is one of the most interest-

ing activities from a philosophy of biology perspective, for reasons that will become apparent.

6.3.  Evidence Available Through Genomics Research.  An important advantage to the frame- work outlined above is that any given missing genes problem can concisely be represented in a

simple spreadsheet diagram. Indeed, the perspicuity of this representation of the missing genes

problem is an important clue to the right gene concept, or so I shall argue below.

The comparative genomics approach to the missing genes problem takes advantage of the

fact that many nucleic acid sequences are orthologs, where an orthologous sequence is one

that performs a related function in two or more species, and whose appearance in the genomes

of those species is due to common descent (thus, orthologs are a particular type of homologous

trait – see Sober [27]). Thus, partially-completed genome annotations from other species may 

provide important clues for solving missing genes problems that arise for other species.Once a missing genes problem has been specified, the genomics approach to its solution

begins by constructing a spreadsheet. That is, we create an inventory of the known implemen-

tations of the subsystems in question – this information may be accessed through public and

private databases, including the KEGG map database8, which I rely upon throughout this sec-

tion. Particular attention is paid to available genome data from species that are known to be

closely related to the species in question because they are more likely to contain orthologous

sequences.

 After a set of species has been identified with known or partially known implementations of 

the subsystem, that information is organized into a spreadsheet. This representation clearly 

highlights the exact information that is available for the target genome, and the information

that is missing. When the spreadsheet has been compiled, it is easy to see how the various

annotated genomes for other species implement the subsystem. In particular, the spreadsheet

representation makes it clear how various functional roles cluster together; it shows how the

8This database me be found at http://www.genome.jp/kegg/pathway.html.

18

8/12/2019 Ernst - Unknown - Comparative Genomics and the Gene Concept

http://slidepdf.com/reader/full/ernst-unknown-comparative-genomics-and-the-gene-concept 19/29

8/12/2019 Ernst - Unknown - Comparative Genomics and the Gene Concept

http://slidepdf.com/reader/full/ernst-unknown-comparative-genomics-and-the-gene-concept 20/29

implementation of a pathway, then this information may be used to confirm the hypothesis

that another organism uses the same sequence in the same way, provided that the two species

are appropriately related.

Of course, confirmation relations are typically symmetric – if one piece of evidence confirms

another, then the reverse is also true. Thus, there is a feedback between inferring phylogeny andannotating the genome. When we better understand how genomes are annotated, this informa-

tion provides important clues about the evolutionary history of the species and its relationship

to other species. Indeed, the core machinery of the cell evolved so long ago (in comparison to

other traits that are less central to the operation of the organism) that genome annotation of 

those subsystems allows us to look further back in evolutionary history than a similar analysis

of other phenotypic traits would allow.9

6.4.  A Simple Example.  Figure (6.1) is adapted from the KEGG pathway database – a freely-

accessible database of information about known pathways and subsystems in many different

species. It shows the subsystem that synthesizes lysine. One may think of the diagram as repre-

senting all the known pathways by which lysine is synthesized from other chemicals. Boxes in

the diagram with a period-delimited set of numbers – called the ‘EC number’ – represent func-

tional roles, and the circles represent the chemical product that is produced after that func-

tional role has operated. The arrows are used to show the order of steps by which the functional

roles produce the various compound that are necessary for the synthesis of lysine.

 As is typically the case, the product of this subsystem may be used by other subsystems to

produce other compounds that are required by the cell. Accordingly, the subsystem diagram

indicates that lysine may be used as an input to the alkaloid biosynthesis subsystem, and thatL-Homoserine may be used in the glycine metabolism subsystem. As we have seen above, any 

given subsystem may be implemented by one of several different pathways. These options are

shown in the subsystem diagram by places where there is more than one arrow leading from a

circle.

Figure (6.2) is a representative spreadsheet diagram for a portion of the lysine biosynthesis

subsystem. It collects a portion of the available information for nine bacterial genomes; this in-

formation is taken from the current version of the KEGG database. It corresponds to a portion

of the subsystem diagram (6.1). In the spreadsheet, the species names are listed along the left

side; the various functional roles (indicated by their EC numbers) are listed at the top. A dark-ened rectangle means that the species has an identified sequence implementing the functional

role. Where the rectangle is empty, there is no known implementation of that functional role.

In the spreadsheet, I have divided the functional roles into two groups, which are labeled (A)

and (B). If we examine the functional roles from each of these two groups, we see that there is

9Indeed, one of the reasons for focusing on the central machinery of the cell is that some researchers hope that by 

so doing, it will finally become possible to make reasonable hypotheses about prebiotic evolution.

20

8/12/2019 Ernst - Unknown - Comparative Genomics and the Gene Concept

http://slidepdf.com/reader/full/ernst-unknown-comparative-genomics-and-the-gene-concept 21/29

8/12/2019 Ernst - Unknown - Comparative Genomics and the Gene Concept

http://slidepdf.com/reader/full/ernst-unknown-comparative-genomics-and-the-gene-concept 22/29

implementation. By the same token, we might conjecture that Lysteria monocytogenes  imple-

ments role 3.5.1.47.

Clearly, the quick generation of such conjectures is a highly valuable feature of the compara-

tive genomics approach. Furthermore, due to the presence of orthologous sequences, we have

reasonable – but defeasible – hypotheses about how those functional roles are implemented.Specifically, we should look at those sequences that are known to implement those functional

roles in other species. For example, if we wonder which gene implements role 2.3.1.89 in Strep-

tococcus pneumoniae , then it is a reasonable first assay to examine the genome for sequences

that are similar to the ones implementing that role in other bacterial species such as  Listeria 

monocytogenes , Staphylococcus aureus , and Bacillus subtilis . It is now very simple to conduct

such a search in an automated fashion, since the genome data is simply digital information that

can be searched like any other large dataset.

This example should make it clear why researchers are optimistic about the progress that is

possible in genome annotation. For although this is a simple example, it does faithfully show that there are three distinct stages of genomics research. We may think of those stages roughly 

in the following way.

•   Formulation of the problem . A missing genes problem can be formulated by discovering 

 which functional roles appear to be missing from the annotations of species. This can be

automated by considering subsystem diagrams as directed graphs, and then identifying 

 which paths through the graph are only partially annotated.

•  Search through reference genomes . Other genomes can be identified that are known to

implement the missing functional roles. Those sequences serve as models for candidate

sequences in the target genome.

•  Confirmation of the hypothesis . If such a sequence is discovered in the target genome,

 we may obtain confirming evidence by testing whether the sequence is clustered on the

genome with other sequences that are required for the pathway.

Of course, it may turn out that no such sequence is discovered in any of the comparison genomes.

But that would not show that the comparative genomics approach fails in that case. For if 

there is some unknown sequence that implements the functional role in that particular species,

then it is quite reasonable to suspect that the same sequence will implement that role in other

species. So this suggests that a search through other genomes that are also lacking an identified

implementation of the functional role. If there is a sequence that is nearby on the chromo-

some, and which is found in several of the target genomes, then one may conjecture that it

implements the functional role. This is an important point, because a comparative genomics

approach is not limited to cases in which the sequence has already been discovered in some

other species by traditional ‘wet lab’ techniques. Rather, the computational methods used in

comparative genomics may take the lead by guiding traditional wet lab techniques such as gene22

8/12/2019 Ernst - Unknown - Comparative Genomics and the Gene Concept

http://slidepdf.com/reader/full/ernst-unknown-comparative-genomics-and-the-gene-concept 23/29

knockout studies. Indeed, it is believed by many genomics researchers that one of the most im-

portant benefits of these computational methods is that they help molecular biologists in the

laboratory focus their research on those hypotheses that are most promising.

7. THE  M ODULARITY OF THE GENOME

If we take common usage among researchers as definitive, then we would be forced to con-

clude immediately that genes exist. But the discussion from the previous sections has suggested

a more critical method for determining whether genes exist (and if so, what they are). That is,

 we reinterpret the problem of the existence of genes as a problem of the referent of the term

‘gene’. With the problem formulated in such a way, two questions remain to be settled:

(1) Does the term ‘gene’ refer?

(2) If so, to what does the term ‘gene’ refer (to the best of our knowledge)?

 We should note that these two questions are independent, in the sense that we may give a pos-

itive answer to the first without being able to answer the second. Also, it is important to note

that the first question belongs to the philosophy of language; in contrast, the second question

is a scientific one, which is philosophical only in that the philosophy of science should indicate

 which empirical information bears upon it.

The previous discussion suggests that the best way to determine whether the term ‘gene’

refers is to look to the research programme within which the term is deployed. If there is an

ongoing research programme that is dedicated to discovering the characteristics of genes, and

that research is guided by the fact that particular causal powers are attributed to genes, then

 we have good reason to hold onto the view that the term ‘gene’ refers. But if the research pro-

gramme has been abandoned, or if it has continued in name only – perhaps only by attributing 

totally distinct causal powers to ‘genes’ – then we should hold (with Hull and Dupré) that genes

do not exist. For in such a case, the research programme has been abandoned, leaving behind

any available context upon which to fix the referent of the term.

So we ask what characteristics of genes are assumed by current research. When we consider

comparative genomics, the characteristic feature of this research that stands out is that it cru-

cially assumes that genes are, in an important sense,  modular  units on the chromosome. In

particular, we can identify the following features that genes are assumed to have, which weshall collectively label the ‘modularity of the genome’ hypothesis:

(1) Genes correspond to functional segments of nucleic acids on the chromosome.

(2) These sequences code for proteins, which perform identifiable functions – what we have

called ‘functional roles’.

(3) Genes tend to be conserved by natural selection – once a gene has evolved, it is likely to

be inherited by descendents of the originating species.23

8/12/2019 Ernst - Unknown - Comparative Genomics and the Gene Concept

http://slidepdf.com/reader/full/ernst-unknown-comparative-genomics-and-the-gene-concept 24/29

(4) Genes are interchangeable modules – a gene may appear in one pathway of a particular

species, but be part of a different pathway in another species.

 With the exposition of comparative genomics in section 6, it is easy to see that this research pro-

gramme crucially assumes the truth of theses (1) through (4). To see that it does in fact assume

the truth of these theses, we may briefly consider each in turn. Thesis (1) is obvious, since theannotation process assumes (as does everyone) that genes are to be identified by their location

on the chromosome. As for thesis (2), comparative genomics researchers must assume this as

a working hypothesis, or else it would be impossible to formulate a missing genes problem by 

noting that a particular functional role has not been identified with a sequence of nucleic acids.

Genomicists assume the truth of thesis (3) in several ways; but most obviously, there would be

no reason to compare the annotations of several related species if there was no presumption

that these annotations would likely be shared by related species. And of course, the reason why 

closely related species would be expected to have them in common is precisely because genes

(and their functional roles) are to be conserved by natural selection as species evolve. Lastly,

thesis (4) is assumed when comparative genomics researchers, in the course of investigating a

missing genes problem, look to related, but distinct, functional roles in other species.

Genes, then, are implicitly identified with a particular kind of sequence – namely, sequences

that are functional and modular in the sense given by theses (1) through (4), and whose modu-

larity is a product of evolution and natural selection.

 At this relatively early stage of research into genomics, I am skeptical that it is possible to give

a more detailed characterization of the gene concept. But this should not be surprising – it is

only recently that large amounts of genomics data have become available, and this is a science

that is still in its infancy. And as I have noted above, it is perfectly ordinary that we may say 

that a particular theoretical term refers, without being able to give it a full characterization. But

in spite of our inability to give a thorough intensional definition of the concept, there are im-

portant benefits to conceiving of genes as conserved, functional, modular sequences of nucleic

acids. In the remainder of this section, I shall briefly detail some of these benefits.

7.1.  Evidence of genes.   Given the complexity of the relationship between sequences andgenes,

one can hardly blame Duprè for announcing the end of the gene concept. However, as I have

argued, such pessimism is unwarranted. Indeed, it may be one of the more interesting corollar-

ies of the comparative genomics concept of the gene that it indicates what is  right  about these

earlier gene concepts. In particular, it shows us that these earlier gene concepts are evidence  of 

the existence of genes, although they cannot define  what the gene concept is.

For example, consider (what has turned out to be) a naive hope that genes would correspond

to contiguous sequences of nucleic acids on the chromosome. Of course, we now recognize

that this is sometimes not the case. However, if we understand genes as evolved and conserved

functional modules on the chromosome, then it turns out that contiguity on the chromosome24

8/12/2019 Ernst - Unknown - Comparative Genomics and the Gene Concept

http://slidepdf.com/reader/full/ernst-unknown-comparative-genomics-and-the-gene-concept 25/29

is (defeasible) evidence  of the existence of genes. For the processes of recombination and other

genetic shuffling on the chromosome make it more likely that a sequence will be preserved

intact if it is not spread out over the chromosome. Thus, the fact that genes are functional mod-

ules implies that we would expect their physical characteristics to help them to be conserved

during those reshuffling processes. And indeed, as comparative genomics research has shown,it has turned out that genes often are  contiguous for just this reason.

The lesson here is that we must not confuse evidential facts with definitional ones – in par-

ticular, the modularity of genes increases the probability that genes will be contiguous; thus,

the contiguity of an alleged gene is positive evidence that we have in fact identified a gene. But

like most evidential facts, these are defeasible. Some genes may be discontiguous, and yet be

functional modules. In general, when it turns out that a proposed mark of genes is found to not

hold generally, then we should not conclude that genes do not exist.

Indeed, the modular nature of genes shows why not only their contiguity, but also their loca-

tion on the chromosome, is evidential without being definitional. For the working hypothesisof comparative genomics is that genes are modular in at least two senses – for they not only are

functional modular units themselves, but they are embedded in a hierarchy of modules consist-

ing of functional roles, pathways, and subsystems. The fact that these ‘higher-level’ modules are

conserved by evolution and natural selection makes it the case that the location of genes occur-

ring in the same pathway are more likely to be located near each other for the same reason that

nucleic acids in the same gene are likely to be near each other. But again, this fact about the lo-

cation of genes on the chromosome does not serve as any part of the definition of what a gene

is; it is merely confirming evidence that particular sequences of nucleic acids are genes.

7.2.   Why is it so difficult to characterize genes?  It is an important virtue of this proposal that

it not only replaces some failed attempts to say what genes are, but that it also explains why it

is so difficult to characterize the gene concept in the first place. In fact, it is easy to see why the

gene concept is so elusive. For although modularity, as I have argued, is central to the nature of 

genes, we do not yet understand the evolution of modularity.

Examples of evolved structures that display modularity are easy to come by. In philosophi-

cal literature, the best known discussion of modularity is undoubtedly the discussion that was

instigated by Jerry Fodor regarding the modularity of mind [4]. Other examples are less well-

known in philosophical discussions. For instance, recent research in the nascent field of neu-

roeconomics has uncovered neural structures that appear to function as discrete modules (e.g.

see [1, 6, 7]). And it has been well-known in computer science that when neural networks are

subject to evolutionary pressures through so-called ‘genetic algorithms’, it is common for the

resulting structures to exhibit modularity (e.g. [12–14]).

If the research methodology of comparative genomics is borne out in the long run (as I be-

lieve it will be) then it will turn out that modularity has evolved not only in gross anatomical25

8/12/2019 Ernst - Unknown - Comparative Genomics and the Gene Concept

http://slidepdf.com/reader/full/ernst-unknown-comparative-genomics-and-the-gene-concept 26/29

structures, but in the genetic code as well. Thus, to see why it is so difficult to characterize the

nature of genes, we should see how this problem is an instance of a more general problem that

is extremely difficult. Let us call this ‘the problem of evolved modularity’.

The problem of evolved modularity has been addressed in the philosophical literature, but

not in a technically satisfying way. For example, Günther Wagner has discussed two major pro-cesses that may bring about the evolution of modularity, which he calls ‘parcellation’ and ‘inte-

gration’ [30, p. 38]. Applying these concepts to genes, ‘parcellation’ refers to the ‘elimination of 

pleiotropic effects’ between different sets of genes or nucleic acid sequences and the ‘mainte-

nance and/or augmentation of pleitropic effects’ within genes or nucleic acid sequences. The

concept of integration is concerned with the construction of higher-level modularity; it is the

‘creation of pleiotropic effects’ among genes. Thus, in the context of the evolution of genes

and pathways, if we consider genes to be the lowest level in a hierarchy of modularity, parcel-

lation would be a general term referring the processes whereby the modularity of the gene is

produced. At a higher level of modularity, integration is the general process whereby genesbecome organized into pathways.

It should not be controversial at all that processes of parcellation and integration must take

place in the evolution of genes, pathways, and higher levels of modularity. Indeed, these terms,

as defined by Wagner, are so general that almost no substantive empirical claim is made by 

asserting that these processes take place. The interesting challenge, which may be framed in

terms of these two processes, is therefore to determine by what evolutionary mechanisms par-

cellation and integration do take place. And it is here that comparative genomics is extremely 

useful. For as I have outlined in above, there is a useful positive feedback loop between genome

annotation and the discovery of phylogenetic history. Genome annotation – as it is practiced incomparative genomics – depends crucially on our having at least a partial phylogenetic history 

of the species, because the technique requires comparisons among more or less closely related

species. Conversely, when a set of annotations is completed, reference to existing sequence

data for other species may suggest phylogenetic relationships that have been unknown. Thus,

as more sequence data and annotated sequences become available, we are able to look farther

back in the evolutionary history of the species. In fact, this process may allow us to reconstruct

how the gene arose in the first place, and learn about the timing and process whereby genes

became organized into particular pathways. It is important to note that this is not merely spec-

ulation; in an increasing number of cases, this has been accomplished.10

10For example, comparative genomics has made is possible to reconstruct the evolutionary origin of the Prosthe-

cobacter tubulin genes [11], lysine biosynthesis [17], as well as specific functional roles in pathways in the lysine

biosynthesis subsystem [29]. An informative discussion of methodology may be found in [31].

26

8/12/2019 Ernst - Unknown - Comparative Genomics and the Gene Concept

http://slidepdf.com/reader/full/ernst-unknown-comparative-genomics-and-the-gene-concept 27/29

This positive feedback loop between phylogenetic inference and genome annotation also

makes contact with the distinctively philosophical problem of analyzing the gene concept. Be-

cause it is essential to the gene that it is an evolved modular structure, a fully satisfactory ac-

count of the gene will require an understanding of how such modular structures evolve. At this

time, we can only gesture at the mechanisms by which modularity evolves; but comparativegenomics will allow us to learn how modularity arises (when it does). At that point, we will be

able to offer a specific, etiological account of the gene.

8. CONCLUSION

 Although it is a significant amount of work to get clear on the research methodology of com-

parative genomics, there is more than enough philosophical payoff for doing so. In particular,

it turns out that the fact about genes that is crucial to understanding comparative genomics

is that this research must assume that genes are modular. Genes are conceived of as discrete,

functional units that are interchangeable among various pathways and subsystems, and whichare also conserved by evolution. If my arguments are correct, then it turns out that the various

alleged features of genes (such as contiguity, location on the chromosome, etc.) that have been

seized upon as providing essential features of genes are actually by-products of the modularity 

of genes.

If the arguments in this paper have been correct, then the most significant payoff of the cur-

rent study might not be a positive characterization of the gene, but instead the identification of 

a worthwhile and neglected research problem. For we will not be able to provide a fully ade-

quate gene concept without first understanding the evolution of modularity. If we were to have

an adequate theory of the evolution of modularity, other philosophical problems would be elu-cidated; these include the modularity of the mind and perhaps the units of selection problem.

Fortunately, comparative genomics is beginning to provide valuable empirical data on how a

complex modular structure has evolved. Thus, the problem of characterizing the gene con-

cept may be a route to understanding other philosophical problems that may be illuminated

through a better understanding of modularity.

27

8/12/2019 Ernst - Unknown - Comparative Genomics and the Gene Concept

http://slidepdf.com/reader/full/ernst-unknown-comparative-genomics-and-the-gene-concept 28/29

REFERENCES

1. Colin Camerer, George Loewenstein, and Drazen Prelec, Neuroeconomics: How neuroscience can inform eco-

nomics , Journal of Economic Literature 43 (2005), 9–64.

2. Berent Enç, Reference of theoretical terms , Noûs 10 (1976), no. 3, 261–282.

3. Zachary Ernst, Philosophical issues arising from genomics , Oxford Handbook of Philosophy of Biology (Michael

Ruse, ed.), Oxford University Press, 2008.

4. J.A. Fodor, The modularity of mind , MIT Press Cambridge, MA, 1983.

5. Alan Garfinkel,  Reductionism , The Philosophy of Science (Richard Boyd, Philip Gasper, and J.D. Trout, eds.),

MIT Press, 1991, pp. 443–459.

6. Paul W. Glimcher, Decisions, uncertainty, and the brain: The science of neuroeconomics , MIT Press, Cambridge,

Massachusetts, 2003.

7. Paul W. Glimcher and Aldo Rustichini,   Neuroeconomics: The consilience of brain and decision , Science  306

(2004), 447–452.

8. David L. Hull, Informal aspects of theory reduction , Philosophy of Science Association (1974), 653–670.

9. D.L. Hull, Reduction in Genetics–Biology or Philosophy? , Philosophy of Science 39 (1972), no. 4, 491–499.

10. N. Ivanova, A. Sorokin, I. Anderson, N. Galleron, B. Candelon, V. Kapatral, A. Bhattacharyya, G. Reznik,

N. Mikhailova, A. Lapidus, et al.,  Genome sequence of Bacillus cereus and comparative analysis with Bacillus 

anthracis , Nature 423 (2003), no. 6935, 87–91.

11. Cheryl Jenkins, Ram Samudrala, et al., Genes for the cytoskeletal protein tubulin in the bacterial genus Prosthe-

cobacter , Proceedings of the National Academy of Sciences of the United States of America  99 (2002), 17049–

17054.

12. Nadav Kashtan and Uri Alon, Spontaneous Evolution of Modularity and Network Motifs , Proceedings of the

National Academy of Sciences of the United States of America 102 (2005), no. 39, 13773–13778.

13. B. Kosko, Hidden patterns in combined and adaptive knowledge networks , International Journal of Approxi-

mate Reasoning  2 (1988), no. 4, 377–393.

14. ,  Neural networks and fuzzy systems: a dynamical systems approach to machine intelligence , Prentice-Hall, 1992.

15. Saul Kripke, Naming and necessity , Harvard University Press, Cambridge, 1980.

16. Frederick W. Kroon, Theoretical terms and the causal view of reference , Australasian Journal of Philosophy  63

(1985), no. 2, 143–166.

17. Hiromi Nishida, Makoto Nishiyama, Nobuyuki, Takehide Dosuge, Takayuki Hoshino, and Hisakazu Yamane, A 

Key to the Evolution of Amino Acid Biosynthesis , Genome Research 9 (1999), 1175–1183.

18. Robert Nola, Fixing the reference of theoretical terms , Philosophy of Science 47 (1980), no. 4, 505–531.

19. R. Overbeek, M. Fonstein, M. D’Souza, G.D. Pusch, and N. Maltsev, The use of gene clusters to infer functional 

coupling , Proc Natl Acad Sci US A  96 (1999), no. 6, 2896–2901.

20. Ross Overbeek, Genomics: what is realistically achievable? , Genome Biology  1 (2000), 1–3.

21. Ross Overbeek, Terry Disz, and Rick Stevens,  The   SEED: A peer-to-peer environment for genome annotation ,

Communications of the Association for Computing Machinery  47 (2004), 46–51.

22. Ross Overbeek et al., The  ERGO genome analysis and discovery system , Nucleic Acids Research 31 (2003), no. 1,

164–171.

23. ,  The subsystems approach to genome annotation and its use in the project to annotate 1000 genomes ,

Nucleic Acids Research 33 (2005), no. 17, 5691–5702.

24. Hilary Putnam, Meaning and Reference , The Journal of Philosophy  70 (1973), no. 9, 699–711.

28

8/12/2019 Ernst - Unknown - Comparative Genomics and the Gene Concept

http://slidepdf.com/reader/full/ernst-unknown-comparative-genomics-and-the-gene-concept 29/29

25. Willard Van Orman Quine,  Reference and modality , From a Logical Point of View, Harvard University Press,

1953.

26. Alexander Rosenberg,  Instrumental biology or the disunity of science , University of Chicago Press, Chicago,

1994.

27. Elliott Sober,   Reconstructing the past: Parsimony, evolution, and inference , MIT Press, Cambridge, Mas-

sachusetts, 1988.

28. P. Kyle Stanford and Philip Kitcher, Refining the causal theory of reference for natural kind terms , Philosophical

Studies 97 (2000), 99–129.

29. A.M. Velasco, J.I. Leguina, and A. Lazcano, Molecular Evolution of the Lysine Biosynthetic Pathways , Journal of 

Molecular Evolution 55 (2002), 445–459.

30. Günther Wagner, Homologues, Natural Kinds and the Evolution of Modularity , American Zoologist 36 (1996),

36–43.

31. Itai Yanai and Charles DeLisi, The society of genes: networks of functional links between genes from comparative 

genomics , Genome Biology  3 (2002), no. 11, 1–12.

E-mail address : [email protected]

DEPARTMENT OF PHILOSOPHY , UNIVERSITY OF M ISSOURI-C OLUMBIA 

URL : www.missouri.edu/~ernstz