Nature of polymorphism in HLA-A, -B, and -C molecules

5
Proc. Natl. Acad. Sci. USA Vol. 85, pp. 4005-4009, June 1988 Immunology Nature of polymorphism in HLA-A, -B, and -C molecules (histocompatibility/diversity/evolution/peptides/T-cel receptor) P. PARHAM*t, C. E. LOMEN*, D. A. LAWLOR*, J. P. WAYS*, N. HOLMES*, H. L. COPPINt, R. D. SALTER*, A. M. WAN*, AND P. D. ENNIS* Departments of *Cell Biology and tMedical Microbiology, Sherman Fairchild Science Building, Stanford University School of Medicine, Stanford, CA 94305 Communicated by Harden M. McConnell, February 1, 1988 (received for review January 14, 1988) ABSTRACT Diversity in 39 HLA-A, -B, and -C molecules is derived from 20 amino acid positions of high variability and 71 positions of low variability. Variation in the structurally homologous a, and a2 domains is distinct and may correlate with partial segregation of peptide and T-cell receptor binding functions. Comparison of 15 HLA-A with 20 HLA-B molecules reveals considerable locus-specific character, due primarily to differences at polymorphic residues. The results indicate that genetic exchange between alleles of the same locus has been a more important mechanism in the generation of HLA-A, -B, and -C diversity than genetic exchange events between alleles of different loci. Class I major histocompatibility complex glycoproteins are peptide-binding proteins that present processed antigens to cytotoxic T lymphocytes. The genes coding for these mole- cules are the most polymorphic loci known in higher verte- brates and for humans a total of 19 HLA-A, 37 HLA-B, and 8 HLA-C molecules have been defined (1). Although the basic features of class I molecules are well defined (2), accumulation of allelic sequences has been slow. The paucity of sequences has limited our understanding of the scope of the polymorphism, its function, and its generation. We present here a comparison of 39 HLA-A, -B, and -C se- quences and a general assessment of their patterns of diver- sity. MATERIALS AND METHODS Genomic clones encoding HLA-A1, -B8, -B14, -B18, -Bw4l, -Bw42, -B44.2, -Bw65, and -Cw2.2 were isolated from the following cell lines: S. Gar (HLA-Aw24,3;B18,w41;Cw6), BB (HLA-Aw68.2,30;Bw42,w6S), FMB (HLA-AJ,32;$44,w57; Cw5,w6), MRWC (HLA-A2,32$JA,27;Cw2), MVL (HLA- Aw32;B27;Cw2), and LCL721 (HLA-Ai,2$.i,5). The cloned genes are underlined. Construction of libraries, isolation and identification of genes, and sequencing of genes with exon- specific oligonucleotide primers were as described (3). Ge- nomic clones encoding HLA-A1 and HLA-B8 were kindly provided by H. T. Orr (University of Minnesota) (4). RESULTS AND DISCUSSION Variability in HLA-A, -B, and -C Molecules. Genes encod- ing HLA-A1, -B8, -B14, -B18, -B44, -Bw4l, -Bw42, -Bw65, and -Cw2 were isolated, and the exons were sequenced. The sequences of the HLA-B44 and HLA-Cw2 proteins each differ by 3 amino acids from identically typed molecules isolated from other cell lines (5, 6). These represent distinct subtypes, and we have designated them as HLA-B44.2 and HLA-Cw2.2 compared to HLA-B44.1 and HLA-Cw2.1 for the published sequences (5, 6). Comparison of 15 HLA-A, 20 HLA-B, and 4 HLA-C proteins show that substitutions are found at 91 out of 274 positions in the three extracellular domains (a,, a2, and a3) that interact with antigenic peptides, the T-cell receptor, and the CD8 molecule. As single residues predominate at most positions, a consensus sequence can be made (Fig. 1). Individual molecules differ from the consensus by 12-30 residues, showing all have considerably diverged from a common ancestor. Without knowing the total number of alleles, it is difficult to predict how many sequences are required to gain an accurate description of HLA-A, -B, and -C polymorphism. As an empirical assessment, we compared the variability plot (23) obtained from all 39 sequences with the variability plot made from the first 23 sequences obtained (Fig. 2). Compar- ison of the two variability plots reveals both similarities and differences. It is first helpful to define two distinct groups of polymorphic positions based on the magnitude of their variability. The number of polymorphic positions in the a3 domain and the magnitude of their variability are much lower than encountered in the a1 and a2 domains. This correlates with a suppression of coding substitutions in the exon coding for a3 compared to the exons coding for a1 and a2 (9, 10) and suggests that polymorphism in a1 and a2 is qualitatively different from that in a3. We, therefore, divided the positions of polymorphism in a1 and a2 according to whether their variability was greater or comparable to that found in a3. Positions with variability >'5.0 are designated as high vari- ability, and those with variability between 5.0 and 1.0 are designated low variability. Comparison of the plots made with 23 or 39 sequences shows no dramatic changes in positions of high variability. Eighteen positions with high variability (positions 9, 24, 45, 62, 65, 66, 67, 70, 71, 74, 77, 80, 95, 97, 114, 116, 156, and 163) are held in common (Figs. 1 and 2). Position 43 only has high variability when 23 sequences are considered; positions 69 and 76 have high variability when 39 sequences are consid- ered. The magnitude of variability at these positions shows modest gains with the increase in sequence number, yet remains significantly lower than the values obtained for hypervariable residues of immunoglobulin and T-cell recep- tors (23, 24). For this reason we have not designated these positions as having allelic hypervariability, although this nomenclature has been widely adopted in analysis of class II major histocompatibility complex sequences (25). In conclu- sion, the pattern of polymorphism at residues of high vari- ability is not greatly changed in considering 39 versus 23 sequences and is unlikely to undergo major change as further sequences accrue. In contrast, a significant difference is found in positions showing low variability. The number of these residues increases from 58 to 71 when 39 sequences were compared to 23 sequences, and this number can be expected to increase as more molecules are sequenced. Although the modest varia- tTo whom reprint requests should be addressed. 4005 The publication costs of this article were defrayed in part by page charge payment. This article must therefore be hereby marked "advertisement" in accordance with 18 U.S.C. §1734 solely to indicate this fact.

Transcript of Nature of polymorphism in HLA-A, -B, and -C molecules

Proc. Natl. Acad. Sci. USAVol. 85, pp. 4005-4009, June 1988Immunology

Nature of polymorphism in HLA-A, -B, and -C molecules(histocompatibility/diversity/evolution/peptides/T-cel receptor)

P. PARHAM*t, C. E. LOMEN*, D. A. LAWLOR*, J. P. WAYS*, N. HOLMES*, H. L. COPPINt, R. D. SALTER*,A. M. WAN*, AND P. D. ENNIS*Departments of *Cell Biology and tMedical Microbiology, Sherman Fairchild Science Building, Stanford University School of Medicine, Stanford, CA 94305

Communicated by Harden M. McConnell, February 1, 1988 (received for review January 14, 1988)

ABSTRACT Diversity in 39 HLA-A, -B, and -C moleculesis derived from 20 amino acid positions of high variability and71 positions of low variability. Variation in the structurallyhomologous a, and a2 domains is distinct and may correlatewith partial segregation of peptide and T-cell receptor bindingfunctions. Comparison of 15 HLA-A with 20 HLA-B moleculesreveals considerable locus-specific character, due primarily todifferences at polymorphic residues. The results indicate thatgenetic exchange between alleles of the same locus has been amore important mechanism in the generation of HLA-A, -B,and -C diversity than genetic exchange events between allelesof different loci.

Class I major histocompatibility complex glycoproteins arepeptide-binding proteins that present processed antigens tocytotoxic T lymphocytes. The genes coding for these mole-cules are the most polymorphic loci known in higher verte-brates and for humans a total of 19 HLA-A, 37 HLA-B, and8 HLA-C molecules have been defined (1). Although thebasic features of class I molecules are well defined (2),accumulation of allelic sequences has been slow. The paucityof sequences has limited our understanding of the scope ofthe polymorphism, its function, and its generation. Wepresent here a comparison of 39 HLA-A, -B, and -C se-quences and a general assessment of their patterns of diver-sity.

MATERIALS AND METHODSGenomic clones encoding HLA-A1, -B8, -B14, -B18, -Bw4l,-Bw42, -B44.2, -Bw65, and -Cw2.2 were isolated from thefollowing cell lines: S. Gar (HLA-Aw24,3;B18,w41;Cw6), BB(HLA-Aw68.2,30;Bw42,w6S), FMB (HLA-AJ,32;$44,w57;Cw5,w6), MRWC (HLA-A2,32$JA,27;Cw2), MVL (HLA-Aw32;B27;Cw2), and LCL721 (HLA-Ai,2$.i,5). The clonedgenes are underlined. Construction of libraries, isolation andidentification of genes, and sequencing of genes with exon-specific oligonucleotide primers were as described (3). Ge-nomic clones encoding HLA-A1 and HLA-B8 were kindlyprovided by H. T. Orr (University of Minnesota) (4).

RESULTS AND DISCUSSIONVariability in HLA-A, -B, and -C Molecules. Genes encod-

ing HLA-A1, -B8, -B14, -B18, -B44, -Bw4l, -Bw42, -Bw65,and -Cw2 were isolated, and the exons were sequenced. Thesequences of the HLA-B44 and HLA-Cw2 proteins eachdiffer by 3 amino acids from identically typed moleculesisolated from other cell lines (5, 6). These represent distinctsubtypes, and we have designated them as HLA-B44.2 andHLA-Cw2.2 compared to HLA-B44.1 and HLA-Cw2.1 forthe published sequences (5, 6).

Comparison of 15 HLA-A, 20 HLA-B, and 4 HLA-Cproteins show that substitutions are found at 91 out of 274positions in the three extracellular domains (a,, a2, and a3)that interact with antigenic peptides, the T-cell receptor, andthe CD8 molecule. As single residues predominate at mostpositions, a consensus sequence can be made (Fig. 1).Individual molecules differ from the consensus by 12-30residues, showing all have considerably diverged from acommon ancestor.Without knowing the total number of alleles, it is difficult

to predict how many sequences are required to gain anaccurate description of HLA-A, -B, and -C polymorphism.As an empirical assessment, we compared the variability plot(23) obtained from all 39 sequences with the variability plotmade from the first 23 sequences obtained (Fig. 2). Compar-ison of the two variability plots reveals both similarities anddifferences. It is first helpful to define two distinct groups ofpolymorphic positions based on the magnitude of theirvariability. The number of polymorphic positions in the a3domain and the magnitude of their variability are much lowerthan encountered in the a1 and a2 domains. This correlateswith a suppression of coding substitutions in the exon codingfor a3 compared to the exons coding for a1 and a2 (9, 10) andsuggests that polymorphism in a1 and a2 is qualitativelydifferent from that in a3. We, therefore, divided the positionsof polymorphism in a1 and a2 according to whether theirvariability was greater or comparable to that found in a3.Positions with variability >'5.0 are designated as high vari-ability, and those with variability between 5.0 and 1.0 aredesignated low variability.Comparison of the plots made with 23 or 39 sequences

shows no dramatic changes in positions of high variability.Eighteen positions with high variability (positions 9, 24, 45,62, 65, 66, 67, 70, 71, 74, 77, 80, 95, 97, 114, 116, 156, and 163)are held in common (Figs. 1 and 2). Position 43 only has highvariability when 23 sequences are considered; positions 69and 76 have high variability when 39 sequences are consid-ered. The magnitude of variability at these positions showsmodest gains with the increase in sequence number, yetremains significantly lower than the values obtained forhypervariable residues of immunoglobulin and T-cell recep-tors (23, 24). For this reason we have not designated thesepositions as having allelic hypervariability, although thisnomenclature has been widely adopted in analysis of class IImajor histocompatibility complex sequences (25). In conclu-sion, the pattern of polymorphism at residues of high vari-ability is not greatly changed in considering 39 versus 23sequences and is unlikely to undergo major change as furthersequences accrue.

In contrast, a significant difference is found in positionsshowing low variability. The number of these residuesincreases from 58 to 71 when 39 sequences were compared to23 sequences, and this number can be expected to increase asmore molecules are sequenced. Although the modest varia-

tTo whom reprint requests should be addressed.

4005

The publication costs of this article were defrayed in part by page chargepayment. This article must therefore be hereby marked "advertisement"in accordance with 18 U.S.C. §1734 solely to indicate this fact.

4006 Immunology: Parham et al. Proc. Natl. Acad. Sci. USA 85 (1988)

30 40 50 60 70 801 10 20

----------h-- _________

;;¢-A il ------ DE-CGS1{SFRFTTSVSRPGROPFR YAVDDTQFVRFDSDDAS1 10 20 30 40

lk k L

90

1ATU.w7

17

50 60 70 80 90

- S_ AflB(l_- 7lfr --r r 'v 'J V %' *d

100 110 120GSHTLOMGCDVGCPDGRLLRGYIOT£YDGKDYIAl

---_-,-.-

130 140 150 160 170 180

Off nrf 0&f.!VAAnq AAATsOY*LTA*WVTAB l.kl* 'tf UT NOV VtdB"V¢T AD A

tFt4

100 110 120 130L k LI k

140 150 160

I

.4...------___T

,I-

L70 180

FIG. 1. Figure continues onthe opposite page.

el domain

consensusM2.1

M2.2YA2.27£2.3£2.4.A2.4b

0690S8.2

£W68.1A3.1£34A11£1

A32£v243Wv3

327.1327.2127.3327.4327f

144.1144.2

1131v47

O63514118140*Dv41360

58Bv4217.17.2Cvi

Cv2.1Cw2.2

Cv3consensus

a2 domain

consensusA2*1

A2.2YA2.2FA2.3A2.4aA2.4bAw69

Aw68.2Aw68. I

A3.1A3.2A11A1

A32Aw248w58

827.1B27.2B27*3B27.4B27f

B44.1B44.2

B13Bw47Bw65B14Bi1

B40*Bw41Bw60

B8Bw4287.1B7.2Cwl

Cw2.1Cw2.2

Cw3consensus

--I- --- ---IE ..

rAAQTVQRWEA"VAF.Q ..All Of:.r_ LPI .-..

I .

:

I .l

i

.e

1

gr

_T

:

,__M__K__

1ml. IMIL

0M- b- 0

Pr Pr Pr Pr v 0 ( -mw( OMC . zT u

Proc. Natl. Acad. Sci. USA 85 (1988) 4007

a3 domain

190 200 210 220 230 240 250 260 270DPPKTHVTHHPISDHEATLRCWALGFYPAEITLTWQRDGEDQTQDTELVETRPAGDRTFQKWAAVVVPSGEEQRYTCHVQHEGLPKPi rLRW

DPPKTHVTHHPISDHEATLRCWALGFYPAEITLTWQROGEDQTQDTELVETRPAGDRTFQKWAAVVVPSGEEQRYTCH190 200 210 220 230 240 250 2

-- M MMW W-o "

FIG. 1. The amino acid se-quences of the a1, a2, and a3

-------------- domains of 39 HLA-A, -B, and -C-------------- molecules. The standard one-

letter amino acid code is used. A-------------- dash indicates that a residue is-------------- identical to that found in the con-

sensus sequence. Below the se-.-------------_quence is a schematic of the sec---------------- ondary structure with arrows rep---------------- resenting (-strands, squiggles

representing a-helices, and barsrepresenting turns and bends (7).

--------------- Residues of the four central (3---------------- strands and the a-helical regions--------------- that have accessible side chains.-------------_are shown in color (8). Orange--------------- residues point into the peptide---------------- binding site, yellow residues point

up from the binding site (toward__---------- the T-cell receptor), and blue res-

idues point away from the site.-------------- The arrowheads above the se-

quence of each domain indicate

.-------------_ residues with high variability (>-------------- 5.0). Sequences were obtained

from refs. 5, 6, 9-22. Residue 116E------ in HLA-B7.2, although different

_ ---------_--- from the consensus Y, is unknown[VQHEGLPKPLTLRW, and is indicated with an X (22).60 270 Certain errors in translation in

previous reports have been cor-

rected (6, 11, 13).

tions at these "newer" positions of polymorphism do notgreatly affect general assessments of variation, they com-monly represent allele-specific residues and may be offunctional and serological consequence.

i 10 20 30 40 50 60 70

ii

B 39 Sequences£01 16

14

12

10

8

80 90 1 10 20 3014

£02 12

10-

8-

a1

40 50 60 70 80 90

I.,.lILL,91100 110 120 130 140 150 160 170o180 91 100 110120 130 140 150 60 10 10

IA02I4126183190 200 210 220 230 240 280 260 270

Position in sequence

FIG. 2. Variability in the a,, a2, and a3 domains of HLA-A, -B,and -C molecules. Protein sequences were analyzed as described byWu and Kabat (23). (A) Twenty-three sequences were analyzed. (B)All 39 of the sequences shown in Fig. 1 were used. Moleculesexcluded from the analysis in A were HLA-A2.2F, -A2.4b, -All,-Al, -B27.4, -B27f, -B13, -Bw47, -B14, -B18, -B8, -Bw4l, -Bw42,-Cwl, -Cw2.1, and -Cw2.2.

At this point we cannot reliably assess the total number ofpositions exhibiting polymorphism. Therefore, the assign-ment of positions as being conserved or monomorphic mustremain tentative. In consideration of the total number ofpolymorphic positions, it is of interest that combining eightH-2K, -D, and -L sequences (2) with HLA-A, -B. and -Cgives 127 polymorphic positions (46% of the total number ofpositions) of which 43 are specific to HLA, 36 are specific toH-2, and only 48 are held in common.

Variability in a, and a2 Domains Is Distinct. Althoughshowing little homology in primary sequence the a, and a2domains have a similar structure in three dimensions thatconsists of an amino-terminal region of (B-pleated sheet and acarboxyl-terminal region of a-helices. The four (-strands ofeach domain combine to produce a planar structure thatforms the floor of the peptide-combining site (7). The a-helices lay on top of the (3-sheet and define the sides of thesite. Despite their structural similarity, we find strikingdifferences in their patterns of sequence variability. Greaterdiversity is found in the a-helical region of a, compared to a2,whereas the diversity in the (-strand region of a2 is greaterthan that found in a,.The a-helical region of a1 has a cluster of 11 residues of

high diversity (residues 62, 65, 66, 67, 69, 70, 71, 74, 76, 77,and 80), whereas only 2 residues are found in the correspond-ing region of a2 (residues 156 and 163). The accessible aminoacid side chains of the a-helical regions have been dividedinto three groups according to the direction in which theamino acid points, (i) into the binding site and could interactwith peptide, (ii) up from the binding site and could interactwith the T-cell receptor, or (iii) away from the binding site(orange, yellow, or blue, respectively, in Fig. 1) (8). With theexception of residue 45, the side chains of all residues withhigh variability either point into or up from the binding site;

consensusA2 . 1

A2.2YA2.2FA2.3

A2.4aA2.4bAw69

Aw68.2Aw68. 1

A3.1A3.2AllAl

A32Aw24Bw58

B27.1B27.2827.3B27.4B27f

B44.1B44.2

B13Bw47Bw65B14Bli

B40*Bw4lBw6O

B8Bw42B7.1B7.2Cwl

Cw2.1Cw2.2

Cw3consensus

v Vr

A 23 Sequences

16

14

12

10

I I AL14

12

10

me JI- PLJr W~ _--. 3---, --

Immunology: Parham et al.

.F --

2:'

183 190 200 210 220 230 240 250 260 270

Proc. Natl. Acad. Sci. USA 85 (1988)

this is in general agreement with the conclusions of Bjorkmanet al. (8) based on a total of 22 sequences. In addition we findthat there is a segregation of putative peptide binding residuesto the al-helices and of T-cell binding residues to thea2-helices. There are 11 orange and 7 yellow residues in thehelical region of al and 7 orange and 11 yellow residues in thehelical region of a2. This suggests that the a1-helix maypredominate in interactions with the bound peptide, whereasthe a2-helix plays a greater role in interaction with the T-cellreceptor. The differences in variability between the twohelical regions may directly reflect this potential functionalsegregation. The lower diversity in the a2-helices may reflectgreater constraints on the interaction between the class Imajor histocompatibility complex molecule and T-cell recep-tor compared to the interaction between the class I majorhistocompatibility complex molecule and a peptide. Thisinterpretation is also consistent with the predominance ofsubstitutions in the a-helical region of a2 found in variantsand mutants defined by cytotoxic T cells (2, 8).

In contrast to the helical regions, where diversity isconcentrated in the al domain, one finds a greater diversityin the 3-strand region of the a2 domain. There are fourpositions of high variability (residues 95, 97, 114, and 116)compared to three positions (residues 9, 24, and 45) in al, andthe magnitude of variability at these positions is greater thanthat found in the al domain. The p-strands form the bottomof the peptide-binding site and may only contact the peptide,not the T-cell receptor. These observations suggest a greaterrole for the p-strands of al in interacting with conservedelements, perhaps the backbone of bound peptides, and forthe p-strands of a2 in interacting with the variable side chainsof bound peptide.

Distinct Patterns of Polymorphism in HLA-A, -B, and -CMolecules. Codominant expression of three class I majorhistocompatibility complex loci in humans (HLA-A, -B, and-C) and mice (H-2K, -D, -L) raises the question of theirstructural and functional divergence. One indicator would bethe existence of locus-specific residues. In contrast to 183positions where conservation of an amino acid between allHLA-A, -B, and -C molecules is seen, only 6 positions havelocus-specific residues (positions 52, 138, 183, 239, and 268).At no position are all three loci different, the pattern is alwaysof two loci having the same residue with the third locus beingunique. HLA-A-specific residues are Met-138 and Met-189;Arg-239 is an HLA-B specific residue; and Val-52, Glu-183,and Glu-268 are HLA-C specific residues.

In addition to this small number of locus-specific residuesmany polymorphic features of human class I molecules are

segregated between loci. When variability plots are limited toproducts of a single locus, values of variability are signifi-cantly reduced and distinct patterns are obtained for HLA-Aand HLA-B (Fig. 3). (Insufficient sequences precluded anal-ysis of HLA-C.) Only three positions for HLA-A (positions62, 77, and 156) and four positions for HLA-B (positions 45,67, 97, and 116) exhibit more than three different amino acidscompared to 14 positions (positions 9, 45, 62, 67, 69, 70, 77,80, 95, 97, 114, 116, 156, and 163) when all loci are combined.From a total of 73 variable positions in HLA-A and -Bmolecules, we find only 25 that show polymorphism in bothloci, with 27 positions being polymorphic at the A locus aloneand 21 being polymorphic at the B locus alone. Variation atmany of the high-variability residues is predominantly thecontribution of a single locus. At position 76 the variability isentirely derived from the A locus; at positions 45, 67, and 71,it is entirely derived from the B locus. Variability at positions62, 65, 66, and 156 is predominantly due to HLA-A locusdifferences, and variability at positions 116 and 163 is due toHLA-B locus differences. The variation seen in a3 is primar-ily due to substitutions at the HLA-A locus.

10

8

6

4

2

D(Ym 12

10

8

6

4

2

Position in Sequence

FIG. 3. Variability of HLA-A and HLA-B molecules. The vari-ability of the sequences for the 15 HLA-A and 20 HLA-B moleculesshown in Fig. 1 is plotted as described in Fig. 2.

To quantitate the degree of locus specificity at positions ofpolymorphism, we calculated the number of amino aciddifferences between all possible pairs of sequences. Thefrequency distribution of these values was plotted, differen-tiating between comparison of products of the same or

different loci (Fig. 4). Two distinct distributions occur withonly a small overlap. The values for pairs involving productsof different loci range from 24 to 50 with a modal value of 41.In contrast the differences between products of the samelocus range from 1 to 32 with a modal value of 20. There are

no molecules for which all or a majority of either theinterlocus or the intralocus comparisons fall into the overlapregion. Thus one might now determine the locus of anunknown molecule on the basis of its pattern of polymorphic

50~

40

30-,LL

20

10k

15 25 35 45

Number of Differences

FIG. 4. HLA-A, -B, and -C products can be distinguished on the

basis of their polymorphic differences. The number of amino acid

substitutions in the extracellular domains (a,, a2, and a3) between all

pairs ofHLA-A, -B, and -C molecules was calculated. The frequencydistribution of the values obtained when pairs of molecules from the

same locus are compared is given by o and when pairs of moleculesfrom different loci are compared by a. The overlap of the two

distributions is given by w.

- 1-( -i a2 -r 1 - 03

HLA-A

i d4LL420 40 60 80 100 120 140 160 180 200 220 240 260

4008 Immunology: Parham et al.

Proc. Natl. Acad. Sci. USA 85 (1988) 4009

substitution. Separation of the interlocus comparison into itsthree components indicates that HLA-B and HLA-C locusproducts are most closely related. HLA-A and HLA-C arethe most divergent (data not shown).The distribution given by pairwise comparisons within a

locus has a third component with a modal value of 3 andrepresents sequences that are almost identical. This resultsfrom comparisons made between subtypes of particularHLA-A, -B, and -C molecules.These results demonstrate that products of one HLA locus

are much more similar to each other than they are to productsof another locus. This is in part due to locus-specific residuesbut is primarily the result of polymorphisms that are specif-ically or predominantly found within the products of onelocus. This similarity in the coding sequences is consistentwith locus-specific sequences in the untranslated regions ofthe mRNA (26).Mechanisms of Diversification of HLA-A, -B, and -C.

Spontaneous mutants of the murine class I molecule H-2Kbhave been shown to arise through genetic exchange eventsbetween different class I genes (27, 28). These findingsinitiated a debate as to whether this mechanism is the majorgenerator of diversity for other class I genes (29, 30). If truefor HLA-A, -B, and -C, one would expect to find greatersimilarities between alleles of different loci than betweenalleles of the same locus, and this is not the case. Theexistence of considerable locus specificity in polymorphicsequences indicates that exchange of sequences between theHLA-A, HLA-B, and HLA-C loci has not been a majorcontributor to diversity at these genes.

In comparison of allelic sequences one can discern acharacteristic patchwork-quilt pattern in which various smallelements of sequence are arranged in various combinations(Fig. 1), a good example being provided by residues 62-84 ofHLA-B molecules. These patterns are to be expected if pointmutations in combination with intraallelic exchange eventsare the predominant mechanisms generating new alleles. Wehave shown (3) that HLA-Aw69 results from allelic recom-bination between HLA-A2.1 and HLA-Aw68.1 genes, andtwo further examples of in vivo exon shuffling are found in thefollowing sequences. HLA-Bw42 is identical to HLA-B7 inthe a, domain and to HLA-B8 in the a2 domain. HLA-Bw4lis identical to HLA-Bw6O in the a, domain and has an a2domain that with two exceptions (Leu-95 -+ Trp and Ser-97

Arg) is identical to HLA-B8.We do not rule out a role for exchange events between

different loci and have hypothesized that such events wereresponsible for HLA-A2 and HLA-Bw58 sharing residues62-65 (14) and for HLA-Aw24, HLA-A32, and variousHLA-B locus molecules sharing residues 79-83 (12). How-ever, unlike H-2Kb mutants (27, 28), that mechanism has nothad a major impact in shaping the patterns ofHLA-A, -B, and-C diversity. H-2Kb has an unusually high frequency ofmutation compared with other H-2 genes and may representa special haplotype in which the rate of gene conversion isunusually high (31). Alternatively species differences in thenature of class I diversification may exist as suggested by thelack oflocus-specific residues in H-2K, -D, and -L (2). Finallyit must be emphasized that these results do not provideinformation on the relative rates ofgenetic exchange betweenloci and alleles as we have only studied those molecules thathave been selected and fixed in the human population.

Functional Implications. This analysis indicates that all ormost of the positions of high variability in HLA-A, -B, and -Chave been identified. With one exception they appear to bedirectly involved in the peptide-binding site as defined by thestructure ofHLA-A2 (7, 8). These substitutions will undoubt-edly alter the repertoire of peptides and T-cell receptors thatfunctionally interact with HLA-A, -B, and -C molecules. In

addition many positions of the a,, a2, and a3 domains that donot line the peptide-binding site show low variability. Thesepolymorphisms may also affect the affinities for peptides andT-cell receptors through conformationally induced, and per-haps more subtle, changes in the sites of interaction. Thefinding that patterns of polymorphism at the HLA-A, -B, and-C loci are different suggests there is functional diversifica-tion between products of these loci that may have evolved tobind structurally distinct groups of peptides or T-cell recep-tors.

We thank D. Shook for technical assistance and Dr. PamelaBjorkman for insight into the structure of HLA-A2. This researchwas supported by Grants A117892 and A124258 from the PublicHealth Service. P.P. is a scholar of the Leukemia Society ofAmerica.

1. Albert, E. D., Baur, M. P. & Mayr, W. R., eds. (1984) Histo-compatibility Testing 1984 (Springer, Berlin).

2. Maloy, W. L. (1987) Immunol. Res. 6, 11-29.3. Holmes, N. & Parham, P. (1985) EMBO J. 4, 2849-2854.4. Koller, B. H., Geraghty, D., Orr, H. T., Shimizu, Y. &

DeMars, R. (1987) Immunol. Res. 6, 1-10.5. Kottmann, A. H., Seemann, G. H. A., Guessow, H. D. &

Roos, M. H. (1986) Immunogenetics 23, 3%-400.6. Gussow, D., Rein, R. S., Meijer, I., deHoog, W., Seemann,

G. H. A., Hochstenbach, F. M. & Ploegh, H. L. (1987) Im-munogenetics 25, 313-322.

7. Bjorkman, P. J., Saper, M. A., Samraoui, B., Bennett, W. S.,Strominger, J. L. & Wiley, D. C. (1987) Nature (London) 329,506-512.

8. Bjorkman, P. J., Saper, M. A., Samraoui, B., Bennett, W. S.,Strominger, J. L. & Wiley, D. C. (1987) Nature (London) 329,512-518.

9. Holmes, N., Ennis, P., Wan, A. M., Denney, D. W. & Parham,P. (1987) J. Immunol. 139, 936-941.

10. Strachan, T., Sodoyer, R., Damotte, M. & Jordan, B. R. (1984)EMBO J. 3, 887-894.

11. Cowan, E. P., Jelachich, M. L., Biddison, W. E. & Coligan,J. E. (1987) Immunogenetics 25, 241-250.

12. Wan, A. M., Ennis, P., Parham, P. & Holmes, N. (1986) J.Immunol. 137, 3671-3674.

13. N'Guyen, C., Sodoyer, R., Trucy, J., Strachan, T. & Jordan,B. R. (1985) Immunogenetics 21, 479-489.

14. Ways, J. P., Coppin, H. L. & Parham, P. (1985)J. Biol. Chem.260, 11924-11933.

15. Seemann, G. H. A., Rein, R., Brown, C. S. & Ploegh, H. L.(1986) EMBO J. 5, 547-552.

16. Zemmour, J., Ennis, P. D., Parham, P. & Dupont, B. (1988)Immunogenetics 27, 281-287.

17. Ways, J. P., Lawlor, D. A., Wan, A. M. & Parham, P. (1987)Immunogenetics 25, 323-328.

18. Sodoyer, R., Damotte, M., Delovitch, T. L., Trucy, J., Jordan,B. R. & Strachan, T. (1984) EMBO J. 3, 879-885.

19. Orr, H. T., Lopez de Castro, J. A., Lancet, D. & Strominger,J. L. (1979) Biochemistry 18, 5711-5720.

20. Domenech, N., Ezquerra, A., Castano, R. & Lopez de Castro,J. A. (1988) Immunogenetics 27, 1%-202.

21. Rojo, S., Aparicio, P., Choo, S. Y., Hansen, J. A. & Lopez deCastro, J. A. (1987) J. Immunol. 139, 831-836.

22. Taketani, S., Krangel, M. S., Spits, H., deVries, J. & Stro-minger, J. L. (1984) J. Immunol. 133, 816-821.

23. Wu, T. T. & Kabat, E. A. (1970) J. Exp. Med. 132, 211-250.24. Davis, M. M. (1985) Annu. Rev. Immunol. 3, 537-560.25. Benoist, C. O., Mathis, D. J., Kanter, M. R., Williams, V. E.

& McDevitt, H. 0. (1983) Cell 34, 169-177.26. Koller, B. H., Sidwell, B., DeMars, R. & Orr, H. T. (1984)

Proc. NatI. Acad. Sci. USA 81, 5175-5178.27. Nathenson, S. G., Geliebter, J., Pfaffenbach, G. M. & Zeff,

R. A. (1986) Annu. Rev. Immunol. 4, 471-502.28. Flavell, R. A., Allen, H., Burkly, L. C., Sherman, D. H.,

Waneck, G. L. & Widera, G. (1986) Science 233, 437-443.29. Klein, J. (1984) Transplantation 38, 327-329.30. Pease, L. R. (1985) Transplantation 39, 227-231.31. Klein, J. (1978) Adv. Immunol. 26, 55-146.

Immunology: Parham et al.